Having seen the structure of the cell, we are now ready to see how DNA, several forms of RNA and ribosomes work to synthesize proteins. DNA furnishes the recipe not only for reproduction, but also for daily cell growth and maintenance. All this depends to a very great degree on gene regulation, which we will consider shortly.
The so-called central dogma of molecular biology is often paraphrased as “DNA makes RNA and RNA makes protein.” Put differently, DNA is transcribed inside the nucleus to make mRNA, which is expelled from the nucleus to the cytoplasm where it is translated to protein by ribosomes.
DNA –> transcription (nucleus)→ mRNA
→ translation (ribosome)→ protein.
The recipe is expressed in “bytes” of three nucleobases; one three-base byte is referred to as a code-word. When transcribed to its complementary form in mRNA, it is called a codon.
The enzyme which does the work of “reading” a gene on the DNA and building a corresponding gene of RNA is called RNA polymerase.1)In fact, there are several forms of DNA polymerase, but that complexity is well beyond the scope of this document. The RNA it constructs is called mRNA, for “messenger RNA”. The DNA recipe begins with a sequence called the promoter. RNA polymerase contains a complementary sequence which recognizes the promoter, binds to it and launches transcription.2)Transcription is started only if it is allowed by gene regulation, which we will consider shortly. It then unwinds a part of the DNA chain and reads code-words, starting with the promoter. As it reads the DNA, it constructs a complementary chain, called pre-mRNA, from nucleotides.
The nucleotides used, called nucleotide-triphosphates (NTPs), have two additional phosphate groups containing a significant amount of energy from ATP. This energy is used to bond the nucleotides together to form the RNA. Protein synthesis is one of the most costly of cell processes in terms of energy consumption. Much of this energy is used to make enzymes essential to the functioning of the cell.
The RNA polymerase moves along the DNA, unwinding sections as it goes, reads the code words and assembles the appropriate pre-mRNA codons from NTPs. The separated DNA strands recombine in its wake. Eventually, it reaches a transcription-terminator sequence in the DNA and ends transcription. It now has gone through three steps, known as initiation, elongation (of the produced pre-mRNA) and termination. The pre-mRNA then is released into the nucleoplasm.
Before leaving the nucleus, the pre-mRNA must be cleaned up. This is needed because DNA contains non-coding, or junk, sequences. The codons which should be kept are called exons (like “expressed”) and those which should be deleted are called introns (like “interrupted”). Small particles called “snurps” (for snRNPs, or small nuclear ribonucleoproteins), made up of RNA and proteins, bind together to form spliceosomes, which remove introns and splice the exons back together again, resulting in a cleaned-up form of mRNA.3)Are you wondering how the snurps can recognize the introns an exons? So am I. All I can say is that it is quite complicated and has something to do with methylation of the DNA strands. It is currently not understood why there are introns at all.
The mRNA is then moved out of the nucleus for the next step.
Protein synthesis – translation
After the mRNA leaves the nucleus, it is used to provide the input data for the synthesis of proteins. This takes place on ribosomes.
There are two sorts of ribosomes in eukaryotic cells, depending on their location.
- Free ribosomes float in the cytoplasm and make proteins which will function there.
- Membrane-bound ribosomes are attached to the rough endoplasmic, reticulum, which is why it is called “rough”. Proteins produced there will either form parts of membranes or be released from the cell.
In most cells, most proteins are released into the cytoplasm.
Ribosomes are made of ribosomal RNA, or rRNA (one more kind of RNA), and proteins. They are constructed within the nucleolus as two subunits, which are released through the nuclear pores into the cytoplasm.
In addition to the mRNA and the ribosome subunits, a method Is needed for supplying the appropriate amino acids to be linked by peptide bonds to make up the protein or enzyme being constructed. Enter still one more kind of RNA, transfer RNA, or tRNA.
A molecule of tRNA has on one end a binding site (adenylic acid) for a specific amino acid and, on the other, an anticodon, a binding site for the complement of the codon on mRNA. It first lines up its anticodon with an mRNA codon and then carries the corresponding amino acid into the ribosome. Such tRNA molecules, carrying an amino acid, are called aminoacyl tRNA.
The ribosome itself possesses three spaces:
- the A-site, where mRNA enters;
- the P-site, where the current peptide element Is added to the growing chain;
- the E-site, from which the peptide chain of the protein or enzyme exits the ribosome.
Initially, the ribosome subunits are floating independently in the cytoplasm or attached to the RER.
To begin translation, the small ribosome subunit binds to the first ribosomal binding site on mRNA. Then the tRNA carrying methionine, the amino acid indicated by the mRNA START codon, binds its anticodon to the mRNA binding site. The large ribosome subunit binds to the small, so the ribosome is now complete with the first tRNA in the P-site and the second codon of the mRNA in the A-site. Next, tRNA for the codon in the A-site is carried in, so the first two amino acids are now in the P and A sites. The ribosome then catalyzes the formation of a peptide bond between these two amino acids. The ribosome and the mRNA move relative to each other so the first amino acid enters the E-site, the second enters the P-site, and a new one, the third, enters the A-site. It continues like that until a STOP codon enters the A-site and causes release of the growing peptide chain.
Once part of a strand of mRNA has left one ribosome, it can enter another. One strand may be in 3 to 10 ribosomes at once, in a different step of translation in each one. Such clusters of ribosomes translating the same mRNA strand are called polyribosomes.
Regulation of gene expression
Every cell in an organism has the same complete genome in its nucleus and so has access to all the same protein “recipes”. But heart cells should not produce proteins used only by the liver and no cell should produce proteins in quantities beyond what it can use. Controlling which proteins to express and when is called regulation. Note that this is one more instance of communication in the body, telling genetic machinery when and what to express.
Regulation of prokaryotic cells
Regulation in prokaryotic cells is relatively simple, as there is no nucleus. so transcription and translation take place almost in the same place and at the same time. Regulation in prokaryotic cells, though, almost always concerns transcription.
An example from a prokaryotic cell will show how this works – and introduce some new terminology.
The bacterium E. Coli normally uses glucose for energy. But if glucose is absent and lactose is present, it can use the latter sugar. The proteins necessary for the use of lactose are controlled by a sequence of genes called the lac operon. The operon includes not only the necessary genes, but, at the beginning, a promoter which indicates the beginning of the operon and is the site where RNA polymerase binds to begin the transcription. In between the promoter and the set of genes, of which there may be any number, is a sequence called the operator, which is where DNA-binding genes bind to regulate transcription.4)Look out for the terminology: Sean B. Carroll refers to the operator as a genetic switch, a term we will meet with in the disussion of regulation in eukaryotes.
When no lactose is present, a protein called the lac repressor is bound to the operator and the state of the lac operon is “off”. (See following figure.) The gene for the lac repressor is a constitutive gene: It is always expressed because it is the recipe for an essential protein. On the other hand, a regulated gene is expressed selectively. The lac repressor has a second, allosteric, binding site. When lactose is present, an isomer of lactose binds to the allosteric site of the repressor, which causes it to change its form and unbind from the operator. The lac operon is now in the “on” state. This form of regulation is called induction: Lactose is said to be the inducer of the lac operon and acts through the allosteric site of the lac repressor. Transcription now occurs, but slowly, because some glucose still may be present. But are the proteins mapped by the lac-digesting genes needed? In other words, is glucose very lacking?
The answer to that question and the second part of this process depends on the presence of glucose and is regulated by a second DNA-binding protein, CAP (catabolite activator protein). CAP is also an allosteric protein with one DNA-binding site and one allosteric site which binds to cyclic AMP (cAMP). CAP is only active when it is bound to cAMP. You guessed it, cAMP levels are high when glucose levels are low.5)If we go one step back, we see that glucose binds to an allosteric site on the enzyme adenylate cyclase, which makes cAMP from ATP, and disables it. So glucose binds to adenylate cyclase, which makes cAMP, which binds to CAP, which binds to the promoter to promote transcription of the genes. “Head bone’s connected to the neck bone…” In that case, cAMP-CAP binds to the promoter and enhances transcription of the genes. So lactose can be considered the “on-off” switch for transcription of genes for lactose-digestion and cAMP-CAP, the “volume control”.
Regulation of eukaryotic cells
In eukaryotic cells, transcription occurs inside the nucleus and translation outside, so mRNA has to be shuttled across the nuclear membrane in between the two processes. Regulation in eukaryotic cells therefore may take place inside or outside the nucleus or at any step in the expression pathway, including control of access to the gene in the DNA, control of transcription, pre-mRNA processing, mRNA lifetime, translation and even final-protein modification.
Inside the nucleus, histones, around which chromatin is wound to make nucleosomes, can wind or unwind to change spacing of the nucleosomes and thereby allow or deny access to genes. This process is called epigenetic regulation.6)Look out, epigenetic regulation is used for somewhat different notions, too, and they are not all necessarily so. Since histones are positively charged and DNA, negatively, modifying the charge by adding chemical “tags” to either modifies the configuration of the DNA.
Transcription in eukaryotic cells is controlled by transcription factors (TFs), regulatory proteins which may bind to a number of different sites on the DNA to either promote or inhibit transcription. RNA polymerase can only bind to a transcription-factor complex in eukaryotic cells, so transaction factors are essential for transcription. Such sites exist near the promoter as well as at points far away from them. The promoter is recognized by its beginning, the TATA box, which contains the seven-nucleotide sequence TATAAAA. Selective promotion or inhibition at combinations of these sites can bring about tissue-specific gene expression. General transcription factors affect any gene in all cells; regulatory transcription factors affect specific genes according to the cell. In addition, environmental changes may bring about different gene expression according to current, perhaps temporary needs.
The two types of transcription factors work together with three types of regulatory sequences to regulate transcription in eukaryotic cells.
- promoter proximal elements are, of course, near the promoter and turn transcription on;
- enhancers are far from the regulated genes and also turn the transcription on;
- silencers (or repressors) are also far away from the regulated genes and turn transcription off.
In between transcription and translation, proteins may interfere with spliceosomes to modify splicing of pre-mRNA. Thus, different intron selections can allow different mRNAs to be produced from the same pre-mRNA, a phenomenon known as alternative splicing, as already mentioned.
Yet another type of RNA, very short-stranded microRNA or miRNA, can bind with complementary mRNA before it is translated and promote repression of its translation. miRNA associates with RISC (RNA-induced silencing complex) to degrade mRNA.
Other proteins, RNA-binding proteins (RPBs), can bind with the 5′ cap or the 3′ tail of the mRNA and either increase or decrease its stability.
Phosphorylation or attachment of other chemicals to the mRNA protein initiator complex also inhibit translation. Similar bindings may take place on the protein after translation and modify its stability, lifetime or function. RNA does not hang around forever, but eventually is degraded and is no longer functional. So controlling its lifetime is another way of regulating its activity.
The development genetic toolkit – what evo devo tells us
Recall that, in prokaryotes, relatively simple operators are bound by DNA-binding transcription factors in order to activate, deactivate or induce the expression of a set of genes. As we have only begun to see, the regulation of eukaryote genes is far more complex than that. But there is more.
There are a small number9)A couple dozen or more. of families of genes which are the same or similar in various eukayarotic organisms, including all animals. These common gene families are said to constitute a genetic development toolkit, or simply genetic toolkit. Rather than expressing10)Remember, “express” means “code for”. protein components of the organism, these genes express transcription factors and signaling elements which regulate the expression of other genes having to do with the body parts or morphology of eukaryotic organisms.11)I.e., they allow the binding of RNA polymerase (in fact, RNA polymerase II) to produce mRNA which is exported from the nucleus to ribosomes for making proteins, Since the toolkit is the same (or almost) across different species, it may explain development far more simply than if all genes had to be specific to each different part of an organism.
At its simplest, two steps are involved:
- A first level of non-coding DNA displays sequences, called a signature, which is bound by transcription-factor proteins. The TF binding may cause another part of the DNA to express, for instance, certain members of a family of toolkit genes in a specific region of an organism.
- Secondly, another non-coding DNA sequence displays signatures to which proteins of the expressed toolkit domain bind in order to control expression of other genes.
Of course, the results of step 2 can enter back into another step 1 and so on.
This is like a series of switches, which is a term used to describe the non-coding sequences which regulate coding sequences elsewhere on the DNA strand.12)Switches and signatures are terms used by Sean B. Carroll. It is important to distinguish between DNA sequences, which may be non-coding switches or actual genes, and expressed proteins, which may be transcription factors, signals or somatic proteins. A number of TFs bind to a switch, turning it on or off, and the switch in turn activates or suppresses expression by genes farther down the line on the DNA strand, as in the preceding figure.
In general, there is a network of switches, with one level producing TFs which regulate the next, which then regulates the next and so on. Switches may control different development regions in different tissues and this at different stages of development.
The set of transcription factors binding to a switch may be composed of acceptors (regulation “on”) or repressors (regulation “off”) and it is the combinatorial logical result of the network of these factors which allows fine tuning of regulation. One switch may be activated by TFs activated by numerous other switches and may in turn participate in activating still other switches.
Switches are the means by which a relatively limited set of toolkit genes may be used differently in different regions, or even different animals, at different times in embryonic development – or evolution.
Some terminology is required to understand the evo-devo literature.
- Transcription factors are proteins and so are not on the DNA string, therefore not on the same molecule as the DNA which is regulated. They therefore are called trans-acting regulatory elements (TRE).13)In Latin, “cis” means “this side of” and “trans” means “the other side of”. Think of cis-Alpine (this side of the Alps) and trans-Alpine.
- Switches are on the same string of DNA as the regulated gene and are called cis-acting regulatory elements (CRE).
So one can say that TREs bind to CREs to regulate gene expression.
The first toolkit family to be understood was a set of genes for about 60 amino acids which change the morphology and function of serial body parts, as in flies. If tampered with, these genes can, for instance, make a leg grow where an antenna should be. Since the proteins change the cells they modulate into something else, they are called homeotic14)Homeosis the transformation of one organ into another. transcription factors and their genes are homeotic genes. The protein domain they express is therefore a homeotic domain and the family of genes is called a homeobox, the genes composing it being called Hox genes. Their specific action differs spatially on different body parts. Hox genes occur in clusters which are in the same order on the chromosomes as that of the body regions they regulate. They indicate the position on the developing organism, which then knows whether to develop, say, a wing or an antenna at that position.
- A homeobox is a sequence of genes which encode a protein domain which consists of transcription factors for genes. Homeoboxes are found, for instance, in bacteria, fruit flies, mice, frogs, cows and humans.
- A homeodomain is a protein domain corresponding to a family of homeobox genes.15)A domain is a conserved part of a protein sequence which can exist independently of the rest of the protein chain. Homeodomains can be seen as the building blocks of development and evolution.
- Hox genes are a subgroup of homeobox genes consisting of about. ~180 base pairs corresponding to ~60 amino acids. They occur in similar forms within genes for morphogenesis in animals, fungi and plants. The objects transcribed are regulated by a common homeobox, but correspond in their nature to the specific species.16)Some authors use the term homeobox only for Hox genes.
The following table lists just a few of the homeobox families and the organism components they regulate.
|Protein name||Penotype regulated|
|Hox||body regions (e.g., head, thorax or abdomen)|
|Sonic||organogenesis (tissue patterns)|
|Ulrabithorax (Ubx)||represses insect wing formation|
Since they were discovered in studying the embryonic development (ontogeny) of organisms, this branch of biology is called evolutionary developmental biology, or “evo devo”. The collection of all such families of genes is called the evolutionary-genetic toolbox, since it regulates similar gene expression in many organisms. There are about two dozen or so different families of homeoboxes.
In all animals, there exist similar gene sequences corresponding to protein domains which are transcription factors for that animal’s version of some phenotype. A Pax-6 gene from a mouse makes an eye form in a fruit fly – a fruit-fly eye, not a mouse eye.17)I see this nicely in an analogy with a computing program. A program to construct an organism will contain a higher-level library of homeobox routines for making, say, eyes or legs. Another library will contain the specific routines for that organism. A mouse organism will pass control at a specific place to a Pax-6 gene which will pass control to the appropriate eye routine.
Since all animals possess homeobox genes, their common ancestors must have, too. Therefore, toolkit genes must must have been in existence for at least 500 My, which means they enter into the explanation not only of embryonic development, but of evolution.
Stem cells are those which may split and form any kind of cell.18)Usually to make an identical stem cell and another cell, which may or may not be a stem cell. But once a specific type of cell is made, it can only do certain things. This is because it no longer has access to the entire recipe book (genome), but only those recipes which it needs. The cell is then said to be differentiated and the process for making it is differential gene expression. Through gene regulation, they only have access to the genes they need to fulfill their particular function. Such regulation or differentiation depends on the cell’s environment. We have seen an example where the presence of lactose induces the expression of the lac operon.
Gene regulation can fill a book. And cell differentiation can fill another.
Continue with cell division and the cell cycle.
Notes [ + ]
|1.||↑||In fact, there are several forms of DNA polymerase, but that complexity is well beyond the scope of this document.|
|2.||↑||Transcription is started only if it is allowed by gene regulation, which we will consider shortly.|
|3.||↑||Are you wondering how the snurps can recognize the introns an exons? So am I. All I can say is that it is quite complicated and has something to do with methylation of the DNA strands. It is currently not understood why there are introns at all.|
|4.||↑||Look out for the terminology: Sean B. Carroll refers to the operator as a genetic switch, a term we will meet with in the disussion of regulation in eukaryotes.|
|5.||↑||If we go one step back, we see that glucose binds to an allosteric site on the enzyme adenylate cyclase, which makes cAMP from ATP, and disables it. So glucose binds to adenylate cyclase, which makes cAMP, which binds to CAP, which binds to the promoter to promote transcription of the genes. “Head bone’s connected to the neck bone…”|
|6.||↑||Look out, epigenetic regulation is used for somewhat different notions, too, and they are not all necessarily so.|
|7.||↑||Author’s own work|
|8.||↑||The bobby-pin curl is an idealization;|
|9.||↑||A couple dozen or more.|
|10.||↑||Remember, “express” means “code for”.|
|11.||↑||I.e., they allow the binding of RNA polymerase (in fact, RNA polymerase II) to produce mRNA which is exported from the nucleus to ribosomes for making proteins,|
|12.||↑||Switches and signatures are terms used by Sean B. Carroll.|
|13.||↑||In Latin, “cis” means “this side of” and “trans” means “the other side of”. Think of cis-Alpine (this side of the Alps) and trans-Alpine.|
|14.||↑||Homeosis the transformation of one organ into another.|
|15.||↑||A domain is a conserved part of a protein sequence which can exist independently of the rest of the protein chain.|
|16.||↑||Some authors use the term homeobox only for Hox genes.|
|17.||↑||I see this nicely in an analogy with a computing program. A program to construct an organism will contain a higher-level library of homeobox routines for making, say, eyes or legs. Another library will contain the specific routines for that organism. A mouse organism will pass control at a specific place to a Pax-6 gene which will pass control to the appropriate eye routine.|
|18.||↑||Usually to make an identical stem cell and another cell, which may or may not be a stem cell.|