CHAPTER TWO
GENE EXPRESSION
The transfer of the genetic information occurs at three different levels: replication, transcription
and translation. These three processes are termed the Central Dogma of Molecular Biology.
Transcription and translation are the two major processed that make up the gene expression
process. By transcription we mean the rewriting of the genetic information from the DNA to the
RNA. Translation translates the genetic information from the RNA molecule into the primary
protein structure – the sequence of amino acids in the polypeptide chain.
TRANSCRIPTION
It is the conversion of the information in DNA into a mRNA sequence. It is the first process of
gene expression. Later, the mRNA gotten is transported from the nucleus into the cytoplasm (to
the ribosome).
During transcription only a short part of one of the two DNA strands (template strand) is
transcribed.
The transcription of the genetic information from the gene in the RNA molecule is catalyzed by
the RNA polymerase enzyme. Sometimes a more complex name is used: DNA dependent RNA
polymerase.
It adds nucleotides according to the rules of complementarity. The nucleotides are connected in
the 5´- 3´ direction by a phosphodiester bond. The synthesis of RNA takes place in the 5´- 3´
direction for which a so called template DNA strand has a 3´-5´orientation. By transcription in
prokaryotes an mRNA (mature) is formed, while a so called primary transcript is formed in
eukaryotes (pre-mRNA).
The transcription unit of prokaryotes, which has one or more structural genes, contains a so called
leading sequence, located immediately after the promoter. A part of it is the Shine - Dalgarno
sequence, which is transcribed into the mRNA as: 5´-A G G A-3´. By this sequence the mRNA
binds to the 16S rRNA in the 30S subunit of the ribosome. This enables the formation of the
mRNA + ribosome complex. In prokaryotic cells the production of all forms of RNA is catalyzed
by only one type of RNA polymerase (465 kDa). The transcription process is divided into three
steps: initiation, elongation and termination.
Initiation
The beginning of transcription is called initiation. At first the RNA polymerase connects to a
promotor and after the double helix partially unwinds. A transcription bubble is formed. There
is a conformation change that converts the complex to an elongation form followed by movement
of the transcription complex away from the promoter.
Elongation
The second phase of transcription is the lengthening (“growth”) of the mRNA chain and is called
the elongation phase. This phase starts so, that NusA protein binds to the initiatory form of the
RNA polymerase, forming an elongated form of RNA polymerase. In this form RNA polymerase
“moves” on the DNA chain, denaturing it and then transcribing the template sequence onto the
RNA chain with the speed of 60 nucleotides per second.
Unlike replication, primers are not needed here. The growth of the transcript is in the 5’-3’
direction.
Termination
The end of the transcription process is called termination. It occurs in the area of the
“terminator”, where a polyA polymerase activating sequence is located. The NusA protein
recognizes this sequence and stops the movement of RNA polymerase on the DNA chain.
The synthesis of RNA in eukaryotic cells is carried out by three kinds of RNA polymerases:
RNA polymerase I, II, and III. RNA polymerase I catalyzes the transcription of genes for the
rRNA. RNA polymerase II catalyzes the transcription of structural genes with the creation of
heterogeneous nuclear RNA (hnRNA or pre – mRNA). From this molecule, after certain
modifications, a functional mRNA is formed.
The newly synthesized molecule of RNA is called the primary transcript, and must be
modified. This process is called post-transcription modification or RNA maturation.
Posttranscriptional modifications
In eukaryotes, transcription occurs in the cell’s nucleus. The RNA that is synthesized in this
process is then transferred to the cytoplasm where it is translated into proteins. In prokaryotes,
this transcript is ready for translation while in eukaryotes, it is not and so needs to be modified
befor translation. These modifications are called posttranscriptional modifications or RNA
maturation. At the end of the modifications, a mature mRNA is synthesized. Several of these
modifications are done three of which are:
5’-capping
Here the 5’ phosphate is removed by a phosphatase enzyme and the 7-methylguanosine is added
to this end.
This type of cap is called a 5’-cap. The cap protects the 5’ end of the transcript from
ribonucleases that have a specificity to the 5’-3’ phosphodiester bonds.
RNA slicing
The second step is splicing, during which RNA copy of non-coding gene sequences (introns),
are cut out from the precursor molecule. In eukaryotic cells, splicing is catalyzed by large enzyme
complexes (spliceosomes). They are able (in collaboration with small RNAs) to recognize an
intron, cut it out, and connect together the coding sequences (exons).
3’-polyadenylation
In this process, the 3’ end is cleaved and about 250 Adenine residues added to form a poly A tail.
This addition is done by PolyA polymerase using ATP as a precursor.
GENETIC CODE
The genetic code is a set of rules by which information encoded within genetic material (mRNA)
is translated into proteins by living cells. The code defines how sequences of these nucleotide
triplets called codons specify which amino acid will be added next during protein synthesis.
There are 64 types of codons, from these only 61 code amino acids. The first, start triplet, in
mRNA is always AUG, which codes for methionine. Three of these triplets don’t code for any
amino acid and have an important role during the termination of synthesis of the polypeptide
chain. Those are the so-called termination codons or stop codons. After the transcription to the
mRNA, they are: UAA, UAG and UGA. One of them is always at the end of the last exon.
Characteristics of the genetic code
The genetic code is:
Triplet – three nitrogen bases of nucleotides following one after another in the mRNA (or
in the DNA, in the coding region of the structural gene) coding for one amino acid (except
the stop triplets). This means a set of three bases specify one amino acid. This set of three
bases is called a codon or triplet. There is a maximum of 4 3=64 codons for the 20 standard
amino acids.
Universal – present in all organisms. Only within mitochondria certain deviations from
the universal genetic code were observed. This means that the genetic code is the same for
all examined plans and animals.
Not overlapping – triplets follow one after the other in the DNA (in a linear setting)
without interruption;
Degenerate – there are too many triplet possibilities (61) for the coding of (21) amino
acids. This means that more than one codon can specify the same amino acid
(redundancy). But neither of the codons that specify the same amino acid can specify
another amino acid (no ambiguity). In reality it is a safety measure, since the amino acids
that are used the most often also have the highest variation of triplets (4-6), which they
can be coded by. It is therefore a protection against the rise of mutations, since the change
of one base in the triplet doesn’t necessarily mean the change of the entire amino acid
inside the protein.
NB: For each codon, the nucleotide in the third position can be changed without altering the
codon specificity. Eg. UUU and UUC code for Phenylalanine and CUU, CUC, CUA and CUG
code for leucine. This is called the wobble hypothesis. Codons that designate the same amino
acid are called synonyms.