1.1 Basic Structure of Amino Acids
1.1 Basic Structure of Amino Acids
Amino Acids
Abstract Amino acids are carboxylic acids with an amino-group in the ’-position.
With the exception of glycine, all amino acids are chiral; usually the L-form is
used in living organisms. Twenty-two different amino acids are encoded in genes.
Polycondensation of amino acids leads to peptides and proteins. The different side-
chains of the various amino acids have different physicochemical properties and
allow these amino acids to fulfil different functions inside a protein.
Amino acids contain a carboxy group, an amino group, a hydrogen atom, and a
variable side-chain R (“residue”). The simplest amino acid is glycine, where R is
a hydrogen atom. Because the ’-carbon of Gly carries only three different ligands
it is not enantiomeric. Thus glycine is not chiral, unlike all other amino acids (see
Fig. 1.1). Only L-amino acids are found in proteins. However, D-amino acids are
found in the bacterial cell wall and in several antibiotics. In humans, D-Ser is
produced by astrocytes to regulate NMDA-receptor responses to Glu and long-term
potentiation.
The carboxy group has a pKa close to 2 whereas the amino group pKa ranges
from 9 to 10. Thus, amino acids can exist in different protonation states:
R
O OH O O
C C
+
H 2N C H H3N C H
R R H N
CO
NH2 ,
ε δ γ β α O
H2 N C C C C C C
H2 H2 H2 H2 H OH
Fig. 1.1 Top left: Basic structure of an amino acid. Amino acids can form zwitterions (Zwitter
(Ger.) = hermaphrodite) with a positive and a negative charge. Top right: Because the ’-carbon
bears four different substituents, it is chiral (exception: glycine where R D H). In L-amino acids if
the ’-carbon is placed on the paper plane, with the hydrogen facing you, the remaining substituents
read “CORN”. Bottom: Nomenclature of carbon atoms, using lysine as example. The Carboxy-
carbon is designated C’, the following carbon atoms are labelled with the letters of the Greek
alphabet. Sometimes the last C-atom is called ¨, irrespective of the chain length
You have to be able to recognise the 22 amino acids (see Fig. 1.2). The amino
acid side-chains can engage in a number of noncovalent interactions and covalent
bonds (Table 1.1):
From your chemistry lessons you know how to determine the pKa of an acid or base,
the pH at which half of the molecules are charged. A compound like an amino acid,
which can act as both acid and base, has another important property: the isoelectric
point pI, which is the pH at which the number of positive charges on the molecule
is the same as the number of negative charges. At the pI the molecule has no net
charge; the molecule’s ability to interact with water is lowest, and therefore its
solubility.
1.2 The Isoelectric Point 5
O O
C
+
H 3N CH
H
Glycine (Gly, G)
O O O O O O O O
C C C C
+ +
H 3N
+
CH H 3N
+
CH H3N CH H3N CH
C H C OH C SH C Se
H2 H2 H2 H2
O O O O
O O O O O O C C
C C C +
+ + H3N CH
+
H3N
+
CH H 3N CH H3N CH
H3N CH
CH2 CH2
HC CH3 HC OH CH2
C C S CH3
CH3 CH3 C H2
O O H2N O
Valine (Val, V) Threonine (Thr, T) Aspartic acid (Asp, D) Asparagine, (Asn, N) Methionine (Met, M)
O O O O O O O O O
O C
C C C C
+ +
H3N
+
H3N CH H 3N
+
CH H 3N CH H 3N
+
CH CH
CH2 HC CH3 CH2 CH2 CH2
HC CH3 CH2 CH2 CH2 CH2
CH3 CH3 C C CH2
O O O NH2
NH
C
Leucine (Leu, L) Isoleucine (Ile, I) Glutamic acid (Glu, E) Glutamine (Gln, Q) +
H2N N H2
Arginine (Arg, R)
O O O O
C C
+ +
H3N CH H3N CH
CH2 CH2
CH2 CH2
CH2 CH2
CH2 CH2
+
H3N NH
Lysine (Lys, K) C O
N CH3
O O O O O O
C C C
+ + + Pyrrolysine (Pyl, O)
H3N CH H 3N CH H 3N CH
CH2 CH2 CH2
NH
O
Phenylalanine (Phe, F) Tyrosine (Tyr, Y) Tryptophan (Trp, W)
O O O O
C C
H
+ C H3N
+
CH
H2N
CH2
NH
+
Proline (Pro, P) N
H Histidine (His, H)
Fig. 1.2 The 22 amino acids encoded by genes. Once incorporated into proteins, amino acids may
be further modified. Pyrrolysine has been found only in bacteria; it is encoded by the “amber” stop
codon UAG. Selenocysteine is encoded by the UGA “opal” stop-codon (see Fig. D.1 on page 493).
Acidic groups are marked red, basic groups blue, polar groups orange, and hydrophobic groups
green. Note that Thr and Ile have a chiral “- in addition to the ’-carbon. Pyl has two chiral carbon
atoms in the ring
6 1 Amino Acids
Table 1.1 Bonds formed by various amino acids. For posttranslational modifica-
tions the following abbreviations were used: A = acetyl, F = fatty acid, I = isoprenoid,
M = methyl, N = nitrosyl, OH = hydroxyl, P = phosphate, S = sugar, SS = disulphide,
U = ubiquitin (-like), Y = AMPylation
Hydrophobic Hydrogen Covalent
Amino Acid interaction bond Salt bond modification
Alanine (Ala, A) + – – –
Arginine (Arg, R) + +++ +++ –
Asparagine (Asn, N) – +++ – S
Aspartate (Asp, D) – ++ +++ (P), M
Cysteine (Cys, C) + (+) + SS, M, F, I, N
Glutamine (Gln, Q) + ++ – –
Glutamate (Glu, E) + ++ +++ (P), M
Glycine (Gly, G) (+) – – F
Histidine (His, H) – +++ + (P)
Lysine (Lys, K) ++ +++ – OH, A, M, U
Leucine (Leu, L) +++ – – –
Isoleucine (Ile, I) +++ – – –
Methionine (Met, M) ++ – – –
Phenylalanine (Phe, F) +++ – – –
Proline (Pro, P) ++ – – OH
Pyrrolysine (Pyl, O) +++ + – –
Selenocysteine (Sec, U) + ++ + –
Serine (Ser, S) + + – P, S, F
Threonine (Thr, T) ++ + – P, S, Y
Tryptophan (Trp, W) ++ + – –
Tyrosine (Tyr, T) ++ + – P, Y
Valine (Val, V) ++ – – OH
How can we get the isoelectric point? Looking at the titration curve of glycine
(see Fig. 1.3, top) we see that below pK1 most of the molecules bear one positive
charge at the amino group; the carboxy group is uncharged.
Above pK2 it is the other way round: the carboxy group bears a (negative) charge
and the amino group is uncharged. Right in the middle between pK1 and pK2 there
is an inflection point in the titration curve; this is the pI.
Thus we remember: In a chemical that has one acidic and one basic group, pI is
the average of the two pK values:
1
pI D
.pK1 C pK2 / (1.1)
2
But how do we do it when there are 3 or more ionisable groups?
1.2 The Isoelectric Point 7
pH
6
calculated as the average
between the pKa -values on 4
each side of that uncharged
2
form
0
0 0.5 1 1.5 2
OH-equivalents (mol/mol)
pI = 3.22
_ _ _
HO O O O O O O O
C C C C
+ + +
H3N CH pK1 = 2.19 H3N CH pKR = 4.25 H3N CH pK2 = 9.67 H2N CH
CH2 CH2 CH2 CH2
CH2 CH2 CH2 CH2
C C C C
HO O HO O _O O _O O
Glutamate
12
10
8
pH
pI = 7.59
_ _ _
HO O O O O O OO
C C C C
+ + +
H3N CH pK1 = 1.82 H3N CH pKR = 6.00 H3N CH pK2 = 9.17 H N CH
2
Histidine
12
10
8
pH
0
0 0.5 1 1.5 2 2.5 3
OH-equivalents (mol/mol)
8 1 Amino Acids
First we write down the various forms of the molecule, with the corresponding
pKa values between them. Then we count the positive and negative charges on each
form and identify the electrically neutral one. The pI can be calculated as the average
between the pKa s on either side of it, just as it is done for molecules with only 2
ionisable groups.
Glutamic acid, for example, (see Fig. 1.3, middle) at very low pH carries only
one positive charge at the amino group (1st form). As the pH increases beyond
2:19, more and more protons are lost from the carboxy group on C’. This (2nd)
form carries one positive and one negative charge, and is the neutral one. Around
pH 4:25, a second proton is lost from the terminal carboxy group (3rd form, one
positive, two negative charges) and around pH 9:67 a further proton is lost from the
amino group (4th form, no positive and 2 negative charges). Thus the pI is calculated
as the average of 2:19 and 4:25, which is 3:22.
The rational with histidine (see Fig. 1.3, bottom) is similar. The electrically
neutral form is the 3rd, and the pI is the average of 6:00 and 9:17, which is 7:59.
The solubility of amino acids is lowest at the pI, as interaction with water is
reduced.
with pK and pKC the pKa -values of positive and negative groups, respect-
ively.
(continued)
1.3 The One-Letter Code 9
In most cases amino acid names are abbreviated with the first three letters of their
names. These abbreviations are easy to remember, however, they use up unnecessary
memory in computer databases. The 22 proteinogenic amino acids can be encoded
by the 26 letters of the Roman alphabet (leaving space for some rare amino acids),
and then each amino acid in a protein sequence uses up only 1 rather than 3 bytes of
storage space. Unfortunately, several amino acids start with the same letter (such as
Ala, Arg, Asp, and Asn), thus we cannot simply use the first letter to encode them.
The following list should help you to remember one-letter codes:
• Amino acids with a unique first letter: Cys, His, Ile, Met, Ser, Val
• Where several amino acids start with the same letter, common amino acids are
given preference: Ala, Gly, Leu, Pro, Thr
• Letters other than the firstst letter are used for Asn (asparagiN), Arg (aRginine),
Tyr (tYrosine)
• Similar sounding names: Asp (asparDic acid), Glu (glutEmate), Gln (Qtamine),
Phe (Fenylalanine)
• The remaining amino acids have letters that do not occur in their name: Lys (K
close to L), Trp (W reminds of double ring), Sec (U), Pyl (O)
• X is used as placeholder, meaning “any amino acid”. B is used for “Asp or Asn”,
Z for “Gln or Glu”, J for “Ile or Leu”. The - is used to denote gaps in a protein
sequence, e.g., in sequence alignments. h is used to denote hydrophobic amino
acids (do not confuse with H for His!)
In the medical literature these codes appear mostly to denote mutations: A123Q
means that the Ala in position 123 is replaced by Gln. Certain sequences of amino
acids occur in several different proteins, where they serve a special function. Such
conserved motives are usually named after their 1-letter amino acid abbreviations.
Thus you may encounter KDEL-motives or DEATH-ATPases.
10 1 Amino Acids
You may now ask why there are so many different amino acids. The answer is
that these different molecules have different properties, that let them serve different
functions inside proteins (see also Table 1.2).
One difference you have already learned about: there are amino acids whose side-
chains can bear positive or negative charges, whereas other side-chains are always
uncharged. Charged side-chains have different pKr , which can be can be influenced
strongly by neighbouring amino acids, for example, Cys (pKr = 5–10), His (pKr =
4–10 and the carboxylic acid group of Glu and Asp (pKr = 4–7). This is important
for proton transfer reactions in the catalytic centre of proteins (acid/base catalysis;
see Sect. 5.5 on page 131). Ionisable groups also form the ionic bonds (salt bridges)
which stabilise protein tertiary structure (see page 32).
Asp, Glu, and His residues can chelate bivalent metal ions including Fe; Zn and
Ca. This is important for enzymes with metal cofactors, in hæmoglobin and in some
regulatory proteins such as calmodulin.
Some amino acids are hydrophilic (= water friendly) because they carry
ionised or polar groups (COOH; NH2 ; OH; SH). Other amino acids are
hydrophobic (= water fearing, fat friendly), with long aliphatic (Ile, Leu, Val), or
aromatic (Phe, Trp) side-chains. If these residues point into the solution, they force
water molecules into a local structure of higher order (i.e., lower entropy), which
is unfavourable. Burying these residues in the interior of the protein avoids this
penalty; this is the molecular basis for hydrophobic interactions.
Some amino acids have small side-chains (such as glycine), others very big,
bulky ones (such as tryptophan). The small hydrogen residue of Gly not only fits
into tight spaces (see section on collagen (page 324) for an important example), but
because it has no “-carbon it can assume secondary structures (see Sect. 2.2 on page
20) that are forbidden for all other amino acids.
Proline has its nitrogen in a ring structure, which makes the molecule very stiff,
limiting the flexibility of protein chains.
The SH-group of Cys, the unprotonated His and the OH-group of Ser and Thr
are nucleophiles which are essential residues in the active centre of many enzymes.
Some amino acids confer properties to the protein which can be used in the
laboratory: Met binds certain heavy metals which are used in X-ray structure
determination and reacts with cyanogen bromide (BrCN) leading to protein
cleavage. Cys and Lys are easily labelled with reactive probes. Aromatic amino
acids, in particular Trp, absorb UV-light at 280 nm; this can be used to measure
protein concentration. In addition they show fluorescence, which can be used to
measure distances, and their variation during conformational changes, in proteins.
Table 1.2 Properties of the 21 amino acids encoded in a mammalian genome. Posttranslational modification may change these properties considerably.
The helix propensity measures the energy by which an amino acid destabilises a poly-alanine helix
Amino acid 3-letter 1-letter Mr pK1 pK2 pK3 pI Hydropathy Helix propensity Surface Volume Abundance
(Da) (COOH) (NHC3 ) (R) (kJ/mol) (Å2 ) (Å3 ) (%)
Alanine Ala A 89 2.34 9.69 – 6.01 C1:8 0.00 115 67 9.0
Arginine Arg R 174 2.17 9.04 12.48 10.76 4:5 0.21 225 167 4.7
Asparagine Asn N 132 2.02 8.08 – 5.41 3:5 0.65 160 148 4.4
Aspartic acid Asp D 133 1.88 9.60 3.65 2.77 3:5 0.43 150 67 5.5
Cysteine Cys C 121 1.96 8.18 10.28 5.07 C2:5 0.68 135 86 2.8
Glutamic acid Glu E 147 2.19 9.67 4.25 3.22 3:5 0.39 180 114 6.2
Glutamine Gln Q 146 2.17 9.13 – 5.65 3:5 0.16 190 109 3.9
Glycine Gly G 75 2.34 9.60 – 5.97 0:4 1.00 75 48 7.7
1.4 Biological Function of Amino Acid Variety
Histidine His H 155 1.82 9.17 6.00 7.59 3:2 0.56 195 118 2.1
Isoleucine Ile I 131 2.36 9.68 – 6.02 C4:5 0.41 175 124 4.6
Leucine Leu L 131 2.36 9.60 – 5.98 C3:8 0.21 170 124 7.5
Lysine Lys K 146 2.18 8.95 10.53 9.74 3:9 0.26 200 135 7.0
Methionine Met M 149 2.28 9.21 – 5.74 C1:9 0.24 185 124 1.7
Phenylalanine Phe F 165 1.83 9.13 – 5.48 C2:8 0.54 210 135 3.5
Proline Pro P 115 1.99 10.96 – 6.48 1:6 3.16 145 90 4.6
Selenocysteine Sec U 168 2.16 9.40 5.20 3.68 rare
Serine Ser S 105 2.21 9.15 13.60 5.68 0:8 0.50 115 73 7.1
Threonine Thr T 119 2.11 9.62 13.60 5.87 0:7 0.66 140 93 6.0
Tryptophan Trp W 204 2.38 9.39 – 5.89 0:9 0.53 255 163 1.1
Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 1:3 0.49 230 141 3.5
Valine Val V 117 2.32 9.62 – 5.97 C4:2 0.61 155 105 6.9
11
12 1 Amino Acids
1.5 Exercises
1.5.1 Problems
1.5.2 Solutions
1.1 The isoelectric point of a compound is the pH, at which it has an equal number
of positive and negative charges.
1.2 Trp is aromatic and hydrophobic, Lys positively charged (-amino group), Gly
small (only hydrogen as side-chain), Ser is polar (OH-group). Gln does not fit into
any of these categories.
1.3 Below pH 2.2 Lys has 2 positive charges from the fully protonated ’- and
–-amino groups, and no negative charge because the carboxy group will be mostly
Reference 13
protonated as well. Above pH 2.2, the carboxy-group will lose its proton, resulting
in 1 negative and 2 positive charges, and a net charge of +1. Beyond pH 8.95, the ’-
amino-group will lose its proton and there will be one negative and positive charge,
resulting in a net charge of ˙0. Beyond pH 10.53, the proton on the –-amino group
will be lost as well, resulting in one negative and no positive charges. Thus the pI is
calculated as 1=2 .8:95 C 10:53/ D 9:74.
1.4
A Thr also has the OH-group, only has an additional C. It would likely work just
as well as Ser, the experiment would not answer the question.
B Ala is Ser minus the OH group which is the catalytically active part of Ser.
Thus Ala would be ideal to test the hypothesis.
C Trp is the bulkiest of all amino acids. If the enzyme were no longer active after
the replacement, you would not know whether this was because of the lack of the
catalytically active OH-group or because of disruption of the 3D-structure of the
enzyme.
D Glu is acidic. If you use it to replace a polar but uncharged amino acid the 3D
structure of the enzyme would likely be perturbed by salt-bridge formation. If
the enzyme didn’t work afterwards, you’d not know whether this was because of
changed 3D structure or because of the missing OH-group.
E His is basic. If you use it to replace a polar but uncharged amino acid the 3D
structure of the enzyme would likely be perturbed by salt-bridge formation. If
the enzyme didn’t work afterwards, you’d not know whether this was because of
changed 3D structure or because of the missing OH-group.
1.5
A) Phe-Ala-Val All three amino acids are hydrophobic, this peptide would be
only very sparingly soluble in both water and base.
B) Glu-Gly-Asp Glu + Asp are acidic residues and give additional charges at
alkaline pH.
C) Gln-Gly-Asn Gln and Asn have acid amide functional groups, which are
somewhat polar but do not become charged at high pH. Gly is hydrophobic.
D) Lys-Arg-His All three amino acids are basic, at high pH they will be
uncharged. This peptide would be very soluble in acid, but not in base.
E) Trp-Lys-Asn A bulky hydrophobic, a basic and a somewhat polar amino
acid would give low solubility in water and base, but somewhat better solubility
in acid.
Reference
1. G.R. Grimsley, J.M. Scholtz, C.N. Pace, A summary of the measured pK values of the ionizable
groups in folded proteins. Protein Sci. 18, 247–251 (2009). doi: 10.1002/pro.19
Chapter 2
Protein Structure
Abstract Peptides and proteins are made by condensation of amino acids, forming
peptide bonds. The sequence of amino acids in a protein is called its primary
structure. Secondary structure is determined by the dihedral angles ; of the
peptide bonds, the tertiary structure by the folding of protein chains in space.
Association of folded polypeptide molecules to complex functional proteins results
in quaternary structure. Proteins can be further modified by posttranslational
addition of small molecules.
A peptide bond is formed by the condensation of two amino acids under elimination
of water (see Fig. 2.1). Addition of further amino acids to the chain leads to
tripeptides, tetrapeptides, and so on. Chains of up to 20 amino acids are called
oligopeptides (oligo D few), and longer ones polypeptides (poly D many). Proteins
are polypeptides with a biological function.
Polypeptides range in size from a few amino acids to thousands; Fig. 2.2 shows
aspartam, a dipeptide. Proteins consist either of a single polypeptide chain, or
they are formed from separate polypeptide chains called subunits. Some proteins
contain other covalently bound components, prosthetic groups, and posttranslational
modifications (see below).
The sequence of amino acids in a protein is called its primary structure. In
biochemistry, this is always given starting with the N-terminal and ending with the
C-terminal amino acid, because this is the order in which amino acids are added
during protein synthesis in the cell (this process is discussed in detail in textbooks
of molecular cell biology, e.g., [1, 21]).
Carboxy-terminal O
end +
O
O H3N C
O C O O C
C R´ H 2O
O C O R" C R"
C
+ R´ C HN
H 3N
NH C O
+ O C R´ C
C R NH
O +
H3 N H2O O C
C R
O C Amino-terminal C R
+
end +
H3N H3N
Polypeptide Oligopeptide
(> 20) (< 20)
Fig. 2.1 Polycondensation of amino acids to peptides and proteins. Polycondensation is a reaction
where organic molecules react with each other via their functional groups, producing small
molecules (here: water) in addition to a macromolecule
aspartyl-phenylalanine-1-methyl ester
(Aspartam®)
C˛ , C0 , the nitrogen and the oxygen atom of the peptide bond form a single
plane. The bond between C0 and N is somewhat shorter than a normal CN single
bond, because of mesomery with the CDO double bond (see Fig. 2.3). Since the
lone electron pair of N enters into the partial double bond it can no longer accept a
proton; the N in a peptide bond is not basic.
Thus the peptide bond has a “partial ( 40 %) double bond character” (see
Fig. 2.4). Like the CDC bond, it is planar and cannot rotate. The H and O of
the peptide bond are in the trans-configuration. Formally, we express the same
2.1 Primary Structure 17
Fig. 2.3 The geometry of the peptide bond. Left: The bonds between AB and BC define a
plane, as do the bonds between BC and CD. The angle between these planes is called the
dihedral angle of the BC bond. For the standard way to determine this angle orient that bond
into the paper plane, so that the neighbouring atoms (here A and D) point upwards. Then measure
the angle formed: clockwise is positive, anticlockwise negative. Right: Because of mesomery, the
dihedral angle of the bond between the carboxy-carbon (C0 ) and the nitrogen (!) is fixed to 180°,
with N; H; C and O lying in a single plane. Slight deviations are possible, but rare. The bond angle
of C˛ () is 110:8° ˙ 2:5° for 86 299 residues investigated in a recent study [41]. Variable are the
dihedral angles and , which determine the secondary structure of a protein (see next section)
idea by saying that the dihedral angle ! of the peptide bond is fixed to 180° (see
Fig. 2.4). Because of the bulky R-groups, the trans-configuration is more stable with
most amino acids (99:7 % probability). The exception is Pro, which occurs in cis-
configuration much more frequently than other amino acids (5:8 % probability, see
Fig. 2.5).
On the other hand, the NC˛ and C˛ C0 bonds are normal single bonds; rotation
around those is possible. The angles of rotation are named and , respectively.
Rotation in the peptide chain is limited by two factors. First, at certain angles and
around one amino acid an atom of that amino acid would collide with an atom
of the following amino acid (see Fig. 2.6). The angles ; which result in a clash
between C0n DO and NnC1 H are defined as 0°, 0°. Additionally, size and charge of
the R-groups can make certain positions more stable than others.
Thus in a plot of versus (RAMACHANDRAN-plot [18, 27], see Fig. 2.9 on
page 23) there are regions that are sterically forbidden, there are fully allowed
regions with no steric hindrance, and there are unfavourable regions which can be
assumed by slight bending of bonds.
18 2 Protein Structure
C-terminus
N-terminus N-terminus
Cα Cδ
Cβ Cγ
α−1 α−1
C N C N
C'-1 Cγ C'-1 Cβ
Cδ Cα
O O
C-terminus
cis trans
Fig. 2.4 cis-trans-isomery around the peptide bond. Because the C0-1 N-bond has the character
of a partial double bond, rotation around this bond cannot occur and cis-trans isomery results. For
steric reasons the trans-configuration is much more probable than the cis-. Pro is unusual in that
the cis-configuration has a probability of 5–6 %, which is about 100 times higher than with other
amino acids
Proline is again a special case because the peptide nitrogen is part of a ring
structure; this limits to values between 35° and 85°. As I will show in a
moment, this has considerable consequences for protein secondary structure.
Because glycine has only a hydrogen as R-group, steric hindrance is much less a
problem than with other amino acids. Thus in a RAMACHANDRAN-plot Gly can be
found in regions forbidden for other amino acids (Fig. 2.9).
The great variety of proteins that can be observed today has arisen from a much
smaller number of ancestors during evolution. This can be shown by comparing
the primary structure of proteins; the more similar they are, the closer they are
related [10].
2.1 Primary Structure 19
10
Helix (n = 2604)
Sheet (n = 2846)
Other (n = 14,351)
relative frequency (%)
0,1
0,01
Fig. 2.5 Distribution of dihedral angle ! of Pro in 1453 “non-redundant” proteins whose structure
is known with a resolution 1:5 Å. Of the 431 146 amino acid residues 19 801 were proline. In ’-
helices, all of the 2604 Pro occurred in trans (! 180°). In “-strands, 73 out of 2846 (2:6 %) Pro
occurred in cis (! 0°), in “other” secondary structures 1071 out of 14 351 Pro (7:5 %) were in
cis. Almost all other Pro residues occurred in trans, the number of Pro residues with intermediate
! was 0, 1, and 11, respectively
Because each position in the primary structure can be occupied by any of the
20 common amino acids (and occasionally also by Sec and Pyl), the possible
number of combinations is huge. For example, a protein with 100 amino acids has
20100 D 1:3 10130 possible sequences. Given that our universe is about
13:7 109 a 4:32 1017 s old, creationists have argued that proteins cannot
have been created by a process of random mutation and selection. This argument is
fallacious, however, because it makes the (unspoken!) assumption that the function
of a protein can only be met by one particular amino acid sequence. The existence
of isoenzymes – proteins with different structure but the same function – proves
this assumption wrong (see Fig. 2.8). Interestingly, the study of protein- and DNA-
sequences [10] has confirmed, and in many cases given us additional details on, the
tree of life first proposed by C. DARWIN [9].
20 2 Protein Structure
Fig. 2.6 Some ; -combinations lead to collisions between atoms of neighbouring amino acids.
Such angles are forbidden; the dihedral angles ; which result in a clash between C0n DO and
NnC1 H are defined as 0°, 0°. The green balls represent the R-groups. In addition to these next-
neighbour effects, some angles also lead to collisions between amino acids further apart. This
is a stereogram: if you look at the images cross-eyed, you will see three figures, the middle of
which is three-dimensional (this takes some practice and is not a required skill for a physician).
Cheap lorgnette-style stereo viewers are available on the Internet to help with this. They are built
around two prisms which ensure that each eye sees only the image intended for it (see http://www.
shortcourses.com/stereo/stereo1-7.html for examples)
The secondary structure of a protein is any regular, repetitive folding pattern in the
molecule. It is stabilised by hydrogen bonds (see Fig. 2.13) between the amino- and
keto-groups of the peptide bonds, which carry a partial positive and negative charge,
respectively (see Fig. 2.3b). Although each hydrogen bond has only a relatively
small bond energy ( 5 kJ/mol), the sum of the bond energies over all hydrogen
bonds in a protein is considerable.
The following structural motives are particularly common [19, 28].
2.2 Secondary Structure 21
Fig. 2.7 Interpretation of structure diagrams of proteins, using the pentapeptide HTCPP. (a)
Space-filled diagrams show the true (VAN DER WAALS-) extension of a molecule, but even in short
peptides clarity is lost. (b) A wire diagram is clearer. The centres of the atoms are connected by
thin lines; hydrogens are not drawn. Wire diagrams look a little like structural formulas in organic
chemistry, however, atoms are shown in their true three-dimensional arrangement. (c) For larger
proteins even wire diagrams would be too cluttered. Thus the atoms forming the protein backbone
are connected by a thick line, which is used to represent the amino acid chain. (d) If the protein
contains disulphide bonds, showing only the backbone trace leaves the disulphide bond dangling
in free space, thus (e) this bond is often shown (incorrectly, but easier to interpret) connecting
the backbone traces. (f) A further abstraction is achieved by showing elements of secondary
structure instead of the backbone trace: alpha helices as red helices and “-strands as yellow arrows,
connected by the backbone trace of coils and turns (in grey). Other colouring schemes used in this
book are N-terminal (red) to C-terminal (purple), different colours for different chains or “shapely
colours” (a quasi-standard in molecular modelling) for different amino acids. Shown here is “-
lactamase (PDB-code 1m40, a protein that confers penicillin resistance to bacteria)
22 2 Protein Structure
Fig. 2.8 Subtilisin (PDB-code 3VYV, left) and Chymotrypsin (PDB-code 1oxg, right) are both
Ser-proteases that use the classical catalytic triad (Ser, His, Asp, shown as wire diagram) in the
catalytic centre to cleave proteins. These amino acids are far apart in the sequence, but close to
each other in the folded protein. Both proteins, however, have completely different sequences and
secondary structures. This is an example of convergent evolution (“re-inventing the wheel”). The
proteins are called iso-enzymes (iso = the same)
Function of ’-helices:
• An ’-helix of 22 amino acids is long enough to span a double membrane. The
part of the helix that is inside the membrane consists of hydrophobic amino
acids that can interact with the lipid tails of the membrane. Hydrophilic amino
acids on both ends interact with the cytosol and the interstitial fluid, respectively.
The cytosolic end has positively charged amino acids and the extracellular end
more negatively charged ones, because the potential of a cell is negative inside
(70 mV). Thus the correct orientation of the protein is ensured by the electrical
field. At the interface between the membrane and the aqueous environment one
finds predominantly aromatic amino acids and Lys.
• Amphipatic ’-helices at the N-terminus of a protein serve as recognition sites for
the import into mitochondria. Every 4th or 5th amino acid is positively charged,
so that all positive charges are in the same quadrant of the helix (see Fig. 2.10
and Fig. 16.10 on page 375).
• Two ’-helices wound around each other form a coiled coil. Keratin consists of
such coiled-coils (see Fig. 2.11). These are held together by disulphide bonds.
24 2 Protein Structure
Fig. 2.10 Signal-peptide for import into mitochondria. Most mitochondrial proteins are encoded
in the nucleus; they are synthesised in the cytosol and then imported via a transport system that
spans both mitochondrial membranes (see Fig. 16.10 on page 375). An amphipatic ’-helix serves
as the recognition signal for binding of the nascent protein to the transporter. Note that the helical
wheel projection is viewed from the N-terminus
Breaking these with thioglycolic acid is the basis of the permanent wave.
• Heptad-repeats (Leu-zippers) are ’-helices where every 7th amino acid is leucine
(Fig. 2.12). Such helices associate because of hydrophobic interactions between
the Leu-residues, allowing for specific dimerisation of proteins. Some DNA-
binding proteins have this structure.
2.2.2 ˇ-Strand
In the “-strand, the polypeptide backbone is stretched out with ; 120°, 120°.
Several strands are aligned either in a parallel (all carboxy-terminal ends are at the
same side) or antiparallel fashion, forming hydrogen bonds between a NH group
of one strand and a CDO-group in a neighbouring strand. This gives rise to a large
blanket-like structure, the “-pleated sheet. The main difference between the ’-helix
and “-strand is that in the ’-helix hydrogen bonds occur between residues of the
same helix, whereas in a “-pleated sheet they occur between residues of neigh-
bouring strands (see Fig. 2.13). Nevertheless, a single “-strand is stable because
2.2 Secondary Structure 25
Fig. 2.11 Keratin is a heterodimer that forms coiled-coils. Depicted here is coil 2B of keratin 5
and 14 (PDB-code 3tnu; the structure of the entire keratin molecule has not been solved yet). The
helices are shown as green and orange ribbons; in addition the protein surface is shown (blue D
basic, red D acidic, yellow D polar and grey D nonpolar)
Fig. 2.12 In heptad-repeats (Leu-zipper, here tropomyosin, PDB-code 1ic2) every seventh amino
acid is Leu. This leads to specific associations of ’-helices by hydrophobic interactions
the amino acids in this extended structure have plenty of “wiggling” space without
running into steric hindrance (look up the coordinates in the RAMACHANDRAN
plot!), resulting in entropic stabilisation. The R-groups point up- and downwards in
turn, making amphipatic sheets with polar and nonpolar or positive and negative
faces possible. The entire sheet is rarely flat, but has a right-handed twist, in
extreme cases forming a “-barrel (see Fig. 2.13). In schematic diagrams of protein
structure each “-strand is drawn as a broad arrow.
26 2 Protein Structure
Fig. 2.13 Hydrogen bonding in ’-helix (left, cytochrome b562 , PDB-code 256B) and “-sheets
(right, E. coli OmpA, PDB-code 1QJP). In an ’-helix all hydrogen bonds between keto- and
amino-groups in the protein backbone occur between neighbouring amino acids of the same helix.
In “-sheets, however, all such hydrogen bonds occur between amino acids in different strands,
alternating between the right and left neighbour
Fig. 2.14 Anti-parallel “-pleated sheet of the silk fibroin N-terminal domain (FibNT) from the
silkworm Bombyx mori L. at pH 4.7 (PDB-code 3ua0). Two subunits (red to yellow and cyan to
blue) form the sheet. Neighbouring strands have alternating directions, and are joined by “-turns.
Hydrogen bonds holding the sheet together run at right angle to the strands
In an antiparallel “-sheet (see Fig. 2.14) the stands point in alternating directions.
They are usually joined together by “-turns (see later). Ideal ; D 138°,
137°. Silk-protein is an example for the use of “-sheets in biologically important
structures. The amino acids within a “-strand are already in an extend conformation,
therefore silk shows little elasticity and has an extremely high tensile strength, as any
extension would require breaking covalent bonds. On the other hand, the strands are
held together by hydrogen bonds only, giving silk cloth this wonderful soft flow. At
neutral pH in the silk gland the fibroin protein has soluble random coils; as the silk
thread is ejected, acidification leads to “-sheet formation and precipitation of the
2.2 Secondary Structure 27
Fig. 2.15 Parallel “-pleated sheet (here PDB-code 2v9s). All strands have the same direction; the
“return-legs” are either ’-helices or coils. Hydrogen bonds holding the sheet together run obliquely
to the strands
protein. Each silkworm cocoon is made from a single silk thread that is 900 m long
and has a diameter of 10 µm.
In a parallel “-sheet (see Fig. 2.15) the N-termini of all strands point in the same
direction, ideal ; D 116°, 111°. The hydrogen bonds are oblique to the strand
direction, hence the parallel “-sheet is less stable than the antiparallel. The strands
in a parallel “-sheet are often joined by ’-helices, which form the “return-leg”.
The ideal parallel and antiparallel “-sheets are characterised by different ; -
values. However, because of the twisting of strands in a “-sheet there are no separate
peaks for them in the RAMACHANDRAN-plot; rather, they merge into a single big
area of extended structure.
Parallel “-strands can be wound into right-handed coils (see Fig. 2.16), containing
either two (“-roll) or three (“-helix) strands per rung [40]. The “-helix is found in
enzymes whose substrates are oligosaccharides (e.g., pectinases); it also occurs in
tailspike proteins of bacteriophages and in amyloid aggregates, the cause of several
debilitating diseases (see Sect. 10.2 on page 206). In a “-helix one or two of the three
“-strands may be replaced by ’-helices (see Fig. 2.17). Three “-helices or -rolls can
be arranged in coils, similar to the ’-helical coiled-coils.
28 2 Protein Structure
Fig. 2.16 Top: The tailspike protein from the bacteriophage Sf6 contains a “-helix with three
parallel “-sheets (PDB-code 2vbk). Three such helices are wound around each other, forming
a coiled-coil (not shown). Bottom: Alkaline protease from Pseudomonas aeruginosa (PDB-code
1kap) is an example for a “-roll with two parallel “-sheets. The structure is stabilised by Ca2C -ions
The PII helix is left-handed with three residues per turn and ; D 70°, 140°.
As is the single “-strand, it is stabilised by entropy, not by hydrogen bonds. Pro
frequently occurs in this structure, but not all PII helices contain Pro.
2.2.3.1 Collagen
Collagens (see Sect. 14.1.1 on page 324 for a more detailed discussion) are the
most important example for the PII -helix, they consist of three PII -helices wound
around each other (hetero- or homo-oligomer, see Fig. 2.18). The human genome
contains 42 collagen genes, which encode for 28 known collagen types. Of these
types I, II, and III are the most important. Each of the three molecules in collagen
has 1050 amino acids, with the sequence Gly-X-Pro. The angle of the Pro peptide
bond (amino group part of a ring) allows the sharp turn in the molecule [3], and the
2.2 Secondary Structure 29
Fig. 2.17 In porcine ribonuclease inhibitor (PDB-code 2bnh) ’-helices and “-strands form a ’/“
coil. To prevent side-chain interference, the ’-helices have to twist, resulting in a circular, rather
than helical, structure
Fig. 2.18 Collagen (PDB-code 1cag). To make the tight association between the three strands
clearer, one each is drawn space-filling, as wire diagram and as carbon-backbone. Note the
repeating Gly-X-Pro (yellow, green, brown; with X often hydroxy-Pro) sequence. Marked in blue
is a Gly!Ala mutation that prevents a close fit and destabilises the molecule. Such mutations
cause, for example, EHLERS-DANLOS-syndrome
small R-residue of Gly (only a H) allows the three protein molecules to wrap tightly
around each other. The resulting “rope” has a tensile strength higher than steel.
If only a single one of the Gly-residues in one of the collagen chains is mutated,
wrapping is no longer possible, leading to osteogenesis imperfecta (brittle bone
disease, collagen I), to EHLERS-DANLOS-syndrome (collagen I, III, or V), with too
brittle or too elastic ligaments and death by vascular or organ rupture, epidermolysis
bullosa (blistering of skin, collagen XVII), or to ALPORT-syndrome (collagen IV,
kidney, and hearing defects).
30 2 Protein Structure
Fig. 2.19 “-turns (here in PDB-code 1qiv) are most common between the strands of an anti-
parallel “-sheet
Hairpin Turns allow the protein to fold back onto itself in a 180° angle. They are
important, for example, between the different strands of an antiparallel “-sheet.
Because the CDO- and NH-groups of a turn are not all involved in hydrogen bond
formation within the protein, they are often surface-exposed and interact with water.
They may also occur in the catalytic centre of enzymes, where they are involved in
substrate binding.
Turns can contain 4 amino acid residues (“-turn, frequent, with a hydrogen bond
between NHi ::: ODCi3 , see Fig. 2.19) or 3 (”-turn, rarer). Turns with 2 (•), 5 (’)
or 6 ( ) amino acids have been described, but are very rare. Turns often contain Gly
(smallest amino acid) or Pro residues, the latter because of its specific value of ; in
addition the C˛ and Cı of Pro can undergo CH::: interactions with neighbouring
aromatic amino acids, which—although not as strong as regular hydrogen bonds—
can stabilise the turn [3]. Turns may also be found in the catalytic centre of an
enzyme (with a CH::: bond between Pro and an aromatic substrate). The different
types of turns can be distinguished by the ; -values of their peptide bonds [39].
2.2 Secondary Structure 31
In addition to the ’-helix there are two other, much rarer helical conformations
310 -helix 3 residues per turn and a hydrogen bond between residues i and i+3
(; D 50°, 25°). It occurs only at the C-terminal end of ’-helices, and can
be only 4–5 residues long.
-helix with 5 residues per turn and a hydrogen bond between residues i and
i+5. The -helix is usually only 7–10 amino acids long and flanked on both ends
by ’-helices. In effect, it introduces a kink into a long ’-helix (see Fig. 2.20).
Evolutionary, they are created by insertion of an amino acid into an ’-helix.
They occur at least once in about 15 % of proteins, often in the active site of an
enzyme. -helices have variable ; -values.
There are some other structures which occur in only a few proteins, but have
important functional roles. We discuss those when we talk about some special
proteins. Each secondary structure can be characterised by the ; angles in the
protein backbone (see Fig. 2.9).
Fig. 2.20 -helix (cyan) introduces a kink between the flanking ’-helices, here human ferritin
(PDB-code 3ajo)
32 2 Protein Structure
2.2.6 Coils
Coils are any structure except those mentioned above. Note that amino acids in coils
still have a defined position within the structure of a protein, thus the terms “random
coil” or “unordered”, sometimes found in the literature, are misleading. These
areas have an important function too, because they add flexibility to the protein
and allow conformational changes, for example, during enzymatic turnover. Their
peptide bonds are not involved in intra-protein hydrogen bonding, therefore they
are often exposed to interact with water, small ligands, or with other proteins. Coils
tend to tolerate mutations better than other structures and are therefore hotspots for
evolution. In Chap. 11 on page 225 we will see that it is coils that give antibodies
their specific binding properties.
Tertiary structure describes the global conformation of a protein, in other words, the
way in which the elements of its secondary structure are arranged in space. Tertiary
structure is determined by
Hydrophobic interactions of amino acid side-chains. Typical globular pro-
teins have a core of hydrophobic side-chains, whereas hydrophilic side-chains
are on the surface where they interact with water or with other proteins. If
hydrophobic residues were exposed to water, the water would have to form
an ordered cage (so-called clathrate) around them, which would decrease the
entropy of the system.
VAN DER WAALS -interactions are fluctuating dipole interactions with a bond
energy of 4–17 kJ/mol. The bond length is 4 Å.
Hydrogen bonds are interactions between permanent partial charges. The
bond length is about 3 Å; the bond energy is 2–6 kJ/mol if both partners are
partially charged and up to 21 kJ/mol if one partner is fully charged. If the
distance between the partners is too large, an indirect hydrogen bond may be
formed where water acts as a bridge (ı H2 O C ı)
Salt bridges are interactions between fully charged groups. The bond length is
2:8 Å. The bond energy is 10–30 kJ/mol in an aqueous environment, but can be
significantly higher if both groups are buried in a hydrophobic core.
Disulphide bonds are formed between two Cys residues after fold-
ing of the protein into its higher-order structure (RSH C HSR0 !
RSSR0 C 2ŒH). This is an oxydation (removal of hydrogen), which will
not normally occur in the reducing environment of the cytosol. However, the
environment inside the ER is oxydising. Thus disulphide bridges are found
2.3 Tertiary Structure 33
more frequently in the cell surface and secreted proteins than in cytosolic ones.
They may occur between two Cys residues in the same polypeptide (intrachain),
or between different polypeptides (interchain). Bond length is 2:2 Å and bond
energy 167 kJ/mol.
Coordination around cofactors Several amino acids in a protein can be
involved in the coordination of metal ions (Ca; Zn; Fe; Mg; Na; K) or prosthetic
groups such as hæme or FAD.
In transmembrane segments, hydrophilic amino acids are in contact with water
(snorkelling effect) and hydrophobic amino acids are found in contact with the
fatty acid tails of the lipids (anti-snorkelling effect). The lipid/water interface is
formed by three amino acids with special properties: Trp, Tyr, and Lys (the so-
called aromatic belt; see Fig. 2.21). They have in common relatively long molecules
which are hydrophobic, but have a hydrophilic (polarised or ionised) end and can
make contact with lipids and water at the same time. Thus a transmembrane segment
has a well-defined position within the membrane and cannot bob up and down [15].
Fig. 2.21 Aromatic belt in a transmembrane protein, here outer membrane protein A (PDB-
code 1qjp). Trp (olive), Tyr (brown) and Lys (blue) are marked; they occur preferentially at the
membrane surface
34 2 Protein Structure
Some proteins have several domains, that is, individually folding regions
connected by short segments. These individual domains can be isolated by gentle
proteolysis; they may maintain not only their structure, but even their catalytic
function. For example, the chaperone Hsc70 (see Fig. 15.1 on page 346) has three
domains: an ATPase-, a peptide-binding-, and a regulatory domain. Gentle treatment
with chymotrypsin will digest the links between those domains. The isolated ATPase
domain can still hydrolyse ATP.
If one looks at many different proteins, one will find certain patterns in the way
elements of secondary structures are arranged. These folding patterns are called
motives. It is interesting to note that motives are much more stable during evolution
than amino acid sequences. In other words, some proteins can be shown to be
homologous by their folding patterns, even though they no longer have significant
similarity in their amino acid sequence (e.g., the muscle protein actin, the enzyme
hexokinase, and the chaperone Hsc70).
According to their folding pattern, protein domains may be hierarchically
classified into groups. Because such classification is somewhat subjective,
different schemes have been suggested. One commonly used scheme is the
Structural Classification of Proteins (SCOP) database (since 2014 SCOP2,
http://scop2.mrc-lmb.cam.ac.uk/). Classification at present cannot be done
automatically, but requires expert knowledge. The following taxa are used in SCOP
(see also figs. 2.22 and 2.23):
Class Coarse classification according to the relative content of ’-helix and “-
strand.
Fold Major structural similarity, the proteins have identical secondary structure
elements (at least in part) and the same topological connections. However, there
may be considerable variation in peripheral regions of a domain. Similarities may
arise from common origin or from convergent evolution.
Superfamily Domains have a common folding pattern and their functions are
similar, but sequence identity may be low. The ATPase domains of actin,
hexokinase, and Hsc70 are an example for a superfamily. Common evolutionary
origin is probable.
Family Proteins with high sequence homology (> 30 % identity) and/or similar
function. Proteins clearly have an evolutionary relationship. Identical proteins
are subclassified by species.
For a basic understanding of protein folding only the class is relevant:
all-’ Proteins which contain only ’-helices, or where the content of “-strands is
at least insignificant.
all-“ Proteins which contain only “-strands, or where the content of ’-helices is
at least insignificant.
2.3 Tertiary Structure 35
a.1.1.2, PDB-codeCytochrome-b562,
Hæmoglobin β 1HGA PDB-code 562B a.24.3.1
Lactate dehyd-
rogenase domainc.2.1.5, PDB-code c.47.1.1, PDB-code
1 1I0Z Thioredoxin 2TRX
Fig. 2.23 Stereo views of multidomain and membrane proteins, small proteins, and coiled-coils
Apart from SCOP there are also other approaches for protein classification,
in particular CATH (http://www.cathdb.info) and FSSP (http://ekhidna.biocenter.
helsinki.fi/dali/), but these yield largely similar results [17].
Proteins can be described by a set of concise classification strings (sccs)
according to their structure, for example, b.2.1.1 (class b D all “, fold 2 D NAD(P)+ -
binding ROSSMANN-fold domains, superfamily 1 D Alcohol dehydrogenase-like
and family 1 D Alcohol dehydrogenase). Within families, proteins are sorted by
species and isoform (Table 2.1).
2.4 Quaternary Structure 37
Internet Resources
Protein structures are stored in the Brookhaven Protein Data Bank (PDB)
in a unified format that can be used by modelling software such as
DeepView (formerly known as Swiss-PDB, http://www.expasy.ch/spdbv/
mainpage.html). Coordinates may be obtained from PDBlite (http://oca.
ebi.ac.uk/oca-bin/pdblite), OCA(http://bip.weizmann.ac.il/oca-bin/ocamain),
PDBsum (http://www.ebi.ac.uk/pdbsum/), or, if the EC-number is known,
from http://www.ebi.ac.uk/thornton-srv/databases/enzymes/.
PDBTM (http://pdbtm.enzim.hu/?) deals with membrane proteins. Three-
dimensional structures of nucleic acids may be retrieved from NDB http://
ndbserver.rutgers.edu/.
The protein structures presented in this book were created with DeepView
using data files obtained from OCA.
into a protein has important consequences for its function, which is often lost if the
subunits are separated (see chapter 7 on page 163).
In some proteins several polypeptides come together to form a subunit, which
repeats several times. Such subunits are called protomers. For example, hæmo-
globin is a diprotomer; each protomer consists of an ’- and a “-chain (see Fig. 7.2
on page 166).
Assume a protein with 100 peptide bonds, each of which can assume 6 stable
conformations (’-helix, “ "" -sheet, “ "# -sheet, PII -helix, turn, coil).
Because each of these states is characterised by a ; -angle pair, this results in
26 D 64 possible angles per peptide bond and 10064 D 10128 for the entire protein
(note that this is an underestimate!).
Rotation around a ¢-bond takes about 1013 s, thus folding by random testing
of all possible angles would take 10128 1013 s D 1012813 s D 10115 s. Our
best estimate for the age of the universe is 13:7 109 a 4:32 1017 s. Proteins
therefore should never fold. You now also understand why it is so difficult to
calculate protein structures ab initio.
In reality, folding is a rapid process; in E. coli at 37 ıC a 100 amino acid protein
folds in about 5 s. During folding hydrophobic residues are buried in the interior
and hydrophilic residues appear on the outside of the protein, resulting in a compact
“molten globule” structure. This brings amino acids so close to each other that
the formation of hydrogen bonds between peptide bonds gives rise to secondary
structure.
The conformational freedom for protein folding is, however, much smaller than it
might appear at first sight. We have already discussed steric hindrance between the
atoms of neighbouring amino acids, which lead to large nonpermissible areas in the
RAMACHANDRAN plot. But steric hindrance is also possible between amino acids
farther along the protein chain, unless the protein is in an extended conformation
(upper left hand quadrant in the RAMACHANDRAN-plot) [24]. Thus it is not
possible to have a “-strand directly following an ’-helix (or vice versa) without an
intervening coil. These restrictions no doubt explain the limited number of structural
motives found in proteins.
2.5 Further Aspects of Protein Structure 39
dŒc
D k0 Œc C k1 Œi; Œc C Œi D n (2.1)
dt
k0 and k1 are the rate constants for unfolding and folding, respectively; these
rate constants are related by the law of mass action:
Œieq k0
D DK (2.2)
Œceq k1
If the energy penalty for the misfolded state U = 0, then the average mean
first passage time required to arrive for the first time at Œc D n (all bonds
correctly folded) becomes
1
D . C 1/n (2.4)
nk0
free energy G Su
Sf
Fig. 2.24 Protein folding reduces free energy (G). The native structure is the one with the lowest
free energy. However, proteins may get kinetically trapped in local minima of the energy landscape.
During folding the entropy of the unfolded protein (Su ) is reduced to that of the folded (Sf ),
symbolised by the width of the funnel. This entropy reduction (more orderly, less probable state)
reduces the overall change in free energy of the folding process. This is shown here for a two-
dimensional reaction, but in protein folding each amino acid can adjust at least and , so the
number of dimensions is impressive and it is not surprising that no way to calculate the 3D-structure
of a given protein sequence from first principles is known
Fig. 2.25 Movement in adenylate kinase during substrate binding [23]. Note how ’-helices and
“-sheets provide an overall stable structure, with coils acting as hinges. Viewing this video
(http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369945/bin/pcbi.1002555.s012.mov) requires a
connection to the Internet. The still image shows a superposition of the enzyme with either 2 ADP
(PDB-code 2cdn) or with AMP and the nonhydrolysable ATP-analog AMP-PNP (PDB-code 1ank)
bound
2.5.3 Morpheeins
100
relative enzymatic activity (%)
80
60
40
20
0
[ ]
42 2 Protein Structure
the energy of thermal motion (RT, at room temperature 2:5 kJ/mol). Changes
in environmental conditions can break such bonds, leading to the denaturation of
proteins.
Strong acids and bases denature proteins by disrupting ionic interactions.
Organic Solvents can denature proteins by disrupting hydrophobic interac-
tions. Proteins are not soluble in organic solvents. More water soluble solvents
(e.g. ethanol or acetone) bind water and thus reduce the concentration of water
available to the protein.
Detergents disrupt hydrophobic interactions. They can denature proteins
without precipitating them.
Salts precipitate proteins because they reduce the concentration of water avail-
able to maintain protein structure.
Small hydrophilic substances such as urea can denature proteins when they
are present in high concentration, both by binding water and by binding to the
protein.
Heavy metal ions (lead, mercury) bind to carboxylate or sulphydryl groups of
proteins. That’s why they are toxic!
Heat An increase in temperature leads to increased molecular motion. This can
result in breaking hydrogen bonds. If some hydrogen bonds break, the structure
of the protein (say, an ’-helix) is weakened; that is, other hydrogen bonds become
easier to break. Denaturation by increasing temperature therefore is a process
that starts quite suddenly at a certain critical temperature and is completed
at a temperature only marginally higher. Renaturation is sometimes possible
with small proteins (ribonuclease, lysozyme) under laboratory conditions, but
denaturation is irreversible in the real world (boiled egg).
Humans die if their core body temperature exceeds 42 ıC, when key
proteins lose their function. For a useful application of thermal protein
denaturation see Fig. 2.26.
The covalent bonds in proteins are more robust, but peptide bonds are hydrolysed
by heating in strong acids and bases, and by proteolytic enzymes. Disulphide bonds
are cleaved by reducing agents; oxydizing agents can form disulphide bonds from
SH-groups.
(continued)
46 2 Protein Structure
O
O
Peptide
Peptide
O
O
Fig. 2.27 Top: Dimerisation of HIV-protease (PDB-code 1DAZ) occurs by the formation of an
antiparallel “-sheet from the ends of both subunits. This brings the two catalytic aspartate residues
(D25 and D25 ’) close together. Middle: Space filling model of HIV-Protease. The “-sheet formed
by the N- and C-terminal ends of both subunits is clearly visible. Bottom: Substances with two
peptides linked by a stiff backbone can interdigitate into the dimerisation site of a monomer and
prevent dimerisation
2.6 Posttranslational Modifications of Proteins 47
This brings the catalytic aspartate residues (D25 in each subunit) together,
thus forming the catalytic site of the enzyme. Several pharmaceuticals are
on the market which bind to the catalytic centre of the enzyme, but these
are beginning to lose their effectiveness due to the development of resistant
virus strains. Also they are very hydrophobic compounds, which makes
their pharmaceutical use difficult. A new class of protease inhibitors binds
to the dimerisation site and prevents the formation of the active enzyme.
Development of such substances requires an intimate understanding of the
structure and function of an enzyme (see Fig. 2.27 and [4] for further details).
The human genome contains 23 000 genes [7, 8, 35]. mRNA-processing (altern-
ative splicing, mRNA editing etc.) results in 3 mRNAs per gene (Fig. 2.28).
Posttranslational modification of the proteins produced from them creates 10
different protein species from each mRNA. Thus the human proteome consists of
106 proteins, with different functions, regulation, destruction. . .
The properties of proteins can be changed by posttranslational modification; in
some cases this can be done (or undone) quickly in response to environmental
stimuli, for example, exposure to hormones. Such modifications can switch enzymes
between active and inactive states and are required for the proper targeting of a pro-
tein to subcellular structures. The following reactions are of particular importance:
2.6.1 Glycosylation
immunoprecipitation
Fig. 2.28 The pathway from genome to phenome is studied at different levels with different methods. The resulting complex data can only be handled by
advanced computing techniques
2 Protein Structure
2.6 Posttranslational Modifications of Proteins 49
OH OH OH OH
H2C H2C H2C H2C
H H OH H
N N N N
OH OH OH OH OH
OH OH OH
OH OH OH OH
5-Amino-5-deoxy-D- 1,5-Dideoxy-1,5- N-Butyldeoxy- 1-deoxygalacto-
glucopyranose imino-D-glucitol nojirimycin nojirimycin
(Nojirimycin) (Deoxynojirimycin) (Miglustat)
Fig. 2.29 Sugar-analogues where the aldehyde-group is replaced by NH2 act as glycosylation
inhibitors. They can be used as antiviral drugs, and also in some inherited diseases (mucopolysac-
charidoses), where the enzymes that degrade glycoproteins in the lysosomes do not work properly
2.6.2 Glucation
Fig. 2.30 Glucation of proteins by the aldehyde group of glucose proceeds via an unstable
SCHIFF-base and AMADORI-rearrangement to a stable ketosamine. During roasting, this is
converted into caramels via the MAILLARD-reaction. These are responsible for the taste of
cooked food. Ketosamine may also be converted to Advanced glucation end products (AGE) by
STRECKER-degradation
Gluathion ( -Glu-Cys-Gly)
Fig. 2.31 The tripeptide glutathione serves as a redox-coupler in our cells. Left: Structure of
glutathione. Right: Coupling of detoxification of reactive oxygen species (ROS, here H2 O2 ) and
consumption of NADPH + H+ by glutathione
When cytosolic proteins are used in the laboratory one has to make sure
that their SH-groups are not oxydised by air oxygen, which would lead to
inactivation. The buffers therefore usually contain an antioxydant such as “-
mercaptoethanol or dithiotreitol.
Bacterially expressed eukaryotic proteins are often misfolded and pre-
cipitate as inclusion bodies because bacteria are less active in disulphide
formation than eukaryotes. However, bacteria do have an enzyme operon
(Dsb, short for disulphide bond) for formation and isomerization of protein
disulphide bonds in their periplasm.
2.6.4 Proteolysis
released into the intestine, they are activated by cleaving off a part of the enzyme
that was blocking the active site. Cascades of proteolytic enzymes make up our
blood-clotting and complement system (see Sect. 11.3 on page 249). Prohormones
(e.g. insulin) are activated in a similar manner. On the other hand, proteins no longer
needed can be inactivated by proteolysis (e.g., cyclins in cell cycle).
Proteolysis may also be used to remove signal peptides. For example, some
proteins destined for the intermembrane space of mitochondria carry a signal
sequence for mitochondrial import (see Fig. 2.10) which leads to their import into
the mitochondrial matrix. There the signal peptide is cleaved off by matrix protease,
exposing a second signal directing the protein’s export into the intermembrane space
through a different transporter.
A special form of proteolysis is protein splicing [25]. This reaction is carried
out by a protease within the protein itself, the intein. This protease cuts itself out
of the protein and rejoins the flanking segments (exteins), and all this without
requiring any external proteins, cofactors or sources of energy such as ATP! The
intein protein, once cut out of the host protein, has endonuclease activity. Intein
genes are mobile elements (parasitic DNA); the corresponding mRNA can be used
to direct the synthesis of cDNA by reverse transcriptases encoded by retrovirus
inside the cellular DNA. This cDNA then is integrated into genes of other proteins
by the endonuclease activity of the intein. As the intein cuts itself out of that protein,
this insertion has little negative consequences for the host. Inteins are therefore the
smallest possible parasites [13, 14]. They are now used as self-cleaving affinity-tags
to make protein pharmaceuticals.
2.6.5 Hydroxylation
Protein hydroxylation occurs on Pro and Lys residues. We have already discussed
the importance of Pro-hydroxylation for collagen formation (see page 28).
Proteins regulated by Pro-hydroxylation are the hypoxia induced transcription
factors (HIF) [16]. These consist of two subunits, ’ and “. In the presence of oxy-
gen, the ’-subunit is hydroxylated on P402 and P564 by HIF-prolyl hydroxylases
(PHD-1, -2 and -3, EC 1.14.11.29), leading to their proteasomal destruction. In the
absence of oxygen, the ’-subunits accumulate and form a complex with “, which
binds to hypoxia response elements in the cellular DNA. As a consequence, oxygen
consumption of the cell is down-regulated; it can survive a low oxygen supply for a
longer time.
This mechanism may one day be exploited to increase the survival time of
organs in infarct or transplantation, e.g., with inhibitors of PHD-1 (currently
available inhibitors produce too many side effects due to concomitant inhibi-
tion of PHD-2 and -3).
2.6 Posttranslational Modifications of Proteins 53
O C-terminus O C-terminus
C ATP ADP C
N-terminus N CH N-terminus N CH
H H
CH2 CH2
O
NH N P O
N N O
Fig. 2.32 Phosphorylation of histidine residues in His-kinases. Although mammals do not use
regulatory phosphorylation of His, bacteria and fungi regulate the expression of pathogenicity
factors that way. Thus, His-kinases may become important drug targets
2.6.6 Phosphorylation/Dephosphorylation
The transfer of phosphoryl groups from ATP to the hydroxy groups of Ser, Thr, and
Tyr (rarely onto His-nitrogen or Asp and Glu COO ) is important for the reversible
regulation of enzyme activity. The transfer is catalysed by protein kinases, and the
removal by protein phosphatases. Thus the reaction is rapidly reversible at minimal
expense for the cell (a single high energy phosphate bond). One-third of all proteins
in the cell undergo regulatory phosphorylation/dephosphorylation cycles (Fig. 2.32).
2.6.7 Acetylation/Deacetylation
Transfer of acetyl groups from acetyl-CoA onto the –-amino group of Lys by protein
acetylases, and their removal by protein deacetylases, are also used for regulation
of enzymatic activity. The human acetylome, containing 1750 proteins, has recently
been determined [5]. Many DNA-binding proteins are regulated by acetylation,
because the acetylated Lys is much less likely to be protonated, hence less likely
to bind to the negative charges on DNA. In addition, the change in protonation
also affects the binding of transcription factors. The activity of metabolic enzymes
may also be regulated by Lys-acetylation, for example, the glycolytic activity of
Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is increased by acetylation;
gluconeogensis is stimulated by deacetylation.
Three classes of deacetylases are known: class I and II hydrolyse the bond with
water, whereas class III deacetylases (sirtuins) use NAD+ (see Fig. 2.33), thus their
activity depends on the nutritional status of the cell [11, 33, 38, 42]. This is probably
the mechanism behind the observation that mild caloric restriction prolongs the life
expectancy of lab animals.
54 2 Protein Structure
H +H2O + O CH3
N CH3
NH3 +
class I or II deacetylase O
O
acetate
N-acetyl-lysyl-group
O NH2
NH2
N N
+
N N O O N
O O O O O
P P
O O
HO OH HO OH
class III deacetylase
NAD+
NH2
O NH2 N N
NH3
+ + +
N N
O O
O
O
O
O O OH
P P
N O O
HO OH HO O O
Nicotinamide
CH3
O-acetyl-ADP-ribose
(second messenger)
Fig. 2.33 There are three classes of deacetylases. Enzymes of classes I and II simply hydrolyse
the amid bond. However, enzymes of class III (sirtuins) use NAD+ , which is available in the
fasting state, but converted to NADH + H+ in the fed state. Thus sirtuins regulate gene expression
depending on the energy available to the cell
2.6.8 Methylation/Demethylation
Transfer of methyl groups from S-adenosyl methionine (SAM) onto proteins may
also serve regulatory purposes, but we know very little about it [6]. Transfer can be
to
carboxyl groups, forming methyl esters. This reaction is used to mark dam-
aged proteins for destruction, but also in signal cascades of unknown function.
2.6 Posttranslational Modifications of Proteins 55
Addition of
palmitoyl- (fatty acid) groups to internal Cys or Ser
myristoyl- (fatty acid) groups to N-terminal Gly
farnesyl- or geranylgeranyl (isoprenoid) groups to C-terminal Cys
converts cytosolic enzymes to membrane-bound (cytosolic leaflet). Because this is
required for the activation of some enzymes, the transferases make possible drug
targets (e.g., anticancer drugs).
2.6.10 S-Nitrosylation
2.6.11 ADP-Ribosylation
ADP-Ribosylation (see Fig. 2.35) is used by some bacterial toxins (Vibrio cholerae,
Bordetella pertussis) to inactivate cellular proteins. This is the starting point of
the patho-mechanism of the diseases associated with these bacteria (cholera and
whooping cough, respectively).
2.6.12 Deamidation
Deamidation is the removal of the acid amide group from Gln or Asn, forming Glu
and Asp, respectively. It may be followed by racemization (formation of D-amino
acids).
56 2 Protein Structure
NfκB
IkB IkB
NfκB NfκB
NfκB
NfκB NfκB
IkB
IkB
O NH2
H2N N N
HO OH HO OH
NAD+
Cholera
toxin
NH2
N N
O
H
Protein Arg N N N
O O NH2
+
O P P O
O O O N
O O H
HO OH HO OH
Ubiquitin, a 8:6 kDa protein (see Fig. 3.17 on page 88) is transferred to proteins
by a group of ubiquitin-ligases, of which three classes exist. E1-ligase (UbA1) (see
Fig. 2.36) forms a thioester bond with the C-terminal glycine residue of ubiquitin
in an ATP-dependent reaction. This activated ubiquitin is then transferred to an
58 2 Protein Structure
E2-SH
Ub-S Ub-S SH
E1 E2 E1
AMP-Ub AMP-Ub
PPi
E2-S-Ub
Ub
S-Ub S-H
E1 E1
ATP AMP ATP
AMP-Ub
Fig. 2.36 The E1-ubiquitin ligase binds two molecules of ubiquitin: one is covalently bound to a
thiol residue of the enzyme, the second is bound to AMP. The first ubiquitin molecule is transferred
to the E2-ubiquitin ligase, then the second ubiquitin is moved to the SH-group, and the AMP is
exchanged for ATP. Then another ubiquitin is bound to the nucleotide, pyrophosphate is released
in the process, and the enzyme is ready for a new cycle
E2-ligase (UbCs) and from there to an E3-ligase. All three ligases bind ubiquitin as
thioester. There is only one (or, in some species, a few) E1 (UbA1) but several UbCs
and many E3-ligases, which are often specific for a single target. Ubiquitin is usually
transferred to the -amino group of a lysine, forming an isopeptide bond. Binding
to the N-terminus, or to Cys, Ser, and Thr has been described, but is rare. Ubiquitin
contains seven functionally distinct Lys-residues, whose -amino groups form iso-
peptide bonds with the C-terminal Gly of other ubiquitin molecules, resulting in
long chains of poly-ubiquitin. Protein degradation in the proteasome, for example,
results from poly-ubiquitination at Lys-11 and/or Lys-48. For the discovery of
ubiquitin A. CIECHANOVER, A. HERSHKO & I. ROSE received the Nobel Prize
for Chemistry in 2004.
There is a whole family of ubiquitin-like modifiers which are transferred in a
similar manner, but whose function we are only beginning to understand (see, e.g.,
[36] for a recent review). Transfer is often by an E2-ligase directly; E3-ligases are
required for ubiquitin presumably because of the large number of different proteins
labelled with this marker. These ubiquitin-like modifiers (UbLs) are involved in the
regulation of endocytosis, apoptosis, cell cycle, DNA repair and other processes.
There are even ubiquitin-like proteins (Isg15 and Fat10), which are regulated by
interferon and modulate immune response. The mechanisms involved in these
regulatory pathways are, however, poorly understood.
Ubiquitin and ubiquitin-like modifiers are removed by deconjugases that cleave
the isopeptide bond. These are often highly specific for both the tag and the modified
target.
2.7 The Relationship Between Protein Structure and Function... 59
OH
OH OH
OH OH
O CH2 O N O N O
N N N O N O
N folding cyclisation
τ = 10 min N O N OH
CH2 O
OH
dehydration
OH
OH
Aequorea victoria GFP
λex= 395 + 488 nm, λem= 508 nm
oxidation
τ=1h N O
N O
OH O
OH N
N O
O2
H2 O2 N
N
Fig. 2.37 Maturation of the fluorophore in GFP. The reaction does not require enzymes or
cofactors except molecular oxygen
60 2 Protein Structure
Fig. 2.38 Stereo representation of the crystal structure of GFP (PDB-code 1ema)
Fig. 2.39 Amino acids that interact with the fluorophore in the active centre of GFP
2.8 Exercises
2.8.1 Problems
2.2. Why can the isoelectric point of a protein not be predicted from its amino acid
composition and the known isoelectric points of the amino acids?
2.4. Which of the following amino acids would you expect to find in the core of a
protein?
A Lys
B Arg
C Glu
D Leu
E Asp