0% found this document useful (0 votes)
26 views58 pages

1.1 Basic Structure of Amino Acids

Amino acids are carboxylic acids with an amino group, and they play crucial roles in proteins, with 22 different types encoded in genes. They have a basic structure consisting of a carboxy group, an amino group, a hydrogen atom, and a variable side-chain, with the majority being chiral. The isoelectric point (pI) is the pH at which the amino acid has no net charge, affecting its solubility and interactions in biological systems.

Uploaded by

sadia.202204062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views58 pages

1.1 Basic Structure of Amino Acids

Amino acids are carboxylic acids with an amino group, and they play crucial roles in proteins, with 22 different types encoded in genes. They have a basic structure consisting of a carboxy group, an amino group, a hydrogen atom, and a variable side-chain, with the majority being chiral. The isoelectric point (pI) is the pH at which the amino acid has no net charge, affecting its solubility and interactions in biological systems.

Uploaded by

sadia.202204062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Chapter 1

Amino Acids

Abstract Amino acids are carboxylic acids with an amino-group in the ’-position.
With the exception of glycine, all amino acids are chiral; usually the L-form is
used in living organisms. Twenty-two different amino acids are encoded in genes.
Polycondensation of amino acids leads to peptides and proteins. The different side-
chains of the various amino acids have different physicochemical properties and
allow these amino acids to fulfil different functions inside a protein.

1.1 Basic Structure of Amino Acids

Amino acids contain a carboxy group, an amino group, a hydrogen atom, and a
variable side-chain R (“residue”). The simplest amino acid is glycine, where R is
a hydrogen atom. Because the ’-carbon of Gly carries only three different ligands
it is not enantiomeric. Thus glycine is not chiral, unlike all other amino acids (see
Fig. 1.1). Only L-amino acids are found in proteins. However, D-amino acids are
found in the bacterial cell wall and in several antibiotics. In humans, D-Ser is
produced by astrocytes to regulate NMDA-receptor responses to Glu and long-term
potentiation.
The carboxy group has a pKa close to 2 whereas the amino group pKa ranges
from 9 to 10. Thus, amino acids can exist in different protonation states:

At neutral pH, amino acids exist as zwitterions—molecules that possess both a


positive and a negative charge.
Several amino acids contain additional acidic or basic groups in their side-
chains. The following table gives pKa values of the side-chains in water, the pKa
inside a protein depends on the unique environment created by neighbouring side-
chains [1]:

© Springer International Publishing Switzerland 2015 3


E. Buxbaum, Fundamentals of Protein Structure and Function,
DOI 10.1007/978-3-319-19920-7_1
4 1 Amino Acids

R
O OH O O
C C
+
H 2N C H H3N C H
R R H N
CO

NH2 ,
ε δ γ β α O
H2 N C C C C C C
H2 H2 H2 H2 H OH

Fig. 1.1 Top left: Basic structure of an amino acid. Amino acids can form zwitterions (Zwitter
(Ger.) = hermaphrodite) with a positive and a negative charge. Top right: Because the ’-carbon
bears four different substituents, it is chiral (exception: glycine where R D H). In L-amino acids if
the ’-carbon is placed on the paper plane, with the hydrogen facing you, the remaining substituents
read “CORN”. Bottom: Nomenclature of carbon atoms, using lysine as example. The Carboxy-
carbon is designated C’, the following carbon atoms are labelled with the letters of the Greek
alphabet. Sometimes the last C-atom is called ¨, irrespective of the chain length

Group Amino Acid pKa in water


Carboxy Glu, Asp 4:1
Selenol Sec 5:2
Imidazole His 6:2
Sulphydryl Cys 8:3
Phenolic OH Tyr 10:1
Amino Lys 10:5
Guanidino Arg 12:0
Alcoholic OH Ser, Thr 15:9

You have to be able to recognise the 22 amino acids (see Fig. 1.2). The amino
acid side-chains can engage in a number of noncovalent interactions and covalent
bonds (Table 1.1):

1.2 The Isoelectric Point

From your chemistry lessons you know how to determine the pKa of an acid or base,
the pH at which half of the molecules are charged. A compound like an amino acid,
which can act as both acid and base, has another important property: the isoelectric
point pI, which is the pH at which the number of positive charges on the molecule
is the same as the number of negative charges. At the pI the molecule has no net
charge; the molecule’s ability to interact with water is lowest, and therefore its
solubility.
1.2 The Isoelectric Point 5

O O
C
+
H 3N CH
H
Glycine (Gly, G)

O O O O O O O O
C C C C
+ +
H 3N
+
CH H 3N
+
CH H3N CH H3N CH

C H C OH C SH C Se
H2 H2 H2 H2

Alanine (Ala, A) Serine (Ser, S) Cysteine (Cys, C) Selenocystein (Sec, U)

O O O O
O O O O O O C C
C C C +
+ + H3N CH
+
H3N
+
CH H 3N CH H3N CH
H3N CH
CH2 CH2
HC CH3 HC OH CH2
C C S CH3
CH3 CH3 C H2
O O H2N O

Valine (Val, V) Threonine (Thr, T) Aspartic acid (Asp, D) Asparagine, (Asn, N) Methionine (Met, M)

O O O O O O O O O
O C
C C C C
+ +
H3N
+
H3N CH H 3N
+
CH H 3N CH H 3N
+
CH CH
CH2 HC CH3 CH2 CH2 CH2
HC CH3 CH2 CH2 CH2 CH2
CH3 CH3 C C CH2
O O O NH2
NH
C
Leucine (Leu, L) Isoleucine (Ile, I) Glutamic acid (Glu, E) Glutamine (Gln, Q) +
H2N N H2
Arginine (Arg, R)
O O O O
C C
+ +
H3N CH H3N CH
CH2 CH2
CH2 CH2
CH2 CH2
CH2 CH2
+
H3N NH
Lysine (Lys, K) C O

N CH3

O O O O O O
C C C
+ + + Pyrrolysine (Pyl, O)
H3N CH H 3N CH H 3N CH
CH2 CH2 CH2

NH

O
Phenylalanine (Phe, F) Tyrosine (Tyr, Y) Tryptophan (Trp, W)

O O O O
C C
H
+ C H3N
+
CH
H2N
CH2

NH
+
Proline (Pro, P) N
H Histidine (His, H)

Fig. 1.2 The 22 amino acids encoded by genes. Once incorporated into proteins, amino acids may
be further modified. Pyrrolysine has been found only in bacteria; it is encoded by the “amber” stop
codon UAG. Selenocysteine is encoded by the UGA “opal” stop-codon (see Fig. D.1 on page 493).
Acidic groups are marked red, basic groups blue, polar groups orange, and hydrophobic groups
green. Note that Thr and Ile have a chiral “- in addition to the ’-carbon. Pyl has two chiral carbon
atoms in the ring
6 1 Amino Acids

Table 1.1 Bonds formed by various amino acids. For posttranslational modifica-
tions the following abbreviations were used: A = acetyl, F = fatty acid, I = isoprenoid,
M = methyl, N = nitrosyl, OH = hydroxyl, P = phosphate, S = sugar, SS = disulphide,
U = ubiquitin (-like), Y = AMPylation
Hydrophobic Hydrogen Covalent
Amino Acid interaction bond Salt bond modification
Alanine (Ala, A) + – – –
Arginine (Arg, R) + +++ +++ –
Asparagine (Asn, N) – +++ – S
Aspartate (Asp, D) – ++ +++ (P), M
Cysteine (Cys, C) + (+) + SS, M, F, I, N
Glutamine (Gln, Q) + ++ – –
Glutamate (Glu, E) + ++ +++ (P), M
Glycine (Gly, G) (+) – – F
Histidine (His, H) – +++ + (P)
Lysine (Lys, K) ++ +++ – OH, A, M, U
Leucine (Leu, L) +++ – – –
Isoleucine (Ile, I) +++ – – –
Methionine (Met, M) ++ – – –
Phenylalanine (Phe, F) +++ – – –
Proline (Pro, P) ++ – – OH
Pyrrolysine (Pyl, O) +++ + – –
Selenocysteine (Sec, U) + ++ + –
Serine (Ser, S) + + – P, S, F
Threonine (Thr, T) ++ + – P, S, Y
Tryptophan (Trp, W) ++ + – –
Tyrosine (Tyr, T) ++ + – P, Y
Valine (Val, V) ++ – – OH

How can we get the isoelectric point? Looking at the titration curve of glycine
(see Fig. 1.3, top) we see that below pK1 most of the molecules bear one positive
charge at the amino group; the carboxy group is uncharged.
Above pK2 it is the other way round: the carboxy group bears a (negative) charge
and the amino group is uncharged. Right in the middle between pK1 and pK2 there
is an inflection point in the titration curve; this is the pI.
Thus we remember: In a chemical that has one acidic and one basic group, pI is
the average of the two pK values:
1
pI D
 .pK1 C pK2 / (1.1)
2
But how do we do it when there are 3 or more ionisable groups?
1.2 The Isoelectric Point 7

Fig. 1.3 Titration curves of _pI = 5.97 _


HO O pK1 = 2.34 O O pK2 = 9.60 O O
glycine, glutamic acid and +
C
+
C C
H3N CH2
histidine. At the isoelectric H3N CH2 H2N CH2

point amino acids carry an glycine


12
equal number of positive and
negative charges, thus they 10

have no net charge. The 8


isoelectric point can be

pH
6
calculated as the average
between the pKa -values on 4
each side of that uncharged
2
form
0
0 0.5 1 1.5 2

OH-equivalents (mol/mol)
pI = 3.22
_ _ _
HO O O O O O O O
C C C C
+ + +
H3N CH pK1 = 2.19 H3N CH pKR = 4.25 H3N CH pK2 = 9.67 H2N CH
CH2 CH2 CH2 CH2
CH2 CH2 CH2 CH2
C C C C
HO O HO O _O O _O O

Glutamate
12

10

8
pH

00 0.5 1 1.5 2 2.5 3


OH-equivalents (mol/mol)

pI = 7.59
_ _ _
HO O O O O O OO
C C C C
+ + +
H3N CH pK1 = 1.82 H3N CH pKR = 6.00 H3N CH pK2 = 9.17 H N CH
2

CH2 CH2 CH2 CH2


H H H H
C N C N C N C N
CH CH CH CH
C N C N C N C N
H + H H + H H H

Histidine
12

10

8
pH

0
0 0.5 1 1.5 2 2.5 3
OH-equivalents (mol/mol)
8 1 Amino Acids

First we write down the various forms of the molecule, with the corresponding
pKa values between them. Then we count the positive and negative charges on each
form and identify the electrically neutral one. The pI can be calculated as the average
between the pKa s on either side of it, just as it is done for molecules with only 2
ionisable groups.
Glutamic acid, for example, (see Fig. 1.3, middle) at very low pH carries only
one positive charge at the amino group (1st form). As the pH increases beyond
2:19, more and more protons are lost from the carboxy group on C’. This (2nd)
form carries one positive and one negative charge, and is the neutral one. Around
pH 4:25, a second proton is lost from the terminal carboxy group (3rd form, one
positive, two negative charges) and around pH 9:67 a further proton is lost from the
amino group (4th form, no positive and 2 negative charges). Thus the pI is calculated
as the average of 2:19 and 4:25, which is 3:22.
The rational with histidine (see Fig. 1.3, bottom) is similar. The electrically
neutral form is the 3rd, and the pI is the average of 6:00 and 9:17, which is 7:59.
The solubility of amino acids is lowest at the pI, as interaction with water is
reduced.

This has clinical applications: In the inherited disease cystinuria large


amounts of cystine are excreted with the urine. As cystine is poorly soluble at
neutral pH, this can result in kidney stones. Ensuring a urine pH > 8:5—
well above the pI—makes cystine more soluble and reduces the amount
precipitating in the urinary tract.

Solubility of Proteins Is Minimal at the pI


Just as for amino acids, the pI of a protein is the pH at which there are as
many positive as negative charges. At this point solubility of the protein is
minimal, which can be used to purify a protein by precipitation (see Sect. 3.1.2
on page 66). The behaviour of proteins toward ion exchange columns and
during electrophoresis also depends on charge. Because the number of ionic
interactions between the amino acids of a protein is smallest at the pI, protein
stability is reduced.
Naïvely, the pI of a protein could be calculated simply from the amino acid
composition and the HENDERSON–HASSELBALCH-equation:
X
n
1 X
n
1
n D nC D (1.2)
iD1
1 C 10pK pH iD1
1 C 10pHpKC

with pK and pKC the pKa -values of positive and negative groups, respect-
ively.

(continued)
1.3 The One-Letter Code 9

Interactions between charged groups within the protein significantly


change the pKa -values [1]. In addition, even the tabulated pKa -values for
amino acids in water vary among different authors by about 0:5 pH-
units. Thus the determination of protein charge at different pH and of
the pI has to be done in vitro rather than in silico. There are, however,
programs available which try to estimate the pI of a protein, for example,
protcalc.sourceforge.net/. On isoelectric.ovh.org/ the
issue is discussed further and the algorithms commonly used are described.

1.3 The One-Letter Code

In most cases amino acid names are abbreviated with the first three letters of their
names. These abbreviations are easy to remember, however, they use up unnecessary
memory in computer databases. The 22 proteinogenic amino acids can be encoded
by the 26 letters of the Roman alphabet (leaving space for some rare amino acids),
and then each amino acid in a protein sequence uses up only 1 rather than 3 bytes of
storage space. Unfortunately, several amino acids start with the same letter (such as
Ala, Arg, Asp, and Asn), thus we cannot simply use the first letter to encode them.
The following list should help you to remember one-letter codes:
• Amino acids with a unique first letter: Cys, His, Ile, Met, Ser, Val
• Where several amino acids start with the same letter, common amino acids are
given preference: Ala, Gly, Leu, Pro, Thr
• Letters other than the firstst letter are used for Asn (asparagiN), Arg (aRginine),
Tyr (tYrosine)
• Similar sounding names: Asp (asparDic acid), Glu (glutEmate), Gln (Qtamine),
Phe (Fenylalanine)
• The remaining amino acids have letters that do not occur in their name: Lys (K
close to L), Trp (W reminds of double ring), Sec (U), Pyl (O)
• X is used as placeholder, meaning “any amino acid”. B is used for “Asp or Asn”,
Z for “Gln or Glu”, J for “Ile or Leu”. The - is used to denote gaps in a protein
sequence, e.g., in sequence alignments. h is used to denote hydrophobic amino
acids (do not confuse with H for His!)
In the medical literature these codes appear mostly to denote mutations: A123Q
means that the Ala in position 123 is replaced by Gln. Certain sequences of amino
acids occur in several different proteins, where they serve a special function. Such
conserved motives are usually named after their 1-letter amino acid abbreviations.
Thus you may encounter KDEL-motives or DEATH-ATPases.
10 1 Amino Acids

1.4 Biological Function of Amino Acid Variety

You may now ask why there are so many different amino acids. The answer is
that these different molecules have different properties, that let them serve different
functions inside proteins (see also Table 1.2).
One difference you have already learned about: there are amino acids whose side-
chains can bear positive or negative charges, whereas other side-chains are always
uncharged. Charged side-chains have different pKr , which can be can be influenced
strongly by neighbouring amino acids, for example, Cys (pKr = 5–10), His (pKr =
4–10 and the carboxylic acid group of Glu and Asp (pKr = 4–7). This is important
for proton transfer reactions in the catalytic centre of proteins (acid/base catalysis;
see Sect. 5.5 on page 131). Ionisable groups also form the ionic bonds (salt bridges)
which stabilise protein tertiary structure (see page 32).
Asp, Glu, and His residues can chelate bivalent metal ions including Fe; Zn and
Ca. This is important for enzymes with metal cofactors, in hæmoglobin and in some
regulatory proteins such as calmodulin.
Some amino acids are hydrophilic (= water friendly) because they carry
ionised or polar groups (COOH; NH2 ; OH; SH). Other amino acids are
hydrophobic (= water fearing, fat friendly), with long aliphatic (Ile, Leu, Val), or
aromatic (Phe, Trp) side-chains. If these residues point into the solution, they force
water molecules into a local structure of higher order (i.e., lower entropy), which
is unfavourable. Burying these residues in the interior of the protein avoids this
penalty; this is the molecular basis for hydrophobic interactions.
Some amino acids have small side-chains (such as glycine), others very big,
bulky ones (such as tryptophan). The small hydrogen residue of Gly not only fits
into tight spaces (see section on collagen (page 324) for an important example), but
because it has no “-carbon it can assume secondary structures (see Sect. 2.2 on page
20) that are forbidden for all other amino acids.
Proline has its nitrogen in a ring structure, which makes the molecule very stiff,
limiting the flexibility of protein chains.
The SH-group of Cys, the unprotonated His and the OH-group of Ser and Thr
are nucleophiles which are essential residues in the active centre of many enzymes.
Some amino acids confer properties to the protein which can be used in the
laboratory: Met binds certain heavy metals which are used in X-ray structure
determination and reacts with cyanogen bromide (BrCN) leading to protein
cleavage. Cys and Lys are easily labelled with reactive probes. Aromatic amino
acids, in particular Trp, absorb UV-light at 280 nm; this can be used to measure
protein concentration. In addition they show fluorescence, which can be used to
measure distances, and their variation during conformational changes, in proteins.
Table 1.2 Properties of the 21 amino acids encoded in a mammalian genome. Posttranslational modification may change these properties considerably.
The helix propensity measures the energy by which an amino acid destabilises a poly-alanine helix
Amino acid 3-letter 1-letter Mr pK1 pK2 pK3 pI Hydropathy Helix propensity Surface Volume Abundance
(Da) (COOH) (NHC3 ) (R) (kJ/mol) (Å2 ) (Å3 ) (%)
Alanine Ala A 89 2.34 9.69 – 6.01 C1:8 0.00 115 67 9.0
Arginine Arg R 174 2.17 9.04 12.48 10.76 4:5 0.21 225 167 4.7
Asparagine Asn N 132 2.02 8.08 – 5.41 3:5 0.65 160 148 4.4
Aspartic acid Asp D 133 1.88 9.60 3.65 2.77 3:5 0.43 150 67 5.5
Cysteine Cys C 121 1.96 8.18 10.28 5.07 C2:5 0.68 135 86 2.8
Glutamic acid Glu E 147 2.19 9.67 4.25 3.22 3:5 0.39 180 114 6.2
Glutamine Gln Q 146 2.17 9.13 – 5.65 3:5 0.16 190 109 3.9
Glycine Gly G 75 2.34 9.60 – 5.97 0:4 1.00 75 48 7.7
1.4 Biological Function of Amino Acid Variety

Histidine His H 155 1.82 9.17 6.00 7.59 3:2 0.56 195 118 2.1
Isoleucine Ile I 131 2.36 9.68 – 6.02 C4:5 0.41 175 124 4.6
Leucine Leu L 131 2.36 9.60 – 5.98 C3:8 0.21 170 124 7.5
Lysine Lys K 146 2.18 8.95 10.53 9.74 3:9 0.26 200 135 7.0
Methionine Met M 149 2.28 9.21 – 5.74 C1:9 0.24 185 124 1.7
Phenylalanine Phe F 165 1.83 9.13 – 5.48 C2:8 0.54 210 135 3.5
Proline Pro P 115 1.99 10.96 – 6.48 1:6 3.16 145 90 4.6
Selenocysteine Sec U 168 2.16 9.40 5.20 3.68 rare
Serine Ser S 105 2.21 9.15 13.60 5.68 0:8 0.50 115 73 7.1
Threonine Thr T 119 2.11 9.62 13.60 5.87 0:7 0.66 140 93 6.0
Tryptophan Trp W 204 2.38 9.39 – 5.89 0:9 0.53 255 163 1.1
Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 1:3 0.49 230 141 3.5
Valine Val V 117 2.32 9.62 – 5.97 C4:2 0.61 155 105 6.9
11
12 1 Amino Acids

1.5 Exercises

1.5.1 Problems

1.1. Define the isoelectric point of a compound.


1.2. Connect the following properties with amino acids:
1) hydrophobic A) Tryptophane
2) positively charged B) Serine
3) small C) Lysine
4) polar D) Glycine
5) aromatic E) Glutamine
1.3. Lysine has the pKa -values 2.18 (carboxy-group), 8.95 (’-amino group) and
10.53 (–-amino group). The pI is .
1.4. You are working on a research project to elucidate the reaction mechanism of
an enzyme. You think that a particular serine residue in the protein is required for
catalytic activity. To test this hypothesis you want to genetically replace this amino
acid by another, and then test whether the enzyme is still active.
Which amino acid should you choose to replace the Ser?
A Threonine
B Alanine
C Tryptophan
D Glutamic acid
E Histidine
1.5. Which of the following tripeptides would be the most soluble in 1 M NaOH:
A) Phe-Ala-Val
B) Glu-Gly-Asp
C) Gln-Gly-Asn
D) Lys-Arg-His
E) Trp-Lys-Asn

1.5.2 Solutions

1.1 The isoelectric point of a compound is the pH, at which it has an equal number
of positive and negative charges.
1.2 Trp is aromatic and hydrophobic, Lys positively charged (-amino group), Gly
small (only hydrogen as side-chain), Ser is polar (OH-group). Gln does not fit into
any of these categories.
1.3 Below pH 2.2 Lys has 2 positive charges from the fully protonated ’- and
–-amino groups, and no negative charge because the carboxy group will be mostly
Reference 13

protonated as well. Above pH 2.2, the carboxy-group will lose its proton, resulting
in 1 negative and 2 positive charges, and a net charge of +1. Beyond pH 8.95, the ’-
amino-group will lose its proton and there will be one negative and positive charge,
resulting in a net charge of ˙0. Beyond pH 10.53, the proton on the –-amino group
will be lost as well, resulting in one negative and no positive charges. Thus the pI is
calculated as 1=2  .8:95 C 10:53/ D 9:74.
1.4
A Thr also has the OH-group, only has an additional C. It would likely work just
as well as Ser, the experiment would not answer the question.
B Ala is Ser minus the OH group which is the catalytically active part of Ser.
Thus Ala would be ideal to test the hypothesis.
C Trp is the bulkiest of all amino acids. If the enzyme were no longer active after
the replacement, you would not know whether this was because of the lack of the
catalytically active OH-group or because of disruption of the 3D-structure of the
enzyme.
D Glu is acidic. If you use it to replace a polar but uncharged amino acid the 3D
structure of the enzyme would likely be perturbed by salt-bridge formation. If
the enzyme didn’t work afterwards, you’d not know whether this was because of
changed 3D structure or because of the missing OH-group.
E His is basic. If you use it to replace a polar but uncharged amino acid the 3D
structure of the enzyme would likely be perturbed by salt-bridge formation. If
the enzyme didn’t work afterwards, you’d not know whether this was because of
changed 3D structure or because of the missing OH-group.
1.5
A) Phe-Ala-Val All three amino acids are hydrophobic, this peptide would be
only very sparingly soluble in both water and base.
B) Glu-Gly-Asp Glu + Asp are acidic residues and give additional charges at
alkaline pH.
C) Gln-Gly-Asn Gln and Asn have acid amide functional groups, which are
somewhat polar but do not become charged at high pH. Gly is hydrophobic.
D) Lys-Arg-His All three amino acids are basic, at high pH they will be
uncharged. This peptide would be very soluble in acid, but not in base.
E) Trp-Lys-Asn A bulky hydrophobic, a basic and a somewhat polar amino
acid would give low solubility in water and base, but somewhat better solubility
in acid.

Reference

1. G.R. Grimsley, J.M. Scholtz, C.N. Pace, A summary of the measured pK values of the ionizable
groups in folded proteins. Protein Sci. 18, 247–251 (2009). doi: 10.1002/pro.19
Chapter 2
Protein Structure

Nature is greatest in the smallest things.


(C. LINNÆUS: Systema Naturae)

Abstract Peptides and proteins are made by condensation of amino acids, forming
peptide bonds. The sequence of amino acids in a protein is called its primary
structure. Secondary structure is determined by the dihedral angles ; of the
peptide bonds, the tertiary structure by the folding of protein chains in space.
Association of folded polypeptide molecules to complex functional proteins results
in quaternary structure. Proteins can be further modified by posttranslational
addition of small molecules.

2.1 Primary Structure

A peptide bond is formed by the condensation of two amino acids under elimination
of water (see Fig. 2.1). Addition of further amino acids to the chain leads to
tripeptides, tetrapeptides, and so on. Chains of up to 20 amino acids are called
oligopeptides (oligo D few), and longer ones polypeptides (poly D many). Proteins
are polypeptides with a biological function.
Polypeptides range in size from a few amino acids to thousands; Fig. 2.2 shows
aspartam, a dipeptide. Proteins consist either of a single polypeptide chain, or
they are formed from separate polypeptide chains called subunits. Some proteins
contain other covalently bound components, prosthetic groups, and posttranslational
modifications (see below).
The sequence of amino acids in a protein is called its primary structure. In
biochemistry, this is always given starting with the N-terminal and ending with the
C-terminal amino acid, because this is the order in which amino acids are added
during protein synthesis in the cell (this process is discussed in detail in textbooks
of molecular cell biology, e.g., [1, 21]).

© Springer International Publishing Switzerland 2015 15


E. Buxbaum, Fundamentals of Protein Structure and Function,
DOI 10.1007/978-3-319-19920-7_2
16 2 Protein Structure

Carboxy-terminal O
end +
O
O H3N C
O C O O C
C R´ H 2O
O C O R" C R"
C
+ R´ C HN
H 3N
NH C O
+ O C R´ C
C R NH
O +
H3 N H2O O C
C R
O C Amino-terminal C R
+
end +
H3N H3N

Aminoacids Dipeptide Tripeptide

Polypeptide Oligopeptide
(> 20) (< 20)

Fig. 2.1 Polycondensation of amino acids to peptides and proteins. Polycondensation is a reaction
where organic molecules react with each other via their functional groups, producing small
molecules (here: water) in addition to a macromolecule

Fig. 2.2 Aspartam is a O O


peptide used as an artificial C CH3
O O
sweetener in the food industry C N CH
+ H
H3N CH CH2
CH2
-
COO

aspartyl-phenylalanine-1-methyl ester
(Aspartam®)

C˛ , C0 , the nitrogen and the oxygen atom of the peptide bond form a single
plane. The bond between C0 and N is somewhat shorter than a normal CN single
bond, because of mesomery with the CDO double bond (see Fig. 2.3). Since the
lone electron pair of N enters into the partial double bond it can no longer accept a
proton; the N in a peptide bond is not basic.
Thus the peptide bond has a “partial ( 40 %) double bond character” (see
Fig. 2.4). Like the CDC bond, it is planar and cannot rotate. The H and O of
the peptide bond are in the trans-configuration. Formally, we express the same
2.1 Primary Structure 17

Fig. 2.3 The geometry of the peptide bond. Left: The bonds between AB and BC define a
plane, as do the bonds between BC and CD. The angle between these planes is called the
dihedral angle of the BC bond. For the standard way to determine this angle orient that bond
into the paper plane, so that the neighbouring atoms (here A and D) point upwards. Then measure
the angle formed: clockwise is positive, anticlockwise negative. Right: Because of mesomery, the
dihedral angle of the bond between the carboxy-carbon (C0 ) and the nitrogen (!) is fixed to 180°,
with N; H; C and O lying in a single plane. Slight deviations are possible, but rare. The bond angle
of C˛ () is 110:8° ˙ 2:5° for 86 299 residues investigated in a recent study [41]. Variable are the
dihedral angles  and , which determine the secondary structure of a protein (see next section)

idea by saying that the dihedral angle ! of the peptide bond is fixed to 180° (see
Fig. 2.4). Because of the bulky R-groups, the trans-configuration is more stable with
most amino acids (99:7 % probability). The exception is Pro, which occurs in cis-
configuration much more frequently than other amino acids (5:8 % probability, see
Fig. 2.5).
On the other hand, the NC˛ and C˛ C0 bonds are normal single bonds; rotation
around those is possible. The angles of rotation are named  and , respectively.
Rotation in the peptide chain is limited by two factors. First, at certain angles  and
around one amino acid an atom of that amino acid would collide with an atom
of the following amino acid (see Fig. 2.6). The angles ; which result in a clash
between C0n DO and NnC1 H are defined as 0°, 0°. Additionally, size and charge of
the R-groups can make certain positions more stable than others.
Thus in a plot of  versus (RAMACHANDRAN-plot [18, 27], see Fig. 2.9 on
page 23) there are regions that are sterically forbidden, there are fully allowed
regions with no steric hindrance, and there are unfavourable regions which can be
assumed by slight bending of bonds.
18 2 Protein Structure

C-terminus

N-terminus N-terminus
Cα Cδ
Cβ Cγ
α−1 α−1
C N C N
C'-1 Cγ C'-1 Cβ
Cδ Cα

O O
C-terminus
cis trans

Fig. 2.4 cis-trans-isomery around the peptide bond. Because the C0-1 N-bond has the character
of a partial double bond, rotation around this bond cannot occur and cis-trans isomery results. For
steric reasons the trans-configuration is much more probable than the cis-. Pro is unusual in that
the cis-configuration has a probability of 5–6 %, which is about 100 times higher than with other
amino acids

Proline is again a special case because the peptide nitrogen is part of a ring
structure; this limits  to values between 35° and 85°. As I will show in a
moment, this has considerable consequences for protein secondary structure.
Because glycine has only a hydrogen as R-group, steric hindrance is much less a
problem than with other amino acids. Thus in a RAMACHANDRAN-plot Gly can be
found in regions forbidden for other amino acids (Fig. 2.9).

Internet Resources on Protein Structure


Amino acid sequences of proteins and the nucleotide sequences of their
genes are stored in databases that are accessible from the Worldwide Web,
such as the nonredundant protein sequence database Owl (http://www.bioinf.
man.ac.uk/dbbrowser/OWL/) and ExPASy (http://au.expasy.org/). Multiple
sequence alignment services are offered by BCM (http://searchlauncher.bcm.
tmc.edu/multi-align/multi-align.html). For taxonomic questions in general,
http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html may be consul-
ted. Several discussion groups in the bionet hierarchy on Usenet deal with
proteins, especially bionet.molbio.proteins.

2.1.1 Protein Sequences and Evolution

The great variety of proteins that can be observed today has arisen from a much
smaller number of ancestors during evolution. This can be shown by comparing
the primary structure of proteins; the more similar they are, the closer they are
related [10].
2.1 Primary Structure 19

Distribution of proline dihedral angle ω

10
Helix (n = 2604)
Sheet (n = 2846)
Other (n = 14,351)
relative frequency (%)

0,1

0,01

-150 -100 -50 0 50 100 150


ω (°)

Fig. 2.5 Distribution of dihedral angle ! of Pro in 1453 “non-redundant” proteins whose structure
is known with a resolution  1:5 Å. Of the 431 146 amino acid residues 19 801 were proline. In ’-
helices, all of the 2604 Pro occurred in trans (!  180°). In “-strands, 73 out of 2846 (2:6 %) Pro
occurred in cis (!  0°), in “other” secondary structures 1071 out of 14 351 Pro (7:5 %) were in
cis. Almost all other Pro residues occurred in trans, the number of Pro residues with intermediate
! was 0, 1, and 11, respectively

Because each position in the primary structure can be occupied by any of the
20 common amino acids (and occasionally also by Sec and Pyl), the possible
number of combinations is huge. For example, a protein with 100 amino acids has
20100 D 1:3  10130 possible sequences. Given that our universe is about
13:7  109 a  4:32  1017 s old, creationists have argued that proteins cannot
have been created by a process of random mutation and selection. This argument is
fallacious, however, because it makes the (unspoken!) assumption that the function
of a protein can only be met by one particular amino acid sequence. The existence
of isoenzymes – proteins with different structure but the same function – proves
this assumption wrong (see Fig. 2.8). Interestingly, the study of protein- and DNA-
sequences [10] has confirmed, and in many cases given us additional details on, the
tree of life first proposed by C. DARWIN [9].
20 2 Protein Structure

Fig. 2.6 Some ; -combinations lead to collisions between atoms of neighbouring amino acids.
Such angles are forbidden; the dihedral angles ; which result in a clash between C0n DO and
NnC1 H are defined as 0°, 0°. The green balls represent the R-groups. In addition to these next-
neighbour effects, some angles also lead to collisions between amino acids further apart. This
is a stereogram: if you look at the images cross-eyed, you will see three figures, the middle of
which is three-dimensional (this takes some practice and is not a required skill for a physician).
Cheap lorgnette-style stereo viewers are available on the Internet to help with this. They are built
around two prisms which ensure that each eye sees only the image intended for it (see http://www.
shortcourses.com/stereo/stereo1-7.html for examples)

2.2 Secondary Structure

The secondary structure of a protein is any regular, repetitive folding pattern in the
molecule. It is stabilised by hydrogen bonds (see Fig. 2.13) between the amino- and
keto-groups of the peptide bonds, which carry a partial positive and negative charge,
respectively (see Fig. 2.3b). Although each hydrogen bond has only a relatively
small bond energy ( 5 kJ/mol), the sum of the bond energies over all hydrogen
bonds in a protein is considerable.
The following structural motives are particularly common [19, 28].
2.2 Secondary Structure 21

Fig. 2.7 Interpretation of structure diagrams of proteins, using the pentapeptide HTCPP. (a)
Space-filled diagrams show the true (VAN DER WAALS-) extension of a molecule, but even in short
peptides clarity is lost. (b) A wire diagram is clearer. The centres of the atoms are connected by
thin lines; hydrogens are not drawn. Wire diagrams look a little like structural formulas in organic
chemistry, however, atoms are shown in their true three-dimensional arrangement. (c) For larger
proteins even wire diagrams would be too cluttered. Thus the atoms forming the protein backbone
are connected by a thick line, which is used to represent the amino acid chain. (d) If the protein
contains disulphide bonds, showing only the backbone trace leaves the disulphide bond dangling
in free space, thus (e) this bond is often shown (incorrectly, but easier to interpret) connecting
the backbone traces. (f) A further abstraction is achieved by showing elements of secondary
structure instead of the backbone trace: alpha helices as red helices and “-strands as yellow arrows,
connected by the backbone trace of coils and turns (in grey). Other colouring schemes used in this
book are N-terminal (red) to C-terminal (purple), different colours for different chains or “shapely
colours” (a quasi-standard in molecular modelling) for different amino acids. Shown here is “-
lactamase (PDB-code 1m40, a protein that confers penicillin resistance to bacteria)
22 2 Protein Structure

Fig. 2.8 Subtilisin (PDB-code 3VYV, left) and Chymotrypsin (PDB-code 1oxg, right) are both
Ser-proteases that use the classical catalytic triad (Ser, His, Asp, shown as wire diagram) in the
catalytic centre to cleave proteins. These amino acids are far apart in the sequence, but close to
each other in the folded protein. Both proteins, however, have completely different sequences and
secondary structures. This is an example of convergent evolution (“re-inventing the wheel”). The
proteins are called iso-enzymes (iso = the same)

2.2.1 The ˛-Helix

The polypeptide chain is wound in a counterclockwise spiral around an imaginary


axis, with 3:6 amino acids per turn. Such a spiral is called righthanded, because
if you hold your right hand with the thumb pointing from N- towards C-terminus
the fingers curl counterclockwise. Left-handed helices are improbable for L-amino
acids, because the “-carbon and carbonyl-oxygen would collide.
Each turn is about 5:4 Å long, the pitch per amino acid is 1:5 Å. The angle
between successive residues is about 100°,  D 57°, D 47°. The R-groups
point outward. This compact, rod-like structure is maintained by hydrogen bonding
between a carboxy-oxygen (partial negative charge) and the amino hydrogen (partial
positive charge) 4 amino acids further along the chain. A full turn of the helix has
3.6 amino acids. This “hydrogen bond loop” totals 13 atoms, hence ’-helices are
sometimes called 3:613 -helices.
Because all N-termini point in the same direction, an ’-helix has a dipole moment
and can bind to charged molecules. Proline and glycine don’t fit well into the ’-
helix. The ’-helix is the most common secondary structure in proteins. It occurs
both in many fibrous (long, stretched-out) proteins (such as myosin and keratin),
and in many globular (compact-shaped) proteins. Often ’-helices have a polar side
(facing the outside of a protein) and a nonpolar one which is buried in the interior
(amphipatic helix).
Proline introduces kinks when it occurs within ’-helices, as the secondary amide
cannot donate hydrogen bonds. However, it is ideally suited as the N-terminal amino
acid in helices, where other amino acids would have a “lonely” backbone-amide.
2.2 Secondary Structure 23

Fig. 2.9 RAMACHANDRAN-plot of “representative/non-redundant” proteins with  1:5 Å resol-


ution from the PDB database (2014-05-27, 1453 proteins with 431 146 amino acids). For each
combination of dihedral angles  and , the frequency was counted (bin size 1°) and represented
by a colour of the rainbow (from purple D rare to dark red D frequent). In the top left corner of
the diagram are the extended structures (“-strands, PII -helix and turn). The big peak below is at the
coordinates of the ’-helix; the small peak to the right is the left-handed ’l -helix, a structure that
can be only a few amino acids long. Note that certain ; -combinations result in steric clashes,
these therefore do not occur in proteins (frequency D 0, represented by white)

Function of ’-helices:
• An ’-helix of 22 amino acids is long enough to span a double membrane. The
part of the helix that is inside the membrane consists of hydrophobic amino
acids that can interact with the lipid tails of the membrane. Hydrophilic amino
acids on both ends interact with the cytosol and the interstitial fluid, respectively.
The cytosolic end has positively charged amino acids and the extracellular end
more negatively charged ones, because the potential of a cell is negative inside
(70 mV). Thus the correct orientation of the protein is ensured by the electrical
field. At the interface between the membrane and the aqueous environment one
finds predominantly aromatic amino acids and Lys.
• Amphipatic ’-helices at the N-terminus of a protein serve as recognition sites for
the import into mitochondria. Every 4th or 5th amino acid is positively charged,
so that all positive charges are in the same quadrant of the helix (see Fig. 2.10
and Fig. 16.10 on page 375).
• Two ’-helices wound around each other form a coiled coil. Keratin consists of
such coiled-coils (see Fig. 2.11). These are held together by disulphide bonds.
24 2 Protein Structure

Fig. 2.10 Signal-peptide for import into mitochondria. Most mitochondrial proteins are encoded
in the nucleus; they are synthesised in the cytosol and then imported via a transport system that
spans both mitochondrial membranes (see Fig. 16.10 on page 375). An amphipatic ’-helix serves
as the recognition signal for binding of the nascent protein to the transporter. Note that the helical
wheel projection is viewed from the N-terminus

Breaking these with thioglycolic acid is the basis of the permanent wave.

• Heptad-repeats (Leu-zippers) are ’-helices where every 7th amino acid is leucine
(Fig. 2.12). Such helices associate because of hydrophobic interactions between
the Leu-residues, allowing for specific dimerisation of proteins. Some DNA-
binding proteins have this structure.

2.2.2 ˇ-Strand

In the “-strand, the polypeptide backbone is stretched out with ;  120°, 120°.
Several strands are aligned either in a parallel (all carboxy-terminal ends are at the
same side) or antiparallel fashion, forming hydrogen bonds between a NH group
of one strand and a CDO-group in a neighbouring strand. This gives rise to a large
blanket-like structure, the “-pleated sheet. The main difference between the ’-helix
and “-strand is that in the ’-helix hydrogen bonds occur between residues of the
same helix, whereas in a “-pleated sheet they occur between residues of neigh-
bouring strands (see Fig. 2.13). Nevertheless, a single “-strand is stable because
2.2 Secondary Structure 25

Fig. 2.11 Keratin is a heterodimer that forms coiled-coils. Depicted here is coil 2B of keratin 5
and 14 (PDB-code 3tnu; the structure of the entire keratin molecule has not been solved yet). The
helices are shown as green and orange ribbons; in addition the protein surface is shown (blue D
basic, red D acidic, yellow D polar and grey D nonpolar)

Fig. 2.12 In heptad-repeats (Leu-zipper, here tropomyosin, PDB-code 1ic2) every seventh amino
acid is Leu. This leads to specific associations of ’-helices by hydrophobic interactions

the amino acids in this extended structure have plenty of “wiggling” space without
running into steric hindrance (look up the coordinates in the RAMACHANDRAN
plot!), resulting in entropic stabilisation. The R-groups point up- and downwards in
turn, making amphipatic sheets with polar and nonpolar or positive and negative
faces possible. The entire sheet is rarely flat, but has a right-handed twist, in
extreme cases forming a “-barrel (see Fig. 2.13). In schematic diagrams of protein
structure each “-strand is drawn as a broad arrow.
26 2 Protein Structure

Fig. 2.13 Hydrogen bonding in ’-helix (left, cytochrome b562 , PDB-code 256B) and “-sheets
(right, E. coli OmpA, PDB-code 1QJP). In an ’-helix all hydrogen bonds between keto- and
amino-groups in the protein backbone occur between neighbouring amino acids of the same helix.
In “-sheets, however, all such hydrogen bonds occur between amino acids in different strands,
alternating between the right and left neighbour

Fig. 2.14 Anti-parallel “-pleated sheet of the silk fibroin N-terminal domain (FibNT) from the
silkworm Bombyx mori L. at pH 4.7 (PDB-code 3ua0). Two subunits (red to yellow and cyan to
blue) form the sheet. Neighbouring strands have alternating directions, and are joined by “-turns.
Hydrogen bonds holding the sheet together run at right angle to the strands

2.2.2.1 The Antiparallel “-Sheet

In an antiparallel “-sheet (see Fig. 2.14) the stands point in alternating directions.
They are usually joined together by “-turns (see later). Ideal ; D 138°,
137°. Silk-protein is an example for the use of “-sheets in biologically important
structures. The amino acids within a “-strand are already in an extend conformation,
therefore silk shows little elasticity and has an extremely high tensile strength, as any
extension would require breaking covalent bonds. On the other hand, the strands are
held together by hydrogen bonds only, giving silk cloth this wonderful soft flow. At
neutral pH in the silk gland the fibroin protein has soluble random coils; as the silk
thread is ejected, acidification leads to “-sheet formation and precipitation of the
2.2 Secondary Structure 27

Fig. 2.15 Parallel “-pleated sheet (here PDB-code 2v9s). All strands have the same direction; the
“return-legs” are either ’-helices or coils. Hydrogen bonds holding the sheet together run obliquely
to the strands

protein. Each silkworm cocoon is made from a single silk thread that is 900 m long
and has a diameter of 10 µm.

2.2.2.2 The Parallel “-Sheet

In a parallel “-sheet (see Fig. 2.15) the N-termini of all strands point in the same
direction, ideal ; D 116°, 111°. The hydrogen bonds are oblique to the strand
direction, hence the parallel “-sheet is less stable than the antiparallel. The strands
in a parallel “-sheet are often joined by ’-helices, which form the “return-leg”.
The ideal parallel and antiparallel “-sheets are characterised by different ; -
values. However, because of the twisting of strands in a “-sheet there are no separate
peaks for them in the RAMACHANDRAN-plot; rather, they merge into a single big
area of extended structure.

2.2.2.3 “-Helix and “-Roll

Parallel “-strands can be wound into right-handed coils (see Fig. 2.16), containing
either two (“-roll) or three (“-helix) strands per rung [40]. The “-helix is found in
enzymes whose substrates are oligosaccharides (e.g., pectinases); it also occurs in
tailspike proteins of bacteriophages and in amyloid aggregates, the cause of several
debilitating diseases (see Sect. 10.2 on page 206). In a “-helix one or two of the three
“-strands may be replaced by ’-helices (see Fig. 2.17). Three “-helices or -rolls can
be arranged in coils, similar to the ’-helical coiled-coils.
28 2 Protein Structure

Fig. 2.16 Top: The tailspike protein from the bacteriophage Sf6 contains a “-helix with three
parallel “-sheets (PDB-code 2vbk). Three such helices are wound around each other, forming
a coiled-coil (not shown). Bottom: Alkaline protease from Pseudomonas aeruginosa (PDB-code
1kap) is an example for a “-roll with two parallel “-sheets. The structure is stabilised by Ca2C -ions

2.2.3 The PII (syn.: Poly-Pro or Polypeptide II) Helix

The PII helix is left-handed with three residues per turn and ; D 70°, 140°.
As is the single “-strand, it is stabilised by entropy, not by hydrogen bonds. Pro
frequently occurs in this structure, but not all PII helices contain Pro.

2.2.3.1 Collagen

Collagens (see Sect. 14.1.1 on page 324 for a more detailed discussion) are the
most important example for the PII -helix, they consist of three PII -helices wound
around each other (hetero- or homo-oligomer, see Fig. 2.18). The human genome
contains 42 collagen genes, which encode for 28 known collagen types. Of these
types I, II, and III are the most important. Each of the three molecules in collagen
has 1050 amino acids, with the sequence Gly-X-Pro. The angle of the Pro peptide
bond (amino group part of a ring) allows the sharp turn in the molecule [3], and the
2.2 Secondary Structure 29

Fig. 2.17 In porcine ribonuclease inhibitor (PDB-code 2bnh) ’-helices and “-strands form a ’/“
coil. To prevent side-chain interference, the ’-helices have to twist, resulting in a circular, rather
than helical, structure

Fig. 2.18 Collagen (PDB-code 1cag). To make the tight association between the three strands
clearer, one each is drawn space-filling, as wire diagram and as carbon-backbone. Note the
repeating Gly-X-Pro (yellow, green, brown; with X often hydroxy-Pro) sequence. Marked in blue
is a Gly!Ala mutation that prevents a close fit and destabilises the molecule. Such mutations
cause, for example, EHLERS-DANLOS-syndrome

small R-residue of Gly (only a H) allows the three protein molecules to wrap tightly
around each other. The resulting “rope” has a tensile strength higher than steel.
If only a single one of the Gly-residues in one of the collagen chains is mutated,
wrapping is no longer possible, leading to osteogenesis imperfecta (brittle bone
disease, collagen I), to EHLERS-DANLOS-syndrome (collagen I, III, or V), with too
brittle or too elastic ligaments and death by vascular or organ rupture, epidermolysis
bullosa (blistering of skin, collagen XVII), or to ALPORT-syndrome (collagen IV,
kidney, and hearing defects).
30 2 Protein Structure

Fig. 2.19 “-turns (here in PDB-code 1qiv) are most common between the strands of an anti-
parallel “-sheet

Heating turns collagen into gelatine. The dissociation temperature of collagen is


influenced by the hydroxylation of proline. Vitamin C (ascorbic acid) is required
for the correct function of Pro-hydroxylase (see Fig. 14.4 on page 328). In scurvy,
the dissociation temperature of collagen drops below the body temperature of 37 ıC,
explaining the connective tissue weakness typical for this condition.
Apart from collagen, PII -helices also occur in SH3-domains, which occur in
proteins involved in signal transduction.

2.2.4 Hairpin Turns

Hairpin Turns allow the protein to fold back onto itself in a 180° angle. They are
important, for example, between the different strands of an antiparallel “-sheet.
Because the CDO- and NH-groups of a turn are not all involved in hydrogen bond
formation within the protein, they are often surface-exposed and interact with water.
They may also occur in the catalytic centre of enzymes, where they are involved in
substrate binding.
Turns can contain 4 amino acid residues (“-turn, frequent, with a hydrogen bond
between NHi ::: ODCi3 , see Fig. 2.19) or 3 (”-turn, rarer). Turns with 2 (•), 5 (’)
or 6 ( ) amino acids have been described, but are very rare. Turns often contain Gly
(smallest amino acid) or Pro residues, the latter because of its specific value of ; in
addition the C˛ and Cı of Pro can undergo CH:::  interactions with neighbouring
aromatic amino acids, which—although not as strong as regular hydrogen bonds—
can stabilise the turn [3]. Turns may also be found in the catalytic centre of an
enzyme (with a CH:::  bond between Pro and an aromatic substrate). The different
types of turns can be distinguished by the ; -values of their peptide bonds [39].
2.2 Secondary Structure 31

”-Turns and Cardiovascular Disease


”-turns occur in fibrinogen; these are recognised by the GPIIb-IIIa receptor
on platelets, leading to platelet aggregation and thrombus formation. Mimetics
that structurally resemble ”-turns can block that receptor and are candidates
for drugs that reduce blood clotting in patients at risk for strokes.

2.2.5 Rare Structures

In addition to the ’-helix there are two other, much rarer helical conformations
310 -helix 3 residues per turn and a hydrogen bond between residues i and i+3
(; D 50°, 25°). It occurs only at the C-terminal end of ’-helices, and can
be only 4–5 residues long.
-helix with 5 residues per turn and a hydrogen bond between residues i and
i+5. The -helix is usually only 7–10 amino acids long and flanked on both ends
by ’-helices. In effect, it introduces a kink into a long ’-helix (see Fig. 2.20).
Evolutionary, they are created by insertion of an amino acid into an ’-helix.
They occur at least once in about 15 % of proteins, often in the active site of an
enzyme. -helices have variable ; -values.
There are some other structures which occur in only a few proteins, but have
important functional roles. We discuss those when we talk about some special
proteins. Each secondary structure can be characterised by the ; angles in the
protein backbone (see Fig. 2.9).

Fig. 2.20 -helix (cyan) introduces a kink between the flanking ’-helices, here human ferritin
(PDB-code 3ajo)
32 2 Protein Structure

2.2.6 Coils

Coils are any structure except those mentioned above. Note that amino acids in coils
still have a defined position within the structure of a protein, thus the terms “random
coil” or “unordered”, sometimes found in the literature, are misleading. These
areas have an important function too, because they add flexibility to the protein
and allow conformational changes, for example, during enzymatic turnover. Their
peptide bonds are not involved in intra-protein hydrogen bonding, therefore they
are often exposed to interact with water, small ligands, or with other proteins. Coils
tend to tolerate mutations better than other structures and are therefore hotspots for
evolution. In Chap. 11 on page 225 we will see that it is coils that give antibodies
their specific binding properties.

2.3 Tertiary Structure

Tertiary structure describes the global conformation of a protein, in other words, the
way in which the elements of its secondary structure are arranged in space. Tertiary
structure is determined by
Hydrophobic interactions of amino acid side-chains. Typical globular pro-
teins have a core of hydrophobic side-chains, whereas hydrophilic side-chains
are on the surface where they interact with water or with other proteins. If
hydrophobic residues were exposed to water, the water would have to form
an ordered cage (so-called clathrate) around them, which would decrease the
entropy of the system.
VAN DER WAALS -interactions are fluctuating dipole interactions with a bond
energy of 4–17 kJ/mol. The bond length is  4 Å.
Hydrogen bonds are interactions between permanent partial charges. The
bond length is about 3 Å; the bond energy is 2–6 kJ/mol if both partners are
partially charged and up to 21 kJ/mol if one partner is fully charged. If the
distance between the partners is too large, an indirect hydrogen bond may be
formed where water acts as a bridge (ı     H2 O   C ı)
Salt bridges are interactions between fully charged groups. The bond length is
2:8 Å. The bond energy is 10–30 kJ/mol in an aqueous environment, but can be
significantly higher if both groups are buried in a hydrophobic core.
Disulphide bonds are formed between two Cys residues after fold-
ing of the protein into its higher-order structure (RSH C HSR0 !
RSSR0 C 2ŒH). This is an oxydation (removal of hydrogen), which will
not normally occur in the reducing environment of the cytosol. However, the
environment inside the ER is oxydising. Thus disulphide bridges are found
2.3 Tertiary Structure 33

more frequently in the cell surface and secreted proteins than in cytosolic ones.
They may occur between two Cys residues in the same polypeptide (intrachain),
or between different polypeptides (interchain). Bond length is 2:2 Å and bond
energy 167 kJ/mol.
Coordination around cofactors Several amino acids in a protein can be
involved in the coordination of metal ions (Ca; Zn; Fe; Mg; Na; K) or prosthetic
groups such as hæme or FAD.
In transmembrane segments, hydrophilic amino acids are in contact with water
(snorkelling effect) and hydrophobic amino acids are found in contact with the
fatty acid tails of the lipids (anti-snorkelling effect). The lipid/water interface is
formed by three amino acids with special properties: Trp, Tyr, and Lys (the so-
called aromatic belt; see Fig. 2.21). They have in common relatively long molecules
which are hydrophobic, but have a hydrophilic (polarised or ionised) end and can
make contact with lipids and water at the same time. Thus a transmembrane segment
has a well-defined position within the membrane and cannot bob up and down [15].

Fig. 2.21 Aromatic belt in a transmembrane protein, here outer membrane protein A (PDB-
code 1qjp). Trp (olive), Tyr (brown) and Lys (blue) are marked; they occur preferentially at the
membrane surface
34 2 Protein Structure

Some proteins have several domains, that is, individually folding regions
connected by short segments. These individual domains can be isolated by gentle
proteolysis; they may maintain not only their structure, but even their catalytic
function. For example, the chaperone Hsc70 (see Fig. 15.1 on page 346) has three
domains: an ATPase-, a peptide-binding-, and a regulatory domain. Gentle treatment
with chymotrypsin will digest the links between those domains. The isolated ATPase
domain can still hydrolyse ATP.

2.3.1 Classification of Proteins by Folding Pattern

If one looks at many different proteins, one will find certain patterns in the way
elements of secondary structures are arranged. These folding patterns are called
motives. It is interesting to note that motives are much more stable during evolution
than amino acid sequences. In other words, some proteins can be shown to be
homologous by their folding patterns, even though they no longer have significant
similarity in their amino acid sequence (e.g., the muscle protein actin, the enzyme
hexokinase, and the chaperone Hsc70).
According to their folding pattern, protein domains may be hierarchically
classified into groups. Because such classification is somewhat subjective,
different schemes have been suggested. One commonly used scheme is the
Structural Classification of Proteins (SCOP) database (since 2014 SCOP2,
http://scop2.mrc-lmb.cam.ac.uk/). Classification at present cannot be done
automatically, but requires expert knowledge. The following taxa are used in SCOP
(see also figs. 2.22 and 2.23):
Class Coarse classification according to the relative content of ’-helix and “-
strand.
Fold Major structural similarity, the proteins have identical secondary structure
elements (at least in part) and the same topological connections. However, there
may be considerable variation in peripheral regions of a domain. Similarities may
arise from common origin or from convergent evolution.
Superfamily Domains have a common folding pattern and their functions are
similar, but sequence identity may be low. The ATPase domains of actin,
hexokinase, and Hsc70 are an example for a superfamily. Common evolutionary
origin is probable.
Family Proteins with high sequence homology (> 30 % identity) and/or similar
function. Proteins clearly have an evolutionary relationship. Identical proteins
are subclassified by species.
For a basic understanding of protein folding only the class is relevant:
all-’ Proteins which contain only ’-helices, or where the content of “-strands is
at least insignificant.
all-“ Proteins which contain only “-strands, or where the content of ’-helices is
at least insignificant.
2.3 Tertiary Structure 35

˛=ˇ Proteins which contain alternating or interspersed ’-helices and “-strands.


Mainly parallel “-sheets (“-’-“ units).
˛ C ˇ Proteins which contain segregated ’-helices and “-strands. Mainly anti-
parallel “-sheets.
Small proteins Usually dominated by metal ligand, hæme, and/or disulphide
bridges
intrinsically disordered proteins See Sect. 10.1 on page 203 for a descrip-
tion of this class.

a.1.1.2, PDB-codeCytochrome-b562,
Hæmoglobin β 1HGA PDB-code 562B a.24.3.1

Immunoglobulin b.1.1.1, PDB-code b.75.1.1, PDB-code


domain 1FC2 Bacteriochlorophyll 4BCL

Lactate dehyd-
rogenase domainc.2.1.5, PDB-code c.47.1.1, PDB-code
1 1I0Z Thioredoxin 2TRX

d.5.1.1, PDB-code d.2.1.2, PDB-code


Ribonuclease A 1DZA Lysozyme 132L

Fig. 2.22 Examples for ˛; ˇ; ˛=ˇ and ˛ C ˇ protein structures


36 2 Protein Structure

e.1.1.1, PDB-code e.20.1.1, PDB-code


Ovalbumin 1OVA DnaK-C-terminus 1dkz

Bacteriorhodop- f.13.1.1, PDB-codeOmpA membranef.4.1.1, PDB-code


sin 1M0K domain 1QJP

g.1.1.1, PDB-code g.41.2.1, PDB-code


Insulin 1MSO Rubredoxin 1BRF

Botulinum neuro-h.4.2.1, PDB-code h.1.8.1, PDB-code


toxin 1EPW Fibrinogen 1M1J

Fig. 2.23 Stereo views of multidomain and membrane proteins, small proteins, and coiled-coils

Apart from SCOP there are also other approaches for protein classification,
in particular CATH (http://www.cathdb.info) and FSSP (http://ekhidna.biocenter.
helsinki.fi/dali/), but these yield largely similar results [17].
Proteins can be described by a set of concise classification strings (sccs)
according to their structure, for example, b.2.1.1 (class b D all “, fold 2 D NAD(P)+ -
binding ROSSMANN-fold domains, superfamily 1 D Alcohol dehydrogenase-like
and family 1 D Alcohol dehydrogenase). Within families, proteins are sorted by
species and isoform (Table 2.1).
2.4 Quaternary Structure 37

Table 2.1 Number of entities in the SCOP-database (02/2009, entering data


into SCOP2 is not yet complete)
Class Folds Superfamilies Families
a) All alpha proteins 284 507 871
b) All beta proteins 174 354 742
c) Alpha and beta proteins (a/b) 147 244 803
d) Alpha and beta proteins (a+b) 376 552 1055
e) Multidomain proteins 66 66 89
f) Membrane and cell surface proteins 58 110 123
g) Small proteins 90 129 219
Total 1195 1962 3902

Internet Resources
Protein structures are stored in the Brookhaven Protein Data Bank (PDB)
in a unified format that can be used by modelling software such as
DeepView (formerly known as Swiss-PDB, http://www.expasy.ch/spdbv/
mainpage.html). Coordinates may be obtained from PDBlite (http://oca.
ebi.ac.uk/oca-bin/pdblite), OCA(http://bip.weizmann.ac.il/oca-bin/ocamain),
PDBsum (http://www.ebi.ac.uk/pdbsum/), or, if the EC-number is known,
from http://www.ebi.ac.uk/thornton-srv/databases/enzymes/.
PDBTM (http://pdbtm.enzim.hu/?) deals with membrane proteins. Three-
dimensional structures of nucleic acids may be retrieved from NDB http://
ndbserver.rutgers.edu/.
The protein structures presented in this book were created with DeepView
using data files obtained from OCA.

2.4 Quaternary Structure

Quaternary structure describes how several polypeptide chains come together to


form a single functional protein. As with tertiary structure it is determined by ionic
and hydrophobic interactions between amino acid R-groups. Many proteins consist
of several subunits. Depending on the number of subunits, we speak of monomers,
dimers, trimers, and so on. Depending on whether these subunits are identical, we
put homo- or hetero- in front. Thus a heterodimer is a protein consisting of two
different polypeptide chains. As we will see later, association of several subunits
38 2 Protein Structure

into a protein has important consequences for its function, which is often lost if the
subunits are separated (see chapter 7 on page 163).
In some proteins several polypeptides come together to form a subunit, which
repeats several times. Such subunits are called protomers. For example, hæmo-
globin is a diprotomer; each protomer consists of an ’- and a “-chain (see Fig. 7.2
on page 166).

2.5 Further Aspects of Protein Structure

2.5.1 LEVINTHAL’s paradox:

Assume a protein with 100 peptide bonds, each of which can assume 6 stable
conformations (’-helix, “ "" -sheet, “ "# -sheet, PII -helix, turn, coil).
Because each of these states is characterised by a ; -angle pair, this results in
26 D 64 possible angles per peptide bond and 10064 D 10128 for the entire protein
(note that this is an underestimate!).
Rotation around a ¢-bond takes about 1013 s, thus folding by random testing
of all possible angles would take 10128  1013 s D 1012813 s D 10115 s. Our
best estimate for the age of the universe is 13:7  109 a  4:32  1017 s. Proteins
therefore should never fold. You now also understand why it is so difficult to
calculate protein structures ab initio.
In reality, folding is a rapid process; in E. coli at 37 ıC a 100 amino acid protein
folds in about 5 s. During folding hydrophobic residues are buried in the interior
and hydrophilic residues appear on the outside of the protein, resulting in a compact
“molten globule” structure. This brings amino acids so close to each other that
the formation of hydrogen bonds between peptide bonds gives rise to secondary
structure.
The conformational freedom for protein folding is, however, much smaller than it
might appear at first sight. We have already discussed steric hindrance between the
atoms of neighbouring amino acids, which lead to large nonpermissible areas in the
RAMACHANDRAN plot. But steric hindrance is also possible between amino acids
farther along the protein chain, unless the protein is in an extended conformation
(upper left hand quadrant in the RAMACHANDRAN-plot) [24]. Thus it is not
possible to have a “-strand directly following an ’-helix (or vice versa) without an
intervening coil. These restrictions no doubt explain the limited number of structural
motives found in proteins.
2.5 Further Aspects of Protein Structure 39

Thermodynamics of LEVINTHAL’s Paradox


ZWANZIG et al. have looked at LEVINTHAL’s paradox in thermodynamic
terms [43]. If one assumes that folding is the process of converting n
bonds between n C 1 amino acids from an incorrect (i) to the correct (c)
k0
conformation, then the reaction c FGGGGGGBG i is described by the following
GGGGG
k1
differential equation:

dŒc
D k0 Œc C k1 Œi; Œc C Œi D n (2.1)
dt
k0 and k1 are the rate constants for unfolding and folding, respectively; these
rate constants are related by the law of mass action:

Œieq k0
D DK (2.2)
Œceq k1

Note that K is a thermodynamic property and completely independent of the


processes that lead to folding and unfolding.
If there are  incorrect states of a bond (the number of correct states is
of course 1) and if a correctly folded bond has a free energy of  and an
incorrectly folded bond a free energy of  CU, it can be shown from statistical
mechanics that
k
K D 0 D e kT
U
(2.3)
k1

If the energy penalty for the misfolded state U = 0, then the average mean
first passage time  required to arrive for the first time at Œc D n (all bonds
correctly folded) becomes

1
D . C 1/n (2.4)
nk0

which is a mathematical statement of LEVINTHAL’s paradox since . C 1/n is


the number of possible states of the protein. If, however, there is an energy
penalty for misfolding, then the first passage time becomes much shorter.
Even for U D 2kT per bond (where k is BOLTZMANN’s constant and T
the absolute temperature, kT is the average kinetic energy of atoms at that
temperature)  is on the order of seconds!
40 2 Protein Structure

free energy G Su

Sf

Fig. 2.24 Protein folding reduces free energy (G). The native structure is the one with the lowest
free energy. However, proteins may get kinetically trapped in local minima of the energy landscape.
During folding the entropy of the unfolded protein (Su ) is reduced to that of the folded (Sf ),
symbolised by the width of the funnel. This entropy reduction (more orderly, less probable state)
reduces the overall change in free energy of the folding process. This is shown here for a two-
dimensional reaction, but in protein folding each amino acid can adjust at least  and , so the
number of dimensions is impressive and it is not surprising that no way to calculate the 3D-structure
of a given protein sequence from first principles is known

2.5.2 Energetics and Kinetics of Protein Folding

Amino acids in a folding protein have a choice of undergoing interactions with


either
P other Pamino acids or with water. Thus only the difference Gfolding D
Gaa  Gaw is available to stabilise the native structure of proteins (see
Fig. 2.24). Although folding decreases the enthalpy (H) (which stabilises native
structure), it also decreases the entropy (S), which tends to destabilise the native
structure. Thus protein folding is a compromise between forces, and the actual
stabilisation energy is only about 20–40 kJ/mol, about 10 the thermal energy at
room temperature (kT for individual molecules or RT  2:5 kJ/mol) [43]. This
marginal stability of proteins has a good side, however: it allows protein flexibility
required for ligand binding and enzymatic activity (see fig. 2.25).
2.5 Further Aspects of Protein Structure 41

Fig. 2.25 Movement in adenylate kinase during substrate binding [23]. Note how ’-helices and
“-sheets provide an overall stable structure, with coils acting as hinges. Viewing this video
(http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369945/bin/pcbi.1002555.s012.mov) requires a
connection to the Internet. The still image shows a superposition of the enzyme with either 2 ADP
(PDB-code 2cdn) or with AMP and the nonhydrolysable ATP-analog AMP-PNP (PDB-code 1ank)
bound

2.5.3 Morpheeins

Morpheeins are proteins that form different homo-oligomers according to the


following schema:

100
relative enzymatic activity (%)

80

60

40

20

0
[ ]
42 2 Protein Structure

These oligomers may have different enzymatic properties. DOSS (PBGS)-


porphyria is an example of a disease caused by a mutation that shifts the
equilibrium between oligomers of an enzyme. Such diseases may in future be treated
by small molecules, which increase the likelihood that the protein occurs in the
desired conformation. Do not mix up morpheeins (proteins that morph) with the
drug morphine (from Gr. MORPHEUS D god of sleep).

2.5.4 Molecular Chaperones and Chaperonins

Although protein folding is a spontaneous process driven by thermodynamic


forces and hence, given enough time, all proteins will eventually arrive at their
native structure, kinetically the process can be trapped in local minima in the
energy landscape (see Fig. 2.24). Such metastable intermediates would expose
hydrophobic patches on their surface, which leads to protein aggregation. As
opposed to quaternary structure, such aggregates have no reproducible structure and
no biological function. On the contrary, they may interfere with cellular function
(see Sect. 10.2 on page 206). Note that proteins in the cytosol are very closely
packed at about 300–400 mg/ml. The average distance of protein molecules is just
one protein diameter, and the space between them is filled with water, salts, and
metabolites. This close packing increases the probability of unwanted interactions.
Cells have two lines of defence against misfolded proteins:
Molecular chaperones bind to unfolded proteins and prevent their aggrega-
tion until these proteins can achieve folding [37]. Binding/unbinding cycles of
chaperones may or may not require the hydrolysis of ATP. Examples: Hsp90,
Hsc70, crystallins.
Molecular chaperonins use the energy of ATP-hydrolysis to unfold misfolded
proteins actively, giving them a second chance to arrive at the proper fold. Ex-
ample: GroES/GroEL. Unfolding occurs in a “beaker” formed by the chaperonin
[32], in which the client protein can try to refold without disturbance from other
proteins (at “infinite dilution”). The beaker has a diameter of about 45 Å, enough
to contain proteins (or protein domains) of up to 60 kDa.
Note: Neither chaperones nor chaperonins actively fold proteins; they merely protect
them against aggregation during the folding process. For a more detailed discussion
of this topic see Chap. 15 on page 343.

2.5.5 Protein Denaturation

Secondary, tertiary, and quaternary structures of a protein are determined by


relatively weak interactions, such as hydrogen bonds or hydrophilic interactions.
The energy of a hydrogen bond is only about 4 kJ/mol, which is low compared to
2.5 Further Aspects of Protein Structure 43

the energy of thermal motion (RT, at room temperature  2:5 kJ/mol). Changes
in environmental conditions can break such bonds, leading to the denaturation of
proteins.
Strong acids and bases denature proteins by disrupting ionic interactions.
Organic Solvents can denature proteins by disrupting hydrophobic interac-
tions. Proteins are not soluble in organic solvents. More water soluble solvents
(e.g. ethanol or acetone) bind water and thus reduce the concentration of water
available to the protein.
Detergents disrupt hydrophobic interactions. They can denature proteins
without precipitating them.
Salts precipitate proteins because they reduce the concentration of water avail-
able to maintain protein structure.
Small hydrophilic substances such as urea can denature proteins when they
are present in high concentration, both by binding water and by binding to the
protein.
Heavy metal ions (lead, mercury) bind to carboxylate or sulphydryl groups of
proteins. That’s why they are toxic!
Heat An increase in temperature leads to increased molecular motion. This can
result in breaking hydrogen bonds. If some hydrogen bonds break, the structure
of the protein (say, an ’-helix) is weakened; that is, other hydrogen bonds become
easier to break. Denaturation by increasing temperature therefore is a process
that starts quite suddenly at a certain critical temperature and is completed
at a temperature only marginally higher. Renaturation is sometimes possible
with small proteins (ribonuclease, lysozyme) under laboratory conditions, but
denaturation is irreversible in the real world (boiled egg).

Humans die if their core body temperature exceeds 42 ıC, when key
proteins lose their function. For a useful application of thermal protein
denaturation see Fig. 2.26.

The covalent bonds in proteins are more robust, but peptide bonds are hydrolysed
by heating in strong acids and bases, and by proteolytic enzymes. Disulphide bonds
are cleaved by reducing agents; oxydizing agents can form disulphide bonds from
SH-groups.

2.5.6 Protein Folding

Some proteins, such as ribonuclease, can be denatured completely by heat or the


addition of denaturants. If the solution is cooled, or if the denaturant is removed
by dialysis, the protein spontaneously goes back to its native conformation, and
44 2 Protein Structure

Fig. 2.26 Lionfish (here


Pterois volitans L. 1758)
produce a toxic slime on the
tip of the rays of their fins;
touching them is intensely
painful. The active ingredient
is a mixture of proteins that
also causes high blood
pressure, vomiting, and
interferes with breathing.
Envenomation with lion fish
toxin is treated by exposing
the affected body part to hot
water (45–50 ıC, so hot that it
is just bearable). The water is
frequently changed for about
30 min. This denatures the
toxic proteins, bringing
almost immediate relief

enzymatic activity is restored. These experiments, carried out first by C. ANFINSEN


in the 1950s [2], prove that the secondary and tertiary structure of a protein is
determined solely by its amino acid sequence. No external information is necessary.

This has an important consequence: A mutation of only a single amino acid


may interfere with protein folding, resulting in loss of function. For example,
cystic fibrosis is a disease caused by lack of a transmembrane chloride
channel (cystic fibrosis transmembrane conductance regulator, CFTR). See
page 449 for a discussion of the pathomechanism of cystic fibrosis.
Another example would be collagen, where single amino acid mutations
lead to osteogenesis imperfecta or EHLERS-DANLOS-syndrome (see page
324). Protein folding problems as cause of disease have been reviewed in [34].

There are essentially three models to describe protein folding:


Framework model Protein adapt their secondary structure first, using the
information contained in their sequence. Once the secondary structure is es-
tablished, long-range interactions between amino acids stabilise the tertiary
structure.
Hydrophobic collapse After synthesis proteins adapt a conformation where
hydrophobic amino acids are buried inside and hydrophilic amino acids are on
the surface of the protein. This structure is called “molten globule”. Inside the
2.5 Further Aspects of Protein Structure 45

molten globule long-range interactions between amino acids are established,


allowing the tertiary structure to form. The secondary structure is established last.
Nucleation/Condensation This model is a combination of the other two:
proteins collapse under formation of bonds between key residues. This reduction
in the distance of amino acids leads to the formation of long-range (tertiary
structure) and short-range (secondary structure) interactions at the same time. As
a consequence, folding occurs in one big step without folding intermediates. For
many proteins, especially those with < 100 amino acids, this agrees well with
experimental evidence. Large proteins often consist of independently folding
modules (domains), whose folding rates can vary considerably. These modules
then have to find together in the final structure of the protein.
Folding is a fairly fast process, synthesis and folding of a 100 amino acid protein in
an E. coli cell is complete in less than 5 s (at 37 ıC).
Folding starts even while a protein is synthesised on a ribosome, that is when
only a part of the protein molecule is available for the folding reaction. But even
when an entire protein is totally unfolded and then allowed to refold, some strong
interactions form first and then determine which other interactions may form later.
As LEVINTHAL [20] already noted, such specific folding pathways may lead to
native states that are not at the absolute minimum of the free energy function (i.e.,
not at equilibrium), but are rather metastable states in local energy minima. As long
as the activation energy for going to the absolute minimum is high enough, such
metastable states can live long enough to be biologically functional.

If the activation energy is lowered, rapid misfolding may result, leading to


folding diseases. This can be caused by mutations that increase the rate
of formation of the lower energy states (e.g., HUNTINGTON chorea). These
diseases are discussed in Sect. 10.2 on page 206.

Protein Structure and the Development of Pharmaceuticals


Knowledge of the structure of enzymes and receptors is essential for the
design of pharmaceuticals. For example, human immunodeficiency virus
(HIV) protease is required for splitting an inactive precursor protein in the
virus envelope into two active proteins. Without these proteins, a virus particle
cannot bind to its target cells and would be noninfectious. HIV-protease is a
homodimer, each subunit is 99 amino acids long. They are held together by
an antiparallel “-sheet formed by amino acids 1–4 and 96–99 of both subunits
(see Fig. 2.27).

(continued)
46 2 Protein Structure

O
O
Peptide

Peptide
O
O

Fig. 2.27 Top: Dimerisation of HIV-protease (PDB-code 1DAZ) occurs by the formation of an
antiparallel “-sheet from the ends of both subunits. This brings the two catalytic aspartate residues
(D25 and D25 ’) close together. Middle: Space filling model of HIV-Protease. The “-sheet formed
by the N- and C-terminal ends of both subunits is clearly visible. Bottom: Substances with two
peptides linked by a stiff backbone can interdigitate into the dimerisation site of a monomer and
prevent dimerisation
2.6 Posttranslational Modifications of Proteins 47

This brings the catalytic aspartate residues (D25 in each subunit) together,
thus forming the catalytic site of the enzyme. Several pharmaceuticals are
on the market which bind to the catalytic centre of the enzyme, but these
are beginning to lose their effectiveness due to the development of resistant
virus strains. Also they are very hydrophobic compounds, which makes
their pharmaceutical use difficult. A new class of protease inhibitors binds
to the dimerisation site and prevents the formation of the active enzyme.
Development of such substances requires an intimate understanding of the
structure and function of an enzyme (see Fig. 2.27 and [4] for further details).

2.6 Posttranslational Modifications of Proteins

The human genome contains  23 000 genes [7, 8, 35]. mRNA-processing (altern-
ative splicing, mRNA editing etc.) results in  3 mRNAs per gene (Fig. 2.28).
Posttranslational modification of the proteins produced from them creates  10
different protein species from each mRNA. Thus the human proteome consists of
 106 proteins, with different functions, regulation, destruction. . .
The properties of proteins can be changed by posttranslational modification; in
some cases this can be done (or undone) quickly in response to environmental
stimuli, for example, exposure to hormones. Such modifications can switch enzymes
between active and inactive states and are required for the proper targeting of a pro-
tein to subcellular structures. The following reactions are of particular importance:

2.6.1 Glycosylation

Glycosylation is the process of enzymatic transfer of oligosaccharide (sugar) trees


to proteins (see page 379 for a more detailed discussion of the process). They are
affixed either to the OH-groups of Ser or Thr (O-linked) or to the acid amide
group of Asn (N-linked). Other amino acids (Arg, Tyr, Trp, Hyl, Hyp) are involved
much less frequently, e.g., in collagen. Addition occurs in the ER and the GOLGI-
apparatus to the extracellular domain of membrane proteins and to secreted proteins.
Cytosolic proteins are rarely glycosylated. In bacteria, glycosylation occurs in the
periplasm.
Glycosylation is required for proper protein folding. Glycosylation inhibitors
(see Fig. 2.29) are used as antiviral drugs (e.g. nojirimycin or desoxynojirimycin).
Sugar trees are also required as “address labels” in the intracellular transport
48

immunoprecipitation

Fig. 2.28 The pathway from genome to phenome is studied at different levels with different methods. The resulting complex data can only be handled by
advanced computing techniques
2 Protein Structure
2.6 Posttranslational Modifications of Proteins 49

OH OH OH OH
H2C H2C H2C H2C
H H OH H
N N N N
OH OH OH OH OH

OH OH OH
OH OH OH OH
5-Amino-5-deoxy-D- 1,5-Dideoxy-1,5- N-Butyldeoxy- 1-deoxygalacto-
glucopyranose imino-D-glucitol nojirimycin nojirimycin
(Nojirimycin) (Deoxynojirimycin) (Miglustat)

Fig. 2.29 Sugar-analogues where the aldehyde-group is replaced by NH2 act as glycosylation
inhibitors. They can be used as antiviral drugs, and also in some inherited diseases (mucopolysac-
charidoses), where the enzymes that degrade glycoproteins in the lysosomes do not work properly

of proteins between compartments. For example, in I-cell disease transport of


lysosomal proteins into this compartment fails because an enzyme which transfers
the sugar mannose-6-phosphate to them is defective. They are secreted into the
bloodstream instead and, as a consequence, the lysosomes are nonfunctional.
On the cell surface, the sugar trees of membrane proteins serve as recog-
nition sites for cell–cell-interactions, as immunological determinants and—since
everything has to have a downside too—as docking sites for bacteria and virus.

2.6.2 Glucation

In glycosylation sugars are enzymatically transferred to proteins in a carefully


orchestrated process. Glucation, however, is a spontaneous reaction between sugar
aldehydes (and some other reducing compounds) and amino groups (SCHIFF-base
formation, see Fig. 2.30) in proteins, nucleic acids, and lipids.

The velocity of glucation depends on the concentration of glucose in the


blood. This has a direct medical application: In diabetics, the concentration
of glucated hæmoglobin (HbA1c ) depends on the average blood glucose
concentration during the lifespan of an erythrocyte (about 3 mo).
Glucated proteins can react further to Advanced glucation end products
(AGE); this is thought to be involved in aging and in long-term diabetic
damage. There is a PAMP-receptor for AGEs, called RAGE, stimulation of
which is proinflammatory, procoagulant, and promitotic. This is thought to be
responsible for at least some of the damage (see, e.g., [12, 22, 26, 29–31]).
50 2 Protein Structure

Fig. 2.30 Glucation of proteins by the aldehyde group of glucose proceeds via an unstable
SCHIFF-base and AMADORI-rearrangement to a stable ketosamine. During roasting, this is
converted into caramels via the MAILLARD-reaction. These are responsible for the taste of
cooked food. Ketosamine may also be converted to Advanced glucation end products (AGE) by
STRECKER-degradation

2.6.3 Disulphide Bond Formation

Oxydation of the SH-groups of two cysteine residues leads to the formation of a


covalent bond. The cell uses the tripeptide glutathione (see Fig. 2.31) as reducing
agent: SH C HS C GSSG • SS C 2GSH. Disulphide bond formation
does not happen in the reducing environment of the cytosol, but in the ER (or
the bacterial periplasm) which is oxydising. Special enzymes, Protein disulphide
isomerase (PDI), make sure that the right Cys residues undergo disulphide bond
formation.
2.6 Posttranslational Modifications of Proteins 51

Gluathion ( -Glu-Cys-Gly)

Fig. 2.31 The tripeptide glutathione serves as a redox-coupler in our cells. Left: Structure of
glutathione. Right: Coupling of detoxification of reactive oxygen species (ROS, here H2 O2 ) and
consumption of NADPH + H+ by glutathione

When cytosolic proteins are used in the laboratory one has to make sure
that their SH-groups are not oxydised by air oxygen, which would lead to
inactivation. The buffers therefore usually contain an antioxydant such as “-
mercaptoethanol or dithiotreitol.
Bacterially expressed eukaryotic proteins are often misfolded and pre-
cipitate as inclusion bodies because bacteria are less active in disulphide
formation than eukaryotes. However, bacteria do have an enzyme operon
(Dsb, short for disulphide bond) for formation and isomerization of protein
disulphide bonds in their periplasm.

Some mucolytic pharmaceuticals, including N-acetylcysteine (ACC), work


by breaking SS-bonds in mucus proteins, decreasing the viscosity of mucus
and making it easier to clear it from the airways.

2.6.4 Proteolysis

Proteolysis is involved in the activation of proenzymes, for example, in the digestive


system. Digestive enzymes (e.g. trypsin, chymotrypsin) are produced as inactive
precursors (zymogens), so that they cannot harm the cells secreting them. Once
52 2 Protein Structure

released into the intestine, they are activated by cleaving off a part of the enzyme
that was blocking the active site. Cascades of proteolytic enzymes make up our
blood-clotting and complement system (see Sect. 11.3 on page 249). Prohormones
(e.g. insulin) are activated in a similar manner. On the other hand, proteins no longer
needed can be inactivated by proteolysis (e.g., cyclins in cell cycle).
Proteolysis may also be used to remove signal peptides. For example, some
proteins destined for the intermembrane space of mitochondria carry a signal
sequence for mitochondrial import (see Fig. 2.10) which leads to their import into
the mitochondrial matrix. There the signal peptide is cleaved off by matrix protease,
exposing a second signal directing the protein’s export into the intermembrane space
through a different transporter.
A special form of proteolysis is protein splicing [25]. This reaction is carried
out by a protease within the protein itself, the intein. This protease cuts itself out
of the protein and rejoins the flanking segments (exteins), and all this without
requiring any external proteins, cofactors or sources of energy such as ATP! The
intein protein, once cut out of the host protein, has endonuclease activity. Intein
genes are mobile elements (parasitic DNA); the corresponding mRNA can be used
to direct the synthesis of cDNA by reverse transcriptases encoded by retrovirus
inside the cellular DNA. This cDNA then is integrated into genes of other proteins
by the endonuclease activity of the intein. As the intein cuts itself out of that protein,
this insertion has little negative consequences for the host. Inteins are therefore the
smallest possible parasites [13, 14]. They are now used as self-cleaving affinity-tags
to make protein pharmaceuticals.

2.6.5 Hydroxylation

Protein hydroxylation occurs on Pro and Lys residues. We have already discussed
the importance of Pro-hydroxylation for collagen formation (see page 28).
Proteins regulated by Pro-hydroxylation are the hypoxia induced transcription
factors (HIF) [16]. These consist of two subunits, ’ and “. In the presence of oxy-
gen, the ’-subunit is hydroxylated on P402 and P564 by HIF-prolyl hydroxylases
(PHD-1, -2 and -3, EC 1.14.11.29), leading to their proteasomal destruction. In the
absence of oxygen, the ’-subunits accumulate and form a complex with “, which
binds to hypoxia response elements in the cellular DNA. As a consequence, oxygen
consumption of the cell is down-regulated; it can survive a low oxygen supply for a
longer time.

This mechanism may one day be exploited to increase the survival time of
organs in infarct or transplantation, e.g., with inhibitors of PHD-1 (currently
available inhibitors produce too many side effects due to concomitant inhibi-
tion of PHD-2 and -3).
2.6 Posttranslational Modifications of Proteins 53

O C-terminus O C-terminus
C ATP ADP C
N-terminus N CH N-terminus N CH
H H
CH2 CH2
O
NH N P O
N N O

Fig. 2.32 Phosphorylation of histidine residues in His-kinases. Although mammals do not use
regulatory phosphorylation of His, bacteria and fungi regulate the expression of pathogenicity
factors that way. Thus, His-kinases may become important drug targets

2.6.6 Phosphorylation/Dephosphorylation

The transfer of phosphoryl groups from ATP to the hydroxy groups of Ser, Thr, and
Tyr (rarely onto His-nitrogen or Asp and Glu COO ) is important for the reversible
regulation of enzyme activity. The transfer is catalysed by protein kinases, and the
removal by protein phosphatases. Thus the reaction is rapidly reversible at minimal
expense for the cell (a single high energy phosphate bond). One-third of all proteins
in the cell undergo regulatory phosphorylation/dephosphorylation cycles (Fig. 2.32).

2.6.7 Acetylation/Deacetylation

Transfer of acetyl groups from acetyl-CoA onto the –-amino group of Lys by protein
acetylases, and their removal by protein deacetylases, are also used for regulation
of enzymatic activity. The human acetylome, containing 1750 proteins, has recently
been determined [5]. Many DNA-binding proteins are regulated by acetylation,
because the acetylated Lys is much less likely to be protonated, hence less likely
to bind to the negative charges on DNA. In addition, the change in protonation
also affects the binding of transcription factors. The activity of metabolic enzymes
may also be regulated by Lys-acetylation, for example, the glycolytic activity of
Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is increased by acetylation;
gluconeogensis is stimulated by deacetylation.

This is interesting as the metabolism of cancer cells is shifted towards


glycolysis (WARBURG effect); it makes acetylases and deacetylases targets
for anticancer drugs.

Three classes of deacetylases are known: class I and II hydrolyse the bond with
water, whereas class III deacetylases (sirtuins) use NAD+ (see Fig. 2.33), thus their
activity depends on the nutritional status of the cell [11, 33, 38, 42]. This is probably
the mechanism behind the observation that mild caloric restriction prolongs the life
expectancy of lab animals.
54 2 Protein Structure

H +H2O + O CH3
N CH3
NH3 +
class I or II deacetylase O
O
acetate
N-acetyl-lysyl-group

O NH2
NH2

N N
+
N N O O N
O O O O O
P P
O O
HO OH HO OH
class III deacetylase
NAD+

NH2

O NH2 N N

NH3
+ + +
N N
O O
O
O
O
O O OH
P P
N O O
HO OH HO O O
Nicotinamide
CH3
O-acetyl-ADP-ribose
(second messenger)

Fig. 2.33 There are three classes of deacetylases. Enzymes of classes I and II simply hydrolyse
the amid bond. However, enzymes of class III (sirtuins) use NAD+ , which is available in the
fasting state, but converted to NADH + H+ in the fed state. Thus sirtuins regulate gene expression
depending on the energy available to the cell

2.6.8 Methylation/Demethylation

Transfer of methyl groups from S-adenosyl methionine (SAM) onto proteins may
also serve regulatory purposes, but we know very little about it [6]. Transfer can be
to
carboxyl groups, forming methyl esters. This reaction is used to mark dam-
aged proteins for destruction, but also in signal cascades of unknown function.
2.6 Posttranslational Modifications of Proteins 55

amino groups, forming methyl amines. The function is unknown. No N-


demethylases are known, so the modification is probably permanent.
thiol groups, forming thioesters. The function is unknown.
Unlike phosphorylation, methylation has never been observed to occur on hydroxyl
groups.

2.6.9 Addition/Removal of Hydrophobic Tails

Addition of
palmitoyl- (fatty acid) groups to internal Cys or Ser
myristoyl- (fatty acid) groups to N-terminal Gly
farnesyl- or geranylgeranyl (isoprenoid) groups to C-terminal Cys
converts cytosolic enzymes to membrane-bound (cytosolic leaflet). Because this is
required for the activation of some enzymes, the transferases make possible drug
targets (e.g., anticancer drugs).

2.6.10 S-Nitrosylation

S-Nitrosylation (see Fig. 2.34) occurs on Cys-residues: RSH C NO • RS˙


NOH. The Cys must be positioned between a basic and an acidic residue
(either in primary, tertiary, or quaternary structure) because of acid-base catalysis.
Nitrosylation serves as an additional pathway for NO regulation besides cGMP-
dependent kinases. This pathway too has not been fully explored.

2.6.11 ADP-Ribosylation

ADP-Ribosylation (see Fig. 2.35) is used by some bacterial toxins (Vibrio cholerae,
Bordetella pertussis) to inactivate cellular proteins. This is the starting point of
the patho-mechanism of the diseases associated with these bacteria (cholera and
whooping cough, respectively).

2.6.12 Deamidation

Deamidation is the removal of the acid amide group from Gln or Asn, forming Glu
and Asp, respectively. It may be followed by racemization (formation of D-amino
acids).
56 2 Protein Structure

NfκB
IkB IkB
NfκB NfκB
NfκB
NfκB NfκB

IkB

IkB

Fig. 2.34 Regulation of a pathway by multiple posttranslational modification events. Upon


noxious stimuli, the Inhibitor of NF-› B (I› B) protein is phosphorylated by Inhibitor of NF-› B
kinase (IKK) and releases Nuclear factor › B (NF-› B). The latter enters the nucleus and changes
DNA expression, whereas the phosphorylated I› B is ubiquitinated and thus marked for destruction
by the proteasome. Nitrosylation of IKK prevents the phosphorylation of I› B, nitrosylation of
NF-› B its transport into the nucleus, and nitrosylation of a ubiquitin ligase subunit the transfer of
ubiquitin to phosphorylated I› B

This has medical implications:


catalytic removal by bacterial pathogenicity factors (cytotoxic necrot-
ising factors) on heterotrimeric G-proteins and small GTPases ! GTPase
activity is inhibited; the protein cannot go from the active GTP-bound to
the inactive GDP-bound form.
spontaneous (Asn faster than Gln): age determination in
forensic science long lived, low turnover proteins (bone, teeth)
archaeology rate constant depends on temperature, humidity, soil pH
etc, hence better suited for determination of relative age within a series
of finds.
2.6 Posttranslational Modifications of Proteins 57

O NH2

H2N N N

Protein Arg NH2 + N N N


O O
O P P O
O O O
O O

HO OH HO OH

NAD+

Cholera
toxin
NH2

N N
O
H
Protein Arg N N N
O O NH2
+
O P P O
O O O N
O O H

HO OH HO OH

Fig. 2.35 ADP-ribosylation of proteins

2.6.13 AMPylation (Adenylylation)

AMPylation (Adenylylation) is performed from ATP onto critical Thr hydroxy


groups of Rho GTPases and other important regulatory proteins by Vibrio para-
haemolyticus Vibrio outer protein S (VopS). This results in depolymerisation
of the actin cytoskeleton, affected cells round up. As in ADP-ribosylation the
bacterial toxin uses a readily available energy-rich substrate to disable critical host
proteins. Just as in the ADP-ribosylation from NAD+ or glycation from UDP-
glucose AMPylation from ATP is performed by a bacterial A/B toxin to interfere
with host defences. AMPylation, however, is (at least in bacteria) also used for
control of metabolism: the glutamine synthetase of E. coli is controlled in part by
AMPylation of Tyr-397; the adenylyl transferase (adenylic acid D old name for
AMP) responsible in turn is controlled by uridilylylation.

2.6.14 Transfer of Peptides

Ubiquitin, a 8:6 kDa protein (see Fig. 3.17 on page 88) is transferred to proteins
by a group of ubiquitin-ligases, of which three classes exist. E1-ligase (UbA1) (see
Fig. 2.36) forms a thioester bond with the C-terminal glycine residue of ubiquitin
in an ATP-dependent reaction. This activated ubiquitin is then transferred to an
58 2 Protein Structure

E2-SH
Ub-S Ub-S SH

E1 E2 E1

AMP-Ub AMP-Ub

PPi
E2-S-Ub
Ub
S-Ub S-H

E1 E1
ATP AMP ATP
AMP-Ub

Fig. 2.36 The E1-ubiquitin ligase binds two molecules of ubiquitin: one is covalently bound to a
thiol residue of the enzyme, the second is bound to AMP. The first ubiquitin molecule is transferred
to the E2-ubiquitin ligase, then the second ubiquitin is moved to the SH-group, and the AMP is
exchanged for ATP. Then another ubiquitin is bound to the nucleotide, pyrophosphate is released
in the process, and the enzyme is ready for a new cycle

E2-ligase (UbCs) and from there to an E3-ligase. All three ligases bind ubiquitin as
thioester. There is only one (or, in some species, a few) E1 (UbA1) but several UbCs
and many E3-ligases, which are often specific for a single target. Ubiquitin is usually
transferred to the -amino group of a lysine, forming an isopeptide bond. Binding
to the N-terminus, or to Cys, Ser, and Thr has been described, but is rare. Ubiquitin
contains seven functionally distinct Lys-residues, whose -amino groups form iso-
peptide bonds with the C-terminal Gly of other ubiquitin molecules, resulting in
long chains of poly-ubiquitin. Protein degradation in the proteasome, for example,
results from poly-ubiquitination at Lys-11 and/or Lys-48. For the discovery of
ubiquitin A. CIECHANOVER, A. HERSHKO & I. ROSE received the Nobel Prize
for Chemistry in 2004.
There is a whole family of ubiquitin-like modifiers which are transferred in a
similar manner, but whose function we are only beginning to understand (see, e.g.,
[36] for a recent review). Transfer is often by an E2-ligase directly; E3-ligases are
required for ubiquitin presumably because of the large number of different proteins
labelled with this marker. These ubiquitin-like modifiers (UbLs) are involved in the
regulation of endocytosis, apoptosis, cell cycle, DNA repair and other processes.
There are even ubiquitin-like proteins (Isg15 and Fat10), which are regulated by
interferon and modulate immune response. The mechanisms involved in these
regulatory pathways are, however, poorly understood.
Ubiquitin and ubiquitin-like modifiers are removed by deconjugases that cleave
the isopeptide bond. These are often highly specific for both the tag and the modified
target.
2.7 The Relationship Between Protein Structure and Function... 59

2.7 The Relationship Between Protein Structure


and Function: Green Fluorescent Protein

GFP is produced by the cœlenterate Aequorea victoria (Hydrozoa, MURBACH &


SHEARER, 1902). It accepts energy from a chemiluminescent protein (which would
otherwise produce blue light) and translates it into green light by bioluminescence
resonance energy transfer (BRET). The function of bioluminescence in these
animals is unknown. Because of its intensive green fluorescence GFP has become a
favourite marker in molecular biology. The genetic information for GFP is attached
to the gene for the protein under investigation, so that a fusion product is generated.
Both the amount of target protein produced and its subcellular localization can then
be studied by fluorescence video microscopy.
The fluorophore of GFP is produced by oxydation from 3 neighbouring amino
acids (see Fig. 2.37); this process can proceed spontaneously in the absence of
other proteins and cofactors except oxygen. Thus GFP can be used as marker in
any aerobic cell. The reaction increases the system of conjugated double bonds ( -
system) compared to Tyr and shifts the absorbtion maximum from 280 to 395 nm.
GFP consists mainly of antiparallel “-strands, which together form a “-barrel. An
’-helix with the fluorophore runs in the centre of the barrel (see Fig. 2.38), where the
fluorophore is protected from collisions with water and, in particular, oxygen. Any
such collision would prevent fluorescence by taking away energy from the excited
fluorophore. The structure thus explains the high quantum efficiency of GFP ( 0:8
green photons produced per blue photon absorbed).

OH
OH OH

OH OH
O CH2 O N O N O
N N N O N O
N folding cyclisation
τ = 10 min N O N OH
CH2 O
OH
dehydration

Ser-65 Tyr-66 Gly-67 τ = 3 min


238 aa precursor H2O

OH
OH
Aequorea victoria GFP
λex= 395 + 488 nm, λem= 508 nm
oxidation
τ=1h N O
N O
OH O
OH N
N O
O2
H2 O2 N
N

Fig. 2.37 Maturation of the fluorophore in GFP. The reaction does not require enzymes or
cofactors except molecular oxygen
60 2 Protein Structure

Fig. 2.38 Stereo representation of the crystal structure of GFP (PDB-code 1ema)

Fig. 2.39 Amino acids that interact with the fluorophore in the active centre of GFP

GFP has 2 absorption maxima, that of the nonionised fluorophore (phenol) at


395 nm (UV), and that of the ionised (phenolate) at 488 nm (blue). Ser-65 donates
a hydrogen bond to Glu-222 which makes deprotonation of Glu-222 easier (see
Fig. 2.39). The negative charge on this residue then prevents deprotonation of the
phenyl-group of the fluorophore. If Ser-65 is mutated to Ala, ionization of the
fluorophore becomes easier, the absorbtion maximum at 395 nm is reduced, and
that at 488 nm becomes stronger. The phenolate ion forms a hydrogen bond with
Thr-203; if this is mutated to Ile the fluorophore is stabilised in the phenol-form
and the absorbtion maximum at 395 nm becomes stronger at the expense of that at
488 nm. If Thr-203 is mutated to an aromatic amino acid such as Tyr, stacking of
the -systems leads to a red-shift of both absorbtion and emission maxima by 20 nm
because of the reduced exited state energy (yellow fluorescent protein). If Tyr-66 is
replaced by Trp or His, the maxima are blue-shifted to 436/476 nm (cyan fluorescent
protein) and 390/450 nm (blue fluorescent protein), respectively.
2.8 Exercises 61

2.8 Exercises

2.8.1 Problems

2.1. Which of the following amino acids cannot be posttranslationally modified?


A Cysteine
B Serine
C Tyrosine
D Alanine
E Asparagine

2.2. Why can the isoelectric point of a protein not be predicted from its amino acid
composition and the known isoelectric points of the amino acids?

2.3. The above picture shows a


cartoon of the structure of lactate dehydrogenase. To which SCOP class of proteins
does this enzyme belong?
A all ’
B all “
C ’ /“
D ’ +“
E coiled coil

2.4. Which of the following amino acids would you expect to find in the core of a
protein?
A Lys
B Arg
C Glu
D Leu
E Asp

You might also like