Small Molecule & Protein Docking
CHEM 430
Molecular Docking Models
Over the years biochemists have developed numerous models
to capture the key elements of the molecular recognition
process. Although very simpli%ed, these models have proven
highly useful to the scienti%c community.
The Lock and Key Theory
As far back as 1890 Emil Fischer proposed a model called the
"lock-and-key model" that explained how biological systems
function. A substrate %ts into the active site of a macromolecule,
just like a key %ts into a lock. Biological 'locks' have unique
stereochemical features that are necessary to their function.
The Induced-Fit Theory
In 1958 Daniel Koshland introduced the "induced-%t theory". The
basic idea is that in the recognition process, both ligand and
target mutually adapt to each other through small
conformational changes, until an optimal %t is achieved.
The Conformation Ensemble Model
In addition to small induced-%t adaptation, it has been observed
that proteins can undergo much larger conformational changes.
A recent model describes proteins as a pre-existing ensemble of
conformational states. The plasticity of the protein allows it to
switch from one state to another.
From the Lock and Key to the Ensemble Model
Lock-and-key, induced-%t and the conformation ensemble model
are not contradictory. Each one focuses on a particular aspect of
the recognition process. The lock-and-key model introduces the
principle of 3D complementarity, the induced-%t model explains
how complementarity is achieved, and the ensemble model
depicts the conformational complexity of proteins.
Number of Citations for Docking Programs
—ISI Web of Science (2005)
Sousa, S.F., Fernandes, P.A. & Ramos, M.J. (2006)
Protein-Ligand Docking: Current Status
and Future Challenges Proteins, 65:15-26
R
C
R
C
C C
Docking
What is Docking?
Given the 3D structures of two molecules,
determine the best binding modes.
Molecular Complementarity in Computational
Docking
Computational docking exploit the concept of molecular
complementarity. The structures interact like a hand in a glove,
where both the shape and the physico-chemical properties of
the structures contribute to the %t.
Shape complementarity is the primary criterion for evaluating
the %t in the computational docking of two candidate structures
In addition to shape compatibility, chemical and physico-
chemical complementarity are also important criteria in the
docking between candidate structures
Energy Dictates Molecular Associations
The process of "self-assembly" is dictated by forces that are
energy based: the complex has a lower potential energy than its
constituent parts, and this keeps the parts together
The goal of computational docking is to %nd the 3D
con%guration of the complex that minimizes the energy
Defining a Docking
✽ Position y
✽ x, y, z
✽ Orientation
✽ qx, qy, qz, qw
✽ Torsions τ1
✽ τ1, τ2, … τn
z
De(nition of the "Pose"
A "pose" is a term widely adopted for describing the geometry
of a particular complex (also called "binding mode")
It refers to a precise con%guration which is characterized not
only by the relative orientation of the docked molecules but also
their respective conformations
Key aspects of docking…
Scoring Functions
Predicting the energy of a particular pose
Often a trade-off between speed and
accuracy
Search Methods
Finding an optimal pose
Which search method should I use?
Dimensionality
Can we trust the answer?
B
Bound vs. Unbound
Receptor surface 10 highly penetrating residues
Ligand
Kallikrein A/trypsin inhibitor
complex (PDB codes 2KAI,6PTI)
Molecular Flexibility in Protein-Ligand Docking
The mutual adaptation of a ligand with its receptor is crucial to
understanding ligand binding and protein function
One of the major challenges in molecular docking is how to
account for this adaptation in docking calculations
The docking problem can be classi%ed according to the way
Dexibility is modeled. In ascending order of complexity:
1. Rigid body docking ignores the Dexibility of the molecules
and treats them like rigid objects
2. Rigid receptor – Dexible ligand docking: only the ligand is
treated as Dexible, receptor is rigid
3. Flexible receptor – Dexible ligand docking: both protein and
ligand are treated as Dexible.
Molecular Flexibility in Protein-Ligand Docking
C
C
R
a priori
R
B
Katchalski-Katzir Method
> Exhaustive enumeration of the rotational and
translational degrees of freedom
> Since the scoring function is represented as a
convolution, the enumeration of the translational
degrees of freedom is efficient from exploiting
FFT methods, reduces O(N6) to O(N3 logN3)
> Implemented in the MolFit and FTDock programs,
geometric, but extensions include electrostatics
> Time consuming, yet conceptually simple and
appealing; often used for initial screening
> Volumetric – does not use a MS representation
Nidogen-Laminin Complex
Experiment
Predicted
R
or
C
R
C R
Protein Preparation
The preparation of the protein calls for great care. Important
decisions include the choice of the tautomeric forms of histidine
residues, the protonation states of amino-acids and
conformations of some residues; their incorrect assignments
may lead to docking errors.
Small Molecule Preparation
Before generating and docking the 3D structures of a library of
ligands, it is important to "clean up" the 2D structures being
used by removing any counter ions, salts, or water molecules
that might be part of the registered structure
All reactive or otherwise undesirable compounds must also be
removed
Possibly generate all optical isomers (enantiomers), cis/trans
isomers, tautomers, and protonation states of the structures
For most docking programs the tautomeric and protonation
state of the ligands to be docked is de%ned by the user; in
general the structure considered to be dominant at a neutral pH
is generated; here also, incorrect assignments may lead to
docking errors
C
C
C
C
C
Surface Representation
! Dense MS surface ! Sparse surface
(Connolly) (Shuo Lin et al.)
The DOCK algorithm – Rigid docking
• The DOCK algorithm developed
by Kuntz and co-workers is
generally considered one of the
major advances in protein–ligand
docking [Kuntz et al., JMB, 1982,
161, 269]
• The earliest version of the DOCK
algorithm only considered rigid
body docking and was designed to
identify molecules with a high
degree of shape complementarity
to the protein binding site.
• The first stage of the DOCK
method involves the construction
of a negative image of the
binding site consisting of a series
of overlapping spheres of varying
radii, derived from the molecular
surface of the protein
AR Leach, VJ Gillet, An Introduction to Cheminformatics
The DOCK algorithm – Rigid docking
• Ligand atoms are then matched to
the sphere centres so that the
distances between the atoms
equal the distances between the
corresponding sphere centres,
within some tolerance.
• The ligand conformation is then
oriented into the binding site. After
checking to ensure that there are
no unacceptable steric
interactions, it is then scored.
• New orientations are produced by
generating new sets of matching
ligand atoms and sphere centres.
The procedure continues until all
possible matches have been
considered.
AR Leach, VJ Gillet, An Introduction to Cheminformatics
C
C
C
C
C
Proteins: Struct. Func. And Gen. 12
Clique-Detection
• Nodes comprise of matches between protein and ligand
• Edges connect distance compatible pairs of nodes
• In a clique all pair of nodes are connected
C R
C R
C
C C C R
Proteins 28
B
Sampling Hyperspace
✽ Say we are searching in D-dimensional hyperspace…
✽ We want to evaluate each of the D dimensions N times.
✽ The number of “evals” needed, n, is: n = ND
∴ N = n1/D
✽ For example, if n = 106 and…
✽ D=6, N = (106)1/6 = 10 evaluations per dimension
✽ D=36, N = (106)1/36 = ~1.5 evaluations per dimension
✽ Clearly, the more dimensions, the tougher it gets.
SAMPLING HYPERSPACE !
Very CPU time consuming...
N=T360/i
N: number of conformations
T: number of rotable bonds
I: incremental degrees
Methotrexate
10 rotable bonds
30º increments (discrete)
1012 plausible conformations!
Dihydrofolate reductase with a methotrexate (4dfr.pdb)
Spectrum of Search:
Breadth and Level-of-Detail
Level-of-Detail
Search Breadth
✽ Local
✽ Atom types
✽ Molecular Mechanics (MM) ✽ Bond stretching
✽ Intermediate ✽ Bond-angle bending
✽ Monte Carlo Simulated Annealing ✽ Rotational barrier potentials
(MC SA)
✽ Brownian Dynamics
✽ Molecular Dynamics (MD)
✽ Implicit solvation
✽ Global ✽ Polarizability
✽ Docking
✽ What’s rigid and what’s
flexible?
Two Kinds of Search
Systematic Stochastic
Exhaustive, deterministic Random, outcome varies
Outcome is dependent on Must repeat the search or
granularity of sampling perform more steps to improve
Feasible only for low- chances of success
dimensional problems Feasible for larger problems
Stochastic Search Methods
✽ Simulated Annealing (SA)*
✽ Evolutionary Algorithms (EA)
✽ Genetic Algorithm (GA)*
✽ Others
✽ Tabu Search (TS)
✽ Particle Swarm Optimisation (PSO)
✽ Hybrid Global-Local Search Methods
✽ Lamarckian GA (LGA)*
‘*Supported in AutoDock
C
C
C
R
state variables
C
Single Point Crossover
B
+ =
Two Point Crossover
B
+ =
C
Uniform Crossover
B
+ =
B
R
Genetic Algorithm
Use of a Genetic Algorithm as a sampling method
• Each conformation is described as a set of
rotational angles.
• 64 possible angles are allowed to each of the bond
in the ligand.
• Each plausible dihedral angle is codified in a set of 4
binary bits (26=64)
3
• Each conformation is codified by a so called
2
chromosome with 4 × 6 bits (0 or 1)
1
111010.010110.001011.010010
Φ1 Φ2 ...
Φ1= 1×25 + 1×24 + 1×23 + 0×22 + 1×21 + 0×20 = 58°
Population (ie, set of chromosomes or configurations)
011010.010110.011010.010111 Chromosome
111010.010110.001011.010010
001010.010101.000101.010001
101001.101110.101010.001000
001010.101000.011101.001011
Gene
011010.010110.011010.010111
Single
Mutation
011010.011110.011110.010111
C
001010.010101.000101.010001
011010.010110.011010.010111
Recombination
(Crossover)
001010.010101.011010.010111
011010.010110. 000101.010001
011010.010110.011010.010111 111110.010010.011110.010101
111010.010110.001011.010010 Migration 101010.110110.011011.011010
001010.010101.000101.010001 001010.010101.000101.010001
101001.101110.101010.001000 101101.101010.101011.001100
001010.101000.011101.001011 011010.100000.011001.101011
R
B
back
C
C
Side Chain Rotamer Libraries
Exploring the conformational space of protein side chains is a
complex optimization problem that leads to a combinatorial
explosion of conformers
Protein side chains can be represented adequately by a small
set of discrete rotamers
Analysis of side-chain Ramachadran plots of pdb structures
show that 17 of the 20 amino acids can be represented
adequately by 67 side-chain rotamers
This approach greatly simpli%ed the problem and enabled side-
chain Dexiblity to be tackled
C
Place-and-Join Algorithm
The place-and-join method splits the molecule into rigid
subparts
Each subpart is docked independently
Assembly of the fragments is then done by looking at their
relative location and assessing the possibility of re-connecting
them with the connectivity of the initial molecule, in
geometrically correct conformations
Incremental-Based Methods
The incremental based approach starts with an initial core
docked in the active site, and new fragments are progressively
added and minimized; the treatment is terminated when the
entire molecule is formed
GOLD
(Genetic Optimisation
for Ligand Docking)
Performs automated docking with
full acyclic ligand flexibility, partial
cyclic ligand flexibility and partial
protein flexibility in and around
active site.
Scoring: includes H-bonding term,
pairwise dispersion potential
(hydrophobic interactions),
molecular and mechanics term for
internal energy.
Analysis shows algorithm more likely to fail if ligand is large or highly flexible,
and more likely to succeed if ligand is polar
• The GA is encoded to search for H-bonding networks first;
• Fitness function contains a term for dispersive interactions but takes no account
of desolvation, thus underestimates The Hydrophobic Effect
C
R
C
R
C
C
C
C
C
C
" # " #
! Aij Bij Cij! Dij
∆G = G1 12 − 6 + G2 12 − 10 + Ehbond
i,j
rij rij i,j
rij rij
! qi qj ! 2
/2σ 2
+ G3 + G4 ∆Gtor + G5 Si V j e−rij
i,j
ϵ(r)rij i,j
Grid Maps
Precompute
interactions for each
type of atom
100X faster than
pairwise methods
Drawbacks: receptor is
conformationally rigid,
limits the search space
Setting up the AutoGrid Box
✽ Macromolecule atoms in the rigid part
✽ Center:
✽ center of ligand;
✽ center of macromolecule;
✽ a picked atom; or
✽ typed-in x-, y- and z-coordinates.
✽ Grid point spacing:
✽ default is 0.375Å (from 0.2Å to 1.0Å: ).
✽ Number of grid points in each dimension:
✽ only give even numbers (from 2 × 2 × 2 to 126 × 126 × 126).
✽ AutoGrid adds one point to each dimension.
✽ Grid Maps depend on the orientation of the macromolecule.
Scoring Functions
∆Gbinding = ∆GvdW + ∆Gelec + ∆Ghbond + ∆Gdesolv + ∆Gtors
Dispersion/Repulsion
∆Gbinding = ∆GvdW + ∆Gelec + ∆Ghbond + ∆Gdesolv + ∆Gtors
Soft Van der Waals Repulsion Functions
When calculated with force %elds from molecular mechanics,
steric clashes correspond to high energies
Modifying the normal Van der Waals repulsion function (in blue)
into a softer curve (such as the yellow one) enables them to be
less dominant and to simulate the plasticity of the receptor
without changing its geometry
Fig. 2: Another approach
implemented in AutoDock software
Electrostatics and Hydrogen Bonds
∆Gbinding = ∆GvdW + ∆Gelec + ∆Ghbond + ∆Gdesolv + ∆Gtors
Improved H-bond Directionality
Hydrogen
affinity AutoGrid 3
Oxygen Guanine Cytosine
affinity
AutoGrid 4
Guanine Cytosine
Huey, Goodsell, Morris, and Olson (2004) Letts. Drug Des. & Disc., 1: 178-183
Desolvation
∆Gbinding = ∆GvdW + ∆Gelec + ∆Ghbond + ∆Gdesolv + ∆Gtors
Torsional Entropy
∆Gbinding = ∆GvdW + ∆Gelec + ∆Ghbond + ∆Gdesolv + ∆Gtors
AutoDock 4 Scoring Function Terms
ΔGbinding = ΔGvdW + ΔGelec + ΔGhbond + ΔGdesolv + Δgtors
http://autodock.scripps.edu/science/equations
http://autodock.scripps.edu/science/autodock-4-desolvation-free-energy/
• ΔGvdW = ΔGvdW
12-6 Lennard-Jones potential (with 0.5 Å smoothing)
• ΔGelec
with Solmajer & Mehler distance-dependent dielectric
• ΔGhbond
12-10 H-bonding Potential with Goodford Directionality
• ΔGdesolv
Charge-dependent variant of Stouten Pairwise Atomic
Solvation Parameters
• ΔGtors
Number of rotatable bonds
AutoDock Empirical
Free Energy Force Field
⎛A Physics-based approach from
Bij ⎞ molecular mechanics
W vdw ∑⎜⎜ 12 − 6 ⎟⎟ +
ij
i, j ⎝ rij rij ⎠ Calibrated with 188 complexes from
LPDB, Ki’s from PDB-Bind
⎛C Dij ⎞
W hbond ∑ E(t)⎜⎜ 12 − 10 ⎟⎟ +
ij
Standard error = 2.52 kcal/mol
i, j ⎝ rij rij ⎠
qq
W elec ∑ i j +
i, j ε(rij )rij
W sol ∑ ( SiV j + S jVi )e
(−rij2 / 2σ 2
)
+
i, j
W tor N tor
Scoring Function in AutoDock 4
A Bij C Dij qq (−r 2 / 2σ 2 )
V = W vdw∑ 12 − 6 + W hbond ∑ E(t) 12 − 10 + W elec ∑ i j + W sol ∑( SiVj + SjVi )e ij
ij ij
i, j rij rij i, j rij rij i, j ε (rij )rij i, j
✽ Desolvation includes terms for all atom types
✽ Favorable term for C, A (aliphatic and aromatic carbons)
✽ Unfavorable term for O, N
✽ Proportional to the absolute value of the charge on the atom
✽ Computes the intramolecular desolvation energy for moving atoms
✽ Calibrated with 188 complexes from LPDB, Ki’s from
PDB-Bind
Standard error (in Kcal/mol):
✽ 2.62 (extended)
✽ 2.72 (compact)
✽ 2.52 (bound)
✽ 2.63 (AutoDock 3, bound)
✽ Improved H-bond directionality
AutoDock Vina Scoring
Function
Combination of knowledge-based and empirical approach
∆Gbinding = ∆Ggauss + ∆Grepulsion + ∆Ghbond + ∆Ghydrophobic + ∆Gtors
∆G gauss
Attractive term for dispersion, two gaussian functions
∆Grepulsion
Square of the distance if closer than a threshold value
∆Ghbond
Ramp function - also used for interactions with metal ions
∆Ghydrophobic
Ramp function
∆Gtors
Proportional to the number of rotatable bonds
Calibrated with 1,300 complexes from PDB-Bind
Standard error = 2.85 kcal/mol
C
C
AutoDock and Vina Search
Methods
Global search algorithms:
Simulated Annealing (Goodsell et al. 1990)
Genetic Algorithm (Morris et al. 1998)
Local search algorithm:
Solis & Wets (Morris et al. 1998)
Hybrid global-local search algorithm:
Lamarckian GA (Morris et al. 1998)
Iterated Local Search:
Genetic Algorithm with Local Gradient
Optimization (Trott and Olson 2010)
R
pdbq
pdbqs
B
Practical Considerations
What problems are feasible?
Depends on the search method:
Vina > LGA > GA >> SA >> LS
AutoDock SA : can output trajectories, D < 8
torsions.
AutoDock LGA : D < 8-16 torsions.
Vina : good for 20-30 torsions.
When are AutoDock and Vina not suitable?
Modeled structure of poor quality;
Too many torsions (32 max);
Target protein too flexible.
Redocking studies are used to validate the
method
Using AutoDock: Step-by-Step
✽ Set up ligand PDBQT—using ADT’s “Ligand” menu
✽ OPTIONAL: Set up flexible receptor PDBQT—using
ADT’s “Flexible Residues” menu
✽ Set up macromolecule & grid maps—using ADT’s “Grid”
menu
✽ Pre-compute AutoGrid maps for all atom types in your set of
ligands—using “autogrid4”
✽ Perform dockings of ligand to target—using “autodock4”,
and in parallel if possible.
✽ Visualize AutoDock results—using ADT’s “Analyze” menu
✽ Cluster dockings—using “analysis” DPF command in
“autodock4” or ADT’s “Analyze” menu for parallel docking
results.
AutoDock 4 File Formats
Prepare the Following Input Files
✽ Ligand PDBQT file
✽ Rigid Macromolecule PDBQT file
✽ Flexible Macromolecule PDBQT file (“Flexres”)
✽ AutoGrid Parameter File (GPF)
✽ GPF depends on atom types in:
✽ Ligand PDBQT file
✽ Optional flexible residue PDBQT files)
✽ AutoDock Parameter File (DPF)
Run AutoGrid 4
✽ Macromolecule PDBQT + GPF Grid Maps, GLG
Run AutoDock 4
✽ Grid Maps + Ligand PDBQT [+ Flexres PDBQT]
+ DPF DLG (dockings & clustering)
Things you need to do before using
AutoDock 4
Ligand:
✽ Add all hydrogens, compute Gasteiger charges, and merge
non-polar H; also assign AutoDock 4 atom types
✽ Ensure total charge corresponds to tautomeric state
✽ Choose torsion tree root & rotatable bonds
Macromolecule:
✽ Add all hydrogens (PDB2PQR flips Asn, Gln, His),
compute Gasteiger charges, and merge non-polar H; also
assign AutoDock 4 atom types
✽ Assign Stouten atomic solvation parameters
✽ Optionally, create a flexible residues PDBQT in addition to
the rigid PDBQT file
✽ Compute AutoGrid maps
Preparing Ligands and Receptors
✽ AutoDock uses ‘United Atom’ model
✽ Reduces number of atoms, speeds up docking
✽ Need to:
✽ Add polar Hs. Remove non-polar Hs.
✽ Both Ligand & Macromolecule
✽ Replace missing atoms (disorder).
✽ Fix hydrogens at chain breaks.
✽ Need to consider pH:
✽ Acidic & Basic residues, Histidines.
✽ http://molprobity.biochem.duke.edu/
✽ Other molecules in receptor:
✽ Waters; Cofactors; Metal ions.
✽ Molecular Modelling
Using AutoDockelsewhere.
4 with ADT
Atom Types in AutoDock 4
✽ One-letter or two-letter atom type codes
✽ More atom types than AD3:
✽ 22
✽ Same atom types in both ligand and receptor
✽ http://autodock.scripps.edu/wiki/NewFeatures
✽ http://autodock.scripps.edu/faqs-help/faq/
how-do-i-add-new-atom-types-to-autodock-4
✽ http://autodock.scripps.edu/faqs-help/faq/
where-do-i-set-the-autodock-4-force-field-parameters
Partial Atomic Charges are required
for both Ligand and Receptor
✽ Partial Atomic Charges:
✽ Peptides & Proteins; DNA & RNA
✽ Gasteiger (PEOE) - AD4 Force Field
✽ Organic compounds; Cofactors
✽ Gasteiger (PEOE) - AD4 Force Field;
✽ MOPAC (MNDO, AM1, PM3);
✽ Gaussian (6-31G*).
✽ Integer total charge per residue.
✽ Non-polar hydrogens:
✽ Always merge
Carbon Atoms can be either Aliphatic
or Aromatic Atom Types
✽ Solvation Free Energy
✽ Based on a partial-charge-dependent variant of Stouten
method.
✽ Treats aliphatic (‘C’) and aromatic (‘A’) carbons differently.
✽ Need to rename ligand aromatic ‘C’ to ‘A’.
✽ ADT determines if ligand is a peptide:
✽ If so, uses a look-up dictionary.
✽ If not, inspects geometry of ‘C’s in rings. Renames ‘C’ to ‘A’
if flat enough.
✽ Can adjust ‘planarity’ criterion (15° detects more rings than
default 7.5°).
Defining Ligand Flexibility
✽ Set Root of Torsion Tree:
✽ By interactively picking, or
✽ Automatically.
✽ Smallest ‘largest sub-tree’.
✽ Interactively Pick Rotatable Bonds:
✽ No ‘leaves’;
✽ No bonds in rings;
✽ Can freeze:
✽ Peptide/amide/selected/all;
✽ Can set the number of active torsions that move either
the most or the fewest atoms
Setting Up Your Environment
✽ At TSRI:
✽ Modify .cshrc
✽ Change PATH & stacksize:
✽ setenv PATH (/mgl/prog/$archosv/bin:/tsri/python:$path)
✽ % limit stacksize unlimited
✽ ADT Tutorial, every time you open a Shell or Terminal, type:
✽ % source /tsri/python/share/bin/initadtcsh
✽ To start AutoDockTools (ADT), type:
✽ % cd tutorial
✽ % adt1
✽ Web
✽ http://autodock.scripps.edu
Choose the Docking Algorithm
✽ SA.dpf — Simulated Annealing
✽ GA.dpf — Genetic Algorithm
✽ LS.dpf — Local Search
✽ Solis-Wets (SW)
✽ Pseudo Solis-Wets (pSW)
✽ GALS.dpf — Genetic Algorithm with
Local Search, i.e. Lamarckian GA
Run AutoGrid
✽ Check: Enough disk space?
✽ Maps are ASCII, but can be ~2-8MB !
✽ Start AutoGrid from the Shell:
% autogrid4 –p mol.gpf –l mol.glg &
% autogrid4 -p mol.gpf -l mol.glg ; autodock4 -p mol.dpf -l mol.dlg
✽ Follow the log file using:
% tail -f mol.glg
✽ Type <Ctrl>-C to break out of the ‘tail -f’
command
✽ Wait for “Successful Completion” before starting
AutoDock
Run AutoDock
✽ Do a test docking, ~ 25,000 evals
✽ Do a full docking, if test is OK, ~ 250,000 to
50,000,000 evals
✽ From the Shell:
✽ % autodock4 –p yourFile.dpf –l yourFile.dlg &
✽ Expected time? Size of docking log?
✽ Distributed computation
✽ At TSRI, Linux Clusters
✽ % submit.py stem 20
% recluster.py stem 20 during 3.5
Analyzing AutoDock Results
✽ In ADT, you can:
✽ Read & view a single DLG, or
✽ Read & view many DLG results files in a
single directory
✽ Re-cluster docking results by conformation
& view these
✽ Outside ADT, you can re-cluster several
DLGs
✽ Useful in distributed docking
✽ % recluster.py stem 20 [during|end] 3.5
Viewing Conformational Clusters by
RMSD
✽ List the RMSD tolerances
✽ Separated by spaces
✽ Histogram of conformational clusters
✽ Number in cluster versus lowest energy in that cluster
✽ Picking a cluster
✽ makes a list of the conformations in that cluster;
✽ set these to be the current sequence for states player.