0% found this document useful (0 votes)
3 views7 pages

Bemis 1996

The study analyzes the molecular frameworks of 5120 commercially available drugs to identify common structural features using shape description methods. It finds that while there are 1179 different frameworks, a significant portion of the drugs share only 32 frequently occurring frameworks, indicating low diversity in drug shapes. The analysis aims to inform future drug discovery by highlighting prevalent structural patterns and guiding design efforts.

Uploaded by

Đức An Lý
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Bemis 1996

The study analyzes the molecular frameworks of 5120 commercially available drugs to identify common structural features using shape description methods. It finds that while there are 1179 different frameworks, a significant portion of the drugs share only 32 frequently occurring frameworks, indicating low diversity in drug shapes. The analysis aims to inform future drug discovery by highlighting prevalent structural patterns and guiding design efforts.

Uploaded by

Đức An Lý
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

J. Med. Chem.

1996, 39, 2887-2893 2887

Articles

The Properties of Known Drugs. 1. Molecular Frameworks


Guy W. Bemis* and Mark A. Murcko
Vertex Pharmaceuticals, 130 Waverly Street, Cambridge, Massachusetts 02139-4242

Received April 19, 1996X

In order to better understand the common features present in drug molecules, we use shape
description methods to analyze a database of commercially available drugs and prepare a list
of common drug shapes. A useful way of organizing this structural data is to group the atoms
of each drug molecule into ring, linker, framework, and side chain atoms. On the basis of the
two-dimensional molecular structures (without regard to atom type, hybridization, and bond
order), there are 1179 different frameworks among the 5120 compounds analyzed. However,
the shapes of half of the drugs in the database are described by the 32 most frequently occurring
frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely
low. In our second method of analysis, in which atom type, hybridization, and bond order are
considered, more diversity is seen; there are 2506 different frameworks among the 5120
compounds in the database, and the most frequently occurring 42 frameworks account for only
one-fourth of the drugs. We discuss the possible interpretations of these findings and the way
they may be used to guide future drug discovery research.

Introduction
The drug design process is largely driven by the
instincts, intuition, and experiences of pharmaceutical
research scientists. It is often instructive to attempt
to “capture” these experiences by analyzing the histori-
cal record, i.e., successful drug design projects of the
past. The inferences drawn from this analysis can play
an important role in shaping our thinking on current
and future projects. For this reason, we would like to
analyze the structures of a large number of drugssthe
ultimate product of a successful drug design effort.
There is a wealth of information implicitly encoded in
the two-dimensional and three-dimensional structures Figure 1. Graph representation of molecules.
of molecules that are currently sold as drugs. This
includes toxicity, stability (both chemical and meta- Methods
bolic), synthetic accessibility, starting material costs, The current version of the CMC database (v. 94.1) includes
and the like. Our goal for this paper is to begin to more than 6700 compounds. However, many of these do not
deconvolute this information in order to apply it to the meet our criteria for various reasons, e.g., imaging agents,
dental resins, and veterinary compounds. Thus, our first task
design of new drugs. was to identify and remove these compounds. We eliminated
There are several computational tools available for all compounds for which no therapeutic activity class was
this analysis: substructure searching using one of given, as well as compounds which fell into any of the following
several commercially available software packages (e.g. classes: radiopaque agents, contrast agents, solvents, anes-
thetics, disinfectants, topicals, local agents, spermicides, wet-
Merlin, ISIS, Unity),1-3 automated ring searching using ting agents, flavoring agents, pharmaceutical aids, surgical
one of several published algorithms,4-8 and shape aids, dental, surfactants, sunscreens, ultraviolet screens,
descriptor methods.9-12 We use shape descriptor meth- emetics, preservatives, aerosol propellants, chelators, kera-
ods because they are easily implemented and are flexible tolytics, insecticides, astringents, herbicides, laxatives, sweet-
enough to allow the analysis to be performed in an eners, dental caries prophylactics, adhesives, dentistry, phar-
automated way. maceutic aids, veterinary, buffers, scabicides, and ecto-
parasiticides. After this process, the CMC database had 5120
We analyze the Comprehensive Medicinal Chemistry remaining entries.15
(CMC) database13 which contains two-dimensional and Our analysis of the structures in the CMC database has
predicted three-dimensional structures and important been carried out on two levels, using atomic properties and
biochemical properties for known drugs. The CMC graph properties. Atomic properties include such information
as element type, atomic hybridization, and atomic charge.
database has been developed from Pergammon’s Com- Graph properties of molecules are the connectivity properties
prehensive Medicinal Chemistry series.14 of the atoms representing a molecule, that is, the information
that may be derived from a molecular structure by considering
* To whom correspondence should be addressed. E-mail: bemis@ each atom to be a vertex and each bond to be an edge on a
vpharm.com. graph.16 The graph for a particular molecule may be consid-
X Abstract published in Advance ACS Abstracts, July 1, 1996. ered an archetype for each instance of that molecular shape.

S0022-2623(96)00292-0 CCC: $12.00 © 1996 American Chemical Society


2888 Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 Bemis and Murcko

Figure 2. Graph representation of a typical drug molecule.


Figure 4. Hierarchical description of molecules.

thioridazine molecule consists of two ring systems: a six-ring


and three linearly fused six-rings. Together these rings and
linkers define the framework of this molecule. The concept of
a framework is central to our paper, and provides an important
distinction between our present work and work done previ-
ously.6
We can now classify molecules and their constituent atom
groupings into a hierarchy as shown in Figure 4. This
classification scheme is very useful for analyzing the structures
of drug molecules for several reasons. First, well-represented
frameworks can be identified, and emphasis can be placed on
these for new drug discovery. Second, linkers and ring systems
can be identified for potential use in a combinatorial-type
approach to compound library generation. Third, compound
libraries may be evaluated for their relationship to the shapes
of known drugs. In other words, we can evaluate how well
the diversity space of a library overlaps with our representa-
tion of drug-space.
We begin our analysis by identifying side chain atoms,
Figure 3. Distinguishing between ring systems, linkers, and which is done as follows. Each atom bonded to only one other
side chains. atom is identified as a side chain atom and removed from the
That is, for a molecule such as pyridine (Figure 1a), the molecule. This process is repeated until either the molecule
molecular graph or archetype is the graph with six vertices disappears (acyclic molecules) or until each atom is bonded to
(Figure 1b). The same archetype represents molecules such at least two other atoms. The remaining atoms are identified
as benzene, cyclohexane, and pyran, among many others as the framework atoms. The next step in our analysis is the
(Figure 1c). Thus the structures of molecules can be readily identification of atoms within the framework that are in rings
analyzed in terms of a hierarchy in which molecular arche- (or cycles in the graph) using a depth-first search.17 Any atom
types are at the top, and individual molecules are at the bottom not part of a ring is identified as a linker atom. This process
(Figure 1). follows the hierarchy shown in Figure 4.
When analyzing drug molecules, one is faced with a slightly The molecular frameworks obtained in this manner were
more complicated set of graphs than in the simple example grouped into clusters of identical shape description. Our
shown in Figure 1. To demonstrate this point, we might analysis has been carried out in two ways: we have conducted
consider the antidepressant thioridazine, which is shown along both a purely graph theoretical analysis and an analysis which
with its graph representation or archetype in Figure 2. We also considers atomic properties. Both methods follow es-
can now pick out structural elements which can be used to sentially the same formal procedure with the only difference
further order groups of atoms within a molecular graph. We being the shape descriptor used. For the graph analysis we
may dissect any molecule into four units: ring, framework, used two-dimensional triangle shape descriptors12 and for the
linker, and side chains. We adopted the following definitions analysis including atomic properties we used topological
to aid our analysis. torsions.11 For computation of topological torsions, we found
Ring Systems. We define ring systems to be cycles within it necessary to retain the π electrons associated with frame-
the graph representation of molecules and cycles sharing an work atoms when side chains were removed. For example,
edge (a connection between two atoms or a bond). For cyclohexanone would have the sp2 oxygen tagged as a side
example, benzene, naphthalene, and anthracene are all single chain atom, and the sp2 carbon tagged as having two associ-
ring systems. Treating cycles this way makes sense from a ated pi electrons. On the basis of the topological torsion
chemical structural point of view. As an approximation, the representation, the cyclohexanone framework would therefore
cycles and fused cycles in a molecule represent rigid units in have a different shape description than the cyclohexane
which many degrees of freedom are removed from a collection framework. The cyclohexanone framework is therefore rep-
of atoms. resented with two dots next to the sp2 carbon to indicate the
Linker Atoms. Atoms that are on the direct path connect- associated electrons. We have used this notation in Charts 2
ing two ring systems are defined as linker atoms. As can be and 3.
seen in Figure 3, thioridazine has a two-atom linker connecting
the two ring systems. Molecules such as biphenyl have a zero Results
atom linkersthe six-membered rings are different ring sys-
tems. First we summarize the results of the graph theory
Side Chain Atoms. Any nonring, nonlinker atoms are (archetype) analysis and then the atomic property
defined as side chain atoms. Figure 3 shows that thioridazine (instance) analysis. Finally, we discuss the relationship
has two side chains: a single-atom side chain attached to the
six-ring and a two-atom side chain attached to the fused
between the two kinds of analysis.
tricyclic ring system. From the graph theory analysis, there are 1179
Framework. The framework is defined as the union of ring different frameworks among the 5120 compounds ana-
systems and linkers in a molecule. As shown in Figure 3, the lyzed. Of these frameworks, 783 (66%) are unique, i.e.,
Properties of Known Drugs Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 2889

Chart 1. Graph Frameworks for Compounds in the CMC Database as Classified by Connectivity Triangles (Numbers
Indicate Frequency of Occurrence)

they are found in only one drug molecule. Chart 1 type, hybridization, and bond order) are considered.
shows graph frameworks for compounds in the CMC Somewhat more diversity is seen; there are 2506 dif-
database as classified by connectivity triangles. We ferent frameworks among the 5120 compounds in the
have shown only frameworks that exist in at least 20 database. Again, a large majority of these frameworks
drugs. This set of 32 frameworks accounts for 50% of (1908, or 76%) are unique. Chart 2 shows atomic prop-
the 5120 total drug molecules. Clearly the six-ring is erty-based drug frameworks (drug instances) that occur
the most commonly used framework for these drugs. in the CMC at least 10 times. Naturally, because this
Acyclic molecules (those with no framework) account for classification scheme accounts for hybridization and
306 (6%) of the molecules we examined. bond order, one would expect a more diverse set of
Our second method of analysis uses topological tor- frameworks to be required to represent the drug data-
sions11 for classification. Several atom properties (atom base. Even so, this set of 41 frameworks accounts for
2890 Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 Bemis and Murcko

Chart 2. Atomic Frameworks for Compounds in the CMC Database as Classified by Topological Torsions (Numbers
Indicate Frequency of Occurrence)
Properties of Known Drugs Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 2891

Chart 3. All Six-Membered Rings Found in the CMC for over half of known drugs (as defined by our subset
Database (Numbers Indicate Frequency of Occurrence) of the CMC database).
A problem sometimes encountered when using mo-
lecular shape descriptors is multiple representations,
cases where different shapes are represented by identi-
cal shape descriptions. There are a number of ways to
deal with this problem, such as adding more detail to
the shape descriptor or using multiple shape descrip-
tors.6 For small data sets such as the CMC, perhaps
the simplest solution is to look through groups of
molecules with identical shape descriptions and pick out
cases of multiple representation. This is the method
we used. An example of multiple representation is
found in the topological torsion shape description of
these two molecules:

We found two examples of the B molecular framework


grouped with 30 examples of the type A framework so
we assigned them to separate clusters.
Finally, we should note that as a control, a partial
analysis was performed also on the complete CMC
database (approximately 6700 compounds), and the
results were substantially the same.

Discussion
This is our first attempt at classifying the shapes of
drug molecules, and our goal is to provide a “high-level
overview” of the gross structural features of these
molecules. Accordingly, for purposes of this research,
we have deliberately defined “shape” in simple terms.
The first classification scheme ignores such important
features as the details of substituents on rings, chain
branching, bond order, atom types, stereochemistry, and
three-dimensional conformation. The second classifica-
tion method does account for bond order and atom types.
1235 (24%) of the 5120 molecules we examined. Clearly There is no reason to believe that the set of 5120
benzene is the most commonly used framework for these molecules in our database represents all the possible
drugs. shapes that a drug may take. However, it is instructive
It is instructive to understand the relationship be- to examine the universe of known drugs to see what
tween the graph theory frameworks, which can be patterns may exist. Once these patterns have been
viewed as providing a “high-level” or “generic” classifi- deduced, the drug designer may apply them in various
cation scheme, and the atom property-based frame- ways. For example, one might attempt to bias a de novo
works, which further subdivide classes of frameworks design program or a combinatorial chemistry effort to
based on their chemical properties. As an example, we produce a set of molecules which either contains or does
may consider the atomic property based framework for not contain these patterns.
the most popular graph theory based frameworksthe The reader must bear in mind that “shape” in this
six-ring. Chart 3 shows the set of six-ring atomic frame- work refers to the two-dimensional topological graph of
works that accounts for the 606 six-ring scaffolds found the molecules. While three-dimensional shape is par-
in our filtered version of the CMC database. Over- tially encoded in the two-dimensional graph of a mol-
whelmingly, the most common six-ring atomic frame- ecule, we expect that the three-dimensional conforma-
work is benzene. Of the drug molecules we considered, tions of drugs with the same topological shape will not
8.5% (433 out of 5120) have benzene as their molecular all be similar, although certain conformations would be
framework. expected to appear more frequently than others.
Chart 1 can be further broken down (by inspection) Of course, the preferences we have identified for
into rings and linkers. The linkers present are chains certain shapes do not necessarily reveal some funda-
with zero to seven nodes shown in Chart 4srings and mental truth about drugs, receptors, metabolism, or
linkers. Rings have a dashed line showing points where toxicity. Instead, it may reflect the constraints imposed
linkers can potentially be attached. By using this set by the scientists who have produced these drugs.
of 14 rings (with eight potential attachment points) and Constraints due to synthetic or patent considerations,
eight linkers, we can derive the molecular frameworks cost, or a general conservatism (i.e., a tendency to make
2892 Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 Bemis and Murcko

Chart 4. Graph Representations of the Rings and Linkers for the Most Common Drug Frameworks Found in Chart 1a

a Linkers are depicted with open valences on each end; the number of nodes in each linker is given to the left. Rings are depicted with

dashed lines indicating possible points of attachment for linkers.

new compounds which are structurally similar to known composed of a particular framework divided by the
compounds) all may be reflected in these findings. number of drugs made from that framework.
However, half of the known drugs fall into only 32 As an example, the biphenyl molecular framework
shape categories. The drugs which possess these topo- (Chart 2) constitutes 16 drugs in our database. The
logical shapes (Chart 1) are quite different in polarity, CMC lists the following distinct therapeutic classes for
conformation, hydrogen-bonding potential, and other these drugs: antiamebic, antifungal, antiinfective, anti-
properties; they bind to different classes of receptor; and hypercholesteremic, antihyperlipoproteinemic, fascioli-
they serve different pharmacological needs. And yet, cide, antirheumatic, analgesic, anti-inflammatory, anti-
they all have the same topological shape. thrombotic, uricosuric, and antiarrhythmic. The pharma-
In part, the results in Chart 1 stem from the simplic- cological promiscuity parameter for this molecular
ity of our classification scheme, but it also may reflect framework is therefore 12/16 or 0.75.
some of the properties which are beneficial for producing This parameter would be extremely useful for several
drugs. For example, if we consider the set of 32 purposes such as choosing a scaffold upon which to begin
frameworks in Chart 1, we see that most (23) contain a combinatorial design effort. Unfortunately, the exact
at least two six-rings linked or fused together. We also pharmacological target for each drug is not known, and
see that only three of these frameworks have more than often multiple therapeutic categories are listed for
five rotatable bonds. drugs, so this analysis would require either dealing with
A “pharmacological promiscuity” parameter could be a very restricted subset of drugs or grouping together
provided for each of our frameworks. This was sug- similar low-level pharmacological targets.
gested to us by one external reviewer and several It is intriguing to consider ways in which our analysis
internal reviewers. This parameter would be defined might be used to direct a de novo design effort. For
by the ratio of targets to frameworks, that is, the example, on the basis of the above-mentioned observa-
number of pharmacological targets acted upon by drugs tion that two six-membered rings are a common motif,
Properties of Known Drugs Journal of Medicinal Chemistry, 1996, Vol. 39, No. 15 2893

one might begin a de novo exercise by docking two (9) Concepts and Applications of Molecular Similarity; Johnson, M.
benzene rings into the active site using shape-based A., Maggiora, G. M., Eds.; JohnWiley & Sons, Inc.: New York,
1990.
methods that ignore electrostatics.18,19 Next, one could (10) Carhart, R. E.; Smith, D. H.; Venkataraghavan, R. Atom Pairs
link or fuse these rings into a single ligand using one of as Molecular Features in Structure-Activity Studies: Definition
several algorithms,20-23 placing special emphasis on and Applications. J. Chem. Inf. Comput. Sci. 1985, 25, 82-85.
(11) Nilakantan, R.; Bauman, N.; Dixon, J. S.; Venkataraghavan, R.
scaffolds found in Chart 1 (directed linking). Finally, Topological Torsion: A New Molecular Descriptor for SAR
one could assign atom types for the ligand based on Applications. Comparison with Other Descriptors. J. Chem. Inf.
electrostatic complementarity with the active site,19 Comput. Sci., 1987, 27, 82-85.
placing special emphasis on the atomic distributions (12) Bemis, G. W.; Kuntz, I. D. A fast and efficient method for 2D
and 3D molecular shape description. J. Comput.-Aided Mol. Des.
found for the scaffolds found in Charts 2 and 3 (directed 1992, 6, 607-628.
atom assignment). Some minimization would likely be (13) Comprehensive Medicinal Chemistry (CMC-3D) Release 94.1 is
needed as different atomic hybridizations are overlaid available from MDL Information Systems Inc., San Leandro, CA.
on the initial benzene fragments. (14) Comprehensive Medicinal Chemistry, Vol. 6; Hansch, C., Sammes,
P. G., J. B., Taylor, Series Eds.; Pergamon: Oxford, 1990.
Many other approaches also are possible. For ex- (15) A similar process of removing compounds from the CMC has
ample, one might attempt to utilize the frameworks been carried out as part of an analysis of the molecular weights
found in Chart 2. These could be used as seed struc- of known drugs: Kim, E. E.; Baker, C. T.; Dwyer, M. D.; Murcko,
M. A.; Rao, B. G.; Tung, R. D.; Navia, M. A. Crystal Structure
tures for de novo structure generation by random of HIV-1 Protease in Complex with VX-478, a Potent and Orally
combination of fragments24,25 and linkers such as those Bioavailable Inhibitor of the Enzyme. J. Am. Chem. Soc. 1995,
in the ILIAD database.23 Finally, our collection of “rings 117, 1181-1182.
and linkers” in Chart 4 might be used in conjunction (16) For a good introduction to molecules as graphs, see: Hansen,
P. J.; Jurs, P. C. Chemical Applications of Graph Theory. J.
with fragment perception algorithms26 and similarity Chem. Ed. 1988, 65, 574-580.
methods27 to select compounds for synthesis and testing (17) Cormen, T. H.; Leiserson, C. E.; Rivest, R. L. Introduction to
from a combinatorial library or compound collection Algorithms; MIT Press: Cambridge, 1990; pp 477-485.
(18) Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; Ferrin,
database.
T. E. A geometric approach to macromolecule-ligand interactions.
Future research in the area of “drug database mining” J. Mol. Biol. 1982, 161, 269-288.
will focus on other properties of known drugs including (19) Meng, E. C.; Shoichet, B. K.; Kuntz, I. D. Automated docking
their flexibility, log P, solubility, and a more detailed with grid-based energy evaluation. J. Comput. Chem. 1992, 13,
505-524.
shape description that includes such features as charge (20) Roe, D. C.; Kuntz, I. D. BUILDER v.2: Improving the chemistry
and hydrogen bonding potential. of a de novo design strategy. J. Comput.-Aided Mol. Des. 1995,
9, 269-282.
Acknowledgment. We would like to thank Ajay, (21) Lewis, R. A.; Roe, D. C.; Huang, C.; Ferrin, T. E.; Langridge, R.;
Chris Baker, Joshua Boger, Chris Lepre, Roger Tung, Kuntz, I. D. Automated site-directed drug design using molec-
ular lattices. J. Mol. Graph. 1992, 10, 66-78.
Pat Walters, Keith Wilson, and Bob Zelle for their (22) Lauri, G.; Bartlett, P. A. CAVEAT: a Program to Facilitate the
suggestions and their careful reading of the manuscript. Design of Organic Molecules. J. Comput.-Aided Mol. Des. 1994,
We thank Scott Thomas for help with processing the 8, 51-66.
CMC database. (23) Gillet, V. J.; Newell, W.; Mata, P.; Myatt, G.; Sike, S.; Zsoldos,
Z.; Johnson, A. P. SPROUT: Recent Developments in the De
Novo Design of Molecules. J. Chem. Inf. Comput. Sci. 1994, 34,
References 207-217.
(1) Available from DAYLIGHT Chemical Chemical Information (24) Nilakantan, R.; Bauman, N.; Venkataraghavan, R. A Method
Systems, Inc., Irvine, CA. for Automatic Generation of Novel Chemical Structures and Its
(2) Available from MDL Information Systems, Inc., San Leandro, Potential Applications to Drug Discovery. J. Chem. Inf. Comput.
CA. Sci. 1991, 31, 527-530.
(3) Available from Tripos, Inc., St. Louis, MO. (25) Pearlman, D. A.; Murcko, M. A. CONCERTS: Dynamic connec-
(4) Klingebiel, U.; Specht, K. Automatic Generation of the Chemical tion of fragments as an approach to de novo ligand design. J.
Ringcode from a Connectivity Chart, J. Chem. Inf. Comput. Sci. Med. Chem. 1996, 39, 1651-1663.
1980, 20, 113-116. (26) Barakat, M. T.; Dean, P. M. The Atom Assignment Problem in
(5) Randic, M. Ring ID Numbers, J. Chem. Inf. Comput. Sci. 1988,
Automated De Novo Drug Design. 2. A Method for Molecular
28, 142-147.
(6) Nilakantan, R.; Bauman, N.; Haraki, K.; Venkataraghavan, R. Graph and Fragment Perception. J. Comput.-Aided Mol. Des.
A Ring-Based Chemical Structural Query System: Use of a 1995, 9, 351-358.
Novel Ring-Complexity Heuristic. J. Chem. Inf. Comput. Sci. (27) Willett, P. Algorithms for the Calculation of Similarity in
1990, 30, 65-68. Chemical Structure Databases. In Concepts and Applications of
(7) Domokos, L. Beilstein Ring Search System. 1. General Design. Molecular Similarity; Johnson, M. A., Maggiora, G. M., Eds.;
J. Chem. Inf. Comput. Sci. 1993, 33, 663-667. JohnWiley & Sons, Inc.: New York, 1990; pp 43-64.
(8) Fan, B. T.; Panaye, A.; Doucet, J.-P.; Barbu, A. Ring Perception.
A New Algorithm for Directly Finding the Smallest Set of
Smallest Rings from a Connection Table. J. Chem. Inf. Comput.
Sci. 1993, 33, 657-662. JM9602928

You might also like