(Quantitative Structure Activity Relationship)
relationships collectively referred to as QSARs, theoretical models that can be
used to predict the physicochemical and biological properties of molecules. A structure-activity relationship (SAR)
is a (qualitative) association between a chemical substructure and the
potential of a chemical containing the substructure to exhibit a certain
biological effect. A quantitative structure-activity relationship (QSAR) is a
mathematical model that relates a quantitative measure of chemical structure
(e.g. a physicochemical property) to a physical property or to a biological
effect (e.g. a toxicological endpoint).
This approach attempts to identify and
quantify the physicochemical properties of a drug and to see whether any of
these properties has an effect on drugs biological activity. If such a relationship holds true, an
equation can be drawn up which quantifies the relationship and allows the
medicinal chemist to say with some confidence that property has an important
role in the distribution or mechanism of the drug. By quantifying
physiochemical properties, it should be possible to calculate in advance what
the biological activity of a novel analogue might be.
are many practical purposes of a QSAR and these techniques are utilized widely
in many situations. The purpose of in silico studies, therefore, includes the
predict biological activity and physico-chemical properties by rational means.
comprehend and rationalize the mechanisms of action within a series of
these aims, the reasons for wishing to develop these models include
in the cost of product development (e.g. in the pharmaceutical, pesticide,
Personal products, etc. areas).
could reduce the requirement for lengthy and expensive animal tests.
(and even, in some cases, replacement) of animal tests, thus reducing animal
use and obviously pain and discomfort to animals.
areas of promoting green and greener chemistry to increase efficiency and
eliminate waste by not following leads unlikely to be successful.
Graphs and equations
A range of compounds are synthesized in
order to vary one physiochemical property (log P) and to test how this affects
the biological activity (log 1/C). A graph is then drawn to plot the biological
activity on the y-axis and physiochemical features on the x-axis. It is
necessary to draw the best possible line through the data points on the graph.
This is done by a procedure known as ‘linear regression analysis by the least
square method’. The best line will be the one closest to the data points. To
measure how close the data points are, vertical lines are drawn from each
point. These verticals are measured and then squared in order to eliminate the
negative values. The squares are then added up to give a total. The best line
through the points will be the line where this total is a minimum.
The equation of the straight line will be
y = k1x + K2 where k1 and K2 are constants. By varying k1 and K2, different
equations are obtained until the best line is obtained. This whole process can
be speedily done by computer programme. The significance of the equation is
given by a term known as the regression coefficient (r). This coefficient can
again calculated by computer. For a perfect fit r2 = 1. Good fits generally
have r2 values of 0.95 or above.
There are many physical, structural and
chemical properties which have been studied by the QSAR approach, but the most
commonly studied are hydrophobic, electronic and steric. This is because it is
possible to quantify these effects relatively easy.
The hydrophobic character of a drug is
crucial to how easily it crosses the cell membranes and may also be important
in receptor interactions. Changing substituents on a drug may well have
significant effects on its hydrophobic character and hence its biological
activity. Therefore it is important to have a means of predicting this
partition coefficient (P)
The hydrophobic character of a drug
can be measured experimentally by testing the drug’s relative distribution in
an octanol/water mixture. Hydrophobic molecules will prefer to dissolve in the
octanol layer of this two-phase system, whereas hydrophilic molecules will
prefer the aqueous layer. The relative distribution is known as the partition
coefficient and is obtained from the following equation:
= Concentration of drug in octanol/ Concentration of drug in aqueous solution
compounds will have a high P value, whereas hydrophilic compounds will have a
low P value.
graph is dawn by plotting log (1/C) versus log P; a straight line graph is
obtained showing that there is a relation between hydrophobicity and biological
activity. Such a line would have the following equation:
(1/C) = k1log P + k2
The electronic effect of various
substituents will clearly have an effect on a drug’s ionization or polarity.
This in turn may have an effect on how easily a drug can pass through cell
membrane or how strongly it can bind to a receptor.
Substitution Constant (?)
(1940) is a measure of e-withdrawing or e-donating effects exerted by the
substituents on the reaction center.
e-withdrawing groups stabilize the carboxylate ion: larger Kx, and have positive ? values, e.g.
Cl, CN, CF3.
e-donating groups (e.g. alkyl), equilibrium shifts left (favouring protonated):
lower Kx and negative ? values.
constant takes into account both resonance and inductive effects; thus, the
value depends on whether the substituent is Para or Meta substituted.
The ortho position is not measured due to steric effects. In some
positions only inductive effects effect & some both resonance &
inductive effects play a part. The electronic substitute constants are also
available for aliphatic groups
of resonance forms that stabilize the negative charged carboxylate in
constants, ? can be related to
the free energy of ionization via the Vant Hoff relationship (In this case ? would correspond to the equilibrium
constant, K, allowing for Hammett
relationship is to also be referred to as linear free energy relationship
(LFER)). Uses: Only one known example where just Hammett constants effectively
predict activity (insecticides, diethyl phenyl phosphates. These drugs do not
have to pass into or through a cell membrane to have activity).
(1/C) = 2.282 s – 0.348
is much harder to quantify. Examples are:
Taft’s steric factor (Es) (~1956), an
experimental value based on rate constants·
Molar refractivity (MR)–measure of the volume
occupied by an atom or group–equation includes the MW, density, and the
index of refraction Verloop steric parameter–computer program uses
bond angles, van der Waals radii, bond lengths.
is proposed that drug action could be divided into 2 stages: 1)
Transport & 2) Binding
1/C = k1P = k2P2 + k3s + k4Es + k5
Hansch Analysis looks
at size and sign for each component of the equation.
values of r <<0.9 indicate equation not reliable. Accuracy depends on using enough analogues, accuracy of data, & choice of parameters. Craig Plots It plots of one parameter against another, for example, p vs. s. It is used to quickly decide which analogues to synthesize if the Hansch equation is known. Applications of Hansch Analysis 1) Classification 2) Diagnosis of Mechanism of Drug Action 3) Prediction of Activity (Congeneric series) 4) Lead Compound Optimization Advantages of Hansch analysis A) Use of descriptors (p, s, Es etc.) from small organic molecules may be applied to biological systems. B) Predictions are quantitative and may be evaluated statistically. C) Quick and easy. D) Potential extrapolation: conclusions reached may be extended to chemical substituents not included in the original analysis. Disadvantages of Hansch analysis A) Descriptors required for substituents being studied. B) Large number of compounds required (training set for which physicochemical parameters and biological activity is available). C) Limitations associated with using small molecule descriptors, such as steric factors, on biological systems (i.e. descriptors from physical chemistry). D) Partial protonation of drugs at physiological conditions (can be included in mathematical model). E) Predictions limited to structural class (congeneric series). F) Extrapolations beyond the values of descriptors used in the study are limited. G) Correlation between physical descriptors. For example, the hydrophobicity will have some correlation with the size and, thus, the Taft steric term. APPLICATIONS OF QSAR The ability to predict a biological activity is valuable in any number of industries. Whilst some QSARs appear to be little more than academic studies, there are a large number of applications of these models within industry, academia and governmental (regulatory) agencies. A small number of potential uses are listed below: The rational identification of new leads with pharmacological, biocidal or pesticidal activity. The optimization of pharmacological, biocidal or pesticidal activity. The rational design of numerous other products such as surface-active agents, perfumes, dyes, and fine chemicals. The identification of hazardous compounds at early stages of product development or the screening of inventories of existing compounds. The designing out of toxicity and side-effects in new compounds. The prediction of toxicity to humans through deliberate, occasional The prediction of a variety of physico-chemical properties of molecules (whether they be pharmaceuticals, pesticides, personal products, fine chemicals, etc.). The prediction of the fate of molecules which are released into the environment. The rationalization and prediction of the combined effects of molecules, whether it be in mixtures or formulations. The key feature of the role of in silico technologies in all of these areas is that Predictions can be made from molecular structure alone. CONCLUSIONS QSAR is a broadly used tool for developing relationships between the effects (e.g. activities and properties of interest) of a series of molecules with their structural properties. It is used in many areas of science. It is a dynamic area that integrates new technologies at a staggering rate. (4, 5) Prediction of Activity Spectra for substance (PASS) Computer program PASS (Prediction of Activity Spectra for Substances) predicts biological activity spectrum of compound based on its structural Formula. This approach is based on a robust analysis of structure activity relationships in a heterogeneous training set, including many thousands of compounds from different chemical series. In contrast to molecular modelling type methods and QSAR methods the computer Program PASS is able to predict many kinds of biological activity for compounds from different chemical series on the basis of just their 2D structural formulae in a very rapid manner. Some advantages of PASS Possibility of application at early stages of research. Reasonable accuracy of prediction. Predictions are rather fast. Standard structure format is used. Possibility of creating the exclusive knowledge base. Possibility of free testing. Conclusion ü Computer system PASS has already found new leads with antiulcer, antitumor and antiamnestic activity, and discovered new mechanism of action for some compounds with known effect. Interpreting the Results of Prediction PASS Inet predicts biological activity spectrum (783 pharmacological effects, mechanisms of action, specific toxicity) on the basis of structural formula of the compound. In the "Prediction Results" you obtain the total number of chemical descriptions of your compound, and the number of descriptors which are new comparing to the descriptors in 30,900 compounds from the PASS training set. The compounds are considered equivalent in PASS if they have the same molecular formulae and the same set of MNA descriptors. Since the MNA descriptors do not represent the stereo chemical peculiarities of a molecule, the compounds, which have only stereo chemical differences in the structure, are formally considered equivalent. The equivalent structures are excluded from the training set during the PASS Inet prediction. The result of prediction is presented as the list of activities with appropriate Pa and Pi, sorted in descending order of the difference (Pa-Pi)>0. Pa and Pi are the estimates of probability for the compound to be
active and inactive respectively for each type of activity from the biological
activity spectrum. Their values vary from 0.000 to 1.000. It is reasonably that
only those types of activities may be revealed by the compound, which Pa >
Pi and so they are put into the biological activity spectrum.
If Pa > 0.7 the compound
is very likely to reveal this activity in experiments, but in this case the
chance of being the analogue of the known pharmaceutical agents for this
compound is also high.
If 0.5 < Pa < 0.7 the compound is likely to reveal this activity in experiments, but this probability is less, and the compound is not so similar to the known pharmaceutical agents. If Pa < 0.5 the compound is unlikely to reveal this activity in experiments, but if the presence of this activity is confirmed in the experiment the compound might be a New Chemical Entity. Thus, planning experiments and choosing the activities on which the compound has to be tested, one should have in mind the necessity of balancing between the novelty of pharmacological action and the risk to obtain negative result in experimental testing. Certainly, one will also take into account the particular interest in some kinds of activity, experimental facilities, etc. (6) AUTODOCK AutoDock is molecular modeling simulation software. It is especially effective for Protein-ligand docking. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. AutoDock4 actually consists of two main programs: autodock performs the docking of the ligand to a set of grids describing the target protein; autogrid pre calculates these grids. The introduction of Autodock 4 comprises three major improvements: 1. The docking results are more accurate and reliable. 2. It can optionally model flexibility in the target macromolecule. 3. It enables AutoDock's use in evaluating protein-protein interaction. AutoDock 4.0 not only is it faster than earlier versions, it allows side chains in the macromolecule to be flexible. As before, rigid docking is blindingly fast, and high-quality flexible docking can be done in around a minute. Up to 40,000 rigid dockings can be done in a day on one CPU. The initial applications of AutoDock were in the analysis of binding modes and catalytic properties of protein and nucleic acid complexes, and a typical study would include results from several dozen docking simulations. More recently, however, enhancements in the performance of AutoDock combined with the availability of high speed computers and clusters of computers has allowed much larger experiments, where entire compound libraries are screened against pharmaceutically-relevant targets AutoDock 4.2 relies on a number of approximations to predict the conformation and free energy of binding during a docking simulation. The ligand is treated as flexible, but unlike traditional molecular mechanics methods, only torsional degrees of freedom are explored, holding bond angles and bond lengths constant. This allows very rapid transformations of coordinates during the search, but may cause problems if the complex requires significant distortion of the ligand upon binding. In addition, the simple tree-like structure of the data representation used for the ligand does not allow direct modelling of flexibility in rings, although several methods to reclose ring structures during a docking experiment are currently available in AutoDock. During the docking simulation, a grid-based method is used for energy evaluation, where interaction energies are pre calculated around the target structure and then used as look-up table to allow rapid evaluation of ligand-protein interaction. However, the use of this grid-based method requires that the target molecule is treated as rigid, unless specific side chains are treated explicitly outside the grid. Several search methods are available in AutoDock, including genetic algorithms, simulated annealing, and local search. All of these methods are stochastic, so repeated docking simulations are often used to validate the exhaustiveness of the search and the solution. AutoDock will consistently dock "drug-like" molecules with up to about 10 degrees of torsional freedom. Second, the predicted free energy of binding must be accurate enough to allow ranking of compounds, ensuring that compounds that are predicted to bind most strongly actually do bind when tested experimentally. Most computational docking techniques, including AutoDock, have an accuracy of free energy prediction of about 2–3 kcal/mol standard deviation. This is not sufficient, unfortunately, to provide confident ranking. Rather, we typical refer to the process of "enrichment," where the set of compounds that are predicted to bind tightly are enriched in compounds that actually show strong binding upon testing. It has been widely-used and there are many examples of its successful application in the literature AutoDock was the most cited docking software. It is very fast, provides high quality predictions of ligand conformations, and good correlations between predicted inhibition constants and experimental ones. AutoDock has also been shown to be useful in blind docking, where the location of the binding site is not known. (7, 8) APPLICATIONS X-ray crystallography structure-based drug design lead optimization virtual screening (HTS) combinatorial library design protein-protein docking Chemical mechanism studies.