<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>An Empirical Study on the Influence of
Genetic Operators for Molecular Docking Optimization. Inria Research Report</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0249-6399</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/s10723-010-9166-8</article-id>
      <title-group>
        <article-title>A Comparative Study of Different Optimization Algorithms for Mo- lecular Docking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Afanasiev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Oferkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikhail Posypkin</string-name>
          <email>mposypkin@mail.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anton Rubtsov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey Sulimov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Sulimov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dimonta, Ltd</institution>
          ,
          <addr-line>117186, Nagornaya Str. 15, build. 8, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Systems Analysis RAS</institution>
          ,
          <addr-line>Address 117312, Moscow, pr. 60-letiya Oktyabrya, 9</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research Computer Center at Lomonosov Moscow State University</institution>
          ,
          <addr-line>119992, Leninskie Gory 1, bldg 4, Moscow 11999 2, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <volume>124</volume>
      <issue>1</issue>
      <fpage>8</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Motivation: Computer modeling of protein-ligand interactions is one of the most important phases in a drug design process. The core part of this modeling is a resolution of a global unconstrained optimization problem. This paper presents a comparative computational experiments aimed at studying the efficiency of the different optimization methods applied to the docking problem. We present experimental results for different optimization algorithms and draw conclusions about their efficiency.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Many diseases are caused by foreign protein activity or protein
malfunction. For the treatment of these diseases we can try to
block these proteins by small organic molecules. These molecules
can selectively bind to proteins and thus block their work. This
simplified conception allows development of the drugs, using
purposeful design of new organic compounds – inhibitors for the
given target-protein. The selection of the effective ligands for
inhibition of the target enzyme is usually very laborious, long, and
expensive process. Contemporary molecular modeling tools can
accelerate this process and make it much less expensive. Virtual
screening by means of ligands docking is widely recognized as a
helpful approach in modern drug design
        <xref ref-type="bibr" rid="ref1 ref2">(Kitchen 2004, Zoete
2009)</xref>
        .
      </p>
      <p>The goal of docking is to find the positions of interacting ligand
and a protein with a minimal energy. The stability of this position
is characterized by the energy value. The ligands with minimal
values are candidates for further consideration as potential drugs.
Thus the docking problem is reduced to the global optimization
problem
→
, ∈</p>
      <p>(1)
where is a tuple that determines the ligand position and
bounding box restricting the search area to a reasonable region.
is a
*To whom correspondence should be addressed.</p>
      <p>Having fast and robust optimization algorithms for solving the
problem (1) is crucial for an efficient docking. In this paper we
evaluated different optimization algorithms for resolution of the
problem (1). We performed numerical experiments, analyzed the
results and suggested the most successful combination of
optimization algorithms for this problem.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        The comparative efficiency of different optimization algorithms
has been studied in various papers
        <xref ref-type="bibr" rid="ref3 ref4">(Rosin 1997, Morris 1998,
Tavares 2008, Tavares 2009)</xref>
        . Authors consider global optimization
approaches (genetic algorithms, simulated annealing), local search
techniques (L-BFGS, Solis-Wets method) and their combination.
      </p>
      <p>The best results were obtained by a combined approach when a
genetic algorithm is combined with a local search. In this approach
a fraction of all individuals in a generation are further optimized by
applying a local search technique.</p>
      <p>Two local search techniques addressed in the literature showed the
best results. The first method proposed in (Solis 1981) is a direct
search method with an adaptive step size, which performs a
randomized local minimization of a given candidate solution.
Depending on whether a new solution if found or not, a success or a failure
is recorded. If several successes occur in a row, the step is
increased to move more quickly. If the opposite occurs, the step is
decreased. A bias term is applied to drive the search in successful
directions. The method terminates when a certain lower-bound
threshold for is passed or when a maximum number of steps is
reached. The second method is the
Broyden-Fletcher-GoldfarbShanno method (Nocedal 2006). L-BFGS is a quasi-Newton
method, where both the function to minimize and its gradient must be
supplied by the user. The method stops as soon as it finds a local
optimum or when a threshold number of iterations is exceeded.</p>
      <p>The direct evaluation of the protein-ligand interaction energy is
computationally expensive. Therefore in modern docking tools the
grid of potentials representing protein-ligand interactions is
calculated separately before the docking procedure and the energy is
approximated using precalculated values. The resulting function is
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
not differentiable in the grid nodes and thus L-BFGS method is not
applicable.</p>
      <p>Bond lengths and valence angles have been frozen in the
course of the docking procedure.</p>
      <p>However of the energy is calculated directly local search methods
employing gradient information can be advantageous. It is shown
in (Tavares 2009) that L-BFGS method (Nocedal 2006)
significantly improves the performance of genetic algorithms and gives
better results w.r.t. Solis-Wets method.</p>
      <p>Though the efficiency of various search methods was
addressed in the literature some important methods were
missed. For instance semi-local methods e.g. Monotonic
Basin Hopping that proved to be very efficient for atomic
cluster conformation problem (Leary 2000) were not
considered at all. In this paper we classify search methods into
three groups: local, semi-local and global and performs the
systematic evaluation of the several techniques and their
combinations.
3</p>
    </sec>
    <sec id="sec-3">
      <title>CALCULATING PROTEIN-LIGAND</title>
    </sec>
    <sec id="sec-4">
      <title>INTERACTION ENERGY</title>
      <p>The protein-ligand interaction energy is the objective function in
the problem (1). Its accurate evaluation is crucial for a success of
the whole interaction simulation and thus for the validity of the
numerical experiments.</p>
      <p>
        Two very popular programs implementing docking algorithms
GOLD (Cole 2005) and AutoDock
        <xref ref-type="bibr" rid="ref4">(Morris 1998, Morris 2005)</xref>
        employ too simplified force field, either neglecting electrostatic
interaction in GOLD, or too simplified treating of desolvation
terms in AutoDock.
      </p>
      <p>For our study we considered the interaction model called SOL
proposed in (Romanov 2004, Romanov 2008, Oferkin 2011). The
main idea of this model is to describe with maximal possible
accuracy the protein-ligand interactions, using the docking procedure
based on contemporary molecular mechanics. The main distinctive
features of SOL are:
• A rigid target-protein with the active site represented by
a set of grids for different type potentials, describing
protein-ligand interactions (electrostatic, Van der Waals
(VdW) forces) in the frame of Merck Molecular Force
Field (MMFF) (Halgren 1996).
• Quite rigorous description of solvation-desolvation
effects upon ligand binding process, based on Generalized
Born approximation (Ghosh 1998) and included in the
set of potential grids.
• The grid of potentials representing protein-ligand
interactions are calculated separately before the docking
procedure. Electrostatic, VdW and solvation-desolvation
potentials were calculated on the 101x101x101 grid inside
this cube
• All ligands are considered as fully flexible, i.e. all
topologically available torsion degrees of freedom were
unfrozen and allowed to rotate freely, directed only by
ligand internal energy preferences in the frame of MMFF.
4</p>
    </sec>
    <sec id="sec-5">
      <title>OPTIMIZATION METHODS UNDER TEST</title>
      <p>We considered three types of optimization methods: local,
semilocal and global.
4.1</p>
      <sec id="sec-5-1">
        <title>Local methods</title>
        <p>The goal of local methods is to find the local minimum i.e. the
point that gives the minimal function value in a some
neighborhood. We considered four methods described in table 1.</p>
        <p>The classical Newton method requires fewer iterations than
conjugate gradients but each iteration involves the resolution of the
system of linear algebraic equations. The truncated Newton methods
are based on the idea that an exact solution of the Newton equation
at every step is unnecessary and can be computationally wasteful
in the framework of a basic descent method. Any descent direction
will suffice when the objective function is not well approximated
by a convex quadratic and, as a solution to the minimization
problem is approached, more effort in solution of the Newton equation
may be warranted.</p>
        <p>Another very successful approach to local unconstrained
optimization is Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. This
algorithm belongs to the family of quasi-Newton methods. In these
methods the Hessian matrix of second derivatives need not be
evaluated directly. Instead, the Hessian matrix is approximated using
rank-one updates specified by gradient evaluations (or approximate
gradient evaluations). The L-BFGS algorithm studied in this paper
stores only a few vectors that represent the approximation
implicitly. Due to its moderate memory requirement, L-BFGS method is
particularly well suited for optimization problems with a large
number of variables.</p>
        <p>The three methods outlined above require the evaluation of the
objective function gradient. However for docking problem the
gradient is either not known symbolically or computationally
intractable. Thus one has to approximate the gradient numerically or
use methods that don’t rely on gradient information. For the
Powell’s method the objective function need not be differentiable, and
no derivatives are taken. The method minimizes the function by a
bi-directional search along each search vector. The new position
can then be expressed as a linear combination of the search
vectors. The new displacement vector becomes a new search vector,
and is added to the end of the search vector list. Meanwhile the
search vector which contributed most to the new direction, i.e. the
one which was most successful, is deleted from the search vector
list. The algorithm iterates an arbitrary number of times until no
significant improvement is made. The method is useful for
calculating the local minimum of a continuous but complex function,
especially one without an underlying mathematical definition,
because it is not necessary to take derivatives.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Semi-local methods</title>
        <p>For functions with multiple extremes a local search methods get
stuck in local minima. The semi-local methods can “escape” from
a local minimum by exploring its neighborhood. Such methods
have proved their efficiency for molecular conformation problems
(Wales 1997). We considered two such methods summarized in the
Table 2.
The monotonic sequence basin hopping method tries to improve
the minimum until the number of attempts exceeds the threshold
value . Starting from some point ∈ MBH performs the
following steps ( = 0 at the beginning):
1. Select randomly a point ∈ .
2. Obtain point by applying local minimization to
∈ : = .
3. If ≤ then assign ≔ , ≔ 0. Otherwise
assign ≔ + 1..</p>
        <p>4. If ≥ . then finish, otherwise go to 1.
where = ∈ : | − | ≤ !, = 1, … , # is a box
neighborhood of the given radius ! .</p>
        <p>Thus the MBH method is parameterized by the threshold value ,
the neighborhood radius ! and the local search method .
The best probe method is based on the same idea as MBH. But
unlike MBH it chooses points on the n-dimensional sphere rather
than in a box and uses adaptive neighborhood size. Starting with a
large radius it reduces its size if attempts didn’t lead to an
improvement.
4.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Global methods</title>
        <p>The goal of global methods is to diversify the search in order to
explore the whole feasible set. At the moment we considered only
one globalization strategy: the Monte-Carlo method. It generated a
sequence of uniformly distributed random generated points in a
feasible box. After that each point is used a starting point for
semilocal methods.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>We performed several experiments for different protein-ligand
pairs. Methods demonstrate the same relative behavior for all
variants and thus we present data only for one pair: the target protein
is thrombin (PDB code 1O2G) and the ligand is 4-aminopyridine
in the protonated form.
5.1</p>
      <sec id="sec-6-1">
        <title>Testing results for local methods</title>
        <p>The results for local methods are summarized in the table 3. The
average energy is calculated as a mean value of 64 runs with
randomly generated initial points. The best energy is a lowest value
found throughout these runs. The percentage of errors shows the
number of runs that produced a value greater than the initial value.
As expected the local methods generally improve the energy value.
The Powell method remarkably outperforms methods that rely on
Taylor formula. This is an expectable result as the objective
function obtained as a result of piece-wise linear approximation on a
mesh is non-differentiable in minima. Therefore the Taylor series
gives a poor approximation for a goal function in a neighborhood
of such points. Thus the gradient information can only be used to
bias the search to the descending direction but not as a stopping
criterion.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Testing results for semi-local methods</title>
        <p>Results of the previous section clearly indicate that the Powell
method is the best local search strategy among the considered set.
Thus we used this method as a local method in MBH and BP
semi-local methods. After a set of experiments we found that the
best results are obtained if MBH uses the radius ! = 0.1 and the
BP uses the initial sphere radius ! = 0.5 . The accuracy of results
for both methods depends on the threshold values: the higher
threshold value the lower the minimum. To put both methods in the
equal conditions we set the threshold value to 30 and 90 for BP
and MBH respectively. With such parameters both methods take
approximately the same time.</p>
        <p>The Table 4 compares the results of the basin hopping and best
probe methods. The results were averaged over 10 runs with
different random initial points generated in a box .
Both methods under test gives approximately the same best results,
however the average behavior of MBH is much better.
5.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Testing results for global methods</title>
        <p>At the moment we tried only one globalization strategy: Monte
Carlo method that generates random initial points for MBH
methods. Points are generated in the bounding box . Table 5
summarizes results obtained from 10 runs of this combination. The
threshold value for MBH method was set to 90 and the initial radius
was set to 0.1. The Powell method was used for a local search. The
number of initial points generated by Monte-Carlo methods was
set to 64.
The obtained results demonstrate that even a very simple
globalization strategy gives a noticeable improvement over plain
semilocal and local methods. But the running time is considerably
higher.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>SCIENCE GATEWAY INTEGRATION</title>
      <p>The ultimate goal of our work is to create software environment
for molecular simulation. Such environments are useful to isolate
the end-user from technical details of the application.</p>
      <p>The use of high-performance computing in docking is inevitable
because in practice millions of ligands have to be processed
independently. Docking is perfectly suitable for the desktop grid
computing (Kiss, Greenwel 2010, Kiss 2010). We are going to develop
this application and deploy it at ISA RAS desktop grid
(boinc.isa.ru/dcsdg) and on a combined infrastructure that connects
ISA RAS desktop grid and service grid infrastructure provided by
EDGeS VO (www.edges.eu). Such combined infrastructures get
much attention in the grid community in recent years (Urbach
2009). This approach provides the transparent seamless integration
of desktop and service grids and results in huge consolidated
computing power.</p>
      <p>The deployment will done on top of the BNB-Grid (Evtushenko
2009) programming environment – a library for solving large-scale
optimization problems in the grid and supercomputers. This tool is
currently ported to the desktop grid and is validated for running in
the EDGeS VO. Docking will be one of the supported applications.
Another one – atomic cluster conformation problem (Leary 2000)
is already running in production. With such approach we’ll reuse
the one deployed application for different problems. This is very
important for desktop grids and combined infrastructures where the
deployment and validation requires lots of efforts.</p>
      <p>The docking is only one element of a complex workflow in a drug
design process. The good software environment is crucial for
making the docking software tools useful for a wide community of
researchers. One of such environments is described in (Kim 2008).
It flexibly integrates the convenient Web-based user interface and
the powerful processing back-end deployed in the Grid. Such
software architecture seems to be the standard approach for docking
and other applications demanding for the huge computing power.
To implement a consistent and flexible software environment for
drug design one needs a powerful workflow engine. This is crucial
for combining different building blocks in Drug Design in a
flexible and configurable way. We plan to use P-PGRADE portal
(Farkas 2011) which is capable to construct complex workflows and
harness different types of grids and clouds.
7</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS</title>
      <p>The docking problem is basically a global optimization problem
(1) where objective f x is a protein-ligand interaction energy.
Possessing the powerful techniques for resolving this problem is
crucial for the efficiency of docking. In this paper we considered
several optimization methods for problem (1). Numerical
experiments showed that the best results can be achieved by combining
some globalization techniques (e.g. Monte-Carlo method) and the
monotonic basin hopping semi-local method coupled with the
Powell local search algorithm.</p>
      <p>We also outlined the future software environment which will
provide a consistent and convenient access for a wide range of
researchers to the drug-design experimetns.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This study was supported in part by grant 10-07-00595 from the
Russian Foundation for Basic Research and the DEGISCO FP/7
European (contract no. 261561).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Kitchen</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Decornez</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Furr</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ; Bajorath,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2004</year>
          )
          <article-title>Docking and scoring in virtual screening for drug discovery: methods and applications</article-title>
          .
          <source>Nat. Rev. Drug Discov., 3</source>
          ,
          <fpage>935</fpage>
          -
          <lpage>949</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Zoete</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grosdidier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michielin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>Docking, virtual high throughput screening and in silico fragment-based drug design</article-title>
          .
          <source>J. Cell. Mol. Med</source>
          .,
          <volume>13</volume>
          ,
          <fpage>238</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Rosin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Halliday</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Belew,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>1997</year>
          )
          <article-title>A comparison of global and local search methods in drug docking</article-title>
          .
          <source>Proc 7th Intl Conf on Genetic Algorithms</source>
          . San Francisco, CA.,
          <volume>221</volume>
          -
          <fpage>228</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goodsell</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ; Halliday,
          <string-name>
            <surname>R.S.</surname>
          </string-name>
          ; Huey,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Hart,
          <string-name>
            <given-names>W.E.</given-names>
            ;
            <surname>Belew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.K.</given-names>
            ;
            <surname>Olson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.J.</surname>
          </string-name>
          (
          <year>1998</year>
          )
          <article-title>Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function</article-title>
          .
          <source>J. Comp. Chem</source>
          .,
          <volume>19</volume>
          ,
          <fpage>1639</fpage>
          -
          <lpage>1662</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>