<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting virus mutations through relational learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisa Cilia</string-name>
          <email>ecilia@ulb.ac.be</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Teso</string-name>
          <email>teso@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Ammendola</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Lenaerts</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Passerini⇤</string-name>
          <email>passerini@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI-lab, Vakgroep Computerwetenschappen, Vrije Universiteit Brussel</institution>
          ,
          <addr-line>Pleinlaan 2, 1050 - Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ambiotec sas, R&amp;D, Via Appia Nord 47, 00142 Cisterna di Latina (LT)</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science and Information Engineering, University of Trento</institution>
          ,
          <addr-line>via Sommarive 14, I-38123 (Povo) Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>MLG, D ́epartement d'Informatique, Universit ́e Livre de Bruxelles</institution>
          ,
          <addr-line>Boulevard du Thriomphe - CP 212, 1050 - Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Background: Viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that e↵ectively counter potentially resistant mutants. Results: We propose a simple relational learning approach for mutant prediction where the input consists of mutation data with drug-resistance information, either as sets of mutations conferring resistance to a certain drug, or as sets of mutants with information on their susceptibility to the drug. The algorithm learns a set of relational rules characterizing drug-resistance and use them to generate a set of potentially resistant mutants. Conclusions: Promising results were obtained in generating resistant mutations for both nucleoside and nonnucleoside HIV reverse transcriptase inhibitors. The approach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>
        HIV is a pandemic cause of lethal pathologies in
more than 33 million people. Its horizontal
transmission trough mucosae is dicult to control and treat
because the virus has a high virulence and it infects
several type of immune surveillance cells, such as
those characterized by CD4 receptor (CD4+ cells).
The major problem in treating the human virus
infection is the drug selectivity since the virus
penetrates in the cell where it releases its genetic material
to replicate itself by using the cell mechanisms. A
drug target is the replicating apparatus of the cell.
HIV antiviral molecules will be directed against
several cells such as macrophages or lymphocytes T to
interfere with viral replication. The HIV releases a
single-strand RNA particle, a reverse transcriptase
and an integrase into the cell cytoplasm. Quickly
the RNA molecule is retro-transcribed in a DNA
double strand molecule, which is integrated into the
host genome. The integration events induce a
cellular response, which begins with the transcription
of the Tat gene by the RNA polymerase II. Tat is a
well-known protein responsible for the HIV
activation since it recruits some cytoplasm host proteins
involved in the expression of viral genes.
Remarkably, HIV can establish a life-long latent infection
by suppressing its transcription, thus making
ineffective the large part of antiviral drugs aimed at
controlling the viral replication. However replicating
viruses adopt several drug resistance strategies, for
instance, HIV induces amino acid mutations
reducing the ecacy of the pharmaceutical compounds.
The present work is aimed at gaining knowledge
on mutations that may occur into the viral RNA
transcriptase [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is an important target to
develop antiretroviral medicines and di↵erent types of
molecules have been found active: the Nucleoside
Reverse Transcriptase Inhibitors (NRTI) and Non
NRTI (NNRTI). Although RNA RT inhibitors are
active the HIV virus quickly, more than frequently,
changes the RNA RT encoding sequence thus
acquiring drug resistance. The antiviral therapy is based
on the use of cocktails of molecules including new
RNA RT inhibitors, thus a computational approach
to predict possible mutation sites, and their
sensibility to drug is an important tool in drug discovery
for the antiretroviral therapy.
      </p>
      <p>
        Computational methods can assist here by
exploring the space of potential virus mutants,
providing potential avenues for anticipatory drugs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To
achieve such a goal, one first needs to understand
what kind of mutants may lead to resistance.
      </p>
      <p>
        A general engineering technique for building
artificial mutants is referred to as rational design [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
The technique consists in modifying existing
proteins by site directed mutagenesis. It relies on a
deep domain knowledge in order to identify
candidate mutations that may a↵ect protein structure or
function. The process typically involves extensive
trial-and-error experiments and is also aimed at
improving the understanding mechanisms of a protein
behavior.
      </p>
      <p>
        In this work we report on our initial attempt to
develop an artificial system mimicking the rational
design process. An Inductive Logic Programming
(ILP) learner [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is trained to extract general rules
describing mutations relevant to a certain
behavior, i.e. resistance towards certain inhibitors. The
learned rules are then used to infer novel mutations
which may induce similar resistance.
      </p>
      <p>
        We consider here two learning settings: in the
first one we rely on a training set made of single
amino acid mutations known to confer resistance to
a certain class of inhibitors (we will refer to this
as mutation-based learning and preliminary
promising results in this setting were described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), in
the second we learn the rules directly from mutants
(comprising of 1 to n amino acid mutations) that
have been experimentally tested for their resistance
to the same classes of inhibitors (we will refer to this
as mutant-based learning). This second setting is
actually the most common situation, in which one is
presented with a number of mutants together with
some evidence of their susceptibility to certain
treatments. The goal is then to extract rules and
potential mutations explaining why certain mutants
confer resistance and which other ones produce the same
e↵ect.
      </p>
      <p>
        Many machine learning methods have been
applied in the past to mutation data for predicting
single amino acid mutations on protein stability
changes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the e↵ect of mutations on the
protein function [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] or drug susceptibility [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. To
the best of our knowledge this is the first attempt
to learn relational features of mutations a↵ecting a
protein behavior and use them for generating novel
relevant mutations. Furthermore, even if we focus
on single amino acid mutations in our experimental
evaluation, our approach can be straightforwardly
extended to multiple point mutations, and we are
actively working in this direction. Note that the
other approaches first generate all potential
mutations and then decide which of them leads to
resistance, whereas we produce the relevant ones directly.
      </p>
      <p>
        We report an experimental evaluation focused on
HIV RT. RT is a well-studied protein: a large
number of mutants have been shown to resist to one or
more drugs and databases exist that collect those
data from di↵erent sources and make them available
for further analyses [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        We tested the ability of our approach to
generate drug-resistant amino acid mutations for NRTI
and NNRTI. Our results show statistically
significant improvements for both drug classes over the
baseline results obtained through a random
generator. Mutant-based results confirm our previous
findings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] extending them to the more natural setting
in which only information on mutant behavior is
observed. Additional background knowledge allows to
produce here more compact and simple hypotheses,
which have the advantage of generating a reduced
number of candidate mutations.
      </p>
      <p>The approach can be in general applied in
mutation studies aimed at understanding protein
function. By searching for residues most likely to have
a functional role in an active site, the approach can
for instance be used in the engineering of enzyme
mutants with an improved activity for a certain
substrate.</p>
      <sec id="sec-1-1">
        <title>Datasets</title>
        <p>
          We applied our approach to predict HIV RT
mutations conferring resistance to two classes of
inhibitors: Nucleoside RT Inhibitors (NRTI) and
NonNucleoside RT Inhibitors (NNRTI). The two classes
of inhibitors di↵er in the targeted sites and rely on
quite di↵erent mechanisms [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ]. NNRTI inhibit
the reverse transcriptase by binding to the enzyme
active site, therefore directly interfering with the
enzyme function. NRTI are instead incorporated into
the newly synthesized viral DNA for preventing its
elongation.
        </p>
        <p>
          We collected datasets for both mutation-based
and mutant-based learning. The former (Dataset 1)
is a dataset of amino acid mutations derived from
the Los Alamos National Laboratories (LANL) HIV
resistance database1 by Richter et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], who used
it to mine relational rules among mutations. It
consists of 95 amino acid mutations labeled as resistant
to NRTI and 56 labeled as resistant to NNRTI, over
a set of 581 observed mutations. For the
mutantbased setting, we collected (Dataset 2) HIV RT
mutation data from the Stanford University HIV
Drug Resistance. The database provides a dataset
of selected mutants of HIV RT with results of
susceptibility studies to various drugs, and was
previously employed [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for predicting drug resistance of
novel (given) mutants2. It is composed of 838
different mutants annotated with susceptibility levels
(low, medium and high) to drugs belonging to the
NRTI (639 mutants) and NNRTI (747 mutants) drug
classes. We considered a setting aimed at identifying
amino acid mutations conferring high susceptibility
(with respect to medium or low), and considered a
mutant as highly susceptible to a drug class if it was
annotated as being highly susceptible to at least one
drug from that class.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Learning in first order logic</title>
        <p>Our aim is to learn a first-order logic hypothesis
for a target concept, i.e. mutation conferring
resistance to a certain drug, and use it to infer novel
mutations consistent with such hypothesis. We rely
on definite clauses which are the basis of the
Prolog programming language. A definite clause is an
expression of the form h b1 AND ... AND bn,
where h and the bi are atomic literals. Atomic
literals are expressions of the form p(t1, ..., tn)
where p/n is a predicate symbol of arity n and the
ti are terms, either constants (denoted by lower
case) or variables (denoted by upper case) in our
experiments. The atomic literal h is also called the
head of the clause, typically the target predicate,
and b1 AND ... AND bn its body. Intuitively, a
clause represents that the head h will hold whenever
the body b1 AND ... AND bn holds. For instance,
a simple hypothesis like res against(A,nnrti)
mutation(A,C) AND close to site(C) would
indicate that a mutation C in the proximity of a binding
site confers to mutant A resistance against a nnrti.
Learning in this setting consists of searching for a
set of definite clauses H = {ci, ..., cm} covering all
or most positive examples, and none or few
negative ones if available. First-order clauses can thus
be interpreted as relational features that
characterize the target concept. The main advantage of these
logic-based approaches with respect to other
machine learning techniques is the expressivity and
interpretability of the learned models. Models can be
readily interpreted by human experts and provide
direct explanations for the predictions.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Background knowledge</title>
        <p>We built a relational knowledge base for the problem
domain. Table 1 summarizes the predicates we
included as a background knowledge. We represented
the amino acids of the wild type with their
positions in the primary sequence (aa/2) and the
specific mutations characterizing them (mut/4). Target
predicates were encoded as resistance of the
mutation or mutant to a certain drug (res against/2).
Note that this encoding considers mutations at the
amino acid rather than nucleotide level, i.e. a
single amino acid mutation can involve up to three
nucleotide changes. While focusing on single nucleotide
changes would drastically expand the space of
possible mutations, including the cost (in terms of
nucleotide changes) of a certain amino acid mutation
could help refining our search procedure.</p>
        <p>
          Additional background knowledge was included
in order to highlight characteristics of residues and
mutations:
typeaa/2 indicates the type of the natural amino
1http://www.hiv.lanl.gov/content/sequence/RESDB/
2downloadable at http://hivdb.stanford.edu/cgi-bin/GenoPhenoDS.cgi
acids according to the Venn diagram grouping
based on the amino acids properties proposed
in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. For example, a serine is a tiny and
polar amino acid.
color/2 indicates the type of the natural amino
acids according to the coloring proposed
in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For example the magenta class includes
basic amino acids as lysine and arginine while
the blue class includes acidic amino acids as
aspartic and glutamic acids. These groups of
amino acids do not overlap as in the previous
case.
same type aa/3 indicates whether two residues
belong to the same type T, i.e. a change from one
residue to the other conserves the type of the
amino acid.
same color type/2 indicates whether two residues
belong to the same coloring group, i.e. a
change from one residue to the other conserves
the coloring group of the amino acid.
same type mut t/3 indicates that a residue
substitution at a certain position does not modify
the amino acid type T with respect to the wild
type. For example mutation i123v conserves
the aliphatic amino acid type while mutation
i123d does not (i.e. different type mut t/3
holds for it).
same color type mut/2 indicates that a residue
substitution at a certain position does not
modify the amino acid coloring group with
respect to the wild type. For example
mutation d123e conserves the blue amino acid
group while mutation d123a does not (i.e.
        </p>
        <p>different color type mut/2 holds for it).</p>
        <p>Other background knowledge facts and rules
were added in order to express structural relations
along the primary sequence and catalytic propensity
of the involved residues:
close to site/1 indicates whether a specific
position is distant less than 5 positions from a
residue belonging to a binding or active site.</p>
        <p>
          In our specific case, the background theory
incorporates knowledge about a metal binding
site and a heterodimerization site.
location/2 indicates in which fragment of the
primary sequence the amino acid is located.
Locations are numbered from 0 by dividing the
sequence into fragments of 10 amino acid
lenght.
catalytic propensity/2 indicates whether an
amino acid has a high, medium or low
catalytic propensity according to [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
mutated residue cp/5 indicates how, in a mutated
position, the catalytic propensity has changed
(e.g. from low to high).
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>Algorithm overview</title>
        <p>The proposed approach is sketched in Figure 1.</p>
        <sec id="sec-1-4-1">
          <title>Step 1: Learning phase</title>
          <p>The first step is the learning phase. An ILP learner
is fed with a logical representation of the data D
and of the domain knowledge B to be incorporated,
and it returns a first-order logical hypothesis H for
the concept of mutation conferring resistance to a
certain class of inhibitors.</p>
          <p>In this context there are two suitable ways to
learn the target concept, depending on the type of
input data and their labeling:
a) the one-class classification setting, learning a
model from positive instances only. This is the
approach we employ for Dataset 1: positive
examples are mutations for which experimental
prove is available that shows resistance to a
drug, but no safe claim can be made on
nonannotated mutations.
b) the binary classification setting, learning to
discriminate between positive and negative
instances. This setting is appropriate for
Dataset 2: positive examples are in our
experiments mutants labelled as highly susceptible
to the drug class, negative examples are those
with medium or low susceptibility.</p>
          <p>
            The hypothesis is derived using the Aleph (A
Learning Engine for Proposing Hypotheses) ILP
system3, which allows to learn in both settings. In the
one-class classification case, it incrementally builds a
hypothesis covering all positive examples guided by a
3http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph.html
Bayesian evaluation function, described in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ],
scoring candidate solutions according to an estimate of
the Bayes’ posterior probability that allows to
tradeo↵ hypothesis size and generality. In the binary
classification case, Aleph builds the hypothesis covering
all positive examples and none or few negative ones,
guided by an m estimate evaluation function [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ].
          </p>
          <p>Aleph adds clauses to the hypothesis based on
their coverage of training examples. Given a learned
model, the first clauses are those covering most
training examples and thus usually the most
representative of the underlying concept.</p>
          <p>In Figure 2 we show a simple example of learned
hypothesis covering a set of training mutants from
Dataset 2. The learned hypothesis models the
ability of a mutation to confer the mutant resistance to
NNRTI and is composed of four first-order clauses,
each one covering di↵erent sets of mutations of the
wild type as highlighted in colors: yellow for the first
clause, blue for the second, red for the third, and
green for the fourth one. Some mutations are
covered by more than one clause as shown by the color
overlaps. Bold letters indicate residues involved in
the RT metal binding site (D110, D185 and D186)
or the heterodimerization site (W401 and W414).</p>
        </sec>
        <sec id="sec-1-4-2">
          <title>Step 2: Generative phase</title>
          <p>The second step of our approach is the generative
phase, in which the learned hypothesis is employed
to find novel mutations that can confer drug
resistance to an RT mutant. A set of candidate
mutations can be generated by using the Prolog inference
engine starting from the rules in the learned model.
The rules are actually constraints on the
characteristics that a mutation of the wild type should have
in order to confer resistance to a certain inhibitor,
according to the learned hypothesis.</p>
          <p>Algorithm 1 details the mutation generation
procedure. We assume, for simplicity, to have a model
H for a single drug class. The procedure works by
querying the Prolog inference engine for all
possible variable assignments that satisfy the hypothesis
clauses, each representing a mutation by its position
and the amino acid replacing the wildtype residue.
The set of mutations generated by the model is
ranked according to a scoring function SM before
being returned by the algorithm. We defined SM as
the number of clauses in H that a candidate
mutation m satisfies.</p>
          <p>
            Referring to the model in Figure 2, for instance,
the generated candidate mutations that satify the
first two clauses are all the mutations changing the
glycine in position 190 in a polar amino acid: 190Y,
190W, 190T, 190S, 190R, 190Q, 190N, 190K, 190H,
190E, 190D, 190C where for example the notation
190Y indicates a change of the wild type amino acid,
located in position 190, into a tyrosine (Y). For
instance, 190S and 190E in this list are already part of
the known NNRTI surveillance mutations (see [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]).
          </p>
        </sec>
      </sec>
      <sec id="sec-1-5">
        <title>Learning from mutations</title>
        <p>We first learn general rules characterizing known
resistance mutations (from Dataset 1) to be used for
predicting novel candidate ones.</p>
        <p>We divided the dataset of mutations into a
training and a test set (70/30) in a stratified way, which
means by preserving, both in the train and test set,
the proportion of examples belonging to one of the
two drug classes. The resulting training set is
composed of a total of 106 mutations while the test set
is composed of 45 mutations.</p>
        <p>We trained the ILP learner on the training set
and we evaluated on the test set the set of mutations
generated using the learned model. The evaluation
procedure takes the set of generated mutations and
computes its enrichment in test mutations. We
compare the recall of the approach, i.e. the fraction of
test mutations generated by the model, with the
recall of a baseline algorithm that chooses at random
a set (of the same cardinality) of possible mutations
among all legal ones.</p>
        <p>Results averaged on 30 random splits of the
dataset are reported in Table 2. On each split we
performed 30 runs of our algorithm (Aleph has a
random component generating the seed for the
hypothesis search) and of the random generation
algorithm in each one of the di↵erent learning tasks
(NNRTI, NRTI). In this setting, the average sizes of
the learned hypotheses for NNRTI and NRTI are 8
and 7 rules respectively. For each task we also report
in column 3 and 4 of Table 2 the mean number of
generated mutations over the 30 splits and the
number of test set mutations for reference. In both cases,
improvements are statistically significant according
to a paired Wilcoxon test (↵ =0.01). Figure 3 gives
further details about these results by showing how
the mean recall on the test set varies if we
compute it on the generated mutations satisfying only
one clause, two clauses, three clauses and so on (x
axis). The mean recall trend is shown in orange for
the proposed generative approach and in green for
a random generator of mutations for both classes of
inhibitors.</p>
        <p>We finally learned a model on the whole dataset
in order to generate a single set of mutations for
further inspection. We report five examples of novel
mutations with the highest score for each one of the
tasks: 90I, 98I, 103I, 106P, 179I for NNRTI and 60A,
153M, 212L, 229F, 239I for NRTI.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref21">20</xref>
          ], the authors found a set of novel
mutations conferring resistance to efavirenz and
nevirapine, which are NNRTI. Our mutation generation
algorithm partially confirmed their findings. Mutation
90I was scored high (getting the maximum score of
5 satisfied clauses), mutation 101H was generated
with a score of 3 out of 5, mutations 196R and 138Q
with score 1 out of 5, while mutation 28K was not
generated at all by our system as a candidate for
conferring resistance to NNRTI.
        </p>
        <p>An example of learned hypothesis is reported in
Figure 4. For instance, according to the model,
among the features a mutation should have for
conferring resistance to a NNRTI, there is the change of
a basic (magenta) residue of the wild type, e.g. lysine
or arginine, into a residue with completely di↵erent
phisico-chemical characteristics (rule 16).</p>
        <p>Another example for the resistance to NNRTI is
that a non conserved mutation is present in positions
between 98 and 106 of the wild type sequence (rule
8).</p>
      </sec>
      <sec id="sec-1-6">
        <title>Learning from mutants</title>
        <p>The next set of experiments is focused on learning
mutations from mutant data (Dataset 2). Learned
models are still limited to single amino acid
mutations, and so are novel mutants generated by the
system.</p>
        <p>We randomly assigned the mutants in Dataset 2
to 30 training and a test set splits, by avoiding
having mutants containing the same resistance mutation
(according to the labelling used in Dataset 1) in both
training and test sets. For each of the 30 splits, we
evaluated the recall of the generated mutations on
the known resistance mutations (from Dataset 1),
by first removing all the mutations that were also
present in the training set. Comparison is again
made on a baseline algorithm generating random
legal mutations.</p>
        <p>Results averaged on the 30 random splits are
reported in Table 3. In this setting, the average sizes
of the learned hypotheses for NNRTI and NRTI are
5 and 3 rules respectively. The small size of the
models allows to substantially reduce the space of
possible mutations to be generated. Our approach shows
an average recall of 17% and 7% on the test
mutations, which is statistically significantly higher than
the recall obtained by randomly generating
mutations (paired Wilcoxon test, ↵ =0.01). Figure 5 gives
further details about these results by showing the
mean recall trend by varying the number of satisfied
clauses of the generated mutations.</p>
        <p>
          Figure 6 shows the hypothesis learned by the
system when trained on the whole set of mutants
in Dataset 2. It is easy to see that the
hypothesis for the resistance to NNRTI identifies many (10
out of 19) of the known resistance survaillance
mutations reported in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]: 103N, 106A, 181C, 181I,
181V, 188C, 188H, 190A, 190E and 190S. In
particular, mutation 181C has the highest predicted score
as it satisfies four clauses of the learned model.
Interestingly, other not previously reported mutations
are suggested with a high score (3 out of 4 satisifed
clauses) by the generative algorithm. These
mutations may be novel mutations conferring resistance
to NNRTI. For example, 181N, 181D, 318C and
232C.
        </p>
        <p>29% of the mutations labeled as conferring
resistance to NNRTI in Dataset 1 are also identified.
Apart from the survaillance mutations those include
also: 98G, 227C, 190C, 190Q, 190T and 190V. Key
positions for the resistance to NNRTI, like 181 and
190 can be easily identified from the model in
Figure 6, thanks to rules 15 and 23 that anchor the
mutation to the specific position without restricting
the change to a specific group of amino acids. A
quite uncommon NNRTI-selected mutation, 238N,
appears among the generated ones. Indeed, as
reported in the summaries of the Stanford HIV
Resistance Database, this mutations in combination with
103N, causes high-level resistance to nevirapine and
efavirenz.</p>
        <p>
          Also in the case of NRTI the generative
algorithm suggests a few known survaillance mutations
reported in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]: 67E, 67G, 67N, 116Y, 184V, 184I
(6 out of 34). 18% of the mutations labeled as
conferring resistance to NRTI in Dataset 1 are also
identified. Apart from known survaillance mutation
those include also: 44D, 62V, 67A, 67S, 69R, 184T.
Key positions for the resistance to NRTI as
predicted by our model in Figure 6 are: 33, 67, 184,
194, 218. Of them, the e↵ects of mutations in
position 67 and especially in 184 have been well studied,
while positions 33, 194, 218 seem to be novel. We
found also that the predicted mutation 219H, which
does not appear neither among the know
survaillance mutations or among the resistance mutations
in Dataset 1, is indeed reported in the Stanford HIV
Database as a quite uncommon NRTI-selected
mutation. Note that this uncommon mutation requires
two nucleotide changes to emerge.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Discussion and Future Work</title>
      <p>
        The results shown in the previous section are a
promising starting point to generalize our approach
to more complex settings. We showed that the
approach scales from few hundreds of mutations as
learning examples to almost a thousand of complete
mutants. Moreover the learned hypotheses
significantly constraint the space of all possible single
amino acid mutations to be considered, paving the
way to the expansion of the method to multi-site
mutant generation. This represents a clear
advantage over alternative existing machine learning
approaches, which would require the preliminary
generation of all possible mutants for their evaluation.
Restricting to RT mutants with two mutated amino
acids, this would imply testing more than a
hundred million candidate mutants. At the same time
this simple ILP approach cannot attain the same
accuracy levels of a pure statistical approach. We
are currently investigating more sophisticated
statistical relational learning approaches for improving
the accuracy of the learned models and for
assigning weights to the clauses to be used for selecting
the most relevant generated mutations. These can
be used as a pre-filtering stage, producing
candidate mutants to be further analysed by complex
statistical approaches, and additional tools evaluating,
for instance, their stability. An additional direction
to refine our predictions consists of jointly learning
models of resistance to di↵erent drugs (e.g. NNRTI
and NRTI), possibly further refining the joint
models on a per-class basis. On a predictive (rather than
generative) task, this was shown [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ] to provide
improvements over learning distinct per-drug models.
      </p>
      <p>The approach is not restricted to learning
drugresistance mutations in viruses. More generally, it
can be applied to learn mutants having certain
properties of interest, e.g. improved or more specific
activity of an enzyme with respect to a substrate, in a
full protein engineering fashion.</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>
        In this work we proposed a simple relational
learning approach applicable to evolution prediction and
protein engineering. The algorithm relies on a
training set of mutation data annotated with drug
resistance information, builds a relational model
characterizing resistant mutations, and uses it to generate
novel potentially resistant ones. Encouraging
preliminary results on HIV RT data indicate a
statistically significant enrichment in resistance conferring
mutations among those generated by the system, on
both mutation-based and mutant-based learning
settings. Albeit preliminary, our results suggest that
the proposed approach for learning mutations has a
potential in guiding mutant engineering, as well as in
predicting virus evolution in order to try and devise
appropriate countermeasures. A more detailed
background knowledge, possibly including 3D
information whenever available, is necessary in order to
further focus the set of generated mutations, and
possibly post-processing stages involving mutant
evaluation by statistical machine learning approaches [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
In the next future we also plan to generalize the
proposed approach to jointly generate sets of related
mutations shifting the focus from the generation of
single amino acid mutations to mutants with
multiple mutations.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>EC and TL acknowledge the support of the F.W.O.
(Belgium) and the F.R.S.-F.N.R.S. (Belgium). ST and AP
acknowledge the support of the Italian Ministry of
University and Research under grant PRIN 2009LNP494
(Statistical Relational Learning: Algorithms and
Applications).</p>
    </sec>
    <sec id="sec-5">
      <title>Algorithms</title>
      <sec id="sec-5-1">
        <title>Algorithm 1 - Mutation generation algorithm.</title>
        <p>Algorithm for novel relevant mutations discovery.</p>
        <p>Algorithm 1 Mutation generation algorithm.</p>
        <p>1: input: background knowledge B, learned model H
2: output: rank of the most relevant mutations R
3: procedure GenerateMutations(B, H)
4: Initialize DM ;
5: A find all assignments a that satisfy at least one clause ci 2 H
6: for a 2 A do
7: m mutation corresponding to the assignments a 2 A
8: score SM (m)
9: DM D M [ { (m, score)}
10: end for
11: R RankMutations(DM, B, H)
12: return R
13: end procedure
. number of clauses ci satisfied by a
. rank relevant mutations
rank of
novel
relevant
mutations
hypothesis
mut(A,B,C,D) AND position(C,190)
mut(A,B,C,D) AND position(C,190) AND typeaa(polar,D)
mut(A,y,C,D) AND typeaa(aliphatic,D)
mut(A,B,C,a) AND position(C,106)</p>
        <p>NNRTI"
NNRTI"(rand)"
NRTI"
NRTI"(rand)"</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Tables</title>
      <p>Table 1 - Background knowledge predicates
Summary of the background knowledge facts and rules. MutID is a mutation or a mutant identifier depending
on the type of the learning problem.</p>
      <p>Background Knowledge Predicates
aa(Pos,AA)
mut(MutID,AA,Pos,AA1)
res against(MutID,Drug)
color(Color,AA)
typeaa(T,AA)
same color type(R1,R2)
same typeaa(R1,R2,T)
same color type mut(MutID, Pos)
different color type mut(MutID, Pos)
same type mut t(MutID, Pos, T)
different type mut t(MutID, Pos)
close to site(Pos)
location(L,Pos)
catalytic propensity(AA,CP)
mutated residue cp(Rw,Pos,Rm,CPold,CPnew)
indicates a residue in the wild type sequence
indicates a mutation: mutation or mutant identifier, position and
amino acids involved, before and after the substitution
indicates whether a mutation or mutant is resistant to a certain
drug
indicates the coloring group of a natural amino acid
indicates the type (e.g. aliphafatic, charged, aromatic, polar) of
a natural amino acid
indicates whether two residues belong to the same coloring group
indicates whether two residues are of the same type T
indicates a mutation to a residue of the same coloring group
indicates a mutation changing the coloring group of the residue
indicates a mutation to a residue of the same type T
indicates a mutation changing the type of the residue
indicates whether a specific position is close to a binding or active
site if any
indicates in which fragment of the primary sequence the amino
acid is located
indicates whether an amino acid has a high, medium or low
catalytic propensity
indicates how, in a mutated position, the catalytic propensity has
changed (e.g. from low to high)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cao</surname>
            <given-names>ZW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            <given-names>LY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            <given-names>ZL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>HH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>YZ</given-names>
          </string-name>
          :
          <article-title>Computer prediction of drug resistance mutations in proteins REVIEWS</article-title>
          .
          <source>Drug Discovery Today: BIOSILICO</source>
          <year>2005</year>
          ,
          <volume>10</volume>
          (
          <issue>7</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rubingh</surname>
            <given-names>DN</given-names>
          </string-name>
          :
          <article-title>Protein engineering from a bioindustrial point of view</article-title>
          .
          <source>Current Opinion in Biotechnology</source>
          <year>1997</year>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <fpage>417</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Muggleton</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            <given-names>L</given-names>
          </string-name>
          :
          <article-title>Inductive Logic Programming: Theory and Methods</article-title>
          .
          <source>University Computing</source>
          <year>1994</year>
          , :
          <fpage>629</fpage>
          -
          <lpage>682</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cilia</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passerini</surname>
            <given-names>A</given-names>
          </string-name>
          :
          <article-title>Frankenstein Junior: a relational learning approach toward protein engineering</article-title>
          .
          <source>In Proceedings of the Annotation, Interpretation and Management of Mutations (AIMM) Workshop@ECCB2010</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Capriotti</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fariselli</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casadio</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>A threestate prediction of single point mutations on protein stability changes</article-title>
          .
          <source>BMC bioinformatics</source>
          <year>2008</year>
          , 9
          <issue>Suppl 2</issue>
          :
          <fpage>S6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Needham</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradford</surname>
            <given-names>JR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulpitt</surname>
            <given-names>AJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Care</surname>
            <given-names>Ma</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westhead</surname>
            <given-names>DR</given-names>
          </string-name>
          :
          <article-title>Predicting the e↵ect of missense mutations on protein function: analysis with Bayesian networks</article-title>
          .
          <source>BMC bioinformatics</source>
          <year>2006</year>
          , 7:
          <fpage>405</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bromberg</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rost</surname>
            <given-names>B</given-names>
          </string-name>
          :
          <article-title>SNAP: predict e↵ect of nonsynonymous polymorphisms on function</article-title>
          .
          <source>Nucleic acids research</source>
          <year>2007</year>
          ,
          <volume>35</volume>
          (
          <issue>11</issue>
          ):
          <fpage>3823</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rhee</surname>
            <given-names>SY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wadhera</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-Hur</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brutlag</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>RW</given-names>
          </string-name>
          :
          <article-title>Genotypic predictors of human immunodeficiency virus type 1 drug resistance</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          <year>2006</year>
          ,
          <volume>103</volume>
          (
          <issue>46</issue>
          ):
          <fpage>17355</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>G</given-names>
            <surname>¨otte</surname>
          </string-name>
          <string-name>
            <given-names>M</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          <string-name>
            <given-names>X</given-names>
            ,
            <surname>Wainberg</surname>
          </string-name>
          <string-name>
            <surname>M</surname>
          </string-name>
          :
          <article-title>HIV-1 reverse transcription: a brief overview focused on structurefunction relationships among molecules involved in initiation of the reaction</article-title>
          .
          <source>Archives of biochemistry and biophysics</source>
          <year>1999</year>
          ,
          <volume>365</volume>
          (
          <issue>2</issue>
          ):
          <fpage>199</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Shafer</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>Rationale and Uses of a Public HIV Drug-Resistance Database</article-title>
          .
          <source>Journal of Infectious Diseases</source>
          <year>2006</year>
          ,
          <volume>194</volume>
          (Supplement 1):
          <fpage>S51</fpage>
          -
          <lpage>S58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>De Clercq</surname>
            <given-names>E</given-names>
          </string-name>
          :
          <article-title>HIV inhibitors targeted at the reverse transcriptase</article-title>
          .
          <source>AIDS research and human retroviruses</source>
          <year>1992</year>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <fpage>119</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Spence</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kati</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            <given-names>K</given-names>
          </string-name>
          :
          <article-title>Mechanism of inhibition of HIV-1 reverse transcriptase by nonnucleoside inhibitors</article-title>
          .
          <source>Science</source>
          <year>1995</year>
          ,
          <volume>267</volume>
          (
          <issue>5200</issue>
          ):
          <fpage>988</fpage>
          -
          <lpage>993</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Richter</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustin</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            <given-names>S</given-names>
          </string-name>
          :
          <article-title>Finding Relational Associations in HIV Resistance Mutation Data</article-title>
          .
          <source>In Proceedings of Inductive Logic Programming (ILP)</source>
          , Volume 9
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Betts</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russell</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <string-name>
            <surname>Amino-Acid Properties</surname>
          </string-name>
          and Consequences of Substitutions.
          <source>Bioinformatics for geneticists</source>
          <year>2003</year>
          , :
          <fpage>311</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Taylor</surname>
            <given-names>WR</given-names>
          </string-name>
          :
          <article-title>The classification of amino acid conservation</article-title>
          .
          <source>Journal of Theoretical Biology</source>
          <year>1986</year>
          ,
          <volume>119</volume>
          (
          <issue>2</issue>
          ):
          <fpage>205</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Bartlett</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Porter</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borkakoti</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thornton</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Analysis of catalytic residues in enzyme active sites</article-title>
          .
          <source>Journal of Molecular Biology</source>
          <year>2002</year>
          ,
          <volume>324</volume>
          :
          <fpage>105</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Muggleton</surname>
            <given-names>S</given-names>
          </string-name>
          :
          <article-title>Learning from Positive Data</article-title>
          .
          <source>In Inductive Logic Programming Workshop</source>
          <year>1996</year>
          :
          <fpage>358</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Dˇzeroski</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bratko</surname>
            <given-names>I</given-names>
          </string-name>
          :
          <article-title>Handling noise in Inductive Logic Programming</article-title>
          .
          <source>In ILP92, Report ICOT TM1182. Edited by Muggleton S</source>
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Bennett</surname>
            <given-names>DE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho</surname>
            <given-names>RJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Otelea</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuritzkes</surname>
            <given-names>DR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fleury</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiuchi</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heneine</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantor</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            <given-names>MR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schapiro</surname>
            <given-names>JM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandamme</surname>
            <given-names>AM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandstrom</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boucher</surname>
            <given-names>CaB</given-names>
          </string-name>
          , van de Vijver D,
          <string-name>
            <surname>Rhee</surname>
            <given-names>SY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>TF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pillay</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>RW</given-names>
          </string-name>
          :
          <article-title>Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>PloS one</source>
          <year>2009</year>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>e4724</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          20.
          <string-name>
            <surname>Deforche</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho</surname>
            <given-names>RJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossman</surname>
            <given-names>Z</given-names>
          </string-name>
          , Soares Ma,
          <string-name>
            <surname>Van Laethem</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katzenstein</surname>
            <given-names>Da</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrigan</surname>
            <given-names>PR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantor</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandamme</surname>
            <given-names>AM</given-names>
          </string-name>
          :
          <article-title>Bayesian network analyses of resistance pathways against efavirenz and nevirapine</article-title>
          .
          <source>AIDS (London, England)</source>
          <year>2008</year>
          ,
          <volume>22</volume>
          (
          <issue>16</issue>
          ):
          <fpage>2107</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          21.
          <string-name>
            <surname>Cilia</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landwehr</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passerini</surname>
            <given-names>A</given-names>
          </string-name>
          :
          <article-title>Relational Feature Mining with Hierarchical Multitask kFOIL</article-title>
          .
          <source>Fundamenta Informaticae</source>
          <year>2011</year>
          ,
          <volume>113</volume>
          (
          <issue>2</issue>
          ):
          <fpage>151</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          22.
          <string-name>
            <surname>Theys</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deforche</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beheydt</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moreau</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Laethem</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemey</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho</surname>
            <given-names>RJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rhee</surname>
            <given-names>SY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>RW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Wijngaerden</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandamme</surname>
            <given-names>AM</given-names>
          </string-name>
          :
          <article-title>Estimating the individualized HIV-1 genetic barrier to resistance using a nelfinavir fitness landscape</article-title>
          .
          <source>BMC bioinformatics</source>
          <year>2010</year>
          ,
          <volume>11</volume>
          :
          <fpage>409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>&gt;wt ...AGLKKKKSVTVLDVG...YQYMDDLYVG...WETWWTEY...</given-names>
            <surname>WIPEWEFVN</surname>
          </string-name>
          ... | | | | | |
          <source>| | 98 112 181 190 398 405 410 418</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>