<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Frankenstein Junior: a relational learning approach toward protein engineering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisa Cilia</string-name>
          <email>cilia@disi.unitn.it</email>
          <email>elisa.cilia@iasma.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Passerini</string-name>
          <email>passerini@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Information Engineering, University of Trento</institution>
          ,
          <addr-line>via Sommarive 14, I-38123 (Povo) Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>FEM-IASMA Research &amp; Innovation Center</institution>
          ,
          <addr-line>via Edmund Mach 1, I-38010 S. Michele all'Adige (TN)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Background: Mining relevant features from protein mutation data is fundamental for understanding the characteristics of a protein functional site. The mined features could also be used for engineering novel proteins with useful functions. Results: We propose a simple relational learning approach for protein engineering. First, we learn a set of relational rules from mutation data, then we use them for generating a set of candidate mutations that are most probable to confer resistance to a certain inhibitor or improved activity on a speci c substrate. We tested our approach on a dataset of HIV drug resistance mutations, comparing it to a baseline random generator. Statistically signi cant improvements are obtained on both categories of nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. Conclusions: Our promising preliminary results suggest that the proposed approach for learning mutations has a potential in guiding mutant engineering, as well as in predicting virus evolution in order to try and devise appropriate countermeasures. The approach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>The mining of relevant features from protein
mutation data has its rst aim in understanding the
properties of functional sites, for instance, which residues
are more likely to have a functional role. The same
mined information can be used to engineer mutants
of a protein with an improved activity on a certain
substrate or resistance to a certain inhibitor.</p>
      <p>Rational design is an engineering technique
modifying existing proteins by site directed mutagenesis.
It assumes the knowledge or intuition about the
effects of speci c mutations on the protein function.
The process typically involves extensive
trial-anderror experiments and is also used with the aim
of improving the understanding mechanisms of a
protein behavior and taking the necessary
countermeasures.</p>
      <p>In this work we moved the rst steps towards the
use of a relational learning approach to protein
engineering by focusing on learning relevant single
mutations (mutations that change the amino acid type in
the protein sequence) that can a ect the behavior of
a protein. Predicting the e ect of single mutations
helps reducing the dimension of the search space for
rational design.</p>
      <p>
        In the proposed approach an Inductive Logic
Programming (ILP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] learner is trained to extract
general rules describing mutations relevant to a
certain behavior (e.g. drug resistance of HIV) and the
learned rules are then used to infer novel potentially
relevant mutations.
      </p>
      <p>We focused on learning relevant mutations of the
HIV reverse transcriptase (RT). The HIV RT is a
DNA polymerase enzyme that transcribes RNA into
DNA, allowing it to be integrated into the genome
of the host cell and replicated along with it, and it
is therefore crucial for virus propagation. Viruses
typically have a very high mutation rate and a
mutation can confer the mutant resistance to one or
more drugs, for instance by modifying the inhibitor
target site on the protein. In the case of the HIV
drug resistance, which we addressed in the present
work, building arti cial mutants that can resist to
the inhibitor could help to early predict the virus
evolution and thus design more e ective drugs.</p>
      <p>
        Many machine learning methods have been
applied in the past to mutation data for predicting
single point mutations on protein stability changes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and the e ect of mutations on the protein
function [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or drug susceptibility [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. At our
knowledge this is the rst approach proposal for learning
relational features of mutations a ecting a protein
behavior and use them for generating novel relevant
mutations. Furthermore, even if we focus on single
point mutations in our experimental evaluation, our
approach can be quite straightforwardly extended to
multiple point mutations, and we are actively
working in this direction. Conversely, the up-mentioned
predicting approaches would immediately blow up
for the explosion in the number of candidate con
gurations to evaluate.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Results and Discussion</title>
      <sec id="sec-2-1">
        <title>Dataset</title>
        <p>
          We applied our approach to the same dataset of
mutations already used in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for extracting relational
rules among mutations. The dataset is derived from
the Los Alamos National Laboratories (LANL) HIV
resistance database1 and reports mutations of the
HIV reverse transcriptase (RT). Richter et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
for1http://www.hiv.lanl.gov/content/sequence/RESDB/
mulated the learning problem as a mining task and
applied a relational association rule miner to derive
rules relating di erent mutations and their resistance
properties.
        </p>
        <p>
          In previous work [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] we used the same dataset
for learning a relational model of mutant resistance
with the hierarchical kFOIL algorithm, a statistical
relational learner [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Here we use the dataset for
inferring rules that can characterize a single mutation
as resistant to a certain class of RT inhibitors. Those
drug classes include: a) Nucleoside RT Inhibitors
(NRTI); b) NonNucleoside RT Inhibitors (NNRTI);
c) NonCompetitive RT inhibitors (NCRTI); d)
Pyrophosphate Analogue RT Inhibitors (PARTI). The
four classes of inhibitors di er in the targeted sites
and rely on quite di erent mechanisms. NNRTI and
NCRTI inhibit the reverse transcriptase by binding
to the enzyme active site, therefore directly
interfering with the enzyme function. NRTI is instead
incorporated into the newly synthesized viral DNA for
preventing its elongation. Finally the PARTI targets
the pyrophosphate binding site and it is employed,
as part of a salvage therapy, on patients in which the
HIV infection shows resistance to the other classes of
antiretroviral-drugs. The nal dataset is composed
of 164 mutations labeled as resistant over a set of
581 observed mutations (extracted from 2339
mutants). Among the 164 mutations, 95 are labeled as
resistant to NRTI, 56 to NNRTI, 5 to NCRTI and 8
to PARTI.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Learning in rst order logic</title>
        <p>Our aim is to learn a rst-order logic hypothesis for
the target concept, i.e. mutation conferring
resistance to a certain drug, and use it to infer novel
mutations consistent with such hypothesis. We rely
on de nite clauses which are the basis of the
Prolog programming language. A de nite clause is an
expression of the form h b1 AND ... AND bn,
where h and the bi are atoms. Atoms are
expressions of the form p(t1, ..., tn) where p/n is a
predicate symbol of arity n and the ti are terms,
either constants (denoted by lower case) or variables
(denoted by upper case) in our experiments. The
atom h is also called the head of the clause,
typically the target predicate, and b1 AND ... AND
bn its body. Intuitively, a clause represents that
the head h will hold whenever the body b1 AND
... AND bn holds. For instance, a simple
hypothesis like res against(A,ncrti) mutation(A,C)
AND close to site(C) would indicate that a
mutation C in the proximity of a binding site confers to
mutant A resistance against a ncrti. Learning in
this setting consists of searching for a set of de nite
clauses H = fci; :::; cmg covering all or most positive
examples, and none or few negative ones if available.
First-order clauses can thus be interpreted as
relational features that characterize the target concept.
The main advantage of these logic-based approaches
with respect to other machine learning techniques is
the expressivity and interpretability of the learned
models. Models can be readily interpreted by
human experts and provide direct explanations for the
predictions.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Background knowledge</title>
        <p>We built a relational knowledge base for the domain
at hand. Table 1 summarizes the predicates we
included as a background knowledge. We represented
the amino acids of the wild type with their positions
in the primary sequence (aa/2) and the speci c
mutations characterizing them (mut prop/4). Target
predicates were encoded as resistance of the
mutation to a certain drug (res against/2).</p>
        <p>
          Additional background knowledge was included
in order to highlight characteristics of residues and
relationships between mutations:
color/2 indicates the type of the natural amino
acids according to the coloring proposed in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
For example the magenta class includes basic
amino acids as lysine and arginine while the
blue class includes acidic amino acids as
aspartic and glutamic acids.
same type/2 indicates whether two residues belong
to the same type, i.e. a change from one
residue to the other conserves the type of the
amino acid.
same type mut/2 indicates that a residue
substitution at a certain position does not modify
the amino acid type with respect to the wild
type. For example mutation d123e conserves
the amino acid type while mutation d123a does
not (i.e. different type mut/2 holds for it).
        </p>
        <p>
          Other background knowledge facts and rules
were added in order to express structural relations
along the primary sequence and catalytic propensity
of the involved residues:
close to site/1 indicates whether a speci c
position is distant less than 5 positions from a
residue belonging to a binding or active site.
In our speci c case, the background theory
incorporates knowledge about a metal binding
site and a heterodimerization site.
location/2 indicates in which fragment of the
primary sequence the amino acid is located.
Locations are numbered from 0 by dividing the
sequence into a certain number of fragments.
catalytic propensity/2 indicates whether an
amino acid has a high, medium or low
catalytic propensity according to [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
mutated residue cp/3 indicates how, in a mutated
position, the catalytic propensity has changed
(e.g. from low to high).
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Algorithm overview</title>
        <p>The proposed approach is sketched in Figure 1.</p>
        <p>The rst step is the learning phase, in which an
ILP learner is fed with a logical representation of
the data D (the mutations experimentally observed
to confer resistance to a certain inhibitor) and of
the domain knowledge B we want to incorporate,
and it returns a rst-order logical hypothesis H for
the concept of mutation conferring resistance to a
certain drug.</p>
        <p>The hypothesis is derived using the Aleph (A
Learning Engine for Proposing Hypotheses) ILP
system2. Aleph allows also to learn from positive
example only. This is the most suitable approach in
our case as the positive examples are the mutations
experimentally proved to confer resistance to a drug
but no safe claim can be made on the other
mutations if there is no su cient evidence due for example
to the lack of an exhaustive set of laboratory
experiments.</p>
        <p>
          Aleph incrementally builds a hypothesis covering
all positive examples guided by a Bayesian
evaluation function, described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], scoring candidate
2http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph.html
solutions according to an estimate of the Bayes'
posterior probability that allows to tradeo hypothesis
size and generality.
        </p>
        <p>In Figure 2 we show an example of learned
hypothesis covering a set of examples. The learned
hypothesis models the ability of a mutation to confer
the resistance to NCRTIs and is composed of three</p>
        <p>rst-order clauses, each one covering di erent sets
of mutations of the wild type as highlighted in
colors: blue for the rst clause, yellow for the second
and red for the third one. Some mutations are
covered by more than one clause as shown by the color
overlaps.</p>
        <p>Our background knowledge incorporates
information about the RT metal binding site, which is
composed of the aspartic acids D110, D185 and D186
and, about the heterodimerization site composed of
W401 and W414 (in bold in Figure 2).</p>
        <p>The second step is the generative phase, in which
the learned hypothesis is employed to nd novel
mutations that can confer drug resistance to an RT
mutant. A set of candidate mutations can be generated
by using the Prolog inference engine starting from
the rules in the learned model. The rules are
actually constraints on the characteristics that a
mutation of the wild type should have in order to confer
resistance to a certain inhibitor.</p>
        <p>Algorithm 1 details the mutation generation
procedure. We here assume, for simplicity, to have a
model H for a single drug class. The procedure
works by querying the Prolog inference engine for
all possible variable assignments that satisfy the
hypothesis clauses. In order to generate novel
mutations, the mut prop/4 predicate is modi ed here in
order to return all legal mutations instead of only
those appearing in the training instances. The set
of mutations identi ed by the variable assignments
and not present in the training set is ranked
according to a scoring function SM before being returned
by the algorithm. In this work we de ned SM as the
number of clauses in H that a candidate mutation
m satis es.</p>
        <p>
          Referring to the model in Figure 2 some
generated candidate mutations with highest score are:
101P, 102A, 103F, 103I, 104M, 181I, 181V, 183M,
188L, 188F where for example the notation 101P
indicates a change of the wild type amino acid, located
in position 101, into a proline (P). This list includes
also known surveillance mutations [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Performance evaluation</title>
        <p>We divided the dataset of mutations into a
training and a test set (70/30) in a strati ed way, which
means by preserving, both in the train and test set,
the proportion of examples belonging to one of the
four drug classes. The resulting training set is
composed of a total of 116 mutations while the test set
is composed of 48 mutations.</p>
        <p>We trained the ILP learner on the training set
and we evaluated on the test set the set of
mutations generated using the learned model. The
evaluation procedure takes the generated mutations and
computes its enrichment in test mutations by
counting how many generated mutations are actually
observed in at least one test example. We compare the
recall of the approach with the recall of an algorithm
that choose at random a set (of the same cardinality)
of possible mutations among all legal ones.</p>
        <p>Recall or sensitivity or TP rate is t+t++f where
t+, the true positives, is the number of test set
mutations and t+ +f , true positives plus false negatives,
corresponds to the number of generated mutations.</p>
        <p>Results averaged on 30 random splits of the
dataset are reported in Table 2. On each split we
performed 30 runs of our algorithm and of the
random generation algorithm in each one of the di erent
learning tasks (NNRTI, NRTI, NCRTI and PARTI).
For each task we also reported in column 3 and 4
the mean number of generated mutations over the
30 splits and the number of test set mutations for
reference.</p>
        <p>We evaluated the statistical signi cance of the
performance di erences between the two algorithms
by paired Wilcoxon tests on the averaged recall
reported on each split. We employed a con dence level
of 0.05.</p>
        <p>The improvement of the algorithm with respect
to the random generation of mutations is statistically
signi cant on the NRTI and NNRTI tasks, which are
the tasks on which we can learn the hypothesis from
a largest set of training examples.</p>
        <p>Figure 2 shows the trend of the mean recall over
all splits when cutting the number of generated
mutations (from 1 generated mutation to more than
8000). The advantage of our approach remains quite
stable when reducing the set of candidates,
producing almost nine times more test mutations than
random in the 100 highest scoring ones for NNRTI
(Figure 3 shows the trend of the recall curves of the plot
in Figure 3(a) for the highest ranked 150 mutations).</p>
        <p>We nally learned a model on the whole set of
training mutations in order to generate a single set of
mutations for further inspection. Below we reported
ve examples of novel mutations with the highest
rank for each one of the tasks:
NNRTI 90I 98I 103I 106P 179I
NRTI 60A 153M 212L 229F 239I
NCRTI 183V 183L 188V 188F 188I
PARTI 84R 86E 88Y 88V 89N</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], the authors found a set of novel
mutations conferring resistance to efavirenz and
nevirapine, which are NNRTIs. Our mutation generation
algorithm partially con rmed their ndings.
Mutation 90I was ranked high (5/5), mutation 101H was
generated with a rank of 3/5, mutations 196R and
138Q with rank 1/5, while mutation 28K was not
generated at all by our system as a candidate for
conferring resistance to NNRTI.
        </p>
        <p>Example of learned hypothesis
An example of learned hypothesis is reported in
Figure 5. For instance, according to the model, among
the features a mutation should have for conferring
resistance to a NNRTI, there is the change of a
basic (magenta) residue of the wild type, e.g. lysine
or arginine, into a residue with completely di erent
phisico-chemical characteristics (rule 16).</p>
        <p>Another example for the resistance to NNRTI is
that a non conserved mutation is present in positions
between 98 and 106 of the wild type sequence (rule
8).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this work we proposed a simple relational learning
approach toward protein engineering. Starting from
HIV reverse transcriptase mutation data we built a
relational knowledge base and we mined relevant
relational features for modeling mutant resistance by
using an ILP learner. Based on the learned relational
rules we generate a set of candidate mutations
satisfying them.</p>
      <p>
        Albeit preliminary, our results suggest that the
proposed approach for learning mutations has a
potential in guiding mutant engineering, as well as in
predicting virus evolution in order to try and devise
appropriate countermeasures. A more detailed
background knowledge, possibly including 3D
information whenever available, is necessary in order to
further focus the set of generated mutations, and
possibly post-processing stages involving mutant
evaluation by statistical machine learning approaches [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
In the next future we also plan to generalize the
proposed approach to jointly generate sets of related
mutations shifting the focus from single point
mutations to entire mutants.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Authors contributions</title>
      <p>EC built the background knowledge, implemented
the algorithm and conducted the experimental
evaluation. AP suggested the overall idea and
coordinated the study. Both authors contributed in
writing the article. Both authors read and approved the
nal manuscript.</p>
    </sec>
    <sec id="sec-5">
      <title>Algorithms</title>
      <sec id="sec-5-1">
        <title>Algorithm 1 - Mutation generation algorithm.</title>
        <p>Algorithm for novel relevant mutations discovery.</p>
        <p>Algorithm 1 Mutation generation algorithm.</p>
        <p>1: input: dataset of training mutations D, background knowledge B, learned model H
2: output: rank of the most relevant mutations R
3: procedure GenerateMutations(D; B; H)
4: Initialize DM ;
5: A nd all assignments a that satisfy at least one clause ci 2 H
6: for a 2 A do
7: m mutation corresponding to the assignments a 2 A
8: score SM (m)
9: if not m 2 D then
10: DM
11: end if
12: end for
13: R RankMutations(DM; B; H)
14: return R
15: end procedure</p>
        <p>DM [ f(m; score)g</p>
        <p>. number of clauses ci satis ed by a
. discard mutations observed in the training set</p>
        <p>. rank relevant mutations
dataset of
training
mutations</p>
        <p>D
background
knowledge</p>
        <p>B</p>
        <p>ILP
learner</p>
        <p>H
hypothesis
generator of
relevant
mutations
rank of
novel
relevant
mutations</p>
        <p>R
Figure 2 - Model for the resistance to NCRTI
An example of learned hypothesis for the NCRTI task with highlighted examples covered by the hypothesis
clauses.</p>
        <p>&gt;wt ...AGLKKKKSVTVLDVG...YQYMDDLYVG...WETWWTEY...WIPEWEFVN...</p>
        <p>| | | | | | | |
98 112 181 190 398 405 410 418
mut prop(A,B,C,D) AND location(11.0,C)
mut prop(A,B,C,D) AND mutated residue cp(C,high,small)
mut prop(A,B,C,D) AND color(green,B) AND close to site(C)
mut prop(A,B,C,D) AND color(red,D)
mut prop(A,B,C,D) AND color(red,D) AND color(red,B)
mut prop(A,B,C,D) AND location(7.0,C) AND mutated residue cp(C,medium,medium)
mut prop(A,B,C,D) AND location(7.0,C)
same type mut(A,B)
mut prop(A,B,C,D) AND mutated residue cp(C,medium,high)
mut prop(A,B,C,D) AND mutated residue cp(C,medium,small)
different type mut(A,B) AND location(11.0,B)
mut prop(A,B,C,D) AND mutated residue cp(C,small,small)
same type mut(A,B)
mut prop(A,B,C,D) AND aminoacid(B,v)
mut prop(A,B,C,D) AND location(15.0,C)
mut prop(A,B,C,D) AND aminoacid(D,i)
mut prop(A,B,C,D) AND location(11.0,C)
mut prop(A,B,C,D) AND color(red,D)
mut prop(A,B,C,D) AND color(magenta,B) AND different type mut(A,C)
mut prop(A,B,C,D) AND location(21.0,C)
mut prop(A,B,C,D) AND location(11.0,C)
mut prop(A,B,C,D) AND mutated residue cp(C,high,small)
mut prop(A,B,C,D) AND color(green,B) AND close to site(C)
mut prop(A,B,C,D) AND location(9.0,C)</p>
        <p>Background Knowledge Predicates
aa(Pos,AA)
mut prop(MutationID,AA,Pos,AA1)
indicates a residue in the wild type sequence
indicates a mutation: mutation identi er, position and
amino acids involved, before and after the substitution
res against(MutationID,Drug) indicates whether a mutation is resistant to a certain drug
color(Color,AA) indicates the type of a natural amino acid
same type(R1,R2) indicates whether two residues are of the same type
same type mut(MutationID, Pos) indicates a mutation to a residue from the same type
different type mut(MutationID, Pos) indicates a mutation changing the type of residue
close to site(Pos) indicates whether a speci c position is close to a binding
or active site if any
location(L,Pos) indicates in which fragment of the primary sequence the
amino acid is located
catalytic propensity(AA,CP) indicates whether an amino acid has a high, medium or
low catalytic propensity
mutated residue cp(Pos, CPold, CPnew) indicates how, in a mutated position, the catalytic
propensity has changed (e.g. from low to high)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Muggleton</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            <given-names>L</given-names>
          </string-name>
          :
          <article-title>Inductive Logic Programming: Theory and Methods</article-title>
          .
          <source>University Computing</source>
          <year>1994</year>
          , :
          <volume>629</volume>
          {
          <fpage>682</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Capriotti</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fariselli</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casadio</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>A threestate prediction of single point mutations on protein stability changes</article-title>
          .
          <source>BMC bioinformatics</source>
          <year>2008</year>
          , 9
          <issue>Suppl 2</issue>
          :
          <fpage>S6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Needham</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradford</surname>
            <given-names>JR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulpitt</surname>
            <given-names>AJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Care</surname>
            <given-names>Ma</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westhead</surname>
            <given-names>DR</given-names>
          </string-name>
          :
          <article-title>Predicting the e ect of missense mutations on protein function: analysis with Bayesian networks</article-title>
          .
          <source>BMC bioinformatics</source>
          <year>2006</year>
          , 7:
          <fpage>405</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bromberg</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rost</surname>
            <given-names>B</given-names>
          </string-name>
          :
          <article-title>SNAP: predict e ect of nonsynonymous polymorphisms on function</article-title>
          .
          <source>Nucleic acids research</source>
          <year>2007</year>
          ,
          <volume>35</volume>
          (
          <issue>11</issue>
          ):
          <volume>3823</volume>
          {
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rhee</surname>
            <given-names>SY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wadhera</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-Hur</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brutlag</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>RW</given-names>
          </string-name>
          :
          <article-title>Genotypic predictors of human immunode ciency virus type 1 drug resistance</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          <year>2006</year>
          ,
          <volume>103</volume>
          (
          <issue>46</issue>
          ):
          <volume>17355</volume>
          {
          <fpage>60</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Richter</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustin</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            <given-names>S</given-names>
          </string-name>
          :
          <article-title>Finding Relational Associations in HIV Resistance Mutation Data</article-title>
          .
          <source>In Proceedings of Inductive Logic Programming (ILP)</source>
          , Volume 9
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cilia</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landwehr</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passerini</surname>
            <given-names>A</given-names>
          </string-name>
          :
          <article-title>Mining Drug Resistance Relational Features with Hierarchical Multitask kFOIL</article-title>
          .
          <source>In Proceedings of BioLogical@AI*</source>
          IA2009
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Landwehr</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passerini</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raedt</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasconi</surname>
            <given-names>P</given-names>
          </string-name>
          :
          <article-title>Fast learning of relational kernels</article-title>
          .
          <source>Machine Learning</source>
          <year>2010</year>
          ,
          <volume>78</volume>
          (
          <issue>3</issue>
          ):
          <volume>305</volume>
          {
          <fpage>342</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Taylor</surname>
            <given-names>WR</given-names>
          </string-name>
          :
          <article-title>The classi cation of amino acid conservation</article-title>
          .
          <source>Journal of Theoretical Biology</source>
          <year>1986</year>
          ,
          <volume>119</volume>
          (
          <issue>2</issue>
          ):
          <volume>205</volume>
          {
          <fpage>218</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bartlett</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Porter</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borkakoti</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thornton</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Analysis of catalytic residues in enzyme active sites</article-title>
          .
          <source>Journal of Molecular Biology</source>
          <year>2002</year>
          ,
          <volume>324</volume>
          :
          <fpage>105</fpage>
          {
          <fpage>121</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Muggleton</surname>
            <given-names>S</given-names>
          </string-name>
          :
          <article-title>Learning from Positive Data</article-title>
          .
          <source>In Inductive Logic Programming Workshop</source>
          <year>1996</year>
          :
          <volume>358</volume>
          {
          <fpage>376</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bennett</surname>
            <given-names>DE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho</surname>
            <given-names>RJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Otelea</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuritzkes</surname>
            <given-names>DR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fleury</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiuchi</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heneine</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantor</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            <given-names>MR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schapiro</surname>
            <given-names>JM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandamme</surname>
            <given-names>AM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandstrom</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boucher</surname>
            <given-names>CaB</given-names>
          </string-name>
          , van de Vijver D,
          <string-name>
            <surname>Rhee</surname>
            <given-names>SY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>TF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pillay</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>RW</given-names>
          </string-name>
          :
          <article-title>Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update</article-title>
          .
          <source>PloS one</source>
          <year>2009</year>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>e4724</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Deforche</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho</surname>
            <given-names>RJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossman</surname>
            <given-names>Z</given-names>
          </string-name>
          , Soares Ma,
          <string-name>
            <surname>Van Laethem</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katzenstein</surname>
            <given-names>Da</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrigan</surname>
            <given-names>PR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantor</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafer</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandamme</surname>
            <given-names>AM</given-names>
          </string-name>
          :
          <article-title>Bayesian network analyses of resistance pathways against efavirenz and nevirapine</article-title>
          .
          <source>AIDS (London, England)</source>
          <year>2008</year>
          ,
          <volume>22</volume>
          (
          <issue>16</issue>
          ):
          <volume>2107</volume>
          {
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>