<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GROOLS: Reactive Graph Reasoning for Genome Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonathan Mercier</string-name>
          <email>jmercier@genoscope.cns.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Vallenet</string-name>
          <email>vallenet@genoscope.cns.fr</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS-UMR8030</institution>
          ,
          <addr-line>Evry</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Direction des Sciences du Vivant, CEA, Institut de Génomique</institution>
          ,
          <addr-line>Genoscope, LABGeM, Evry</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Université d'Evry Val-d'Essonne</institution>
          ,
          <addr-line>Evry</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>GROOLS (Genomic Rule Oriented Object Logic System) is an expert system to help biologists in the evaluation of genome functional annotation through biological processes like metabolic pathways. Such reasoning is conducted using a Business Rule Management System (BRMS) working on a generic representation of biological knowledge that captures complex data and relationships. We use the Object-Oriented First Order Logic (OOFOL) extended to a four-valued logic of Belnap. Prior biological knowledge is organized in a directed acyclic graph and evaluated by applying reactive graph reasoning using observation facts. Two types of observations are considered: predictions from bioinformatics methods and assertions that correspond to experimental evidences in the studied organism. Once all facts are spread over the graph, a conclusion is made about the state of prior knowledge (e.g. confirmed presence, missing, unexpected absence). GROOLS implementation is based on the jBoss DROOLS framework.</p>
      </abstract>
      <kwd-group>
        <kwd>genome annotation</kwd>
        <kwd>metabolic network reconstruction</kwd>
        <kwd>business rules</kwd>
        <kwd>object-oriented logic</kwd>
        <kwd>four-valued logic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        GROOLS (Genomic Rule Oriented Object Logic System) is an applied research
project mixing biology, informatics and logic. It aims to standardize the use
of biological results to annotate a genome. This expert system will help
biologists (i.e. bio-annotators) to evaluate the quality of gene functional annotation
through biological processes like metabolic pathways. Starting from a set of
predicted genes, the bio-annotator tries to assign precise molecular functions to
corresponding proteins by integrating various predictions from bioinformatics
methods, which are mainly based on comparative sequence analysis with
proteins having experimentally validated functions. This laborious task may lead
to inconsistencies in the annotations due to the lack of experimental results and
the difficulty to find a correct trade-off between sensitivity and specificity of the
methods for functional inference.
Observations on the organism, like Biolog growth phenotype experiments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
may help genome annotation notably for metabolic network reconstruction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
For example, if an organism is able to grow using a metabolite X then a metabolic
pathway for compound X degradation is required in the organism as well as
enzymes that catalyze the chemical reactions of the pathway. During the genome
annotation process, the consistency and completeness of the predicted enzymatic
functions could then be evaluated using this experimental results.
Such reasoning can be conducted using a Business Rule Management System
(BRMS) working on a generic representation of biological knowledge that
captures complex data and relations like “is-a” and “has-a”. These prior knowledge
objects are theories and are organized in a directed acyclic graph. In GROOLS,
we use the Object-Oriented First Order Logic (OOFOL) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] extended to a
fourvalued logic of Belnap [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Approach</p>
    </sec>
    <sec id="sec-2">
      <title>Metabolic pathway representation</title>
      <p>Metabolic pathways are groups of chemical reactions taking part in a same
biological process. As depicted in Figure 1, different pathway variants may occur
in organisms to achieve a same global chemical transformation. These variants
may have common or specific parts, called here reaction blocks.</p>
      <p>
        In GROOLS, metabolic pathways with their variants made of reaction blocks
will be represented as prior knowledge objects organized in a Directed Acyclic
Graph (DAG) (Figure 2). Except for leaf nodes, nodes are flagged with (i) “And”
when a knowledge requires all its sub-knowledge to be present (ii) “Or” when
a knowledge requires only one of its sub-knowledge to be present. Considering
the example depicted in Figure 2, if the reactions 1, 3 and 4 are present (i.e.
predicted by gene functional annotation) and the pathwayX is required (i.e. the
organism degrades compound X) then VariantX1 can be considered as present
with one missing reaction (reaction 2). Indeed, biological knowledge is often
incomplete and such missing information is commonly called a “pathway hole”
or an “orphan enzyme”: an enzymatic reaction that should occur in an organism
but has no annotated gene to code the enzyme [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Prior knowledge and observation model</title>
      <p>A bio-annotator uses at least three types of facts: (i) “prediction facts” are
predictions from bioinformatics methods or human expertise made by integrating
several method results (ii) “assertion facts” are experimental evidences in the
studied organism (iii) “prior knowledge” gathers all metabolic pathways that
were experimentally elucidated in at least one organism and represents the
actual knowledge over years of cumulative empirical research in biology. Prediction
and assertion facts will be considered as observations and will be used to make
conclusions about the state of prior biological knowledge. As shown in Figure 3,
we designed an object-oriented model with interfaces to ease the integration of
heterogeneous objects from different external databases.</p>
      <p>Fact
getId() : String
getName() : String
getSource() : String
getDate() : DateTime
getPresence() : FourState</p>
      <p>Observation
getPriorKnowledgeId() : String
getEvidence() : Evidence
Assertion</p>
      <p>Prediction</p>
      <p>
        PriorKnowledge
getPartOf() : PriorKnowledge[]
getNodeType() : NodeType
getConclusion() : Conclusion
setConclusion( Conclusion conclusion ) : void
setPresence( FourState presence ) : void
0..*
1
Prior knowledge will be evaluated through observations by applying reactive
graph reasoning using the Object-Oriented First Order Logic (OOFOL) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
extended to a four-valued logic of Belnap [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Predictions will directly impact leaf
nodes of the DAG while assertions will generally impact root nodes. Prediction
and assertions facts take four different values {present, absent, both, unknown}
and {required, avoided, both, unknown} respectively, which correspond to {true,
false, both, none} values of the four-valued logic. In a logic point of view, a class
is a structure and an instance of a class (an object) is a theory. An object can
contain other objects meaning that a theory is a graph of smaller theories. Each
theory is defined by a coherent set of values. A class C is defined as a tuple
⟨LC , AC , IC ⟩. Where L is the vocabulary (class methods), A a set of axioms
(attributes) and I a set of vocabulary to implement (IC ⊂ LC ). The truth of
a theory can be inferred using its sub-theories following a logic. This logic is
designed to cope with various and contradictory information sources. A theory
is true if all sub-theories are true. If all sub-theories are false then the theory
is false. If some sub-theories are true and other false then the theory takes the
value “both” to denote ambiguity. If any information matches a theory then the
theory takes the state “none” (see Table 1). More exactly, a theory linked to
sub-theories with a logical “And” (NodeType knowledge attribute) is evaluated
using the following priority “false &gt; both &gt; none &gt; true”. For a logical “Or”, the
priority is “true &gt; none &gt; both &gt; false”. We are also evaluating an optimistic
version of the logic where “And” priority is modified to “true &gt; false &gt; both &gt;
none” when sub-theories are linked to a single theory. This will help us to deal
with incomplete predictions. Lastly, once all facts are spread over the DAG, a
conclusion with 16 possible states is made on prior knowledge using assertion
and prediction value combination (see Table 2).
The GROOLS system is still under heavy development and evaluation. We hope
that this tool will be useful for biologists to evaluate the overall coherence of
individual predicted functions through the integration of additional information
from biological processes like metabolic pathways.
      </p>
      <p>
        Through a collaborative project between INRIA and the Swiss Institute of
Bioinformatics, a first prototype with similar deductive reasoning has been
implemented in the HERBS system using Jess rule engine. GROOLS implementation
is based on the jBoss DROOLS framework which is an open rule-engine
written in Java [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It natively supports an object-oriented language and uses the
PHREAK algorithm to reason over structured data. It does a bridge between
our Java application and the business logic. Genomic data with functional
predictions and human expert annotations will be extracted from the Prokaryotic
Genome DataBase (PkGDB) of the MicroScope platform[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Metabolic pathways
will be extracted from MetaCyc [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and UniPathway [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] resources.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amir</surname>
          </string-name>
          , E.:
          <article-title>Object-oriented first-order logic</article-title>
          .
          <source>Linkiping Electronic Articles in Computer and Information Science</source>
          (http://www.ida.liu.se/ext/etai) 4,
          <fpage>63</fpage>
          -
          <lpage>84</lpage>
          (
          <year>1999</year>
          ), http://www.ep.liu.se/ej/etai/1999/008/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Caspi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billington</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dreher</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foerster</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fulcher</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          , Holland,
          <string-name>
            <given-names>T.A.</given-names>
            ,
            <surname>Keseler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.M.</given-names>
            ,
            <surname>Kothari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kubo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , et al.:
          <article-title>The metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases</article-title>
          .
          <source>Nucleic Acids Res</source>
          <volume>42</volume>
          (
          <issue>D1</issue>
          ),
          <fpage>D459</fpage>
          -
          <lpage>D471</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Francke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siezen</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teusink</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Reconstructing the metabolic network of a bacterium from its genome</article-title>
          .
          <source>Trends in microbiology 13(11)</source>
          ,
          <fpage>550</fpage>
          -
          <lpage>558</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jr</surname>
            ,
            <given-names>N.D.:</given-names>
          </string-name>
          <article-title>A useful four-valued logic</article-title>
          . In:
          <article-title>Modern uses of multiple-valued logic</article-title>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>37</lpage>
          . Springer (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mackie</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulsen</surname>
            ,
            <given-names>I.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetu</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          :
          <article-title>Biolog phenotype microarrays for phenotypic characterization of microbial cells</article-title>
          .
          <source>In: Environmental Microbiology</source>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Morgat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coissac</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coudert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axelsen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          , Keller, G.,
          <string-name>
            <surname>Bairoch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bridge</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougueleret</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xenarios</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Unipathway: a resource for the exploration and annotation of metabolic pathways</article-title>
          .
          <source>Nucleic Acids Res</source>
          . p.
          <year>gkr1023</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Proctor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neale</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frandsen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Drools documentation</article-title>
          .
          <source>JBoss.org, Tech. Rep</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sorokina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Médigue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lespinet</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vallenet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Profiling the orphan enzymes</article-title>
          .
          <source>Biol. Direct</source>
          <volume>9</volume>
          (
          <issue>10</issue>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Vallenet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calteau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruveiller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engelen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lajus</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Fevre</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Longin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mornico</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roche</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rouy</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salvignol</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scarpelli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thil</surname>
            <given-names>Smith</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.</given-names>
            ,
            <surname>Weiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Medigue</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>MicroScope-an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>41</volume>
          (
          <issue>Database issue</issue>
          ),
          <fpage>D636</fpage>
          -
          <lpage>647</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>