<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How can semantic annotations support the identi cation of network similarities?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Rosenke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dagmar Waltemath</string-name>
          <email>dagmar.waltemath@uni-rostock.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Rostock</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Computational models in open model repositories support biologists in understanding and investigating biological questions. The availability of alternative models results in a need for model selection algorithms. This selection can be based on information retrieval search, full-text search, selection by ontology concepts etc. We here describe our approach to solving a serious aspect of model selection, that is, the problem of comparing models of reaction networks. Speci cally, we discuss how graph algorithmic approaches can help to compare models that are semantically enriched by annotations. While a graph comparison of naked models is infeasible, the knowledge gained from the semantic annotations and domain speci c structures can reduce the complexity. Our concept has the potential to improve model search, and it can contribute to the de nition of similarity measures.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Modeling is a state-of-the-art method in systems biology research, and its
importance for experimental studies and result analyses is steadily increasing.
Computational models describe biological systems, which are analysed and often
simulated, to gain further knowledge about a system under study, or to predict future
experiments. Usually, models are provided to the research community in XML
standard formats such as SBML [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or CellML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These XML-based encodings
are annotated with terms from bio-ontologies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to enable model visualisation,
comparison and search, and thereby improve model reuse. See Figure 1 for an
example of an SBML encoded and annotated model on the cell cycle. Models
are distributed via open repositories such as the BioModels Database [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], JWS
Online [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or the CellML model repository [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Understanding and further developing models remains a major challenge in
biology today due to the ambiguity and varying levels of abstraction in model
descriptions. Other aspects that complexify the study of models are the increasing
number and size of models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Even for well-annotated models, basic
operations, such as identifying suitable submodels and comparing models, can hardly
be handled with or without computational support.
      </p>
      <p>In this work, we outline a path towards e cient computational methods that
assist scientists in nding models and comparing them to each other. Many
models are described by reaction networks, which are built from nodes for reactants,
BRENDA</p>
      <p>Cyclindependent
Kinase
2.7.11.22
Protein-serine
/ threonine
Kinase
2.7.11
Transferring
phosphoruscontaining
Groups
2.7</p>
      <p>Transferase
2</p>
      <p>;
cyclin
total
cdc2
p-cyclin
cdc2-P
p-cyclin
cdc2
cdc2k-P
cdc2k
p-cyclin
;</p>
      <p>GO
protein
dephosphorylation
00070
protein
phosphorylation
0006468
regulation of
cyclin-dependent
protein
kinase activity
0000079
protein
metabolic
process
0019538
cellular
protein
modi cation
process
0006464
cellular
protein
metabolic
process
0044267
protein
modi cation
process
0036211
biological
process
0065007
regulation
of protein
modi cation
process
00031399
reaction products and modi ers. The goal is to extend current search
functionality by a model similarity measure that compares these network structures using
graph algorithmic approaches.</p>
      <p>
        Similar works have already developed algorithms that calculate model
similarities, as for instance in. Proposed solutions either use information retrieval
to calculate similarities between models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; XML di algorithms for di erence
detection within versions of a model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]; focus on the similarities of semantic
annotations of model entities [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; or use network similarity approaches [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ].
However, none of them fully integrates available, domain-speci c information
and graph-based approaches in one similarity measure. Moreover, existing
algorithms provide heuristic or approximate solutions that do not yet represent real
alternatives for the still common manual way of processing models. To master
the demanding challenges introduced by the contemporary way of working with
models, we incorporate knowledge about domain characteristics and so-called
semantic annotations, which have so far not been considered together with the
structural composition of networks.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>This work is tailored towards models encoded in standard representation formats
as SBML and CellML. The key concepts to counter the severe computational
hardness of comparing \naked" graphs are to incorporate (1) information on
structural constraints common to networks of biological models and (2)
knowledge from semantic annotations.</p>
      <p>
        Incorporating the model structure can tremendously reduce the
complexity of the proposed graph-based correlation procedures. This is because
networks in models for biological systems are subject to certain structural
restrictions simply by originating from real world chemical and physical processes.
Chemical reactions, for instance, do almost never incorporate more than two
or three reactants and products. Beside this general observation, more complex
limits to biological reaction networks can be found. For example, certain pattern,
called motifs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], tend to appear regularly in many biological networks.
      </p>
      <p>Incorporating semantic annotations enhances or reduces the probability
of matching nodes based on the linked ontology concepts. Many models are
accurately annotated, including information about the biological meaning of
model entities, about the roles that entities play in reactions, about the types
of reactions, or about the nature of a parameter. We use these links to concepts
in external ontologies to determine similarities between model entities. If two
entities carry the same, or a similar annotation, and they structural restrictions
apply, then our algorithm boosts the respective similarity value.
3</p>
    </sec>
    <sec id="sec-3">
      <title>A First Graph-Based Approach using Annotations</title>
      <p>To support our argumentation in this work we present a rst graph-based
approach where knowledge about semantic annotations is used to obtain an e cient
solution. More precisely, we want to e ciently answer the question, if a query
model Q is a submodel of another model M . Thus we look for an injective
mapping of the nodes from Q to those from M such that the edges are matched.</p>
      <p>Computationally, this is extremely di cult in general. But using the links
into the ontologies, made available by the semantic annotations, can create
valuable connections between Q and M which essentially reduce the possible degree
of freedom for the mapping . Our approach works, if nearly every node in
Q has at most two possible candidate target nodes in M . Although this is a
strong restriction, our algorithm is more general and powerful than many other
approaches.</p>
      <p>Basically, we compute a formula in 2-cnf that describes the valid mappings
by its satisfying assignments, that is, we reduce the problem to 2-SAT, a
tractable version of the satis ablity problem from proposal logic. For every node
q from Q and every possible target node m of M we introduce a variable mq
which is true if and only if (q) = m. As q has at most two targets m; m0 we
are able to include the clauses (mq _ m0q) and (:mq _ :m0q) to express that (q)
is either m or m0. To make injective, we have to state for every node m in
M that at most one node from Q maps to m. For this end, we add the clause
(:mq _ :mq0 ) for every pair q; q0 of Q-nodes that both possibly map to m. As
has to preserve the adjacency of nodes, we add the clause (:mq _ :m0q0 ) for all
Q nodes q; q0 with possible target nodes m; m0 if the adjacency between q and
q0 in Q is di erent from the adjacency between m and m0 in M .</p>
      <p>The length of the resulting 2-cnf formula is at most quadratical in the size of
Q and widely independent of M . This easily makes queries Q of up to hundreds
of nodes feasible.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Summary</title>
      <p>
        This work presents rst thoughts on using a combination of algorithms for graph
similarity and semantic annotations as a mean to map simulation models
describing biological systems. The concept will be implemented in a data management
system that supports the management and linking of model les, ontologies used
to semantically enrich the model les, and all data that needs to be accessed and
stored during and after the computation of network similarities [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Our concept
for model comparison can improve model search, and it can contribute to the
de nition of similarity measures. It can also be used to identify overlapping parts
in a network, for example between di erent versions of a model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hucka</surname>
          </string-name>
          et al.:
          <article-title>The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models</article-title>
          ..
          <source>Bioinformatics 19</source>
          .4:
          <fpage>524</fpage>
          -
          <lpage>531</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cuellar</surname>
          </string-name>
          et al.:
          <article-title>An overview of CellML 1.1, a biological model description language</article-title>
          .
          <source>Simulation 79.12</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Horridge</surname>
          </string-name>
          et al.:
          <article-title>The state of bio-medical ontologies</article-title>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Li</surname>
          </string-name>
          et al.:
          <article-title>BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models</article-title>
          .
          <source>BMC Systems Biology</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Brett and
          <article-title>Snoep: Web-based kinetic modelling using JWS Online Bioinformatics 20</article-title>
          .13:
          <fpage>2143</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lloyd</surname>
          </string-name>
          et al.:
          <article-title>The CellML Model Repository</article-title>
          . Bioinformatics,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Henkel</surname>
          </string-name>
          et al.:
          <article-title>Ranked retrieval of computational biology models</article-title>
          .
          <source>BMC bioinformatics 11</source>
          .1:
          <issue>423</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Tyson:
          <article-title>Modeling the cell division cycle: cdc2 and cyclin interactions</article-title>
          .
          <source>Proc. Natl. Acad. Sci</source>
          .
          <volume>88</volume>
          (
          <issue>16</issue>
          ):
          <fpage>7328</fpage>
          -
          <lpage>7332</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Waltemath</surname>
          </string-name>
          et al.:
          <article-title>Improving the reuse of computational models through version control</article-title>
          .
          <source>Bioinformatics</source>
          <volume>29</volume>
          .6:
          <fpage>742</fpage>
          -
          <lpage>748</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Schulz</surname>
          </string-name>
          et al.:
          <article-title>Propagating semantic information in biochemical network models</article-title>
          .
          <source>BMC Bioinformatics</source>
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Przulj</surname>
          </string-name>
          :
          <article-title>Biological network comparison using graphlet degree distribution</article-title>
          .
          <source>Bioinformatics</source>
          <volume>2</volume>
          (
          <issue>23</issue>
          ):
          <volume>177</volume>
          {
          <fpage>183</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gay</surname>
          </string-name>
          et al.:
          <article-title>A graphical method for reducing and relating models in systems biology</article-title>
          .
          <source>Bioinformatics</source>
          <volume>18</volume>
          (26):i575{
          <fpage>i581</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <article-title>Tyson and Novak: Functional motifs in biochemical reaction networks</article-title>
          .
          <source>Annual review of physical chemistry 61:219</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>