<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Analysis of Differences In Biological Pathway Resources</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Lucy L. Wang, John H. Gennari, and Neil F. Abernethy Department of Biomedical Informatics and Medical Education, University of Washington Seattle</institution>
          ,
          <addr-line>Washington 98195</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Integrating content from multiple biological pathway resources is necessary to fully exploit pathway knowledge for the benefit of biology and medicine. Differences in content, representation, coverage, and more occur between databases, and are challenges to resource merging. We introduce a typology of representational differences between pathway resources and give examples using several databases: BioCyc, KEGG, PANTHER pathways, and Reactome. We also detect and quantify annotation mismatches between HumanCyc and Reactome. The typology of mismatches can be used to guide entity and relationship alignment between these databases, helping us identify and understand deficiencies in our knowledge, and allowing the research community to derive greater benefit from the existing pathway data.</p>
      </abstract>
      <kwd-group>
        <kwd>pathway database</kwd>
        <kwd>knowledge representation</kwd>
        <kwd>resource comparison</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Describing and studying biological pathways is
necessary for understanding biological and disease processes.
Biological functions and processes follow from
complex networks of interactions among gene products and
molecules. Through the study of pathways of known
biochemical reactions, we can gain deeper insights into
these interactions. Many of these relationships and
reactions have been catalogued in pathway resources such as
Reactome, BioCyc, KEGG, and others [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–5</xref>
        ].
      </p>
      <p>
        As of April 2016, PathGuide, a pathway resource
aggregator, lists 547 pathway resources [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], each providing
specialized knowledge in niche areas of biology. Efforts
have been made to integrate some of these databases.
PathwayCommons catalogs human pathway resources
under a unified biological pathway exchange umbrella
(BioPAX), allowing easier querying of pathways across
22 different resources [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Tools such as Consensus
Pathway DB [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and hiPATHDB [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] offer querying
and visualization of pathways from multiple databases.
Statistical frameworks like R Spider seek to
probabilistically combine protein interactions from various
pathway databases into merged networks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. These
tools improve querying of multiple resource, and pave
the way towards more comprehensive network models
of human biological processes.
      </p>
      <p>
        Some work has also been done in inter-resource
comparison, quantifying the overlap between different
databases [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12–15</xref>
        ]. These comparison studies emphasize
differences in entity membership in pathways and
differing counts of unique entities and pathways, but do not
focus on cross-resource entity alignment. Existing tools
for entity normalization of proteins [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and
metabolites [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] may provide a starting point for alignment.
Other studies emphasize aligning metabolic pathways of
different species in order to find analogous but missing
relationships [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ], merging resources for combined
network analysis [
        <xref ref-type="bibr" rid="ref11 ref20">11, 20</xref>
        ], or defining conserved
pathway elements across existing pathway resources [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
However, although they represent progress, the tools and
studies mentioned above accomplish goals that do not
include aligning representations across resources.
      </p>
      <p>
        Given the number and uniqueness of pathway
resources, inter-resource merging is a challenge. In order to
successfully align and integrate the content of multiple
knowledge bases, we must contend with variability in
content correctness, standards usage, knowledge
representation choices, and coverage. Pathway data sharing
standards such as BioPAX, SBML, and PSI MI [
        <xref ref-type="bibr" rid="ref22 ref23 ref8">8, 22,
23</xref>
        ] assist in the interchange of pathway resources, but
even resources available in the same standard still retain
differences in content and representation. Nonetheless,
our goal is to align knowledge, so that users can benefit
from a semantic union across multiple resources.
      </p>
      <p>
        To align resources, we must comprehensively
understand the types of differences one may encounter. Stobbe
et al. have made an excellent start in this direction,
providing numerous examples and descriptions of the
sorts of differences among metabolic pathway resources
[
        <xref ref-type="bibr" rid="ref13 ref24">13, 24</xref>
        ]. Here, we extend this work, aiming at a typology
of mismatches among pathway resources. In particular,
we describe and give examples of mismatches in (a)
annotation, (b) existence, (c) reaction semantics, and (d)
granularity. By classifying mismatches, we enable the
better understanding and discussion of resource
differences, and allow for improved consensus formation in
multiple pathway resource applications.
      </p>
      <p>We also present some results in quantifying annotation
mismatches between two popular human pathway
resources: HumanCyc and Reactome. Results demonstrate
the pervasiveness of representational differences and
suggest further work towards consensus pathway
representations. Understanding the types of mismatches that
exist between resources is a first step towards expanding
and deriving the full benefit of our pathway knowledge.</p>
      <p>II. MISMATCHES IN PATHWAY RESOURCES: A</p>
      <p>TYPOLOGY
To provide examples of mismatches, we retrieved
reaction representations from HumanCyc, KEGG,
PANTHER, and Reactome. Fig. 1 shows several different
representations of a step of glycolysis in Homo sapiens:
the conversion of phosphoenolpyruvate and ADP to
pyruvate and ATP modulated by the enzyme pyruvate
kinase. In this single, well-studied biochemical reaction,
we see a variety of important mismatches, of which a
subset are described below.</p>
      <sec id="sec-1-1">
        <title>A. Annotation</title>
        <p>
          We first consider annotation mismatches on the
participating physical entities. Inconsistencies arise when
two pathway resources refer to the same entity with
different identifiers or different names. Pyruvate is
represented by all four resources (Fig. 1), but is
annotated with two identifiers, ChEBI:15361 (HumanCyc and
PANTHER) and ChEBI:32816 (KEGG and Reactome).
The ChEBI:15361 entity “pyruvate” and ChEBI:32816
entity “pyruvic acid” are conjugate acids and bases of
one another in ChEBI. The display name for pyruvate
also differs between resources, and is given as
“pyruvate” (HumanCyc), “Pyruvate” (KEGG, PANTHER), or
“PYR” (Reactome). Differences in identifiers and names
are also seen for all other participants in this reaction.
Pathways were retrieved from Reactome v55 (http://reactome.
org) and HumanCyc v19.5 (http://humancyc.org) BioPAX3
exports and through PathwayCommons v7 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Glycolysis
pathways for KEGG and PANTHER are located at http://purl.
org/pc2/7/#Pathway 307add3cea6530288cc1016267ec055b and http:
//identifiers.org/panther.pathway/P00024 respectively and are
supplemented by the pathway diagrams at http://kegg.jp and http://
pantherdb.org.
        </p>
        <p>In order to resolve these mismatches, we must either
enforce consistent labeling of entities across resources,
or somehow infer alignment of similar but differently
annotated entities across resources. The former strategy
is usually impractical; in this case, we can infer similarity
by treating ChEBI identifiers that refer to conjugate
acid/base pairs as synonyms.</p>
        <p>A second type of annotation mismatch occurs when
entities lack cross-referenced identifiers, e.g., no
identifiers are given for ADP or ATP in PANTHER pathways.
Other features such as string name, entity relationships,
and local network topology can be used to align entities
between resources when identifiers are insufficient.</p>
      </sec>
      <sec id="sec-1-2">
        <title>B. Existence</title>
        <p>Existence refers to missing or extraneous physical
entities, reactions, relationships, or information, e.g.,
entities that participate in a reaction or reactions that are
members of a pathway in one resource but not another,
or a connection between two reactions that occurs in
one resource but not another. In Reactome, for example,
the conversion of fructose 6-phosphate to fructose
2,6biphosphate is a reaction in the glycolysis pathway. This
reaction is not included in the glycolysis pathway of
the other three resources. Although the reaction involves
entities that participate in glycolysis, there is uncertainty
in whether it is important to the overall process.</p>
        <p>
          Another example of an existence mismatch is the
inclusion of H+ in the conversion of phosphoenolpyruvate
to pyruvate in HumanCyc (Fig. 1). The ion is included
in order to balance reaction charge, but according to
BioPAX3 documentation, reaction participants should be
neutral and ions such as H+ and Mg2+ are not
recommended for inclusion [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Other potential existence
mismatches could occur if one resource lacks or is missing
relevant information about a relationship between two
entities, or one resource specifically negates the existence
of a relationship asserted in another resource.
        </p>
        <p>Existence mismatches can be resolved by taking the
most common representation between many resources
(democratic) or by integrating all possible
representations (exhaustive). Although an exhaustive consensus
method is unlikely to leave out information, it may,
however, produce a large and unwieldy alignment.</p>
      </sec>
      <sec id="sec-1-3">
        <title>C. Reaction semantics</title>
        <p>
          Many differences in reaction representation have been
described in Stobbe et al, such as using the terms left
and right, product and substrate, and input and output to
describe participants in reactions [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. In BioPAX, the
properties conversionDirection, stepDirection, left, and
        </p>
        <p>
          Fig. 2: The oxidative decarboxylation of isocitrate can be represented
as a two-step process with an oxalosuccinate intermediary (left) and
as a one-step process (right).
right are used to indicate reaction direction, as well as
the identities of reactants and products [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. In KEGG,
PANTHER, and Reactome, phosphoenolpyruvate is
labeled left and pyruvate right, with a reaction direction of
left-to-right. However, in HumanCyc,
phosphoenolpyruvate is labeled right and pyruvate left and the reaction
direction is right-to-left, a choice dictated by the Enzyme
Commission system [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Even though HumanCyc is in
the minority, its choice follows recommendations from
the BioPAX3 specifications [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>Resolving this type of semantic mismatch between
resources requires knowledge about the ordering of
reactions, which can be derived from pathway design,
or when reactions are taken out of context, may depend
on chemical kinetics and the reacting environment. For
well-studied pathways, a consensus ordering usually
exists. When participant left and right labels differ between
resources and ordering is unclear, the BioPAX
pathwayOrder object (designed to relay reaction topology) can
sometimes be used along with reaction direction to infer
the correct sequence.</p>
      </sec>
      <sec id="sec-1-4">
        <title>D. Granularity</title>
        <p>Mismatches of granularity occur when resources
represent the same entity or process in different degrees of
detail. One example is complex naming. Many reaction
enzymes are complexes made up of multiple protein
subunits. A reaction may be annotated with a protein
modifier, when in actuality, it is catalyzed by a
complex: a dimer, trimer etc. In Fig. 1, Reactome makes
this distinction by annotating to the “pyruvate kinase
tetramer,” a complex composed of the pyruvate kinase
protein referenced from the other three resources. Due
to the lack of standardized complex naming, however,
we often cannot easily align complexes and proteins
between resources.</p>
        <p>Another type of granularity mismatch occurs at the
reaction level. For example, one resource may choose to
represent the elementary steps of a reaction, including
intermediate chemical species. A single reaction in one
resource may be represented as several in another, with
the same ultimate inputs and outputs. For example, the
oxidative decarboxylation of isocitrate is a two step
process, modified by the enzyme isocitrate dehydrogenase,
producing -ketoglutarate from isocitrate via an
oxalosuccinate intermediate. The reaction can be represented
both with and without the intermediate species, as in
Fig. 2. In these cases, we can study the ultimate inputs
and outputs of ordered reaction sequences to determine
the appropriate reaction-level alignment.</p>
        <p>III. ANNOTATION DIFFERENCES BETWEEN TWO</p>
        <p>RESOURCES
We identify and enumerate mismatches in entity
annotation between two exemplar resources: HumanCyc and
Reactome. Compared to other mismatches, a
disagreement in the annotation of entities could be viewed as
primary: if two resources disagree on physical entities,
then they are also likely to disagree on the reactions and
pathways in which these entities participate.</p>
        <p>The most confident match between entities in two
resources arises when both identifiers and names match.
For example, the molecule ATP matches both on name
and ChEBI identifier for KEGG and Reactome (Fig. 1).
Less confident are identifier matches without string name
matches (e.g. HumanCyc and KEGG use different names
for the entity cross-referenced to UniProt:P30613), and
string name matches without identifier matches (e.g.
HumanCyc and PANTHER cross-reference to
different ChEBI identifiers for the entity named
“phosphoenolpyruvate”).</p>
        <p>From HumanCyc and Reactome, we extract all
proteins and small molecules with cross-referenced
identifiers (UniProt for proteins, ChEBI for small molecules)
and names. String names are taken as all objects of
the BioPAX properties name, displayName, and
standardName on the entity of interest. Using only string
names and UniProt/ChEBI identifiers, there are four
possible ways that entities can match between these two
resources. Entities in HumanCyc can match to entities
in Reactome on ID and name (+I/+N), ID but not name
(+I/-N), name but not ID (-I/+N), and on neither ID nor
name (-I/-N). For this initial analysis, we define string
name matches as case-insensitive equivalence, so small
differences in spelling do not produce a match.</p>
        <p>For each entity in HumanCyc, SPARQL queries are
used to determine whether a matching entity exists in
Reactome, and similarly, Reactome entities are matched</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>HumanCyc protein matches to Reactome</title>
    </sec>
    <sec id="sec-3">
      <title>Reactome protein matches to HumanCyc</title>
      <p>HumanCyc small molecule matches to Reactome
Reactome small molecule matches to HumanCyc
+N
1264
55
1319
-N
759
659
1418
+N
1495
88
1583
-N
1390
13976
15366</p>
    </sec>
    <sec id="sec-4">
      <title>Total 2023 714 2737</title>
      <p>Total
2885
14064
16949
+N
247
479
726
+N
425
890
1315
-N
140
744
884
-N
276
1300
1576</p>
    </sec>
    <sec id="sec-5">
      <title>Total 387 1223 1610</title>
    </sec>
    <sec id="sec-6">
      <title>Total</title>
      <p>701
2190
2891
+I
-I</p>
    </sec>
    <sec id="sec-7">
      <title>Total +I -I</title>
    </sec>
    <sec id="sec-8">
      <title>Total +I -I</title>
    </sec>
    <sec id="sec-9">
      <title>Total +I -I</title>
    </sec>
    <sec id="sec-10">
      <title>Total</title>
      <p>to HumanCyc entities. Resulting matches for proteins
are given in Table I. Out of 2737 unique HumanCyc
proteins, 2078 (75.9%) match to Reactome entities using
identifiers and/or string names. Out of 16949 unique
Reactome proteins, 2973 (17.5%) match to a HumanCyc
protein on identifiers and/or name. Reactome references
many protein isoforms, causing the large imbalance in
unique protein counts between the two resources. These
match ratios are illustrated in Fig. 3.</p>
      <p>Table II shows matches for small molecules. In
HumanCyc, 866 (53.8%) out of 1610 small molecules
match on annotation to an entity in Reactome. In
Reactome, 1591 (55.0%) out of 2891 small molecules match
to entities in HumanCyc, with a large proportion (890
out of 1591) matching on string names only.</p>
      <p>
        Cross-referenced identifiers are the gold standard of
matching between two resources. Therefore, groups
+I/+N and +I/-N likely consist of true matches. Group
-I/+N can be used to learn about representational
differences. Some of the cross-references for entities in
this group point to secondary accession identifiers, which
redirect to other identifiers in the same database. For
example, UniProt:A0AVP9 redirects to UniProt:Q8IWU4,
the entity Zinc transporter 8. For small molecules only,
we also find annotation to ChEBI conjugate acids or
bases (e.g., HumanCyc annotates with ChEBI:456216
(ATP(3-)), a conjugate base of ChEBI:16761 (ATP),
which is used in Reactome), or annotation to tautomers
(e.g., ChEBI: 16828 and ChEBI:57912 for L-tryptophan
and the L-tryptophan zwitterion respectively).
Annotation mismatches of the above subtypes are detected
by querying the UniProt or ChEBI APIs using the
BioServices 1.4.8 Python package [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>Within -I/+N matches, the 55 HumanCyc and 88
Reactome proteins had 208 pairwise string name matches. Of
these, 28 pairs had cross-referenced identifiers that are
UniProt secondary accession IDs, indicating that they
likely refer to the same entity. We could not confirm
the identities of the other 180 pairs through UniProt
accession identifiers. For small molecules in the -I/+N
group, the 479 HumanCyc and 890 Reactome molecules
had 1869 pairwise string name matches. Of these, at
least 1506 pairs referred to similar entities. Annotation
to ChEBI conjugate acids or bases accounted for the
majority of these (1122), followed by annotation with
ChEBI tautomer IDs (301), and ChEBI secondary
accession numbers (83).</p>
    </sec>
    <sec id="sec-11">
      <title>IV. DISCUSSION</title>
      <p>In order to reduce redundancy and errors when merging
information from different knowledge bases, we must
correctly align entities and other assertions between
resources. Entity alignment is a necessary first step
before we can clarify higher-order concepts such as
complexes, reactions, and pathways. As demonstrated using</p>
      <p>HumanCyc and Reactome, many proteins and small
molecules can be matched between two resources using
annotation features such as cross-referenced identifiers
and string names. Among entities that share only string
names, many have related identifiers that can be matched
computationally. Related identifiers can be used to help
improve the accuracy of annotations.</p>
      <p>Moving beyond annotation, other issues of semantics
and granularity come into play. For future work, we
intend to incorporate other features, such as entity
relationships and graph properties like degree and bipartite
connectivity to assist in entity alignment.</p>
      <p>Several limitations exist in this work. First, we only
compared entities between two pathway resources,
HumanCyc and Reactome. We expect to expand our
analysis to include other resources as well. Although some
of our current methods rely on BioPAX, our general
ideas about physical entities and their annotations can
be applied to data represented using other biological
pathway knowledge standards.</p>
      <p>Another limitation arises in the way we identify
annotation mismatches. We only assessed proteins and small
molecules with UniProt or ChEBI identifiers, excluding
those entities without cross-references or with
crossreferences to other databases. This was partially for
simplicity and partially to limit the size of the
comparison problem. For example, an agreement on one set
of identifiers and a disagreement on another yields yet
another class of mismatches.</p>
      <p>Lastly, we were limited by our use of 100% string
name matching to identify potential matched entities.
By doing so, we limit our ability to detect positive
matches and yield more conservative results, e.g.,
“fructose 1,6 bisphosphate” does not match to “D-fructose
1,6-bisphosphate”; the second is a stereoisomer of the
first (generic) molecule, and they may play similar
roles in reactions. Fuzzy string matches may perform
better. However, we want to minimize the false positive
rate, e.g., “fructose 1,6-bisphosphate” and “fructose
2,6bisphosphate” only differ by one character but refer to
different molecules. With these caveats, the typology we
present affords an opportunity to test different algorithms
for the systematic alignment of pathway resources.</p>
    </sec>
    <sec id="sec-12">
      <title>V. CONCLUSION</title>
      <p>The complexity of pathway content is a barrier to
resource integration, but as described above, we are also
challenged by representational and content differences.
Standards like BioPAX help clarify some differences
between resources, but they do not solve all problems
of interoperability. In order to draw from the spectrum
of knowledge we have built as a community, the content
of these resources must be aligned and integrated into
something greater than the parts. Doing so involves
identifying the differences between resources, and
resolving those differences to understand shared meaning.
Our results show that a sizable portion of physical
entities can be aligned between pathway resources using
existing cross-referenced identifiers and string names.
However, annotation features alone are likely insufficient
for matching a majority of entities between resources.
Knowledge of entity relationships, reaction semantics,
granularity, and more about these resources is necessary
to create and evaluate potential alignments. Much of the
work can be done computationally, and the typology
above should guide the engineering of future matching
algorithms.</p>
      <p>To align and integrate knowledge across resources, the
research community must have strategies for resolving
these different sorts of mismatches. Some mismatches,
such as those of annotation, can largely be resolved
using the existing data. Other issues of semantics, such
as differences in how standard languages are used to
express the same knowledge, pose a bigger challenge.
Resource developers should be allowed to make
different choices in knowledge representation. However, this
flexibility should not come at the cost of increased error
or decreased interoperability. A better understanding of
how specific mismatches occur will provide an incentive
for resources to work toward interoperable data and
representations.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGEMENTS</title>
      <p>The authors thank Peter Karp for helpful comments on
an early draft of this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mundo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Haw</surname>
          </string-name>
          et al.
          <article-title>The reactome pathway knowledgebase</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>42</volume>
          (Database issue):
          <fpage>D472</fpage>
          -
          <lpage>477</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wagg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krummenacker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Karp</surname>
          </string-name>
          .
          <article-title>Computational prediction of human metabolic pathways from the complete human genome</article-title>
          .
          <source>Genome Biology</source>
          ,
          <volume>6</volume>
          (
          <issue>R2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kanehisa</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Goto</surname>
          </string-name>
          . Kegg:
          <article-title>Kyoto encyclopedia of genes and genomes</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>28</volume>
          :
          <fpage>27</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kejariwal</surname>
          </string-name>
          et al.
          <article-title>Panther: a library of protein families and subfamilies indexed by function</article-title>
          .
          <source>Genome Res</source>
          ,
          <volume>13</volume>
          :
          <fpage>2129</fpage>
          -
          <lpage>2141</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Riutta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nunes</surname>
          </string-name>
          et al.
          <article-title>Wikipathways: capturing the full diversity of pathway knowledge</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>44</volume>
          (
          <issue>D1</issue>
          ):
          <fpage>D488</fpage>
          -
          <lpage>D494</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cary</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Sander</surname>
          </string-name>
          .
          <article-title>Pathguide: a pathway resource list</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>34</volume>
          (Database issue):
          <fpage>D504</fpage>
          -
          <lpage>506</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cerami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gross</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Demir</surname>
          </string-name>
          et al.
          <article-title>Pathway commons, a web resource for biological pathway data</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>39</volume>
          (Database issue):
          <fpage>D685</fpage>
          -
          <lpage>690</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Demir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cary</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Paley</surname>
          </string-name>
          et al.
          <article-title>The biopax community standard for pathway data sharing</article-title>
          .
          <source>Nature Biotechnology</source>
          ,
          <volume>28</volume>
          (
          <issue>9</issue>
          ):
          <fpage>935</fpage>
          -
          <lpage>42</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamburov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wierling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lehrach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Herwig</surname>
          </string-name>
          .
          <article-title>Consensuspathdb - a database for integrating human functional interaction networks</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>37</volume>
          (Database issue):
          <fpage>D623</fpage>
          -
          <lpage>628</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Rho</surname>
          </string-name>
          et al.
          <article-title>hipathdb: a human-integrated pathway database with facile visualization</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>40</volume>
          (Database issue):
          <fpage>D797</fpage>
          -
          <lpage>802</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krestyaninova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hermjakob. R spider:</surname>
          </string-name>
          <article-title>a network-based analysis of gene lists by combining signaling and metabolic pathways from reactome and kegg databases</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <volume>38</volume>
          (
          <issue>Web Server issue</issue>
          ):
          <fpage>W78</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Soh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Wong</surname>
          </string-name>
          .
          <article-title>Consistency, comprehensiveness, and compatibility of pathway databases</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>11</volume>
          :
          <fpage>449</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Houten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. van Kampen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Moerland</surname>
          </string-name>
          .
          <article-title>Critical assessment of human metabolic pathway databases: a stepping stone for future integration</article-title>
          .
          <source>BMC Systems Biology</source>
          ,
          <volume>5</volume>
          :
          <fpage>165</fpage>
          -
          <lpage>183</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Travers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kothari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caspi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Karp</surname>
          </string-name>
          .
          <article-title>A systematic comparison of the metacyc and kegg pathway databases</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>14</volume>
          :
          <fpage>112</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          .
          <article-title>Comparison of human cell signaling pathway databases - evolution, drawbacks and challenges</article-title>
          . Database, page
          <year>bau126</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and L. Cheng.
          <article-title>Integrating various resources for gene name normalization</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>7</volume>
          (
          <issue>9</issue>
          ):e43558,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wholgemuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haldiya</surname>
          </string-name>
          , E. Willighagen,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kind</surname>
          </string-name>
          , and
          <string-name>
            <surname>O. Fiehn.</surname>
          </string-name>
          <article-title>The chemical translation service - a web-based tool to improve standardization of metabolomic reports</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>26</volume>
          (
          <issue>20</issue>
          ):
          <fpage>2647</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ay</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kahveci</surname>
          </string-name>
          .
          <article-title>Submap: aligning metabolic pathways with subnetwork mappings</article-title>
          .
          <source>Journal of Computational Biology</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <fpage>219</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alberich</surname>
          </string-name>
          , M. Llabre´s, D. Sa´nchez, M. Simeoni, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tuduri</surname>
          </string-name>
          .
          <article-title>Mp-align: alignment of metabolic pathways</article-title>
          .
          <source>BMC Systems Biology</source>
          ,
          <volume>8</volume>
          :
          <fpage>58</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Petrochilos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shojaie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gennari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Abernethy</surname>
          </string-name>
          .
          <article-title>Using random walks to identify cancer-associated modules in expression data</article-title>
          .
          <source>BioData Mining</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <fpage>17</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Morgat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coissac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coudert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Axelsen</surname>
          </string-name>
          , G. Keller, and
          <string-name>
            <given-names>A.</given-names>
            <surname>Bairoch</surname>
          </string-name>
          et al.
          <article-title>Unipathway: a resource for the exploration and annotation of metabolic pathways</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>40</volume>
          (Database issue):
          <fpage>D761</fpage>
          -
          <lpage>769</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hucka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Finney</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Sauro</surname>
          </string-name>
          et al.
          <article-title>The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>19</volume>
          (
          <issue>4</issue>
          ):
          <fpage>524</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hermjakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Montecchi-Palazzi</surname>
          </string-name>
          , and G. Bader et al.
          <article-title>The hupo psi's molecular interaction format - a community standard for the representation of protein interaction data</article-title>
          .
          <source>Nature Biotechnology</source>
          ,
          <volume>22</volume>
          :
          <fpage>177</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stobbe</surname>
          </string-name>
          , G. Jansen,
          <string-name>
            <given-names>P.</given-names>
            <surname>Moerland</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. van Kampen</surname>
          </string-name>
          .
          <article-title>Knowledge representation in metabolic pathway databases</article-title>
          .
          <source>Brief Bioinform</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ):
          <fpage>455</fpage>
          -
          <lpage>470</lpage>
          , 2014 May.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>BioPAX</given-names>
            <surname>Workgroup</surname>
          </string-name>
          .
          <article-title>Biopax - biological pathways exchange language, level 3, release version 1 documentation</article-title>
          .
          <source>July</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cokelaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pultz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Harder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Serra-Musach</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>J. SaezRodriguez.</surname>
          </string-name>
          <article-title>Bioservices: a common python package to access biological web services programmatically</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>29</volume>
          (
          <issue>24</issue>
          ):
          <fpage>3241</fpage>
          -
          <lpage>2</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>