<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The integrative use of anatomy ontology and protein-protein interaction networks to study evolutionary phenotypic transitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pasan C. Fernando</string-name>
          <email>pasan.fernando@coyotes.usd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erliang Zeng</string-name>
          <email>erliang.zeng@usd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paula M. Mabee</string-name>
          <email>paula.mabee@usd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Biology Department University of South Dakota Vermillion</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>- Studying evolutionary phenotypic transitions, such as the fin to limb transition, is popular in evolutionary biology. The recent advances in next-generation technologies have accumulated large volumes of genomics and proteomics data, which can be used to analyze the genetic basis for evolutionary phenotypic transitions. Protein-protein interaction (PPI) networks can be used to predict candidate genes and identify gene modules related to evolutionary phenotypes; however, they suffer from low gene prediction accuracy. Therefore, an integrative framework was developed using PPI networks and anatomy ontology, which significantly improved the accuracy of network-based candidate gene predictions in zebrafish and mouse. This integrative framework will also be used to identify gene modules associated with the fin to limb transition and to study the changes in these modules which lead to the phenotypic change.</p>
      </abstract>
      <kwd-group>
        <kwd>- Anatomy ontology</kwd>
        <kwd>network analysis</kwd>
        <kwd>proteinprotein interactions</kwd>
        <kwd>data integration</kwd>
        <kwd>gene prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The process of evolution is accompanied by numerous
important phenotypic transitions, such as the fin to limb
transition in vertebrates, which contributed to the wealth of
phenotypic diversity observed among different species today.
Understanding the relationship between genes and their
phenotypes is important in explaining the changes in those
phenotypes. Traditionally, wet lab methods were used to
discover genes to phenotype relations. Despite the higher
accuracy of their predictions, wet lab candidate gene
prediction methods are high in resource and time
consumption, which lead to the popularity of faster
computational candidate gene predictions methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that use
the genomic and proteomic data accumulated in public
databases.
      </p>
      <p>
        The use of PPI networks for candidate gene prediction has
become popular due to the availability of large PPI datasets
for model organisms. Network analysis algorithms can be
used to analyze PPI networks and detect gene modules
corresponding to phenotypes in question [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other gene
prediction methods only discover direct gene to phenotype
relationships, but network analysis further identifies gene
interactions that are important in regulating the phenotype.
Understanding the modular structure of gene interactions is
extremely important in studying their role in the development
of phenotypes because it is the gene interactions that
determine the outcome rather than the individual genes.
      </p>
      <p>
        The biggest challenge of using PPI networks is the low
candidate gene prediction accuracy due to the low quality of
the networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The PPI networks are known to contain a
higher amount of false positive interactions, and some
networks are still incomplete [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Before using PPI networks
to study evolutionary phenotypic transitions, their quality
must be improved to obtain better results. Because we are
focusing on anatomical phenotypes, such as the pectoral fin
development and the forelimb development, we propose an
integrative framework that uses anatomy ontology to
incorporate known information about gene-phenotype
relationships in literature with the PPI networks. This
integration is expected to improve the PPI network quality and
predict candidate genes with a higher accuracy. To test this
hypothesis, we use known anatomical phenotype annotations
from mouse and zebrafish. After the evaluation, the integrated
networks will be used to detect gene modules associated with
the fin to limb transition in mouse and zebrafish, and the
modules will be compared to observe the genetic changes
corresponding to the phenotypic transition.
      </p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>
        The first step of the integrative framework is constructing
gene networks that are entirely based on the known gene to
anatomical phenotype annotations. The anatomical profiles
for mouse and zebrafish were downloaded from the Monarch
initiative data repository (https://monarchinitiative.org/),
which retrieves data from model organism databases.
Monarch initiative data is manually pre-processed to remove
unwanted annotations and the genes are annotated to Uberon
anatomy ontology terms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Uberon (http://uberon.github.io/)
is a cross-species anatomy ontology that integrates
speciesspecific anatomy ontologies, such as Mouse Anatomy
Ontology (MA) and Zebrafish Anatomy Ontology (ZFA),
which makes it suitable for evolutionary analyses involving
multiple species [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Semantic similarity scores between anatomy ontology
terms were calculated to obtain pairwise gene similarity
values for all the genes in mouse and zebrafish. Semantic
similarity is a quantitative value that represents similarity
between two ontology terms based on their location in the
ontological structure and their gene annotations [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Four
different semantic similarity methods (Lin, Resnik, Schlicker,
and Wang) were used to generate pairwise gene similarity
matrices, which in turn were used to generate gene networks
that are entirely based on the anatomy ontology annotations of
the genes (anatomy-based gene networks). These networks
were filtered using a gene similarity score cutoff to remove
interactions with low scores. In these networks, the genes with
higher similarity scores are the ones that are annotated to
similar anatomy ontology terms.
      </p>
      <p>
        The PPI networks for mouse and zebrafish were
downloaded from the STRING database
(https://stringdb.org/). Then, the PPI networks were integrated with the
anatomy-based gene networks using pairwise gene similarity
scores of the two networks in a probabilistic model. In the
integrated network, only the gene pairs that receive high
similarity scores from both the input networks have high gene
similarity scores. To assess the candidate gene prediction
performance of the integrated networks and the PPI networks,
Uberon anatomy ontology terms that have at least 10 or more
gene annotations were used from the zebrafish and mouse
anatomical profiles downloaded from the Monarch initiative
data repository. Hishigaki prediction method [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was used as
the network-based candidate gene prediction algorithm and
leave-one-out-cross-validation was used as the evaluation
technique. Receiver operating characteristic (ROC) and
precision-recall curves were generated for the comparison of
different network types. Although the goal was to compare the
integrated versus PPI networks, the anatomy-based gene
networks were also included in the comparison.
      </p>
      <p>III.</p>
    </sec>
    <sec id="sec-3">
      <title>PRELIMINARY RESULTS AND DISCUSSION</title>
      <p>The ROC and precision-recall curve comparisons for
mouse and zebrafish indicate that the integrated networks
significantly outperform the original PPI networks when
predicting candidate genes (Only the zebrafish ROC curve
comparisons of the four semantic similarity calculation
methods are shown in Fig. 1). This result is consistent among
the four semantic similarity calculation methods used. The
higher candidate gene prediction accuracy of the integrated
networks means that their network quality was increased
during the integration. Although anatomy-based gene
networks (shown in blue in Fig. 1) have the highest
performance among most of the semantic similarity
calculation methods, they are not suitable for candidate gene
prediction or identifying network modules because they only
contain genes that have at least one anatomy ontology term
annotation. This number is low compared to the integrated and
PPI networks. For instance, the zebrafish anatomy-based gene
network constructed using the Schlicker method contains
5,386 genes, whereas the corresponding integrated network
contains 12,755 genes. The integrated networks contain a
large number of unknown genes coming from PPI networks,
which can be potential candidates for anatomical phenotypes.
Therefore, integrated networks are more useful for
downstream network analysis.</p>
      <p>The integrated network with the highest performance
for mouse and zebrafish will be used for detecting gene
modules associated with the fin to limb transition. Because the
quality of the integrated networks is higher than the PPI
networks, the gene modules will be more accurate. The gene
modules for pectoral fin and pelvic fin in zebrafish will be
compared with gene modules for forelimb and hindlimb in
mouse, respectively, to identify modular changes genes during
the fin to limb transition. This work showcases how anatomy
ontology can be used to improve the quality of candidate gene
predictions and to perform efficient network analyses to study
evolutionary transitions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ulitsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Shamir</surname>
          </string-name>
          ,
          <article-title>"Network-based prediction of protein function," Molecular systems biology</article-title>
          , vol.
          <volume>3</volume>
          ,
          <issue>2007</issue>
          , p.
          <fpage>88</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            von
            <surname>Mering</surname>
          </string-name>
          et al.,
          <article-title>"Comparative assessment of large-scale data sets of protein-protein interactions,"</article-title>
          <source>Nature</source>
          , vol.
          <volume>417</volume>
          ,
          <year>2002</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Mungall</surname>
          </string-name>
          et al.,
          <article-title>"The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species,"</article-title>
          <source>Nucleic Acids Res</source>
          , vol.
          <volume>45</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>D712</fpage>
          -
          <lpage>D722</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Haendel</surname>
          </string-name>
          et al.,
          <article-title>"Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon,"</article-title>
          <source>Journal of biomedical semantics</source>
          , vol.
          <volume>5</volume>
          ,
          <issue>2014</issue>
          , p.
          <fpage>21</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Falcão</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lord</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>"Semantic Similarity in Biomedical Ontologies,"</article-title>
          <source>PLoS Comput Biol</source>
          , vol.
          <volume>5</volume>
          ,
          <issue>2009</issue>
          , p.
          <fpage>e1000443</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hishigaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nakai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tanigami</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Takagi</surname>
          </string-name>
          ,
          <article-title>"Assessment of prediction accuracy of protein function from proteinprotein interaction data,"</article-title>
          <source>Yeast</source>
          , vol.
          <volume>18</volume>
          ,
          <year>2001</year>
          , pp.
          <fpage>523</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>