<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Phenoscape Knowledgebase: tools and APIs for computing across phenotypes from evolutionary diversity and model organisms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>James P. Balhoff</string-name>
          <email>jbalhoff@rti.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>RTI International Research Triangle Park</institution>
          ,
          <addr-line>NC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        II. DATA INGESTION AND INFERENCE MATERIALIZATION
Phenoscape curators annotate character by taxon matrices
associated with phylogenetic publications using the Phenex
software [9], following the curation process described by
Dahdul et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The resulting NeXML data files, along with
tab-delimited data dumps obtained from model organism
databases, are translated into OWL models (including
definitions of EQ phenotypes) using the Phenoscape data
ingest pipeline (phenoscape-owl-tools, available with all other
described software under an MIT open source license in the
Phenoscape GitHub repository, https://github.com/
phenoscape). OWL transformation code for individual data
sources is kept concise and readable by using a Scala-based
domain specific language for OWL axioms (Scowl) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Next,
all OWL data files, along with referenced ontologies, are
processed to standardize property IRIs; in many cases
community ontologies use non-standard or competing IRIs for
Phenoscape project team
http://phenoscape.org/wiki/Acknowledgments
the “same” property, e.g. our system accounts for ten variants
of the part_of property. We rename these, rather than assert
equivalence axioms, to simplify downstream reasoning and
querying.
      </p>
      <p>
        Based on the content of the data and reference ontologies,
several “ontologies” are programmatically generated, for the
purpose of precomputing inferred concept hierarchies
supporting various Phenoscape use cases: 1) materialization of
the transitive closure for selected properties; 2) generation of
grouping phenotypes for semantic similarity queries; 3)
generation of “absence” concepts for custom negation
reasoning. The ELK reasoner [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is then used to compute the
inferred classification hierarchy, and inferred subclass axioms
are materialized into concrete assertions. Because ELK does
not support negation reasoning, we have also implemented a
custom procedure to compute a class hierarchy for a
predetermined set of negations [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To make reasoning on this
large dataset feasible, we extract only the class axioms (Tbox)
for input into ELK. This is sufficient for our purposes since
most of the data is in the form of compositional class
expressions; however it does restrict use cases that would rely
on inference of property assertions or instance classification.
All data, including asserted and inferred class axioms as well
as instance data, are loaded into the Blazegraph RDF triple
store [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], constituting approximately 100 million RDF triples.
A separate OWL file including only class axioms (~870,000
logical axioms) is saved for later use in reasoner queries.
      </p>
    </sec>
    <sec id="sec-2">
      <title>III. SEMANTIC SIMILARITY PRECOMPUTATION</title>
      <p>Phenotypic profiles for evolutionarily variable taxonomic
nodes and model organism genes are computed as described in
[7], and loaded into Blazegraph. The inferred phenotype class
hierarchy is used to precompute semantic similarity scores
between all pairs of evolutionary variation profiles and gene
phenotype profiles, using an information content-based metric.
The set of pair comparisons is broken into chunks and
processed in parallel on a compute cluster to reduce
computation time. The resulting set of similarity scores, along
with computed statistical support, is loaded into Blazegraph,
resulting in a final triple store totaling approximately 300
million RDF triples.</p>
    </sec>
    <sec id="sec-3">
      <title>IV. WEB SERVICE INTERFACES Access to data in the Phenoscape KB is provided by two web service applications, implemented in Scala using the Spray HTTP toolkit [15]. The first, Owlery, provides a generic JSON</title>
      <p>ontologies
format API to an any OWL API-based reasoner; for the
Phenoscape KB we use ELK, loaded with extracted class
axioms from the ontologies and data. We use Owlery to
support web application queries that require reasoning on
arbitrary OWL class expressions. Owlet provides web services
for description logic queries (obtaining subclasses and
superclasses), and also supports reasoner-based query
expansion using our Owlet package. The Owlery API is
documented at http://docs.owlery.apiary.io/. The second web
service application, phenoscape-kb-services, is the primary
public Phenoscape API and provides Phenoscape
applicationspecific services such as annotation query, semantic similarity,
annotation support for presence/absence inference, and term
info. Phenoscape-kb-services obtains most results via SPARQL
query to the Blazegraph triple store. For use cases requiring
computation by the ELK reasoner, SPARQL queries with
embedded OWL expressions are first expanded using the
Owlet service provided by the Owlery API before being
submitted to Blazegraph. The phenoscape-kb-services API
returns most results in both JSON and tab-delimited text
formats. Documentation for the phenoscape-kb-services API
can be found at http://docs.phenoscapekb.apiary.io/.</p>
    </sec>
    <sec id="sec-4">
      <title>V. APPLICATIONS</title>
      <p>
        A public web user interface for the Phenoscape
Knowledgebase is available at http://kb.phenoscape.org/. The
web interface is a client-side browser application developed in
the AngularJS JavaScript framework [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The application is
implemented entirely upon web service calls to the
phenoscape-kb-services API, ensuring that all functionality is
available via a documented, public API. Using the web
application, researchers can query evolutionary descriptions
and gene phenotypes relevant to particular structures and
qualities, search for taxonomic groups exhibiting variation
similar to the phenotypic effects of a gene of interest [7] (and
vice versa), and export synthetic presence/absence
supermatrices using the OntoTrace system [8]. Building upon
the same web service API, we have also implemented an R
package, rPhenoscape, which makes some of the functionality
of the KB available within the R statistical computing
environment.
Phenoscape web UI R
      </p>
      <p>KB web service API
Blazegraph
triple store</p>
      <p>Owlery
API
ELK
RDF (al )</p>
      <p>Tbox axioms
Semantic similarity
Precompute scores
for gene x taxon
comparisons</p>
    </sec>
    <sec id="sec-5">
      <title>VI. CONCLUSION</title>
      <p>The Phenoscape Knowledgebase architecture illustrates one
approach to integration of multiple datasets in a rich
ontological framework. Deriving the full benefit of the
sophisticated knowledge represented in OBO library ontologies
will often require application of automated reasoners and
programmatic generation of axioms and concepts to facilitate
particular use cases. Here we have provided an overview of the
mix of reusable components and special purpose code we have
developed to support Phenoscape; it is our hope that continued
evolution of the Phenoscape KB architecture will result in
further identification and development of reusable tools which
can support similar efforts.
1.
2.
3.
4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Genome</given-names>
            <surname>Biol</surname>
          </string-name>
          .
          <year>2010</year>
          ;
          <volume>11</volume>
          : R2. doi:
          <volume>10</volume>
          .1186/gb-2010
          <source>-11-1-r2</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>ZFIN: enhancements and updates to the Zebrafish Model Organism Database</article-title>
          .
          <source>Nucl Acids Res</source>
          .
          <year>2011</year>
          ;
          <volume>39</volume>
          :
          <fpage>D822</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gkq1077
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <year>2015</year>
          ;43:
          <fpage>D726</fpage>
          -
          <lpage>36</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gku967
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <year>2015</year>
          ;43:
          <fpage>D756</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gku956
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Köhler</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doelken</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firth</surname>
            <given-names>HV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>BailleulForestier</surname>
            <given-names>I</given-names>
          </string-name>
          , et al.
          <article-title>The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <year>2014</year>
          ;
          <volume>42</volume>
          :
          <fpage>D966</fpage>
          -
          <lpage>74</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gkt1026
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Mol Biol Evol</source>
          .
          <year>2016</year>
          ;
          <volume>33</volume>
          :
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1093/molbev/msv223
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Genesis</surname>
          </string-name>
          .
          <year>2015</year>
          ;
          <volume>53</volume>
          :
          <fpage>561</fpage>
          -
          <lpage>571</lpage>
          . doi:
          <volume>10</volume>
          .1002/dvg.22878
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Syst</given-names>
            <surname>Biol</surname>
          </string-name>
          .
          <year>2015</year>
          ; doi:10.1093/sysbio/syv031
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <year>2010</year>
          ;
          <article-title>5: e10500</article-title>
          . doi:
          <volume>10</volume>
          .1371/journal.pone.0010500
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dahdul</surname>
            <given-names>WM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balhoff</surname>
            <given-names>JP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engeman</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grande</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hilton</surname>
            <given-names>EJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kothari</surname>
            <given-names>C</given-names>
          </string-name>
          , et al.
          <article-title>Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature</article-title>
          .
          <source>PLoS One</source>
          .
          <year>2010</year>
          ;
          <article-title>5: e10708</article-title>
          . doi:
          <volume>10</volume>
          .1371/journal.pone.0010708
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Balhoff</surname>
            <given-names>JP</given-names>
          </string-name>
          .
          <article-title>Scowl: a Scala DSL for programming with the OWL API</article-title>
          .
          <source>The Journal of Open Source Software</source>
          .
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .21105/joss.00023
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kazakov</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krötzsch</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simančík F. The Incredible ELK. J Automat Reason</surname>
          </string-name>
          . Springer Netherlands;
          <year>2013</year>
          ;
          <volume>53</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10817-013-9296-3
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Balhoff</surname>
            <given-names>JP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dececchi</surname>
            <given-names>TA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mabee</surname>
            <given-names>PM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapp</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Presence-absence reasoning for evolutionary phenotypes</article-title>
          .
          <source>Proceedings of Phenotype Day of the Bio-ontologies SIG at ISMB</source>
          <year>2014</year>
          .
          <year>2014</year>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . Available: http://phenoday2014.bio-lark.org/pdf/11.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Blazegraph</surname>
          </string-name>
          . [Internet].
          <source>[cited 29 May</source>
          <year>2016</year>
          ]. Available: https:// www.blazegraph.com
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. spray | REST/HTTP for your Akka/Scala Actors [Internet].
          <source>[cited 29 May</source>
          <year>2016</year>
          ]. Available: http://spray.io
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>AngularJS - Superheroic JavaScript MVW Framework</surname>
          </string-name>
          [Internet].
          <source>[cited 29 May</source>
          <year>2016</year>
          ]. Available: https://angularjs.org
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>