<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GOfox: Semantics-based simplified hierarchical classification and interactive visualization to support GO enrichment analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edison Ong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yongqun He</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Michigan</institution>
          ,
          <addr-line>Ann Arbor, Michigan</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>Gene Ontology (GO)-based statistical enrichment analysis is a popular approach to identify statistically enriched biological processes, molecular functions, and cellular components that are associated with a list of genes. However, such GO enrichment analysis often generates a large number of enriched GO terms that are difficult to interpret and analyze. To address this issue, we developed GOfox, a web tool that utilizes OWL-based ontology semantics and RDF triple store SPARQL queries to generate full or simplified hierarchical GO subsets to classify and display enriched GO terms. GOfox integrates and extends features from OntoFox and Ontobee, two ontology tools developed in the laboratory. GOFox also includes a newly developed algorithm for generating simplified hierarchical classification by considering the multiple inheritance of GO. Furthermore, GOfox provides an interactive visualization that supports GO subset tree exploration and term editing. GOfox is freely available at the website: http://gofox.hegroup.org/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        A biological/biomedical ontology is a set of computer and
human-interpretable terms and relations that represents
entities in a biological/biomedical domain and how they relate
to each other. Hundreds of biological ontologies have been
developed. The most widely used biological ontology is the
Gene Ontology (GO), which systematically and
semantically represents three major attributed associated with gene
products: Biological Processes (BP), Molecular Function
(MF), and Cellular Components (CC)
        <xref ref-type="bibr" rid="ref1">(Ashburner et al.,
2000)</xref>
        . One major GO application is GO-based statistical
enrichment analyses. The rationale of such an enrichment
analysis is that given a group of genes, the co-functioning
genes should have a higher or enriched potential to be
identified as a relevant group using high throughput
technologies (e.g., microarrays and RNA-Seq). Since often hundreds
(or even more) of enriched terms are detected, the linear
output of enriched terms can be very large and
overwhelming, resulting in diluted focus on the analysis of related
terms.
      </p>
      <p>
        To address the ever increasing number of enriched GO
terms resulting from high throughput studies, we developed
GOfox to support GO enrichment analysis through
integrating and extending the features of OntoFox
        <xref ref-type="bibr" rid="ref2">(Xiang et al.,
2010)</xref>
        and Ontobee (Xiang et al., 2011). OntoFox is able to
fetch ontology terms and axioms. OntoFox includes several
semantics algorithms for extracting different levels of
intermediate layer terms between user-selected terms and a
top level term of the ontology
        <xref ref-type="bibr" rid="ref2">(Xiang et al., 2010)</xref>
        . Ontobee
is the default OBO ontology linked data server that
facilitates ontology data sharing, visualization, query, integration,
and analysis (Xiang et al., 2011). Ontobee also supports
ontology visualization including the hierarchy, definition
and annotations. By integrating and extending the features
of OntoFox and Ontobee, GOfox is able to represent the
enriched GO terms in an interactive hierarchical layout
along with term-related information, and it allows users to
manually modify the summarized enrichment result.
Considering the multiple inheritance strategy used in GO
development, GOfox developed a new algorithm to trim down the
size of the enriched subset tree of GO. In addition, GOfox
retrieves and displays related information such as definition,
database cross references and comments, etc. of the selected
GO term from Ontobee. This report provides the first time
introduction of the GOfox to help researchers better
visualize and analyze the results of GO gene enrichment studies.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>GOFOX SYSTEM OVERALL DESIGN</title>
      <p>The overall design and workflow is displayed in Fig. 1.
Using a web form shown in Fig. 2, a user can input enriched or
interested GO terms along with the p-values. Then the user
can define a P-values cutoff (or another cutoff) and how
intermediates are treated. After receiving the user’s request,
the GOfox server will extract a subset of GO that contains
the input terms and related GO terms using PHP, Java and
SPARQL. Specifically, the server queries against He
Group’s RDF triple store using SPARQL and retrieves a
subset of GO. The query results will be in RDF/XML
format and will be reformatted to the OWL format using OWL
API (http://owlapi.sourceforge.net/). Then, based on the
user’s preference, GOfox will run simplification algorithm
and generate results for downloading, visualization, and
editing (Fig. 1). The results will be temporarily stored in He
group RDF triple store and destroyed in a regular basis.</p>
      <p>
        Fig. 1. GOfox program architecture and workflow design.
FOR
The new GOfox algorithm “Include Computed Simplified
Intermediates” (SIM) is developed on the basis of OntoFox
“Include Computed Intermediates” (COM). The COM
basically removes all intermediate GO terms that match the
following rules: 1) the intermediate GO term is not included in
the user’s input; 2) the intermediate GO term has only one
parent and one children GO term
        <xref ref-type="bibr" rid="ref2">(Xiang et al., 2010)</xref>
        .
Although COM works well for most ontologies, it often does
not generate ideal results for ontologies (e.g., GO) that have
multiple inheritance. SIM is developed to resolve this issue.
      </p>
      <p>SIM first goes through the COM steps, and the COM
results are further simplified by selectively removing some
intermediate terms that have multiple parents (e.g., multiple
inheritance) based on the following 3 steps. First of all, SIM
reformats the OWL-formatted results by removing indirect
subclass relationships. For example, the subclass axiom:
(regulates) some (transcription, DNA-templated) will be
removed because the parent-children relationship is not a
direct ‘is a’ relationship. Second, SIM removes intermediate
GO terms that match the following rules: 1) the intermediate
GO term is not included in the user’s input; 2) if the
intermediate GO term has less than two child GO terms within
the user’s input list (Note: here we do not consider one
parent condition as COM does). Third, SIM will further trim
down the list by removing the subclass relationships
between the GO terms and three GO top level terms of BP,
CC, and MF. The requirements of the removal are: 1) the
term is a direct subclass of BP, CC or MF; 2) there exists
another direct subclass relationship between the GO terms
and terms other than the three GO top level terms.</p>
      <p>While GOfox still keeps the COM algorithm for users
to choose, the SIM algorithm provides an extra way of
shortening the GO terms in display.
4</p>
      <p>GOFOX FEATURES AND WEB INTERFACE
GO provides many features for generating hierarchical
classification given a list of user-provided enriched GO terms.
Fig. 2 provides a demo on how GOfox works. Specifically,
a user can choose to type in GO terms or upload a text file
as input. The user can provide a standard P-value or other
Pvalues such as false discovery rate adjusted P-value. A
different value cutoff can also be used. The user can then select
an intermediates retrieval setting, including COM, SIM, or
all intermediates. GOfox will run after “Run GOfox” is
clicked (Fig. 2A).</p>
      <p>After the results are generated, GOfox provides an
Ontobee-like term visualization interface (Fig. 2B). This
feature is good for biologists who are not familiar with using
the Protégé OWL editor to display output files. The user can
interactively explore the hierarchy of retrieved GO terms
and also hide unwanted GO terms from the web page.</p>
    </sec>
    <sec id="sec-3">
      <title>5 AVAILABILITY AND LICENSE</title>
      <p>GOfox is freely available on: http://gofox.hegroup.org/.
With the license of Apache License 2.0, the source code is
released on Github: https://github.com/ontoden/gofox.</p>
    </sec>
    <sec id="sec-4">
      <title>6 SUMMARY</title>
      <p>GOfox is a simplified hierarchical classification tool to help
user interpret the results of GO enrichment analysis. GOfox
addresses a critical issue. i.e., the difficulty to visualize,
select and further analyze the increased number of enriched
GO terms from the popular GO enrichment analysis studies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blake</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolinski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwight</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eppig</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Issel-Tarver</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasarskis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matese</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sherlock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Gene ontology: tool for the unification of biology</article-title>
          .
          <source>The Gene Ontology Consortium. Nat Genet</source>
          <volume>25</volume>
          ,
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courtot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brinkman</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruttenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>OntoFox: web-based support for ontology reuse</article-title>
          .
          <source>BMC Res Notes</source>
          <volume>3</volume>
          :
          <issue>175</issue>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruttenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>Year</year>
          ).
          <article-title>"Ontobee: A linked data server and browser for ontology terms"</article-title>
          ,
          <source>in: The 2nd International Conference on Biomedical Ontologies (ICBO): CEUR Workshop Proceedings)</source>
          , Pages
          <fpage>279</fpage>
          -
          <lpage>281</lpage>
          [http://ceur-ws.org/Vol833/paper248.pdf].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>