<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OxO - a gravy of ontology mapping extracts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Jupp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Liener</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sirarat Sarntivijai</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Vrousgou</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony Burdett</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helen Parkinson</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samples, Phenotypes and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory</institution>
          ,
          <addr-line>Cambridge</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data is increasingly being annotated and described using controlled terminology or ontology standards. There are often multiple ontologies for any given domain, so the ability to map related or similar concepts is a necessary tool for data integration. Several techniques and tools have emerged that support ontology mapping, but finding and harmonising mappings from multiple sources remains a challenge for users. To address this we have developed OxO, a repository of known ontology mappings and cross-references extracted from multiple datasources. OxO provides a Web interface and API to access mappings and the functionality for users to upload their own sets of mappings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Getting access to mappings is extremely valuable for large
data integration efforts, such as the OpenTargets
        <xref ref-type="bibr" rid="ref2">(Koscielny,
An et al. 2017)</xref>
        platform, that pools data supporting
genedisease/phenotype associations from a wide variety of
resources. OpenTargets requires that disease and phenotype
information be normalised to terms in the Experimental
Factor Ontology
        <xref ref-type="bibr" rid="ref3">(Malone, Holloway et al. 2010)</xref>
        . However, data
coming into the platform may have been pre-annotated with
a different ontology. For example, OpenTargets subsumes
information from UniProt where disease terms are annotated
with OMIM labels, while it identifies Orphanet Rare
Disease Ontology (ORDO), which is imported by EFO, as the
preferred standard for rare diseases in the platform. The
OMIM accessions associated with UniProt diseases need to
be consolidated to ORDO terms to unify the representation
of disease in the platform. The Monarch Initiative
        <xref ref-type="bibr" rid="ref4">(Mungall,
Koehler et al. 2016)</xref>
        makes extensive use of cross-reference
mappings to build a unified representation of disease and the
Pistoia alliance is currently running the Ontologies Mapping
Project to evaluate tools to support Ontology Mapping
(http://www.pistoiaalliance.org/ontologies-mapping-rfiguidelines/).
      </p>
      <p>
        A good source of ontology mappings is within the
ontologies themselves. Many of the Open Biomedical Ontologies
use the xref property to indicate a cross-reference between
entities. Whilst the semantics of xrefs are not explicitly
defined by OBO, it has predominantly been used to indicate
equivalence between terms. Mappings can be found in a
number of other places, including some dedicated resources
like the UMLS that provide a large number of mappings for
medical terminologies. Although challenges remain when
integrating term identifiers from multiple sources due to a
lack of consistency in how identifiers are reported
        <xref ref-type="bibr" rid="ref2">(McMurry et al. 2017)</xref>
        . For example both MSH:D009202 and
MESH:D009202 are both compact URIs (CURIEs) in
common use for MeSH terms.
      </p>
      <p>OxO extracts ontology mappings from multiple sources and
harmonises the identifiers to provide an integrated resource
of mappings. OxO constructs a graph of the mappings that
allows users to explore how mappings from different
resources intersect. The OxO mapping graph can be used to
explore both direct and indirect mapping between terms,
thus can be used to used to help traverse gaps where no
direct mappings exist between two ontologies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>
        OxO crawls the Ontology Lookup Service API
        <xref ref-type="bibr" rid="ref5">(Jupp et al.
2015)</xref>
        to discover mappings based on annotation properties
such as the OBO xref property. OxO uses identifiers.org, the
OBO library, and prefixcommons.org to identify
datasources using the identifier prefix. This is also used to
assign identifiers to either an ontology/terminology category
or a database category. For example, the Gene Ontology
provides cross-reference mapping to databases like
Reactome, whereas the Disease Ontology provides
crossreferences as mapping to other disease terminologies, such
as MeSH. Additional mappings of interest to OpenTargets,
such as disease terminology mappings for SNOMED-CT,
ICD, Meddra, OMIM, NCIT and MeSH are also integrated
into OxO directly from the UMLS. Curators at the EBI are
able to log into OxO and upload their own mappings and we
encourage external users to submit any mappings they have
generated to us for review and integration into OxO.
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
      <p>OxO identified 75 database sources that have records
mapped to ontology terms and 104 ontologies that have
mappings to another ontology or database. In total OxO
identified over 1.4 millions xrefs for terms where we can
automatically identify their source. There remain around 80
identifier prefixes that we are yet to map to an authoritative
source1. The OxO user interface
(http://www.ebi.ac.uk/spot/oxo) allows users to explore
mappings between terminologies and export the data in
variety of formats.
Users can use the OxO graph to traverse potential gaps in
coverage, for example, OxO found direct mappings for
“Retinoblastoma” from NCIt (NCIt:C7541) to DOID:768
and UMLS:C0035335. By expanding the search space to 2
hops, we find additional mappings from NCIt:C7541to HP,
SNOMEDCT, MEDDRA, ORPHANET, MeSH and OMIM
that are derived from a number of different sources (See
Figure 1.). The more sources that verify these mappings, the
greater the confidence the user can have that these mappings
are valid. OxO is also being integrated into the Ontology
Lookup Service user interface, so users can view mappings
for terms directly from OLS rather than the OxO interface.
1 Unmapped prefixes list https://goo.gl/xasfND</p>
    </sec>
    <sec id="sec-4">
      <title>4 CONCLUSION</title>
      <p>
        Mappings between ontologies are a useful tool for data
integration, but they are often poorly represented and
semantically ambiguous. Cross-references are used to represent a
range of mapping types including equivalence, subclass and
related in some other way. There is still much work to do
before we can infer equivalence through logical axioms
within ontologies, so these kinds of mappings will remain a
crucial tool, especially in the mapping of disease
terminologies. By bringing these mapping together and harmonising
the identifiers, OxO provides a platform for curators to
collaborate on improving the mappings that exist. We plan to
use OxO as background knowledge to feed into automated
ontology matching algorithms such as LogMap
        <xref ref-type="bibr" rid="ref1">(JiménezRuiz and Cuenca Grau 2011)</xref>
        to compute a reference set of
gold standard mapping for a number of disease ontologies.
OxO is open source and source code is available from
https://github.com/EBISPOT/OLS-mapping-service
      </p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This resource is funded in part by EMBL-EBI core funds,
CORBEL This project receives funding from the European
Union’s Horizon 2020 research and innovation programme
under grant agreement No 654248.</p>
      <p>EXCELERATE ELIXIR-EXCELERATE is funded by the
European Commission within the Research Infrastructures
programme of Horizon 2020, grant agreement number
676559. The Open Targets project and the organisers and
participants of the 2016 BioHackathon in Tsuruoka, Japan.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            , E. and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cuenca Grau</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>LogMap: Logic-Based and Scalable Ontology Matching</article-title>
          .
          <source>The Semantic Web - ISWC</source>
          <year>2011</year>
          : 10th International Semantic Web Conference, Bonn, Germany,
          <source>October 23- 27</source>
          ,
          <year>2011</year>
          , Proceedings,
          <string-name>
            <given-names>Part I. L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          et al. Berlin, Heidelberg, Springer Berlin Heidelberg:
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Koscielny</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al. (
          <year>2017</year>
          ).
          <article-title>"Open Targets: a platform for therapeutic target identification and validation</article-title>
          .
          <source>" Nucleic Acids Research</source>
          <volume>45</volume>
          (
          <issue>D1</issue>
          ):
          <fpage>D985</fpage>
          -
          <lpage>D994</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Malone</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al. (
          <year>2010</year>
          ).
          <article-title>"Modeling sample variables with an Experimental Factor Ontology."</article-title>
          <source>Bioinformatics</source>
          <volume>26</volume>
          (8):
          <fpage>1112</fpage>
          -
          <lpage>1118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          , et al. (
          <year>2016</year>
          ).
          <article-title>"k-BOOM: A Bayesian approach to ontology structure inference, with applications in disease ontology construction." http://biorxiv</article-title>
          .org/content/biorxiv/early/2016/04/15/048843.full.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Jupp S</surname>
          </string-name>
          . et al. (
          <year>2015</year>
          )
          <article-title>A new Ontology Lookup Service at EMBL-EBI</article-title>
          . In: Malone,
          <string-name>
            <surname>J.</surname>
          </string-name>
          et al. (eds.)
          <source>Proceedings of SWAT4LS International Conference</source>
          <year>2015</year>
          "
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Julie</surname>
            <given-names>McMurry</given-names>
          </string-name>
          , &amp;
          <article-title>40 additional authors (see file)</article-title>
          .
          <source>(</source>
          <year>2016</year>
          ).
          <article-title>Identifiers for the 21st century: How to design, provision, and reuse identifiers to maximize data utility and impact</article-title>
          . http://doi.org/10.5281/zenodo.163459
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>