<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Alignment in Ecotoxicological Effect Prediction?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erik B. Myklebust</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernesto Jime´nez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiaoyan Chen</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raoul Wolf</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Knut Erik Tollefsen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Norwegian Institute for Water Research</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Norwegian University of Life Sciences</institution>
          ,
          <addr-line>A ̊s</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>SIRIUS, University of Oslo</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Oxford</institution>
          ,
          <addr-line>Oxford</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Toxicological and Risk Assessment Knowledge Graph (TERA) [1] integrates several disparate datasets relevant to ecological risk assessment and effect prediction. TERA is being used in conjunction with knowledge graph embedding models to improve the extrapolation of chemical effect data in the Norwegian Institute for Water Research (Norsk institutt for vannforskning, NIVA) [1].1 The largest publicly available repository of effect data is the ECOTOXicology knowledge base (ECOTOX) developed by the US Environmental Protection Agency [2]. The dataset consists of 940k experiments using 12k compounds and 13k species. ECOTOX contains a taxonomy (of species), however, this only considers the species represented in the ECOTOX effect data. Hence, to enable extrapolation of effects across a larger taxonomic domain, an alignment to the NCBI taxonomy have to be established. However, there does not exist a complete and public mapping set between the 47,785 ECOTOX taxa and the 2,140,344 NCBI taxa. In this paper we present the ECOTOX-NCBI alignment results of three ontology matching algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Although there does not exist a complete and public alignment between the ECOTOX
and NCBI, a partial mapping curated by experts can be obtained through the ECOTOX
Web.2 We have gathered a total of 2,321 mappings for validation purposes. We have
used three methods to align the two vocabularies: (i) LogMap system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. (ii)
AgreementMakerLight (AML) , and (iii) a baseline string matching algorithm based on
Levenshtein distance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Table 1 shows the alignment results over the ground truth samples. Note that the
results represent 1-to-1 alignments as, in our setting, it is expected an entity from
? Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
1 Knowledge Graphs at NIVA: https://github.com/NIVA-Knowledge-Graph/
2 ECOTOX search interface: https://cfpub.epa.gov/ecotox/search.cfm
Algorithm
LogMap
AML
String distance (&gt; 0:8)
Union all
Consensus (LogMap \ AML)
LogMap [ AML
# mappings</p>
      <p>Recall</p>
      <p>Precision (*)
ECOTOX to match to a single entity in NCBI, and vice-versa. Hence, 1-to-N
(respectively N-to-1) alignments were filtered according to the system computed confidence.
LogMap and AML produce mapping sets with similar recall and (estimated) precision,
with LogMap producing a larger number of mappings. The baseline matcher, as
expected, achieves both a lower recall and (estimated) precision. This shows that a simple
string matching solution may not be enough in this setting. Table 1 also shows the
results of the consensus alignment between AML and LogMap and the union of different
mapping sets. Note that the lower recall of the union is down to overconfidence in the
string distance method when 1-to-1 filtering.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>
        The used alignment techniques achieve relatively good scores for recall over the
available (incomplete) reference mappings. However, aligning such large and challenging
datasets required some preprocessing before ontology alignment systems could cope
with them. The preprocessing involved to split NCBI into manageable fragments,
leading to a set of matching subtasks instead of a single task. Thus, the alignment of
ECOTOX and NCBI has the potential of becoming a new track of the Ontology Alignment
Evaluation Initiative (OAEI)3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to push the limits of state-of-the-art systems. The
output of the different OAEI participants could be merged into a rich consensus
alignment that could become the reference to integrate ECOTOX and NCBI. At the same
time, as the alignment between ECOTOX and NCBI is not public nor complete, the
consensus mappings could also be seen as a very relevant resource to the ecotoxicology
community.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Myklebust</surname>
            ,
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tollefsen</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          :
          <article-title>Knowledge Graph Embedding for Ecotoxicological Effect Prediction</article-title>
          . In: ISWC. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. U.S. EPA:
          <article-title>ECOTOXicology knowledgebase</article-title>
          (ECOTOX) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jime</surname>
          </string-name>
          <article-title>´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Large-scale interactive ontology matching: Algorithms and implementation</article-title>
          .
          <source>In: ECAI</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          :
          <article-title>Binary Codes Capable of Correcting Deletions, Insertions and Reversals</article-title>
          .
          <source>Soviet Physics Doklady</source>
          <volume>10</volume>
          (
          <year>1966</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Algergawy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Results of the Ontology Alignment Evaluation Initiative 2019</article-title>
          . In: 14th International Workshop on Ontology Matching. (
          <year>2019</year>
          )
          <fpage>46</fpage>
          -
          <lpage>85</lpage>
          3 OAEI: http://oaei.ontologymatching.org/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>