<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Results of AML in OAEI 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Faria</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Booma Sowkarthiga Balasubramani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivek R. Shivaprabhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabela Mott</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catia Pesquita</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco M. Couto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel F. Cruz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADVIS Lab, Department of Computer Science, University of Illinois at Chicago</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Gulbenkian de Cieˆncia</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LaSIGE, Faculdade de Cieˆncias, Universidade de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>State</institution>
          ,
          <addr-line>Purpose, General Statement</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>AgreementMakerLight (AML) is an automated ontology matching system that was developed with both extensibility and efficiency in mind. This paper describes its configuration for the OAEI 2017 competition and discusses its results. For this OAEI edition, we built upon the instance matching foundations we laid last year, and tackled the new Hobbit track and its new evaluation platform. AML was the only system to participate in all OAEI tracks this year, and was the top performing system or among the top performing ones in nearly all tracks, including the new Hobbit track. It was awarded the IBM Research prize for the best performing system in all instance matching related tracks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the System</title>
      <p>
        1.2
For the sake of brevity, this section focuses mainly on the features of AML that are new
for this edition of the OAEI. For a complete description of AML’s matching strategy,
please refer to the results papers of the last two OAEI editions [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>1.2.1 AML-Hobbit</title>
        <p>The Hobbit track datasets required profound adaptations to AML. First, although the
ontology files were included in the training sets, in the Hobbit client only the instances
were provided to the matching systems. This meant that the datasets could not be
correctly parsed using OWL API [8], and required us to create an N-Triples parser tailored
to these datasets (i.e., with the contextual information from the ontology files
hardcoded into the parser). Second, the unusual characteristics of the matching tasks, which
involve matching traces based on their geographical points, required that we implement
dedicated data structures and matching algorithms.</p>
        <sec id="sec-2-1-1">
          <title>Linking</title>
          <p>The Linking task focused on finding equivalent traces by matching their geographical
points. The information available for points could include geographical coordinates,
address, timestamp, and velocity. The target dataset resulted from a transformation of
the source dataset, where some information was omitted and other were altered. Of
particular note was the conversion of the geographical coordinates to different coordinate
systems. This required us to do the reverse conversion to the decimal system, which we
performed during parsing.</p>
          <p>The main difficulty of the task was its size, as each trace included on average 2000
points, and the full task consisted in matching 10000 traces. An efficient matching
strategy was therefore paramount.</p>
          <p>To enable such a strategy, we adopted a HashMap-based data structure with inverted
indexes, analogous to AML’s other matching structures, but where geographical points
were used as keys. To this end, we defined a hash code for points based on the
combination of their coordinates. This made it possible to find matching points in O(1) time
and therefore match the trace datasets in O(n) time, with n being the total number of
points in the ontology with the least points. We used the address and timestamp of the
points to filter the matches, and found the velocity to be unnecessary.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Spatial</title>
          <p>The Spatial tasks focused on determining whether traces were related according to a
number of different topological relations (e.g., contains, crosses, disjoint). In this case,
the traces were given as a list of coordinate pairs corresponding to their points, and no
transformation of the data was necessary.</p>
          <p>
            To tackle these tasks, we adopted the ESRI Geometry API, which can be used for
constructing geometries and performing spatial operations and topological relationship
tests on them.
Only a few changes were made to AML’s matching strategy for the SEALS tracks since
the OAEI 2016 edition [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Ontology Parser</title>
          <p>We made a few changes to AML’s ontology parser to cope with typical omissions in
instance matching datasets, such as undeclared properties. By default, the OWL API
interprets undeclared properties to be annotation properties, which leads to erroneous
parsing of the dataset, and hinders AML’s performance.</p>
          <p>Additionally, we also modified the ontology parser to process OBO logical definitions
directly from OWL, as the new versions of the Disease and Phenotype track datasets
already included these definitions (last year they did not, and that required us to use
external files with the definitions).</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Translator</title>
          <p>We improved AML’s Translator by adding a translation to English of the input
ontologies in addition to the reciprocal translation we were already performing. This not only
increases the likelihood that a direct match can be found between ontology entities, but
also enables the use of WordNet [9].
1.3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Adaptations made for the evaluation</title>
        <p>The Hobbit submission of AML is, as a whole, an adaptation made for the evaluation, as
the specificities of the Hobbit evaluation (namely the absence of a Tbox) and the tasks
(which are almost exclusively based on spatial coordinates) demanded a dedicated
submission.</p>
        <p>In addition, as in previous years, our SEALS submission included precomputed
translations, to circumvent Microsoftr Translator’s query limit.
1.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Link to the system and parameters file</title>
        <p>AML is an open source ontology matching system and is available through GitHub:
https://github.com/AgreementMakerLight.
2
2.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Anatomy 2.2</title>
      </sec>
      <sec id="sec-3-2">
        <title>Conference</title>
        <p>AML’s result in the Anatomy track was the same as last year, with 95% precision, 93.6%
recall, 94.4% F-measure, and 83.2% recall++. It remains the best performing system in
this track.</p>
        <p>AML’s performance in the Conference track was also the same as last year. It remains
the best performing system in this track, with the highest F-measure on the full
reference alignment 1 (74%), the full reference alignment 2 (70%), and on both evaluation
modalities with the uncertain reference alignment (Discrete: 78%; Continuous: 77%).
Concerning the logical reasoning evaluation, AML had no consistency principle
violations, but did have conservativity principle violations as this is an aspect AML
deliberately doesn’t take into account given that many of these violations were empirically
found to be false positives.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Disease and Phenotype</title>
        <p>AML generated 2029 mappings in the HP-MP task, 75 of which were unique. It had the
highest F-measure according to the 2-vote silver standard, with 87.2%. In the HP-MeSH
task, it generated 5638 mappings of which 678 were unique. It also had the highest
Fmeasure according to the 2-vote silver standard, with 87.1%. In the HP-OMIM task, it
generated 6681 mappings of which 679 were unique, and was third in F-measure with
87.8%. In the DOID-ORDO task, it generated the most mappings (4779) and the most
unique mappings (1520), and as a result had a relatively low F-measure according to
the 2-vote silver standard (66.1%).
2.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Hobbit</title>
        <p>AML produced a perfect result (100% F-measure) in Linking and all Spatial tasks, with
the sole exception of the Spatial disjoint mainbox task, where it timed out. In Linking,
it had the lowest run time in both the sandbox and mainbox modalities (the other
participant timed out in the mainbox task). In Spatial, it had generally the highest run time in
the sandbox modalities, but had the lowest run time in the mainbox modality of several
tasks, which suggests that it is more scalable than the other participants.</p>
      </sec>
      <sec id="sec-3-5">
        <title>2.5 Instance Matching</title>
        <p>In the SPIMBENCH sub-track, AML obtained the second highest F-measure in the
sandbox modality (91.8%) and the highest F-measure in the mainbox modality (92.2%).
In the Doremus sub-track, AML’s results were underwhelming, with only 61.3%
Fmeasure in the Heterogeneities task and 58.2% F-measure in the False Positives Trap
task. These tasks were considerably more difficult than the homonym tasks of last year.</p>
      </sec>
      <sec id="sec-3-6">
        <title>2.6 Interactive Matching</title>
        <p>AML had an equivalent performance to last year, as we were unable to devote time to
address the issues we detected on its user interaction module. In the Anatomy dataset,
AML had the highest F-measure (95.8% with 0% errors), the second lowest number of
oracle requests, and the lowest impact of errors, with a drop in performance under 3%
between 0 and 30% errors. In the Conference dataset, it was second in F-measure with
0% errors, but first when errors were introduced (for all error rates). Despite this, it was
more impacted by errors than LogMap, due to the fact that it made considerably more
user interactions.
2.7</p>
      </sec>
      <sec id="sec-3-7">
        <title>Large Biomedical Ontologies</title>
        <p>AML had the same results as last year in this track, except that the alignment it
produced for the SNOMED-NCI whole ontologies tasks had more unsatisfiabilities. This
is a consequence of the fact that this year we opted to switch off the use of the ELK
reasoner when parsing the ontologies, due to the SPIMBENCH ontologies being
inconsistent. Although AML’s ontology parser captures most of the subclass and equivalence
relationships identified by ELK (which is why there are only differences in this task),
it doesn’t capture all of them. AML obtained either the highest or the second
highest F-measure in all tasks, and had the highest average F-measure overall with 82.7%
(ignoring the XMAP results, since this system uses the UMLS metathesaurus as
background knowledge, which is the basis of the reference alignments).
2.8</p>
      </sec>
      <sec id="sec-3-8">
        <title>Multifarm</title>
        <p>AML improved its results in matching different ontologies, and remains the system
with the highest F-measure (46%). However, its performance in matching the same
ontologies decreased, and it has only the fourth best F-measure (26%). This decrease
was reportedly due to some errors in parsing the alignments for which a confidence
higher than 1 was generated, an issue which we will investigate and address.
2.9</p>
      </sec>
      <sec id="sec-3-9">
        <title>Process Model</title>
        <p>
          AML obtained the same result as last year in the University Admission dataset, with
70.2% F-measure. This remains the highest F-measure of all OAEI and PMMC [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
participants. In the new Birth Registration dataset, it obtained the highest F-measure among
OAEI participants (42.0%), but would rank only fifth among PMMC participants.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>General comments</title>
      <sec id="sec-4-1">
        <title>Comments on the results</title>
        <p>AML was the only system to participate in all tracks this year, and was either the best
performing or among the top performing systems in nearly all tasks, including the new
Hobbit track and the new datasets in the Process Model Matching and Disease and
Phenotype tracks. AML was also consistently among the fastest systems and among
those that produced the most coherent alignments. As was the case last year, these
results reflect our continued effort to extend and improve AML while ensuring that it
remains both effective and efficient.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Comments on the OAEI test cases</title>
        <p>While we welcome the efforts of the OAEI organizers to expand it with new datasets,
we must comment on some of the issues we encountered during this year’s competition,
and suggest some possible improvements for future editions.</p>
        <p>In the new Hobbit track, even if it is understandable in a new massive venture such
as the Hobbit evaluation platform, the tardiness of the information on the submission
process and evaluation datasets hindered participation. More importantly, the fact that
Tbox data was unavailable through the platform meant that participating systems had
to be trained specifically to interpret the Hobbit Abox data, which we feel violates the
spirit of the OAEI.</p>
        <p>We were also not fully satisfied with the evaluation of the Disease and Phenotype track.
Generating silver standards from the alignments produced by the participating systems
via voting is a reasonable starting point for producing a reference alignment, but they
should not be used as-is for evaluating matching systems, as the evaluation will be
unreliable and superficial. We hope that future efforts focus on improving the evaluation
prior to adding more datasets.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In 2017, AML was the only system to participate in all tracks, and was among the best
performing systems in nearly all tasks (with the sole exception of the Instance Matching
DOREMUS sub-track). However, our efforts to participate in the new Hobbit track left
little time for making other improvements to AML, and as a result, its performance in
most tracks remained the same as last year. That said, our efforts were fully rewarded,
as AML was awarded the IBM Research prize for the best performing system in all
instance matching related tracks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>DF was funded by the EC H2020 grant 676559 ELIXIR-EXCELERATE. CP and FMC
were funded by the Portuguese FCT through the LASIGE Strategic Project
(UID/CEC/00408/2013). CP was also funded by FCT (PTDC/EEI-ESS/4633/2014).
The research of IFC, BSB and VRS was partially funded by NSF awards CNS-1646395,
III-1618126, CCF-1331800, and III-1213013, and by a Bill &amp; Melinda Gates
Foundation Grand Challenges Explorations grant.
7. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The
AgreementMakerLight Ontology Matching System. In OTM Conferences - ODBASE, pages 527–541,
2013.
8. M. Horridge and S. Bechhofer. The owl api: A java api for owl ontologies. Semantic Web,
2(1):11–21, 2011.
9. G. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM,
38(11):39–41, 1995.
10. C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, and F. M. Couto. What’s in a ”nym”?
Synonyms in Biomedical Ontology Matching. In International Semantic Web Conference
(ISWC), pages 526–541, 2013.
11. E. Santos, D. Faria, C. Pesquita, and F. M. Couto. Ontology alignment repair through
modularization and confidence-based heuristics. PLoS ONE, 10(12):e0144807, 2015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakhshandeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Borbinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dadashnia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          , et al.
          <article-title>The process model matching contest 2015</article-title>
          .
          <source>In 6th EMISA Workshop</source>
          , pages
          <fpage>127</fpage>
          -
          <lpage>155</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Palandri Antonelli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stroe</surname>
          </string-name>
          .
          <source>AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies. PVLDB</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1586</fpage>
          -
          <lpage>1589</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stroe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Caimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fabiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          .
          <article-title>Using AgreementMaker to Align Ontologies for OAEI 2011</article-title>
          .
          <source>In ISWC International Workshop on Ontology Matching (OM)</source>
          , volume
          <volume>814</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>114</fpage>
          -
          <lpage>121</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nanavaty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Balasubramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taheri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          .
          <article-title>AML results for OAEI 2015</article-title>
          . In Ontology Matching Workshop. CEUR,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Balasubramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Curado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          .
          <article-title>OAEI 2016 results of AML</article-title>
          . In Ontology Matching Workshop. CEUR,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          .
          <article-title>Automatic Background Knowledge Selection for Matching Biomedical Ontologies</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>9</volume>
          (
          <issue>11</issue>
          ):
          <fpage>e111226</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>