OAEI 2016 Results of AML

       Daniel Faria1 , Catia Pesquita2 , Booma S. Balasubramani3 , Catarina Martins2 ,
        João Cardoso4 , Hugo Curado2 , Francisco M. Couto2 , and Isabel F. Cruz3
                           1
                           Instituto Gulbenkian de Ciência, Portugal
               2
                LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
      3
        ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, USA
               4
                 INESC-ID, Instituto Superior Técnico, Universidade de Lisboa


         Abstract. AgreementMakerLight (AML) is an automated ontology matching
         system based primarily on element-level matching and on the use of external
         resources as background knowledge. This paper describes its configuration for
         the OAEI 2016 competition and discusses its results.
         For this OAEI edition, we tackled instance matching for the first time, thus ex-
         panding the coverage of AML to all types of ontology matching tasks. We also
         explored OBO logical definitions to match ontologies for the first time in the
         OAEI.
         AML was the top performing system in five tracks (including the Instance and
         instance-based Process Model tracks) and one of the top performing systems in
         three others (including the novel Disease and Phenotype track, in which it was
         one of three prize recipients).


1      Presentation of the System
1.1     State, Purpose, General Statement
AgreementMakerLight (AML) is an automated ontology matching system derived from
AgreementMaker [3, 4] and designed to tackle large-scale matching problems [6]. It is
based primarily on lexical matching techniques, with an emphasis on the use of external
resources as background knowledge.
This year, our development of AML was focused primarily on tackling instance match-
ing, an aspect of ontology matching that was missing from its portfolio. However, we
also made several developments with regard to class matching, namely with the use of
OBO logical definitions.
For this OAEI edition, we also decided to adopt the solution of using configuration files
for each track in order to specify the parameters of the matching task (such as whether
to match classes, properties, and/or instances) rather than submit a preconfigured sys-
tem. With this, we aim at providing a more transparent approach to our participation in
the OAEI.

1.2     Specific Techniques Used
For the sake of brevity, this section describes only the features of AML that are new
for this edition of the OAEI. For a complete description of AML’s matching strategy,
please refer to last year’s OAEI results paper [5].

1.2.1 Ontology Data
To store data about ontology individuals, we expanded AML’s Lexicon and Relation-
shipMap data structures [6] and created the new ValueMap. The current organization of
these data structures is the following:

 – The Lexicon of each Ontology stores local names (if not alpha-numeric codes),
   labels and other lexical annotations of classes, individuals, and properties, after
   normalizing them.
 – The ValueMap of each Ontology stores all other annotations of individuals and their
   data property values.
 – The global RelationshipMap stores relations between classes, between individuals,
   between properties, classes instanced by individuals, and property domains and
   ranges.

1.2.2 Instance Matching
For instance matching, AML’s core strategy consists of three matching algorithms:

 – The HybridStringMatcher which matches two entities by computing the maximum
   of the string similarity, word similarity, and WordNet similarity between their Lex-
   icon entries. It is the algorithm AML already used to match properties.
 – The ValueStringMatcher which matches two individuals by computing the maxi-
   mum string similarity between their ValueMap entries, penalizing matches where
   the annotation or data property is not the same.
 – The Value2LexiconMatcher which employs the same combination of similarity
   metrics as the HybridStringMatcher, but compares Lexicon entries of one entity
   with ValueMap entries of the other and vice versa.

AML’s similarity score is the maximum of these three algorithms, but it uses a linear
combination of the three to break similarity ties when performing alignment selection.
AML deviates from this core strategy in three circumstances:

 – When the matching problem requires translation, in which case it employs the same
   matching strategy used for classes and properties when translation is involved.
 – When the ontologies have a high individual connectivity (indicating that there is a
   network or pipeline of individuals), in which case it employs the ProcessMatcher
   algorithm that was developed for matching business process models [2]. It com-
   bines string similarity with structural similarity.
 – When the fraction of individuals with exactly matching values in the ontologies is
   high (meaning that matches based on values have low significance), in which case
   it employs only the HybridStringMatcher.

1.2.3 Exploring OBO Logical Definitions
OBO [12] logical definitions (or cross-products) provide definitions of ontology classes
by establishing intersections between other classes, typically from different ontologies.
For example, the logical definition of the class Human Phenotype Ontology (HP) [10]
class HP:0005815 (“supernumerary ribs”) corresponds to an intersection of the Pheno-
typic                      Quality                     Ontology                    class
PATO:0002002 (“has extra parts of type”) and the Foundational Model of Anatomy
(FMA) class [11] FMA:7574 (“rib”) via an ‘inheres in’ relation. We had previously de-
veloped a variant of AML for computing this type of compound mapping [8].
For this year’s OAEI, we explored the use of these logical definitions to match ontolo-
gies that contain them. Continuing the previous example, the Mammalian Phenotype
Ontology (MP) [13] contains the class MP:0000480 (“increased rib number”) which
to an English-speaking human should be obvious that it corresponds to the HP class
above. However, to lexical ontology matching algorithms this correspondence is very
hard to detect. Logical definitions can help us find this mapping, as MP defines that
the class above corresponds to an intersection of the same class PATO:0002002 and
the UBERON class [7] UBERON:0002228 (“rib”). As UBERON has cross-references
to FMA, we can automatically establish a correspondence between UBERON:0002228
and FMA:7574, and thus find the mapping HP:0005815 <=> MP:0000480.
Because the versions of HP and MP used in the OAEI didn’t include the logical defini-
tions in the ontology files (as the versions available at the OBO portal do), we used an
external file containing these definitions as background knowledge.

1.2.4 Thesaurus Matching
For this year’s OAEI we also employed a matching algorithm based on a thesaurus that
is automatically derived from the ontologies by comparing labels and synonyms for the
same classes, as we have described in a previous study [9]. We hadn’t used this strategy
in previous OAEI editions because our original implementation was too broad and con-
sequently both too imprecise and too inefficient computationally. We addressed these
problems by making a more restrictive implementation.
Currently, the algorithm infers synonyms to populate the thesaurus only when two Lex-
icon entries for a class have the same number of words and all their words are equal
except for one, in which case the words in which they differ are inferred to be synony-
mous. Additionally, the new Lexicon entries generated for classes using the thesaurus
are now only used to check for literal full-name matches, whereas previously they were
also used with string similarity algorithms.


1.3   Adaptations made for the evaluation


The adaptations made for the evaluation were: the preprocessing of cross-references
from Uberon and DOID for use in the Anatomy and Large Biomedical Ontologies
tracks, due to namespace differences; the use of an external logical definitions file,
due to the absence of these in the versions of the ontologies used in the Disease and
Phenotype track; and the precomputing of translations, due to Microsoftr Translator’s
query limit.
1.4   Link to the system and parameters file
AML is an open source ontology matching system and is available through GitHub
(https://github.com/AgreementMakerLight) as an Eclipse project, as a stand-alone Jar
application, and as a package for running through the SEALS client.


2     Results
2.1   Anatomy
Thanks to the use of the new thesaurus matching algorithm, AML improved both its re-
call and recall+ to the highest ever results in this track (93.6% and 83.2% respectively).
However, it had a 0.6% drop in precision and a 0.1% drop in F-measure in comparison
with last year. It remains the best performing system in this track.

2.2   Benchmark
As in previous years, AML obtained a very high precision in this track (this year the
highest, at 100%) but a low recall (0.24%) and consequently a low F-measure as well
(38%). We maintain AML focused on matching real-world ontologies, and have not
prioritized the Benchmark track.

2.3   Conference
AML’s performance in the Conference track was exactly the same as last year, as the
new developments do not affect its performance in this track. It remains the best per-
forming system overall in this track, with the highest F-measure on the full reference
alignment 1 (74%), on the full reference alignment 2 (70%, tied with CroMatch), and
on both evaluation modalities with the uncertain reference alignment (Discrete: 78%;
Continuous: 77%).
Concerning the logical reasoning evaluation, AML again had no consistency principle
violations, but did have conservativity principle violations as this is an aspect AML
deliberately doesn’t take into account given that many of these violations are false pos-
itives.

2.4   Disease and Phenotype
AML was considered one of the three top systems in the Disease and Phenotype track.
In the HP-MP task, it obtained F-measures of 86% and 89.7% according to the 2-
vote and 3-vote silver standards, respectively, and produced 122 unique mappings with
86.7% precision. In the DOID-ORDO task, it obtained F-measures of 90.8% and 87.5%
according to the 2-vote and 3-vote silver standards, respectively, and produced 308
unique mappings, with an estimated precision of 86.7%. AML’s performance in cap-
turing the manually created mappings was poorer (75.9% and 0% recall, for HP-Mp
and DOID-ORDO respectively), since the majority of these mappings are subsumption
ones and AML focuses on equivalence matching.
2.5   Instance Matching

In the Sabine sub-track, AML obtained the second highest F-measure in the Sabine
Linguistic task, with 91.8%, and the highest F-measure in the Sabine Linking task, with
88.9%.
In the Synthetic sub-track, AML obtained the highest F-measure in the UOBM main-
box task, with 51.2%, and the second highest F-measure in the SPIMBENCH mainbox
task, with 81.6%. Interestingly, it ranked lower on the corresponding sandbox versions
(second in UOBM with 66.5%, and third in SPIMBENCH with 82%) and was the sys-
tem that lost the least performance between the sandbox and the mainbox tasks. Ad-
ditionally, it is important to mention that AML does not process or attempt to match
individuals without class assignment, and that there were a number of these in both the
UOBM and SPIMBENCH ontologies which were supposed to be matched, which re-
sulted in lower scores for AML.
In the Doremus sub-track, AML obtained the highest F-measure in all three tasks, with
91.8% in the 9 heterogeneities task, 84.8% in the larger 4 heterogeneities task, and
88.60% in the false-positive track task.
Overall, AML obtained the top F-measure in five of the seven Instance Matching tasks,
and second in the other two, making it overall the most successful instance matching
system in the OAEI 2016.


2.6   Interactive Matching

AML had a worse performance than last year in this track, due to changes to its user
interface to enable alignment revision, which affected the internal functioning of the
interactive matching algorithm. We were unable to completely solve this issue in time
for the evaluation. Nevertheless, in the Anatomy dataset, AML still had the highest F-
measure (95.8% with 0% errors), the lowest number of oracle requests, and the lowest
impact of errors, with a drop in performance under 3% between 0 and 30% errors. In
the Conference dataset, it was surpassed by Alin in F-measure and by LogMap with
regard to the lowest number of requests and lowest impact of errors.


2.7   Large Biomedical Ontologies

Like in the Anatomy track, the introduction of the thesaurus matching algorithm led
to an improved recall from AML on the Large Biomedical Ontologies track, and as a
result AML had a higher F-measure overall in all tasks than in previous years. Despite
this, it was surpassed in F-measure on the FMA-NCI small and FMA-SNOMED small
tasks, obtaining only the second-highest F-measure (ignoring the XMAP results, since
this system uses the UMLS metathesaurus as background knowledge, which is the basis
of the reference alignments). Nevertheless, it remains the best performing system able
to complete all the tasks of this track, and the one that produces the most coherent
alignments.
2.8   Multifarm
AML obtained the top F-measure when matching the same ontologies, and the third best
when matching the same ontologies, due to lowered recall. Despite not being a systems
specifically targeting cross-lingual matching, by using a translation module AML is
able to achieve a good ranking in performance in this track.

2.9   Process Model
AML obtained the top F-measure result in this track, with 70.2%, surpassing not only
all other ontology matching systems, but also all process model matching systems from
last year’s process model matching competition [1].


3     General comments
3.1   Comments on the results
AML remained among the top performing systems in nearly all preexisting tracks, while
also obtaining top results in the new tracks: Disease and Phenotype, in which it was
one of the prize winners; Process Model, in which it surpassed the results of (non-
ontology) process model matchers; and Instance Matching with all new datasets. It was
also consistently among the fastest systems and among those that produced the most
coherent alignments. These results reflect our continued effort to extend AML to cover
all types of ontology matching tasks while ensuring that it remains both effective and
efficient.

3.2   Comments on the OAEI test cases
We welcomed the efforts to expand the scope of OAEI with new tracks and improve
existing ones. We take this opportunity to highlight some issues we encountered during
this year’s competition, and suggest some possible improvements for future editions.
This year there were several issues with the test cases from the Instance Matching track:
there were encoding problems associated with the Sabine datasets; there were instances
without class assignments in the Synthetic and Doremus datasets, and in the case of the
former, some of these instances were supposed to be matched; and the target ontology
in the SPIMBENCH mainbox dataset was inconsistent. These are all issues that can be
found in real-world datasets, and both the developers and users of ontology matching
systems should be aware of them, but we believe that asking systems to handle such
specific issues involves a high level of manual work and tuning of the systems, making
their comparison less straightforward and transparent.
We also find that the evaluation in the Disease and Phenotype track still has room for
improvement. Generating silver standards from the alignments produced by the par-
ticipating systems via voting is a reasonable starting point for producing a reference
alignment, but an insightful evaluation would then need that the silver consensus stan-
dards be manually validated, as well as the unique mappings produced by each system.
Since only the latter manual evaluation was done, and for only up to 30 mappings, this
distorts the results as the evaluation will include wrong mappings (that multiple systems
get wrong) and miss correct mappings (that only one system finds). Additionally, we
propose that in next years the versions of the HP and MP ontologies used in this track
include logical definitions, so other systems can also explore them.


4   Conclusion
In 2016, AML was the top performing system in five tracks (Anatomy, Conference, In-
stance, Multifarm, and Process Model) and one of the top performing systems in three
others (Disease and Phenotype, Interactive, and Large Biomedical Ontologies). It fully
met our goals and expectations for this year’s competition, and rewarded our investment
in instance matching (with top results in both Instance and Process Model) and our use
of logical definitions (with a prize in the Disease and Phenotype track).
Nevertheless we remark with enthusiasm on the improvement of other matching sys-
tems in tracks such as Anatomy, Conference, and Large Biomedical Ontologies. While
in previous years we could be led to the conclusion that ontology matching was stagnat-
ing, and that surpassing the results of the top systems would be a tall order, the results
of this year’s OAEI show that that is not the case.


Acknowledgments
The authors are thankful to Daniela Oliveira (Insight Centre for Data Analytics, NIU
Galway, Ireland) for her support in the alignment of phenotype ontologies, and to André
Oliveira, Filipa Marques and Tânia Maldonado, for their contribution to analyzing the
test cases and results. FMC, CM and CP were funded by the Portuguese FCT through
the LASIGE Strategic Project (UID/CEC/00408/2013). CP was also funded by FCT
(PTDC/EEI-ESS/4633/2014). The research of IFC and BS was partially supported by
a grant from the Bloomberg Philanthropies and by NSF awards CNS-1646395, III-
1618126, CCF-1331800, III-1213013, and IIS-1143926.


References
 1. G. Antunes, M. Bakhshandeh, J. Borbinha, J. Cardoso, S. Dadashnia, C. Francescomarino,
    M. Dragoni, P. Fettke, A. Gal, C. Ghidini, et al. The process model matching contest 2015.
    In 6th EMISA Workshop, pages 127–155, 2015.
 2. J. Cardoso, M. Bakhshandeh, D. Faria, C. Pesquita, and J. Borbinha. Ontology-Based Ap-
    proach for Heterogeneity Analysis of EA Models. In Workshop on Business Process Man-
    agement and Ontologies, 2016.
 3. I. F. Cruz, F. Palandri Antonelli, and C. Stroe. AgreementMaker: Efficient Matching for
    Large Real-World Schemas and Ontologies. PVLDB, 2(2):1586–1589, 2009.
 4. I. F. Cruz, C. Stroe, F. Caimi, A. Fabiani, C. Pesquita, F. M. Couto, and M. Palmonari. Using
    AgreementMaker to Align Ontologies for OAEI 2011. In ISWC International Workshop on
    Ontology Matching (OM), volume 814 of CEUR Workshop Proceedings, pages 114–121,
    2011.
 5. D. Faria, C. Martins, A. Nanavaty, D. Oliveira, B. S. Balasubramani, A. Taheri, C. Pesquita,
    F. M. Couto, and I. F. Cruz. AML results for OAEI 2015. In Ontology Matching Workshop.
    CEUR, 2015.
 6. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The Agreement-
    MakerLight Ontology Matching System. In OTM Conferences - ODBASE, pages 527–541,
    2013.
 7. C. J. Mungall, C. Torniai, G. V. Gkoutos, S. Lewis, and M. A. Haendel. Uberon, an Integra-
    tive Multi-species Anatomy Ontology. Genome Biology, 13(1):R5, 2012.
 8. D. Oliveira and C. Pesquita. Compound matching of biomedical ontologies. In International
    Conference on Biomedical Ontology, volume 1515. CEUR, 2015.
 9. C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, and F. M. Couto. What’s in a ”nym”?
    Synonyms in Biomedical Ontology Matching. In International Semantic Web Conference
    (ISWC), pages 526–541, 2013.
10. P. N. Robinson, S. Köhler, S. Bauer, D. Seelow, D. Horn, and S. Mundlos. The human
    phenotype ontology: a tool for annotating and analyzing human hereditary disease. The
    American Journal of Human Genetics, 83(5):610–615, 2008.
11. C. Rosse, J. L. Mejino Jr, et al. A reference ontology for biomedical informatics: the foun-
    dational model of anatomy. Journal of biomedical informatics, 36(6):478–500, 2003.
12. B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck,
    A. Ireland, C. J. Mungall, et al. The obo foundry: coordinated evolution of ontologies to
    support biomedical data integration. Nature biotechnology, 25(11):1251–1255, 2007.
13. C. L. Smith and J. T. Eppig. The mammalian phenotype ontology as a unifying standard for
    experimental and high-throughput phenotyping data. Mammalian genome, 23(9-10):653–
    668, 2012.