=Paper= {{Paper |id=Vol-2788/oaei20_paper3 |storemode=property |title=OAEI 2020 results for AML and AMLC |pdfUrl=https://ceur-ws.org/Vol-2788/oaei20_paper3.pdf |volume=Vol-2788 |authors=Beatriz Lima,Daniel Faria,Francisco M. Couto,Isabel Cruz,Catia Pesquita |dblpUrl=https://dblp.org/rec/conf/semweb/LimaFCCP20 }} ==OAEI 2020 results for AML and AMLC== https://ceur-ws.org/Vol-2788/oaei20_paper3.pdf
               OAEI 2020 results for AML and AMLC

                    Beatriz Lima1 , Daniel Faria2 , Francisco M. Couto1 ,
                            Isabel F. Cruz3 , and Catia Pesquita1
               1
               LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
                        2
                          BioData.pt & INESC-ID, Lisboa, Portugal
      3
        ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, USA




         Abstract. AgreementMakerLight (AML) is a scalable and extensible ontology
         matching system with an alignment repair functionality and a strong focus on the
         use of external knowledge. In OAEI 2020, AML’s development focused mainly
         on expanding its range of complex matching algorithms, but there were also im-
         provements on its instance matching pipeline and on its ontology parsing algo-
         rithm. AML remains the system with the broadest coverage of OAEI tracks, and
         among the top performing systems overall.


1      Presentation of the System
1.1     State, Purpose, General Statement
AgreementMakerLight (AML) is an ontology matching system inspired by Agreement-
Maker [1, 2, 10] but designed anew to tackle the matching of very large ontologies ef-
ficiently [7]. It is a general purpose system that is able to successfully tackle problems
across the whole spectrum of ontology matching, irrespective of their domain.
AML is primarily based on lexical matching algorithms [8], but also includes struc-
tural algorithms for both matching and filtering, as well as its own logical repair algo-
rithm [9]. It is capable of using external background knowledge, and even automatically
selecting background knowledge sources for any given ontologies to match [6].
AMLC is a new version of AML developed to tackle complex ontology matching. At
this time, it remains separate from the main AML codebase and OAEI submission, but
we aim to merge the two versions in the near future.
This year, our development focused mainly on the implementation of pattern mining
ontology matching algorithms in AMLC, based on association rules and inspired by the
work of Zhou et al. [11]. As of our OAEI submission, AMLC included only variants
of these algorithms for detecting simple class and property mappings, but we are in the
process of implementing variants for complex mappings.
As has been the case in recent years, we also participated in the SPIMBENCH and Link
Discovery tracks via the HOBBIT platform. In the case of SPIMBENCH, we partic-
ipated with the HOBBIT adaptation of the main AML codebase. In the case of Link
Discovery, we participated with a specialized version of AML, AML-Spatial, due to
    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
the unique characteristics of the matching tasks in this track and to the unavailability of
the TBox assertions in the HOBBIT datasets.

1.2   Specific Techniques Used
This section describes only the features of AML that are new for OAEI 2020. It also
describes AMLC, a variant of AML tailored to complex matching. For further infor-
mation on AML’s simple matching strategy, please consult AML’s original paper [7] as
well as the AML OAEI results publications of 2016-2018 [4, 3, 5].

    Our main development this year was a modular association rule mining framework
for ontology matching, inspired by the work of Zhou et al. [11]. This strategy resem-
bles the common market basket analysis, where we take into account how frequently
two entities of different ontologies are related to common instances, given a populated
dataset. Our framework features a central association rule mining algorithm implemen-
tation that selects patterns (i.e., mappings) based on their confidence and support, and a
suite of algorithms devoted to finding individual types of patterns and computing their
confidence and support from among the set of instances. As of the OAEI submission we
had implemented only algorithms for detecting simple class and property mappings, but
we are in the process of implementing algorithms for each type of complex mapping.

1.3   Adaptations Made for the Evaluation
As has been the case in recent OAEI editions, the Link Discovery submission of AML
is adapted to these particular tasks and datasets, as their specificities (namely the ab-
sence of a Tbox) demand a dedicated submission. The same is also true to some extent
of AML’s Complex Matching submission.
As usual, our submission included precomputed dictionaries with translations, to cir-
cumvent Microsoftr Translator’s query limit.

1.4   Link to the System and Parameters File
AML is an open source ontology matching system and is available through GitHub:
https://github.com/AgreementMakerLight.

2     Results
AML’s OAEI 2020 results are summarized in Table 1 and discussed in the following
subsections.

2.1   Anatomy
AML had a 0.7% increase in precision and a 0.9% decrease in recall, resulting in a
0.2% decrease in F-measure, in comparison with its performance in recent years. These
differences are an unexpected consequence of minor changes in AML’s general config-
uration.
       Table 1: Summary of OAEI 2020 results for AML and AMLC.

                                    Recall/              Run
         Task           Precision            F-measure            Rank1
                                   Coverage            time (s)
                            —— Anatomy ——
Mouse-Human                0.956     0.927     0.941      29        1
                     —— Biodiversity & Ecology ——
FLOPO-PTO                  0.766     0.820     0.792     53.7      3
ENVO-SWEET                 0.810     0.927     0.865     38.8      1
ANAEETHES-GEMET            0.976     0.764     0.857      4.2      3
AGROVOC-NALT               0.955     0.835     0.890    139.5      1a
                            —— Complex ——
Conference                  0.31      0.37      0.34        -      1a
Populated Conference     0.23-0.51 0.26-0.31    N/A         -     N/A
Hydrography                 0.45      0.05      0.10        -      1b
Geolink                     0.50      0.23      0.32        -      2
Populated Geolink           0.50      0.23      0.32        -      4
Populated Enslaved          0.73      0.28      0.40        -      1
Taxon                    0.19-0.40      0       N/A         -     N/A
                           —— Conference ——
OntoFarm (ra1-M3)           0.84      0.66      0.74        -       1
OntoFarm (ra2-M3)           0.82      0.61      0.70        -       1
OntoFarm (rar2-M3)          0.78      0.62      0.69        -       2
OntoFarm (Discrete)         0.79      0.77      0.78        -       1
OntoFarm (Continuous)       0.80      0.74      0.77        -       1
DBpedia-OntoFarm            0.48      0.67      0.56        -       1
                       —— Disease & Phenotype ——
HP-MP                      0.910      0.79     0.816      102       3
DOID-ORDO                  0.682     0.834     0.750      200       2
                       —— Interactive Matching ——
Anatomy (error 0.0)        0.972     0.933     0.952     37.3       1
Anatomy (error 0.1)        0.962     0.929     0.945     37.5       1
Anatomy (error 0.2)        0.951     0.928     0.939     37.4       1
Anatomy (error 0.3)        0.942     0.924     0.933     37.2       1
Conference (error 0.0)      0.91     0.698      0.79     30.1       2
Conference (error 0.1)     0.843     0.682     0.754       30       1
Conference (error 0.2)     0.777     0.677     0.723     30.3       1
Conference (error 0.3)     0.721      0.65     0.684     30.5       1
                        —— Knowledge Graph ——
Aggregate (class)           0.98      0.81      0.89        –      1
Aggregate (property)        0.92      0.57      0.70        –      6
Aggregate (instance)        0.90      0.80      0.85        –      3b
Aggregate (all)             0.90      0.80      0.85     3055      3b
                  —— Large Biomedical Ontologies ——
FMA-NCI small              0.958      0.91     0.933       38       1
      FMA-NCI whole               0.806     0.881       0.842        82         1
      FMA-SNOMED small            0.923     0.762       0.835       101         1
      FMA-SNOMED whole            0.685     0.710       0.697       181         3
      SNOMED-NCI small            0.906     0.746       0.818       629         1
      SNOMED-NCI whole            0.862     0.687       0.765       381         1
                                —— Link Discovery ——
      Spatial (mainbox)             1.0       1.0         1.0      11172        1b
                                   —— Multifarm ——
      Different Ontologies         0.72      0.35        0.47       170         1
      Same Ontologies              0.94      0.28        0.17         –         2
                                 —— SPIMBENCH ——
      SPIMBENCH (mainbox) 0.839             0.884       0.860      38772        4
      1
        according to F-measure; a only system with results; b tied with other systems


2.2    Biodiversity and Ecology

AML improved its results on both the FLOPO-PTO and the ENVO-SWEET tasks in
comparison with last year. It was surpassed by two versions of LogMap on the FLOPO-
PTO task, but remained the best performing system in the ENVO-SWEET task.
With respect to the new tasks, AML ranked third in the ANAEETHES-GEMET task,
and was the only system able to produce results in the AGROVOC-NALT task.


2.3    Complex Matching

AMLC was one of three tools able to generate complex correspondences, and the only
tool able to produce results in the (non-populated) Conference task, which uses the
simple reference alignment as input. While its performance was among the best in most
tasks, it remains mediocre in comparison with its performance in simple matching tasks,
underpinning the fact that there is much room for improvement in complex ontology
matching.
We unfortunately were unable to finish implementing the suite of pattern mining al-
gorithms for complex ontology matching in time for this OAEI edition, which likely
would have improved AML’s performance substantially in populated complex tasks.


2.4    Conference

AML had the exact same results as in recent years, with F1-measures of 74% according
to the full reference alignment (ra1), 70% according to the extended reference alignment
(ra2), 78% according to the discrete uncertain reference alignment, and 77% according
to the continuous one, ranking first in all four evaluation variants. It ranked second in
the evaluation with the violation free version of the extended reference alignment (rar2),
likely because AML’s repair algorithm deliberately does not address conservativity vi-
olations, as we do not subscribe to conservativity as a guiding principle in ontology
matching.
AML was one of only five systems able to participate in a new unannounced task con-
sisting in matching the DBpedia to the OntoFarm ontologies, and had the highest F-
measure among those five.

2.5    Disease and Phenotype
AML ranked it third and second in F-measure in the HP-MP and DOID-ORDO tasks,
respectively. However, as has been the trend, AML was one of the systems with the
highest number of unique mappings (i.e., mappings not proposed by any other system).
Since the evaluation in this track is based on a 3-vote consensus alignment, rather than a
true reference alignment, and unique mappings are not otherwise assessed, this severely
affects AML’s evaluation, making its results below average in comparison with other
biomedical matching tasks.

2.6    Interactive Matching
AML had a lower performance than last year in the Anatomy track, undoubtedly tied to
its change in performance in the non-interactive version of the track. Its results in the
Conference track remained the same. Overall it remains the interactive system that is
the least impacted by the oracle errors.

2.7    Large Biomedical Ontologies
AML’s performance in this track was similar to last year’s, but with decimal increases in
F-measure across all tasks, likely due to the same changes that affected its performance
in the Anatomy track. It remains the best performing system in five out of the six tasks.

2.8    Knowledge Graph
Contrarily to last year, AML was able to complete all of the five tasks in a timely
manner, having a global F-measure of 0.85, which ranked it third overall. It had the best
performance in matching classes.

2.9    Link Discovery
As in previous years, AML and all other participants produced a perfect result (100%
F-measure) in the Spatial track. AML had the highest run time among participating sys-
tems, though this was not true in all tasks.


2.10    Multifarm
AML’s results were slightly better than last years’, with a 2% increase in F-measure
in the different ontologies modality and a 1% increase in the same ontologies modal-
ity. These differences are due to correcting a minor configuration problem when using
AML’s word-matching algorithm in a multilingual setting.
2.11    SPIMBENCH

AML obtained the same results as last year, with an F-measure of 86%, which ranked
it fourth.



3      General Comments on the Results

In 2020, AML was once again the system that tackled the most OAEI tracks and
datasets, and maintained its status as one of best performing and broadest matching
systems competing in the OAEI.
Nonetheless, there is still some work to be done in terms of complex matching, in order
to be able to provide more robust results. We will strive to refine and improve AML’s
complex matching pipeline, particularly by upgrading our association rule based ap-
proach.


4      Conclusions

Like in recent years, AML was the matching system that participated in the most OAEI
tracks and datasets, and it was among the top performing systems in most of them.
AML’s performance was very similar to those of recent years in any of the long-standing
OAEI tracks, as most of our development effort went into tackling new challenges, such
as pattern mining approaches for complex matching.
Complex matching remains one of the biggest challenges in ontology matching, and
will remain the main focus of AML’s development in the near future.


Acknowledgments

DF was funded by the Portuguese FCT Grant 22231 BioData.pt (co-financed by FEDER).
CP and BL are supported by FCT through project SMILAX (PTDC/EEI-ESS/4633/2014),
and the LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020).
    FMC was also funded by PTDC/CCI-BIO/28685/2017. The research of IFC was
partially funded by NSF award III-1618126 and by NIGMS-NIH award R01GM125943.



References

 1. I. F. Cruz, F. Palandri Antonelli, and C. Stroe. AgreementMaker: Efficient Matching for
    Large Real-World Schemas and Ontologies. PVLDB, 2(2):1586–1589, 2009.
 2. I. F. Cruz, C. Stroe, F. Caimi, A. Fabiani, C. Pesquita, F. M. Couto, and M. Palmonari. Using
    AgreementMaker to Align Ontologies for OAEI 2011. In ISWC International Workshop on
    Ontology Matching (OM), volume 814 of CEUR Workshop Proceedings, pages 114–121.
    CEUR-WS.org, 2011.
 3. D. Faria, B. S. Balasubramani, V. R. Shivaprabhu, I. Mott, C. Pesquita, F. M. Couto, and
    I. F. Cruz. Results of AML in OAEI 2017. In ISWC International Workshop on Ontology
    Matching (OM), volume 2032 of CEUR Workshop Proceedings, pages 122–128. CEUR-
    WS.org, 2017.
 4. D. Faria, C. Pesquita, B. S. Balasubramani, C. Martins, J. Cardoso, H. Curado, F. M. Couto,
    and I. F. Cruz. OAEI 2016 results of AML. In ISWC International Workshop on Ontology
    Matching (OM), volume 1766, pages 138–145. CEUR-WS.org, 2016.
 5. D. Faria, C. Pesquita, B. S. Balasubramani, T. Tervo, D. Carriço, R. Garrilha, F. M. Couto,
    and I. F. Cruz. Results of AML Participation in OAEI 2018. In ISWC International Workshop
    on Ontology Matching (OM), volume 2288 of CEUR Workshop Proceedings, pages 125–131.
    CEUR-WS.org, 2018.
 6. D. Faria, C. Pesquita, E. Santos, I. F. Cruz, and F. M. Couto. Automatic Background Knowl-
    edge Selection for Matching Biomedical Ontologies. PLoS One, 9(11):e111226, 2014.
 7. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The Agreement-
    MakerLight Ontology Matching System. In OTM Conferences - ODBASE, pages 527–541,
    2013.
 8. C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, and F. M. Couto. What’s in a ”nym”?
    Synonyms in Biomedical Ontology Matching. In International Semantic Web Conference
    (ISWC), pages 526–541, 2013.
 9. E. Santos, D. Faria, C. Pesquita, and F. M. Couto. Ontology Alignment Repair Through
    Modularization and Confidence-based Heuristics. PLoS ONE, 10(12):e0144807, 2015.
10. W. Sunna and I. F. Cruz. In International Conference on GeoSpatial Semantics (GeoS),
    pages 82–97. Springer.
11. L. Zhou, M. Cheatham, and P. Hitzler. Towards Association Rule-Based Complex Ontology
    Alignment. In X. Wang, F. A. Lisi, G. Xiao, and E. Botoeva, editors, Semantic Technology,
    pages 287–303, Cham, 2020. Springer International Publishing.