AML and AMLC results for OAEI 2021

                  Daniel Faria1 , Beatriz Lima1 , Marta Contreiras Silva1 ,
                        Francisco M. Couto1 , and Catia Pesquita1

               LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal


        Abstract. AgreementMakerLight (AML) is an ontology matching system with
        a scalable and extensible framework that enables it to tackle a variety of ontology
        matching tasks. For the OAEI 2021, AML’s development focused exclusively on
        expanding its range of complex matching algorithms, which feature in its com-
        plex matching version, AMLC. AML remains one of the systems with the broad-
        est coverage of OAEI tracks and with the best overall performance.


1     Presentation of the System

1.1    State, Purpose, General Statement

AgreementMakerLight (AML) is an ontology matching system with a scalable and ex-
tensible framework, inspired by AgreementMaker [1, 2, 11], but developed to tackle the
efficient matching of very large ontologies [7]. Despite an emphasis on the biomedical
domain, AML is a broad purpose ontology matching system capable of tackling prob-
lems across the whole spectrum of ontology matching, and of any domain.
AML relies primarily on lexical matching algorithms [9], but also features structural
algorithms for both matching and filtering, as well as a logical repair algorithm [10]. It
makes extensive use of external sources of background knowledge, featuring an auto-
matic selection algorithm to identify the background knowledge sources that should be
used for any given matching task [6].
AMLC is a version of AML developed for complex ontology matching, which remains
separate from the main AML codebase and OAEI submission, as tackling complex on-
tology matching required refactoring most of the AML framework. This year, our work
was exclusively focused on furthering the development to AMLC.
As has been the case in recent years, we also participated in the SPIMBENCH and Link
Discovery tracks via the HOBBIT platform. In the case of SPIMBENCH, we partic-
ipated with the HOBBIT adaptation of the main AML codebase. In the case of Link
Discovery, we participated with a specialized version of AML, AML-Spatial, due to
the unique characteristics of the matching tasks in this track and to the unavailability of
the TBox assertions in the HOBBIT datasets.

    Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
1.2   Specific Techniques Used

This section describes only the features of AMLC that are new for OAEI 2021. For
further information on AML’s simple matching strategy, please consult AML’s original
paper [7] as well as the AML OAEI results publications of 2016-2018 [4, 3, 5].

    Our main development this year was the extension of our modular association rule
mining (ARM) framework for ontology matching, inspired by Zhou et al. [12], but
where the fact that the complex matching patterns are known a priori is exploited to
steer the ARM process, rather than just to filter the final results. Our framework fea-
tures a central ARM implementation that selects patterns (i.e., mappings) based on their
confidence and support, and a suite of algorithms devoted to finding individual types of
patterns and computing their confidence and support from among the set of shared in-
stances in a matching task. While for last year, we had only implemented algorithms for
detecting simple class and property mappings, as of the OAEI 2021, AMLC includes
algorithms for most types of complex patterns.


1.3   Adaptations Made for the Evaluation
As has been the case in recent OAEI editions, the Link Discovery submission of AML
is adapted to these particular tasks and datasets, as their specificities (namely the ab-
sence of a Tbox) demand a dedicated submission. The same is also true to some extent
of AML’s Complex Matching submission.
As usual, our submission included precomputed dictionaries with translations, to cir-
cumvent Microsoft® Translator’s query limit.


1.4   Link to the System and Parameters File
AML is an open source ontology matching system and is available through GitHub:
https://github.com/AgreementMakerLight.


2     Results
AML’s and AMLC’s OAEI 2021 results are summarized in Table 1. Results that were
different from OAEI 2020 [8] are discussed in the subsections below.


2.1   Biodiversity and Ecology
AML’s results were the same as last year for the ANAEETHES-GEMET task, but on
the ENVO-SWEET and AGROVOC-NALT tasks it had a lower precision but higher
recall than last year, which we can only ascribe to differences in the datasets, since
AML was unchanged. AML ranked first in all three tasks.
       Table 1: Summary of AML’s and AMLC’s OAEI 2021 results.

                                    Recall/             Run
         Task          Precision            F-measure            Rank1
                                  Coverage            time (s)
                          —— Anatomy ——
Mouse-Human              0.956      0.927      0.941      29       1
                     —— Biodiversity & Ecology ——
ENVO-SWEET               0.745      0.895      0.813      47      1
ANAEETHES-GEMET          0.976      0.764      0.839      21      1
AGROVOC-NALT             0.853      0.904      0.877   139.5      1a
                   —— Common Knowledge Graphs ——
Mouse-Human                0.0        0.0        0.0     319       8
                          —— Complex ——
Conference                0.31       0.37       0.34       -      1a
Hydrography               0.49       0.04       0.08       -      3
Geolink                   0.49       0.22       0.30       -      2
Populated Geolink         0.49       0.22       0.30       -      2
Populated Enslaved        0.46       0.12       0.18       -      3
                         —— Conference ——
OntoFarm (rar2-M3)        0.78       0.62       0.69       -       1
OntoFarm (Discrete)       0.79       0.77       0.78       -       1
OntoFarm (Continuous)     0.80       0.74       0.77       -       1
DBpedia-OntoFarm          0.50       0.73       0.59       -       2
                       —— Disease & Phenotype ——
HP-MP                    0.911       0.72      0.804     117       3
DOID-ORDO                0.691      0.833      0.755     231       1
                       —— Interactive Matching ——
Anatomy (error 0.0)      0.972      0.933      0.952      16       1
Anatomy (error 0.1)      0.961      0.931      0.946      16       1
Anatomy (error 0.2)      0.952      0.927      0.939      16       1
Anatomy (error 0.3)      0.942      0.925      0.933      16       1
Conference (error 0.0)    0.91      0.698       0.79      21       2
Conference (error 0.1)   0.845      0.687      0.758      21       1
Conference (error 0.2)   0.777      0.665      0.717      21       1
Conference (error 0.3)   0.724       0.65      0.685      21       1
                        —— Knowledge Graph ——
Aggregate (class)         0.98       0.81       0.89       –      1
Aggregate (property)      0.92       0.57       0.70       –      6
Aggregate (instance)      0.90       0.80       0.85       –      3b
Aggregate (all)           0.90       0.80      0.85     3026      3b
                  —— Large Biomedical Ontologies ——
FMA-NCI small            0.958       0.91      0.933      44       1
FMA-NCI whole            0.806      0.881      0.842      92       1
FMA-SNOMED small         0.923      0.762      0.835     124       1
FMA-SNOMED whole         0.685      0.710      0.697     183       3
      SNOMED-NCI small            0.906     0.746       0.818       1026        1
      SNOMED-NCI whole            0.862     0.687       0.765        375        1
                                —— Link Discovery ——
      Spatial (mainbox)             1.0       1.0         1.0         –         1b
                       —— Material Sciences and Engineering ——
                                   —— Multifarm ——
      MultiFarm                    0.72      0.35        0.47      10200        1
                                 —— SPIMBENCH ——
      SPIMBENCH (mainbox) 0.839             0.884       0.860      46517        2
      1
        according to F-measure; a only system with results; b tied with other systems

2.2    Common Knowledge Graphs
In this new track, AML only generated instance mappings as its automatic configuration
detected this as an instance matching problem. However, the track is in fact a class
matching problem, so the instance mappings generated by AML were not contemplated
in the evaluation, and as a result its alignment had no valid mappings and an F-measure
of 0.

2.3    Complex Matching
AMLC’s results were either identical or slightly worse than those of last year, indicating
that, despite our efforts, more work is needed on our complex matching algorithms.
Nevertheless, it is worth mentioning that AMLC remains the only system capable of
producing complex mappings in some of the complex tasks.

2.4    Conference
AML had the exact same results as in recent years on the OntoFarm suite of tasks,
ranking first in all evaluation modalities. It had a higher precision and recall than last
year on the DBpedia-OntoFarm suite (ranking second) which we can only ascribe to
changes in the dataset.

2.5    Disease and Phenotype
AML had similar results to those of recent years, with minor changes being attributed
to the fact that the evaluation is based on a silver standard, as there were no changes on
the side of AML.

2.6    Interactive Matching
AML had the same performance as last year in tasks with 0 error rate, as expected.
In the tasks with error rate greater than 0, small differences can be observed between
the results of this year and last year, due to the stochastic nature of the Oracle’s errors
that makes each evaluation run unique. The differences between the two years are small
because the evaluation is the average of 10 runs.
2.7   Multifarm

AML’s results were slightly better than last years’, with a 2% increase in F-measure
in the different ontologies modality and a 1% increase in the same ontologies modal-
ity. These differences are due to correcting a minor configuration problem when using
AML’s word-matching algorithm in a multilingual setting.


3     Conclusions

In 2021, AML was once again one of the systems that successfully tackled most OAEI
tracks and datasets, as well as one of the best performing systems overall.
The fact that AML remains the best performing system in some of the oldest recurring
tracks, despite no new developments on our side, suggests that either there has been no
interest from the ontology matching community in tackling these challenges, or AML’s
results are sufficiently close to the best that can be achieved with an automated ontology
matching algorithm to make it difficult to surpass them.
    With regard to complex matching, AMLC featured an expand suite of algorithms,
but nevertheless failed to improve upon last year’s results. Clearly more work is needed,
in what is the most challenging sub-field of ontology matching, to attain a performance
that would enable practical use of AMLC.


Acknowledgments

The authors dedicate this article to the memory of Isabel F. Cruz, whose guidance and
encouragement were instrumental to the development of AML and its continued success
throughout the years.
    This work was supported by FCT through the LASIGE Research Unit (UIDB/00408
/2020 and UIDP/00408/2020). It was also partially supported by the KATY project
which has received funding from the European Union’s Horizon 2020 research and
innovation program under grant agreement No 101017453.


References

 1. I. F. Cruz, F. Palandri Antonelli, and C. Stroe. AgreementMaker: Efficient Matching for
    Large Real-World Schemas and Ontologies. PVLDB, 2(2):1586–1589, 2009.
 2. I. F. Cruz, C. Stroe, F. Caimi, A. Fabiani, C. Pesquita, F. M. Couto, and M. Palmonari. Using
    AgreementMaker to Align Ontologies for OAEI 2011. In ISWC International Workshop on
    Ontology Matching (OM), volume 814 of CEUR Workshop Proceedings, pages 114–121.
    CEUR-WS.org, 2011.
 3. D. Faria, B. S. Balasubramani, V. R. Shivaprabhu, I. Mott, C. Pesquita, F. M. Couto, and
    I. F. Cruz. Results of AML in OAEI 2017. In ISWC International Workshop on Ontology
    Matching (OM), volume 2032 of CEUR Workshop Proceedings, pages 122–128. CEUR-
    WS.org, 2017.
 4. D. Faria, C. Pesquita, B. S. Balasubramani, C. Martins, J. Cardoso, H. Curado, F. M. Couto,
    and I. F. Cruz. OAEI 2016 results of AML. In ISWC International Workshop on Ontology
    Matching (OM), volume 1766, pages 138–145. CEUR-WS.org, 2016.
 5. D. Faria, C. Pesquita, B. S. Balasubramani, T. Tervo, D. Carriço, R. Garrilha, F. M. Couto,
    and I. F. Cruz. Results of AML Participation in OAEI 2018. In ISWC International Workshop
    on Ontology Matching (OM), volume 2288 of CEUR Workshop Proceedings, pages 125–131.
    CEUR-WS.org, 2018.
 6. D. Faria, C. Pesquita, E. Santos, I. F. Cruz, and F. M. Couto. Automatic Background Knowl-
    edge Selection for Matching Biomedical Ontologies. PLoS One, 9(11):e111226, 2014.
 7. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The Agreement-
    MakerLight Ontology Matching System. In OTM Conferences - ODBASE, pages 527–541,
    2013.
 8. B. Lima, D. Faria, F. M. Couto, I. F. Cruz, and C. Pesquita. Oaei 2020 results for aml and
    amlc. In ISWC International Workshop on Ontology Matching (OM), volume 2788 of CEUR
    Workshop Proceedings, 2020.
 9. C. Pesquita, D. Faria, C. Stroe, E. Santos, I. F. Cruz, and F. M. Couto. What’s in a ”nym”?
    Synonyms in Biomedical Ontology Matching. In International Semantic Web Conference
    (ISWC), pages 526–541, 2013.
10. E. Santos, D. Faria, C. Pesquita, and F. M. Couto. Ontology Alignment Repair Through
    Modularization and Confidence-based Heuristics. PLoS ONE, 10(12):e0144807, 2015.
11. W. Sunna and I. F. Cruz. Structure-Based Methods to Enhance Geospatial Ontology Align-
    ment. In International Conference on GeoSpatial Semantics (GeoS), volume 4853 of Lecture
    Notes in Computer Science (LNCS), pages 82–97. Springer, 2007.
12. L. Zhou, M. Cheatham, and P. Hitzler. Towards Association Rule-Based Complex Ontology
    Alignment. In X. Wang, F. A. Lisi, G. Xiao, and E. Botoeva, editors, Semantic Technology,
    pages 287–303, Cham, 2020. Springer International Publishing.