The First Version of the OAEI Complex
                   Alignment Benchmark

Elodie Thiéblin1 , Michelle Cheatham2 , Cassia Trojahn1 , Ondřej Zamazal3 and
                                  Lu Zhou2
            1
              IRIT & Université de Toulouse 2 Jean Jaurès, Toulouse, France
                       2
                         Wright State University, Dayton, USA
                  3
                    University of Economics, Prague, Czech Republic
              elodie.thieblin@irit.fr, michelle.cheatham@gmail.com,
        cassia.trojahn@irit.fr, ondrej.zamazal@vse.cz, zhou.34@wright.edu


          Abstract. We present the first version of the complex benchmark of the
          Ontology Alignment Evaluation Initiative campaigns. This benchmark is
          composed of four datasets from different domains (conference, hydrology,
          geoscience and agronomy) and covers different evaluation strategies.


Keywords: complex ontology alignments, evaluation dataset, OAEI


1       Introduction

Complex correspondences involve transformation functions of literal values or
logical constructors (e.g. ∀x, ekaw:AcceptedPaper(x) ≡ ∃y, cmt:acceptedBy(x,y)),
which make them more expressive than simple correspondences. Complex align-
ments, composed of at least one complex correspondence, are therefore a com-
plement to simple alignments. Different approaches for complex matching have
emerged in the literature [2,4,5,8]. Most of them, however, have been evaluated
on tailored datasets (e.g., targeting a specific correspondence pattern). Most ef-
forts on systematic evaluation, in the context of the OAEI campaigns1 , are still
dedicated to simple matchers.
    This paper presents the first version of the OAEI complex track, composed of
four datasets (Table 1) from different domains. This domain and correspondence
variety allows for better covering different kinds of heterogeneity between ontolo-
gies. Different evaluation strategies aim at evaluating complex matchers under
different perspectives. The evaluation will be supported by the SEALS platform
and the output alignments must be in EDOAL. The detail of each dataset and
evaluation process can be found on the OAEI’s 2018 complex track webpage2 ,
and are introduced in the following.
    1
        http://oaei.ontologymatching.org/
    2
        http://oaei.ontologymatching.org/2018/complex/index.html
                 Dataset              Ontologies (1:1) (1:n) (m:n)
                 Conference consensus     3        78 79       0
                 Hydrography              4       113 69      15
                 GeoLink                  2        24 15      72
                 Taxon                    4        6    17     3
Table 1. Number of ontologies and correspondences by kind in each dataset.
(1:1) are simple correspondences, (1:n) and (m:n) are complex correspondences.


2   Conference consensual dataset

This dataset is based on the OntoFarm dataset [9], which is composed of 16 on-
tologies on the conference organisation domain and simple reference alignments
between 7 of them. Here, we consider 3 out of the 7 ontologies from the refer-
ence alignments (cmt, conference and ekaw ), resulting in 3 alignment pairs. The
alignments involve both logical constructors (76 correspondences) and transfor-
mations (3 correspondences). Examples are given in the following :
 1. ∀ x, ekaw:AcceptedPaper(x) ≡ ∃ y, cmt:acceptedBy(x,y) is a correspondence
     with the existential constructor.
 2. ∀ x,y, cmt:name(x,y) ≡ ∃ y1 , y2 , conference:has_the_first_name(x,y1 ) ∧
     conference:has_the_last_name(x,y2 ) ∧ concatenation(y,y1 ," ", y2 ), where
     concatenation(a,b1 , b2 , ...) is a predicate ensuring that its first parameter
     a is equal to the string concatenation of the others {b1 , b2 , ...}. It uses a
     transformation function of the literal values.
    The alignments have been manually created by three experts in the domain,
following the methodology in [7]. Four experts assessed the generated corre-
spondences to reach a consensus. The systems will be manually evaluated on
their output alignments to produce precision and recall scores. Only the com-
plex equivalence correspondences will be assessed. The systems can use a simple
reference alignment as input. Confidence scores of correspondences will not be
taken into account in the evaluation.


3   Hydrography dataset

The hydrography dataset is composed of 4 source ontologies (Hydro3, hydrOn-
tology_native, hydrOntology_translated and Cree) that each should be aligned
to a single target Surface Water Ontology (swo). The source ontologies vary
in their similarity to the target ontology – Hydro3 is similar in both language
and structure, hydrOntology_native and hydrOntology_translated are similar
in structure but hydrOntology_translated is in Spanish rather than English, and
Cree is very different in terms of both language and structure. The alignments
were created by a geologist and an ontologist, in consultation with a native
Spanish speaker regarding the hydrOntology_translated, and consist of logical
relations such as the one shown below.
 1. ∀x, hydrOntology_translated:Aguas_Corrientes(x) ≤ swo:SurfaceFeature(x)
     ∧ swo:Waterbody(x) ∧ ∃y, swo:hasFlow(x,y) ∧ swo:Flow(y)
Performance on this dataset will be evaluated on three sub-tasks: 1) identify-
ing the atoms (classes and properties) from the target ontology involved in the
relations (e.g., swo:SurfaceFeature, swo:Waterbody, swo:hasFlow and swo:Flow
from the correspondence above), 2) when given the atoms, identifying the logical
relations that hold between them and 3) the full complex alignment task. Eval-
uation of the first sub-task will use traditional F-measure, while the remaining
two subtasks will be evaluated on semantic F-measure [1].


4       GeoLink dataset
This dataset is from the GeoLink project3 , which was funded under the U.S. Na-
tional Science Foundation’s EarthCube initiative. It is composed of 2 populated
ontologies: the GeoLink base ontology (gbo) and the GeoLink modular ontology
(gmo). The GeoLink project is a real-world use case of ontologies. The alignment
between the ontologies was developed in consultation with domain experts from
several Geoscience research institutions. The complex correspondences include
not only class and property subsumption and property chains (described in [5]),
but also some that involve typecasting (c.f. [3]), for example:
 1. Property Chain: ∀x,z, gbo:Award(x) ∧ gbo:hasSponsor(x,z) ≡
    ∃y, gmo:FundingAward(x) ∧ gmo:providesAgentRole(x,y) ∧
    gmo:SponsorRole(y) ∧ gmo :performedBy(y,z)
 2. Class Typecasting: ∀x, gbo:PlaceType(x) ≡ rdfs:subClassOf(x, gmo:Place)
More information about this dataset can be found in [10] and the benchmark
and alignment can be downloaded here4 . The performance of alignment systems
on this dataset will be evaluated in the same way as the hydrography dataset.


5       Taxon dataset
This dataset is composed of 4 populated ontologies whose common scope is plant
taxonomy: AgronomicTaxon (agtx ), Agrovoc (agv and agronto), DBpedia (dbo)
and TaxRef-LD (txr ). This dataset extends the one proposed in [6] by adding
the TaxRef-LD ontology. The alignments were manually created with the help
of one expert and involve only logical constructors, as for example:
 1. ∀x, agtx:GenusRank(x) ≡ agronto:hasTaxonomicRank(x,agv:c_11125)
 2. ∀x, agtx:GenusRank(x) ≡ ∃y, dbo:Species(y) ∧ dbo:genus(y,x) ∧ dbo:Species(x)
The evaluation of this dataset is task-oriented. We will evaluate the generated
correspondences using a SPARQL query rewriting system and manually mea-
sure their ability of answering a set of queries over each dataset. For example, a
competency question could be “Retrieve all the genus taxa”. For Agronomic-
Taxon, as source ontology, the corresponding SPARQL query is SELECT ?x
WHERE {?x a agtx:GenusRank.} and the correspondences output by the sys-
tems with Agrovoc as target ontology, should be able to translate the query
into: SELECT ?x WHERE {?x agronto:hasTaxonomicRank agv:c_11125.}
    3
        https://www.geolink.org/
    4
        http://doi.org/10.6084/m9.figshare.5907172
6    Conclusions

This paper has presented the first OAEI complex evaluation track, covering
different kinds of complex correspondences, domains and evaluation strategies.
For most datasets, the evaluation is still manually performed, opening directions
on how complex alignments can be automatically generated and evaluated.

Acknowledgements. We thank Catherine Roussey (IRSTEA) and Nathalie
Hernandez (IRIT) for their help on the Taxon dataset and Dalia Varanka (US
Geological survey) for her work on the hydrography dataset. Ondřej Zamazal
has been partially supported by the CSF grant no. 18-23964S. Creation of the
GeoLink dataset was funded by NSF 1440202.


References
 1. Euzenat, J.: Semantic precision and recall for ontology alignment evaluation. In:
    IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial
    Intelligence, Hyderabad, India, January 6-12, 2007. pp. 348–353 (2007)
 2. Jiang, S., Lowd, D., Kafle, S., Dou, D.: Ontology matching with knowledge rules.
    In: Transactions on Large-Scale Data-and Knowledge-Centered Systems, pp. 75–95
    (2016)
 3. Krisnadhi, A.A., Hitzler, P., Janowicz, K.: On the capabilities and limitations
    of OWL regarding typecasting and ontology design pattern views. In: Ontology
    Engineering - 12th International Experiences and Directions Workshop on OWL,
    OWLED 2015, co-located with ISWC 2015, Bethlehem, PA, USA, October 9-10,
    2015, Revised Selected Papers. pp. 105–116 (2015)
 4. Parundekar, R., Knoblock, C.A., Ambite, J.L.: Discovering concept coverings in
    ontologies of linked data sources. In: ISWC. pp. 427–443 (2012)
 5. Ritze, D., Meilicke, C., Šváb Zamazal, O., Stuckenschmidt, H.: A pattern-based
    ontology matching approach for detecting complex correspondences. In: 4th OM
    workshop. pp. 25–36 (2009)
 6. Thiéblin, E., Amarger, F., Hernandez, N., Roussey, C., Trojahn, C.: Cross-querying
    lod datasets using complex alignments: an application to agronomic taxa. In:
    MTSR. pp. 25–37 (2017)
 7. Thiéblin, E., Haemmerlé, O., Hernandez, N., Trojahn, C.: Task-oriented complex
    ontology alignment – two alignment evaluation sets. In: ESWC (2018), (to appear)
 8. Walshe, B., Brennan, R., O’Sullivan, D.: Bayes-recce: A bayesian model for detect-
    ing restriction class correspondences in linked open data knowledge bases. Inter-
    national Journal on Semantic Web and Information Systems 12(2), 25–52 (2016)
 9. Zamazal, O., Svátek, V.: The Ten-Year OntoFarm and its Fertilization within the
    Onto-Sphere. Journal of Web Semantics 43, 46–53 (2017)
10. Zhou, L., Cheatham, M., Krisnadhi, A., Hitzler, P.: A complex alignment bench-
    mark: Geolink dataset. In: ISWC. Springer (2018), (to appear)