=Paper= {{Paper |id=Vol-3063/om2021_Tpaper5 |storemode=property |title=Challenges of evaluating complex alignments |pdfUrl=https://ceur-ws.org/Vol-3063/om2021_LTpaper5.pdf |volume=Vol-3063 |authors=Beatriz Lima,Daniel Faria,Catia Pesquita |dblpUrl=https://dblp.org/rec/conf/semweb/LimaFP21a }} ==Challenges of evaluating complex alignments== https://ceur-ws.org/Vol-3063/om2021_LTpaper5.pdf
     Challenges of evaluating complex alignments

                  Beatriz Lima, Daniel Faria, and Catia Pesquita

      LASIGE, Dep. Informática, Fac. Ciências, Universidade de Lisboa, Portugal




        Abstract. The evaluation of complex ontology alignments is an open
        challenge, as the traditional syntactic evaluation employed for simple
        alignments is too unforgiving given the difficulty of accurately finding
        complex mappings and the usefulness of approximate solutions.
        In this work we compare and discuss two simple evaluation strategies:
        the entity-based evaluation strategy employed in the complex track of
        the OAEI 2020, and a novel element-overlap–based evaluation approach
        we propose.
        While it is clear that both strategies only provide a gross approximation
        of usefulness, our element-overlap strategy is the more accurate of the
        two, by taking semantic constructs into account. It is also more inter-
        pretable, as the final metrics are based on the total number of mappings
        rather than an arbitrary number of entities. Given that complex map-
        pings often fall outside the DL spectrum, and thereby are non-decidable,
        a significantly more accurate measure of usefulness is not trivial.

        Keywords: Ontology Matching · Ontology Alignment · Complex On-
        tology Matching · Evaluation


1     Introduction
Ontology alignment (or matching) emerged to overcome the semantic hetero-
geneity problem, by providing mappings interrelating the concepts of related
ontologies [2]. While the field is well-established, most ontology alignment sys-
tems and algorithms focus exclusively on finding simple mappings connecting in-
dividual ontology entities directly through equivalence or subsumption relations
[5]. However, conceptual differences between ontologies are often so profound
that such simple mappings are insufficient to capture all the data transforma-
tions required for interoperability between them. Moreover, ontologies may be
semantically irreconcilable through only simple mappings [4].
    A complex ontology mapping is one where at least one of the mapped enti-
ties is an expression that involves multiple entities and/or logical operators or
restrictions (e.g. Accepted contribution = min 1 acceptedBy). Complex map-
pings thus enable us to express rich semantic relations between entities of two
ontologies, and precisely capture the rules for converting instance data between
them.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       Beatriz Lima, Daniel Faria, and Catia Pesquita

    The inclusion of a complex ontology alignment track in the Ontology Align-
ment Evaluation Initiative1 (OAEI) of 2018 [8] is an acknowledgement of the
importance of complex matching by the ontology alignment community. How-
ever, it also brought to the forefront the challenge that is providing an accurate
but fair evaluation of complex ontology alignments [12].
    For simple mappings, the traditional evaluation employed in most OAEI
tracks—computing precision, recall and F-measure through exact match against
a reference alignment—is fairly adequate. One could argue that even in this
context, some mappings should be considered semi-correct, such as a subsump-
tion mapping between two classes that are in fact equivalent, or an equivalence
mapping where one of the classes is a superclass of the correct class [1]. But in
practice, such cases tend to be relatively rare, and have little impact in evaluating
matching systems on simple mappings.
    For complex mappings the outlook is very different. First, building a com-
plete complex reference alignment that contains all non-trivial complex corre-
spondences is extremely laborious, which typically results in either a manual
evaluation of produced mappings [6] or using partial reference alignments. Sec-
ond, the intricacy of the mappings and the unbound search space (due to the
nesting of expressions) mean that cases where alignment systems predict com-
plex mappings that approximate but do not exactly match those in the reference
alignment are the norm rather than the exception. Furthermore two complex
mappings can be syntactically different but semantically equivalent. Thus the
traditional evaluation approach is too unforgiving for complex mappings, and
does not accurately reflect their usefulness [12].
    A number of potential evaluation approaches have been overviewed by Zhou
et al. [12]. They argue that an evaluation approach that reflected the expected
human effort in validating a mapping, such as an edit-distance approach, would
be the most suitable strategy for ontology integration tasks, predicated on the
fact that manual validation would be imperative in such cases. However, to
date no evaluation approach that is simultaneously automated, comprehensive
and able to accurately reflect the usefulness of complex alignments has been
proposed.
    As of the 2020 edition, the OAEI’s complex track employs two different eval-
uation approaches in addition to the traditional exact match approach. In the
Hydrography, Geolink and Enslaved datasets, the evaluation is an entity-based
relaxed precision and recall approach which is comprehensive but neither fully
automated (as the transformation of complex mappings into mapped entities is
done manually) nor entirely accurate (as it doesn’t account for the semantic con-
structs in the complex mappings, only the entities). In the Conference and Taxon
datasets, the evaluation approach is based on query answering, automatically in
the case of Conference [7], but manually in the case of Taxon. This approach is
able to gauge the accuracy of data transformations, but is less comprehensive
than evaluation based on a full reference alignment, and requires the ability to

1
    http://oaei.ontologymatching.org
                                Challenges of evaluating complex alignments    3

rewrite SPARQL queries, which is still an open challenge for more expressive
mappings.
    In this work, we propose a novel automated element-overlap evaluation strat-
egy for complex ontology alignments, as well as a fully automated implementa-
tion of the entity-based evaluation strategy employed in the OAEI. We assess
these two strategies by using them to reevaluate the OAEI 2020 complex results,
and discuss their strengths and limitations.


2     Related Work

The evaluation of complex ontology alignments has been comprehensively over-
viewed by Zhou et al. [12]. They detail the general framework for evaluation with
a reference alignment, which consists of: anchoring, (mapping) comparison,
scoring, and aggregation. Additionally, they enumerate the challenges (C)
that should be addressed by an evaluation strategy:

Anchoring
 C1: avoid the necessity of a full pairwise comparison of reference and system
     mappings.
Comparison
 C2: determine the relation between a candidate mapping and a reference
     mapping.
 C3: handle mapping decomposition (as two separate mappings can be equiv-
     alent to a single other mapping).
 C4: factor the mapping relation.
Scoring
  C5: accurately reflecting the quality/usefulness of each mapping.
Aggregation
 C6: factor partially correct mappings.
 C7: factor cases of mapping decomposition.
 C8: handle the occurrence of (redundant) multiple candidate mappings that
     are implied by a single reference mapping.

These authors also discuss and present the challenges for evaluation without
a reference alignment, namely for query answering approaches such as those
employed in the OAEI. However, as these approaches are less comprehensive
than evaluation with a reference alignment, we focus only on the latter.
    The entity-based relaxed precision and recall approach employed in the com-
plex track of the OAEI2 begins with a manual pre-processing step, where ref-
erence and candidate mappings are converted into a list of key-value pairs of
related entities plus their mapping relation. The key is a source ontology entity
(or combination of entities) belonging to the mappings and manually chosen to
represent them (several mappings can share the same key if they have the same
2
    Unpublished; code provided by the OAEI complex track organisers
4      Beatriz Lima, Daniel Faria, and Catia Pesquita

source entity). The value is the set of all remaining source and target ontology
entities for the mapping(s) that have the key. Considering the following mappings
from cmt − conf erence task of the OAEI (including reference and hypothetical
candidate mappings) as a running example:


Reference mappings:
(A) [hasDecision some Acceptance] or [min 1 acceptedBy] =
    Accepted contribution
(B) ExternalReviewer = min 1 inverseOf (invited by)
(C) Reviewer or ExternalReviewer = Reviewer

Candidate mappings:
(A’) hasDecision some Acceptance > Accepted contribution
(B’) ExternalReviewer = min 1 invited by
(C’) Reviewer = Reviewer

The pre-processing step would result in the following key-value pairs:
Reference mappings:
(A) hasDecision : {Accepted contribution, Acceptance, acceptedBy}, =
(B) ExternalReviewer : {invited by}, =
(C) Reviewer : {Reviewer, ExternalReviewer}, =

Candidate mappings:
(A’) hasDecision : {Accepted contribution, Acceptance}, >
(B’) ExternalReviewer : {invited by}, =
(C’) Reviewer : {Reviewer}, =

This pre-processing step is followed by an evaluation step, where each candidate
mapping is compared with the reference mapping that has the same key-entity.
This comparison is done by computing the entity-precision and entity-recall of
the value-entities in the candidate mapping against those in the reference map-
ping, and multiplying these with a relation similarity score according to the
following criteria:
1.0 if the candidate and reference mapping have the same relation;
0.8 if the candidate mapping has a narrower relation (i.e. < vs. =, = vs. >);
0.6 if the candidate mapping has a broader relation (i.e. > vs. =, = vs. <);
0.3 otherwise (e.g. < vs. >, > vs. <).

The final score of an alignment is the average of the entity scores. Applying this
evaluation algorithm to the example above would result in the scores listed in
Table 1.
                                  Challenges of evaluating complex alignments      5

Table 1. Scores obtained for the running examples under the Entity-based evaluation
strategy.

                            Entity Entity Relation  Relaxed      Relaxed
Alignment TP FP FN
                           Precision Recall score Precision (%) Recall (%)
    A×A’      2    0   1      1         2/3      0.6           60            40
    B×B’      1    0   0      1          1       1.0          100           100
    C×C’      1    0   1      1         1/2      1.0          100            50
      Final    -   -   -      -          -        -           86.7          63.3

3      Algorithms
3.1     Element-overlap–based evaluation
The element-overlap–based evaluation strategy we propose aims at gauging the
expected effort to manually correct the alignment. It is based on a weighted
Jaccard index between all elements of the mappings being compared (both on-
tology entities, and semantic constructs of the expressions) for scoring. It is not
an edit-distance in the strict sense, as it captures similarity rather than dissimi-
larity. However, it allows us to quantify manual correction effort while reflecting
mapping correctness.
    Given candidate and reference complex alignments, Ac and Aref , stored in
data structures where each mapping is indexed by each of the ontology entities
it contains, our algorithm begins with a pre-processing step, where all candidate
and reference mappings are decomposed into lists of elements. For our running
example, the decomposition would result in the following sets:
Reference mappings:
(A) {hasDecision, Acceptance, or, min, 1, accepted by, =,
     Accepted contribution}
(B) {ExternalReviewer, =, min, 1, InverseOf , invited by}
(C) {Reviewer, or, ExternalReviewer, =, Reviewer}
Candidate mappings:
(A’) {hasDecision, Acceptance, >, Accepted contribution}
(B’) {ExternalReviewer, =, min, 1, invited by}
(C’) {Reviewer, =, Reviewer}
    We then iterate through all candidate mappings and perform anchoring by
finding related reference mappings (i.e., those that share at least one entity
from both ontologies with the candidate mapping). For each related reference
mapping, we compute the weighted Jaccard score between its list of elements
and that of the candidate mapping. The weighted Jaccard score between two
lists Lc and Lref is given by:

                              P
                                  k∈Lc ∪Lref min(count(k, Lc ), count(k, Lref ))
       WJaccard(Lc , Lref ) = P
                                  k∈Lc ∪Lref max(count(k, Lc ), count(k, Lref ))
6      Beatriz Lima, Daniel Faria, and Catia Pesquita

This is an adaptation of the traditional Jaccard score between sets, taking into
account that the same element can occur multiple times in a list (as is the case
in a complex mapping).
    We store the maximum Jaccard score found for each candidate mapping as
well as for each reference mapping, which will be aggregated to compute the
precision and recall respectively. Precision is computed as the average of the
best scores obtained for each mapping in the candidate alignment (Ac ), whereas
Recall is the average of the best scores obtained for each mapping in the reference
alignment (Aref ). The scores for our running example are listed in Table 2.
    The detailed description of our algorithm is provided in Algorithm 1.

       Table 2. Element-overlap scoring for the running example mappings.

                           Example Precision Recall
                             A×A’       3/8       3/8
                             B×B’       5/6       5/6
                             C×C’       3/5       3/5
                             Final     60.3%    60.3%



3.2   Automation of the OAEI entity-based pre-processing

As detailed in Section 2, the OAEI’s entity-based evaluation strategy includes
a manual pre-processing step whereby reference and candidate mappings are
converted into key-value pairs of related entities plus the mapping relation. The
fact this step is manual obviously hinders scalability and reproducibility.
    Our proposed algorithm to automate the pre-processing step of this evalu-
ation strategy aims at emulating the manual process of identifying key-entities
while operating under a set of rules that ensure an objective solution, to enable
reproducibility. First, the reference alignment is converted into key-value pairs
under the following rules:

 1. All mappings that have a single source entity will be identified by that entity
    as key, and have the set of target entities as value. If more than one mapping
    has the same key, the values will be merged.
 2. All mappings that have multiple source entities will be identified by each of
    the source entities that is not already the key of a single-source mapping.
    (a) If there are multiple such source entities, the mapping will be decom-
        posed into a key-value pair with each of those source entities as key, and
        the set of all target entities and all other source entities as value.
    (b) If there are no such source entities and the mapping contains exactly
        two source entities, it will be identified by the set of those two source
        entities as key.
    (c) If there are no such source entities and the mapping contains more than
        two entities, it will be identified by all pairwise combinations of source
        entities that are not keys of two-entity mappings.
                                 Challenges of evaluating complex alignments     7

Algorithm 1 Element-overlap evaluation algorithm

[Pre-processing]
Function convert (A)
init : HashTable lists
for mappingi in A:
      for elementj in mappingi :
            lists. add (mappingi ,elementj )
End Function
init : HashTable listsref = convert (Aref ) ,
      listsc = convert (Ac ) , Scoresref , Scoresc
init : double P recision = 0
for mappingi in Ac :
      [Anchoring]
      init : Ar sources , Ar targets
      for source entityj ∈ mappingi :
            Ar sources . addAll (Aref . get (source entityj ))
      for target entityj ∈ mappingi :
            Ar targets . addAll (Aref . get (target entityj ))
      Arelated = Ar sources . retainAll (Ar targets )
      [Comparison & Scoring]
      for mappingj in Arelated
            sim = WJaccard (listsc . get (mappingi ) ,
                  listsref . get (mappingj ))
            if sim > Scoresref . get (mappingj )
                    Scoresref . add (mappingj , sim )
            if sim > Scoresc . get (mappingi )
                    Scoresc . add (mappingi , sim )
      [Aggregation]
      P recision += Scoresc . get (mappingi )
P recision /= Ac . size
init : double Recall = 0
for mappingi in Aref :
      Recall += Scoresref . get (mappingi )
Recall /= Aref . size


          i. If there are multiple such pairs of source entities, the mapping will
             be decomposed into a key-value pair with each of those pairs as key.
         ii. If there is no such pair, the mapping will be identified by the set of
             all source entities.

Then, the candidate alignment is converted into key-value pairs using analogous
rules, except that the reference alignment is used as anchor. For example, rule
2 becomes:

2’. All mappings that have multiple source entities will be identified by each
    of the source entities that is not the key of a single-source mapping in the
    reference alignment.
8        Beatriz Lima, Daniel Faria, and Catia Pesquita

The same logic is applied to all rules, as the goal is to establish a parallel be-
tween the candidate alignment and the reference alignment so as to enable the
evaluation of the former.


4     Evaluation

4.1     Datasets

The datasets we employed to compare the evaluation strategies were the Con-
ference, Geolink and Hydrography datasets from the OAEI 2020 Complex track,
which are detailed in Table 3. For the Geolink and Hydrography datasets we use
the reference alignment provided by the OAEI, and for the Conference dataset
we employ the reference alignment provided by Thiéblin et al. [9], as the OAEI
evaluation is query-based. We did not use the Enslaved or Taxon datasets from
the OAEI 2020, because we encountered errors in the reference alignment of
the former3 , and no reference alignment was available for the later. We used the
alignments produced by the matching systems competing in the OAEI 2020 that
were able to generate complex mappings: AMLC [3], AROA [11] and CANARD
[10] in the case of Geolink; AMLC and CANARD in Conference; and only AMLC
in Hydrography.
                    Table 3. Description of the evaluation datasets.

                                          Alignment    Mappings
               Dataset       Ontologies
                                            Tasks   Simple Complex
               Conference         5            20        111       184
               GeoLink            2            1          19        48
               Hydrography        4            3         113       84


4.2     Results and Discussion

The results of the two evaluation strategies applied to the OAEI alignments are
presented in Table 4. Element-overlap is our proposed algorithm, OAEI auto.
the OAEI evaluation algorithm using our automated implementation of the pre-
processing step, and OAEI man. the OAEI evaluation algorithm with manual
pre-processing, as published in the OAEI website. For the Conference dataset,
the OAEI evaluation was based on query answering, which is not comparable
with the two evaluation strategies, and therefore omitted from the table.
4.2.1 Entity-based evaluation
The results show that the OAEI entity-based evaluation with automated pre-
processing closely approximates the evaluation with manual pre-processing in
most cases, with the only substantial difference being observed for CANARD
in the Geolink dataset. Nevertheless, it must be noted that the two variants
produced exactly the same results in only one case, for AMLC on the Geolink
3
    Entities in the reference alignment that were not in the ontologies.
                                Challenges of evaluating complex alignments        9

Table 4. Evaluation of OAEI participating systems in the several complex datasets us-
ing our element-overlap evaluation, the OAEI entity-based evaluation using the manual
pre-processing step (OAEI man.) or our automated implementation (OAEI auto.).

         Alignment Evaluation            Precision Recall F-measure
          system    strategy               (%)      (%)      (%)
                                  Conference
                       Element-overlap  38±18         37±10      36±13
            AMLC
                       OAEI auto.       49±14         38±12      42±12
                       Element-overlap    24±13       43±8       29±11
          CANARD
                       OAEI auto.         32±11       43±9       36±10
                                     Geolink
                       Element-overlap      47          21         29
            AMLC       OAEI man.            50          23         32
                       OAEI auto.           50          23         32
                       Element-overlap       72         44         55
            AROA       OAEI man.             87         46         60
                       OAEI auto.            88         46         60
                       Element-overlap   54             33         41
          CANARD       OAEI man.         89             39         54
                       OAEI auto.        84             37         51
                                 Hydrography
                       Element-overlap 43±15          8±10       12±14
            AMLC       OAEI man.       48±17           7±8       12±13
                       OAEI auto.      47±19          8±10       12±14


dataset. This means that our automated implementation did not replicate all
the rules that went into the manual pre-processing of the alignments, although
it provided a reasonable approximation. There were likely additional criteria of
a different nature (e.g. favouring classes over properties as key-entities of map-
pings) which we failed to identify in our analysis of the pre-processed alignments
from the OAEI.
    We must also note that both the OAEI entity-based pre-processing and our
attempt to automate it are unnecessarily complex. Representing mappings by
only key-entities, instead of simply contemplating all the entities in a mapping
seems rather arbitrary, and could conceivably lead to a candidate mapping be-
ing represented by a key entity that would result in it being compared with a
reference mapping that is not the most similar to it. Moreover, the result of this
approach is that the precision and recall scores are neither based on the number
of mappings (as in a traditional evaluation and our element-overlap approach)
nor based on the total number of mapped entities (as in a pure entity-based ap-
proach), but somewhere in between, making them hard to interpret or compare.
On the whole, a complete decomposition of the complex alignment into key-value
pairs that encompass all mapped entities would be both more straightforward
to implement and more intuitive to interpret.
10     Beatriz Lima, Daniel Faria, and Catia Pesquita

4.2.2 Element-overlap vs. entity-based
We can observe from the results that the entity-based evaluation is consistently
more generous in terms of precision than the element-overlap-based evaluation,
while recall tends to be similar for both strategies. This can be attributed to the
fact that the element-overlap approach factors both the ontology entities and the
semantic constructs of the expressions in its scoring, whereas the entity-based
evaluation factors only the entities. Since it is generally easier to automatically
find related entities than to infer the exact semantic relations between them,
matching systems would tend to score higher in precision under an entity-based
evaluation.
    An alignment accurately capturing related entities is the most critical as-
pect for a human reviewer, as finding which entities are related is a more time-
consuming task than assessing how they are related. However, there is still a
cost to the latter, which should be factored into scoring the usefulness of a map-
ping. As an example, consider the two reference mappings (R1, R2) from the
conf erence − conf Of task and the two corresponding hypothetical candidate
mappings (S1, S2):
(R1) Reviewed Contribution = min 1 InverseOf(reviews)
(R2) Reviewer = min 1 reviews
(S1) Reviewed Contribution = min 1 reviews
(S2) Reviewer = min 1 InverseOf(reviews)
Essentially, the candidate mappings have inverted the intended usage of the
reviews property, which would require analysis of the definition of the property to
correct. Yet, under an entity-based evaluation, both candidate mappings would
score 100% in precision and recall, as the presence of the InverseOf construct
is invisible to this evaluation strategy. With our element-overlap, on the other
hand, the construct would be factored into the score, providing a more accurate
measure of the usefulness of the mappings.
    Table 5 summarises how the two strategies address the challenges listed in
Section 2. There are several challenges not addressed by our element-overlap
approach, as we based it on a simple Jaccard index, knowingly sacrificing ac-
curacy for scalability. However, there is no challenge that it addresses worse
than the entity-based approach. In assessing the relation between mappings and
reflecting their usefulness, it is more accurate because it takes the semantic
constructs of the mappings into account. It also accounts for cases of mapping
decomposition, if not very accurately, as it allows multiple candidate mappings
to be compared against the same reference mapping. Since the two approaches
have a similar computational cost (the pre-processing cost is much lower for
the element-overlap, but the comparison cost is higher because each mapping
can be compared with several other mappings), the element-overlap should be
preferred.
4.2.3 Jaccard vs. other similarity metrics
One limitation of our element-overlap strategy is that it doesn’t factor the order
in which the elements appear in a mapping, and thus would not be able to
                                Challenges of evaluating complex alignments      11

Table 5. Challenges addressed by our element-overlap evaluation and the OAEI entity-
based evaluation.

                  Challenge            Element-overlap Entity-based
       (C1) Avoid full pairwise              X              X
       (C2) Relation between mappings        X-            X- -
       (C3|C7) Mapping decomposition        X- -             -
       (C4) Mapping relation                 X              X
       (C5) Reflect usefulness               X-            X- -
       (C6) Partially correct mappings       X-             X-
       (C8) Redundant mappings                -              -

distinguish between cases such as the following two hypothetical mappings that
have the opposite meaning:
    – (Reviewer or ExternalReviewer) and (not Author) = Reviewer
    – not (Reviewer or ExternalReviewer) and author = Reviewer
To accurately capture such cases, we could use a canonical edit-distance ap-
proach, such as Levenshtein, but this would produce erroneous results in cases
where the order of elements is irrelevant, such as within disjunction or conjunc-
tion (e.g. if Reviewer were swapped with ExternalReviewer), and would also
have a higher computational cost. Another limitation of our strategy is that it
cannot capture semantically related mappings (e.g. where one has a subclass or
subproperty of the other) or semantically equivalent mappings that are syntacti-
cally distinct, such as these produced by CANARD for the conf erence−conf Of
task:
    – [contributes and domain(Reviewer)] and
       [reviews and domain(Review)] = reviews
    – [InverseOf(has authors) and domain(Reviewer)] and
       [InverseOf(has a review) and domain(Review)] = reviews
These mappings are semantically equivalent because has authors and contributes
are inverse properties in the conference ontology, and so are has a review and
reviews. To detect such mappings, an evaluation strategy would have to employ
an OWL reasoner, which would have an even greater computational cost. Fur-
thermore, there would be hurdles to overcome in that: (a) complex alignments
are often expressed in the EDOAL format4 which includes semantic constructs
that aren’t expressible in OWL; and (b) even OWL compliant mappings can be
beyond DL semantics and therefore compromise the decidability of the reasoning
problem.
    Thus, while our element-overlap approach only provides a gross estimate of
the usefulness of mappings, providing a significantly more accurate estimate in
a scalable manner is not trivial.

5     Conclusion
We have proposed a novel element-overlap–based evaluation strategy for com-
plex ontology alignments, as well as an automated pre-processing algorithm that
4
    https://moex.gitlabpages.inria.fr/alignapi/edoal.html
12      Beatriz Lima, Daniel Faria, and Catia Pesquita

approximates the manual pre-processing of the entity-based evaluation employed
in the OAEI 2020.
    We conclude that the entity-based evaluation employed in the OAEI is unnec-
essarily complex, and falls shorter of addressing the challenges identified for the
evaluation of complex alignments [12] than our element-overlap strategy. More-
over, while our strategy knowingly sacrifices accuracy for scalability, we argue
that a significant gain in accuracy is not trivial, due to complex mappings often
falling outside DL semantics and thereby leading to an undecidable reasoning
problem.
    Nevertheless, in future work we will explore simple rule-based approaches
for semantic comparison that can provide a more accurate evaluation without
sacrificing scalability.

Acknowledgements The authors would like to thank Lu Zhou for kindly
providing the source code used in the OAEI evaluation. This work was sup-
ported by FCT through the LASIGE Research Unit (UIDB/00408/2020 and
UIDP/00408/2020). It was also partially supported by the KATY project which
has received funding from the European Union’s Horizon 2020 research and in-
novation program under grant agreement No 101017453.


References
 1. Ehrig, M., Euzenat, J.: Relaxed Precision and Recall for Ontology Matching. In:
    Proc. K-Cap 2005 workshop on Integrating ontology. Banff, Canada, ehrig2005a
 2. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer-Verlag, Heidelberg (DE),
    2nd edn. (2013)
 3. Faria, D., Pesquita, C., Balasubramani, B.S., Tervo, T., Carriço, D., Garrilha,
    R., Couto, F.M., Cruz, I.F.: Results of AML participation in OAEI 2018. CEUR
    Workshop Proceedings 2288, 125–131 (2018)
 4. Pesquita, C., Faria, D., Santos, E., Couto, F.M.: To repair or not to repair: reconcil-
    ing correctness and coherence in ontology reference alignments. Ontology Matching
    (2013)
 5. Pour, N., Algergawy, A., Amini, R., Faria, D., Fundulaki, I., Harrow, I., Hertling,
    S., Jimenez-Ruiz, E., Jonquet, C., Karam, N., et al.: Results of the Ontology Align-
    ment Evaluation Initiative 2020. In: Proceedings of the 15th International Work-
    shop on Ontology Matching (OM 2020). vol. 2788, pp. 92–138. CEUR-WS (2020)
 6. Ritze, D., Meilicke, C., Šváb-Zamazal, O., Stuckenschmidt, H.: A pattern-based
    ontology matching approach for detecting complex correspondences. In: ISWC
    Workshop on Ontology Matching, Chantilly (VA US). pp. 25–36 (2009)
 7. Thiéblin, E.: Do competency questions for alignment help fostering complex corre-
    spondences? In: International Conference on Knowledge Engineering and Knowl-
    edge Management (EKAW 2018). pp. 1–8. Nancy, France (Nov 2018)
 8. Thiéblin, É., Cheatham, M., Santos, C., Sváb-Zamazal, O., Zhou, L.: The First
    Version of the OAEI Complex Alignment Benchmark. In: International Semantic
    Web Conference (2018)
 9. Thiéblin, É., Haemmerlé, O., Hernandez, N., Trojahn, C.: Task-oriented complex
    ontology alignment: Two alignment evaluation sets. In: Gangemi, A., Navigli, R.,
                                Challenges of evaluating complex alignments        13

    Vidal, M.E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) The
    Semantic Web. pp. 655–670. Springer International Publishing, Cham (2018)
10. Thiéblin, E., Haemmerlé, O., Trojahn, C.: CANARD complex matching system:
    results of the 2019 OAEI evaluation campaign. vol. 2536, pp. 114–122 (2019)
11. Zhou, L., Hitzler, P.: AROA Results for 2020 OAEI. vol. 2788, pp. 161–167 (2020)
12. Zhou, L., Thiéblin, E., Cheatham, M., Faria, D., Pesquita, C., Trojahn, C., Za-
    mazal, O.: Towards evaluating complex ontology alignments. The Knowledge En-
    gineering Review 35 (2020)