Is my ontology matching system similar to yours? ⋆

             Ernesto Jiménez-Ruiz, Bernardo Cuenca Grau, Ian Horrocks

             Department of Computer Science, University of Oxford, Oxford UK


       Abstract. In this paper we extend the evaluation of the OAEI 2012 Large BioMed
       track, which involves the matching of the semantically rich ontologies FMA, NCI
       and SNOMED CT. Concretely, we report about the differences and similarities
       among the mappings computed by the participant ontology matching systems.


1 Introduction
The quality of the mappings computed by an ontology matching system in the Ontology
Alignment Evaluation Initiative (OAEI) [2, 1] is typically measured in terms of preci-
sion and recall with respect to a reference set of mappings. Additionally, the OAEI also
evaluates the coherence of the computed mappings [1].
    However, the differences and similarities among the mappings computed by differ-
ent systems have often been neglected in the OAEI.1 In this paper we provide a more
fine-grained comparison among the matching systems participating in the OAEI 2012
Large BioMed track;2 concretely (i) we have harmonised (i.e. voted) the computed map-
ping sets, and (ii) we provide a graphical representation of the similarity of these sets.

2 Mapping harmonization
We have considered the mappings voted (i.e. included in the output) by at least one
ontology matching system. Figure 1 shows the harmonization (i.e. voting) results for the
FMA-NCI and FMA-SNOMED matching problems. Mappings have received at most
11 and 8 votes (i.e. number of participating systems3 ), respectively. For example, in the
FMA-NCI matching problem, 3,719 mappings have been voted by at least 2 systems.
    Figure 1 also shows the evolution of F-score, Precision and Recall for the different
harmonized mapping sets. As expected the maximum recall (respectively precision) is
reached with the minimum (respectively maximum) number of votes. For example, the
maximum recall in the FMA-SNOMED problem is 0.81, which shows the difficulty of
identifying correct mappings in this matching problem.
    The harmonized mapping sets with the best trade-off between precision and recall
have been selected as the representative mapping sets of the participating ontology
matching systems. For the FMA-NCI matching problem we have selected the mappings
sets with (at least) 3, 4 and 5 votes, while in the FMA-SNOMED matching problem we
have selected the sets with (at least) 2 and 3 votes (see dark-grey bars in Figure 1).
⋆
   This research was financed by the Optique project with the grant agreement FP7-318338.
 1
   As far as we know, only in the 2007 Anatomy track some effort was done in this line: http:
   //oaei.ontologymatching.org/2007/results/anatomy/
 2
   Results available at: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
 3
   Systems with several variants have only been considered once in the voting process.
                                                                                                                 Size                                                                                                                                         Size
                    1.2                                                                                      Precision                                                   1.2                                                                              Precision
                                                                                                                Recall                                                                                                                                       Recall
                                                                                                            F-measure                                                                                                                                    F-measure

                                                                                                                                                                                      17,020
                     1             11,160                                                                                                                                 1


                                                                                                                                Number of mappings


                                                                                                                                                                                                                                                                               Number of mappings
                    0.8                                                                                                                                                  0.8
Score


                                                                                                                                                     Score
                    0.6                                                                                                                                                  0.6

                                                                                                                                                                                               7,997

                    0.4                                                                                                                                                  0.4                              6,592
                                            3,719
                                                    3,074   2,862                                                                                                                                                      4,218
                                                                      2,739     2,563   2,420   2,292
                    0.2                                                                                 2,077                                                            0.2                                                       2,846
                                                                                                                1,630
                                                                                                                                                                                                                                              2,052
                                                                                                                                                                                                                                                         1,610
                                                                                                                          812                                                                                                                                       1,002

                     0                                                                                                                                                    0
                                   1        2       3       4        5          5       7       8       9       10       11                                                           1        2          3            4           5          5         7          8
                                                                     Number of votes                                                                                                                                 Number of votes


                   Fig. 1: Harmonisation in the FMA-NCI (left) and FMA-SNOMED (right) problems
                                                                                                                                                                        0.60
                   0.40                                                                                                                                                                                                                           GOMMA-Bk


                                                                    LogMap                                                                                              0.50
                                                                    LogMap-noe

                   0.30                                                                                                                                                                                                                                             UMLSL
                                                                                                                                                                        0.40 GOMMA                                                                                   UMLSA
                                                                                                                                                                                                                                                                       UMLS
Jaccard distance


                                                                                                                                                     Jaccard distance
                                                              ServOMapL
                                                                  YAM++
                                                            ServOMap                                                                                                    0.30
                   0.20
                                                                               Vote5
                                                                                Vote4                                 UMLSL
                                                                            Vote3                                                                                                                                                                                  Vote2
                                                                                                                       UMLSA
                                                                     GOMMA                                                                                              0.20
                                                                                                                     UMLS
                                                                                        GOMMA-Bk                                                                                                                                                                  YAM++
                   0.10                                                                                                                                                                                                                                            ServOMapL
                                                                                                                                                                                                                                                                  Vote3
                                                                                                                                                                        0.10                                                                                      ServOMap
                                                                                                                                                                                                                                                              LogMap-noe
                                                                                                                                                                                                                                                              LogMap

                              LogMapLt                                                                                                                                             LogMapLt
                     0                                                                                                                                                    0
                          0                     0.10                     0.20                   0.30                     0.40                                                  0      0.10         0.20       0.30          0.40           0.50        0.60        0.70        0.80
                                                                         Jaccard distance                                                                                                                             Jaccard distance


Fig. 2: Mapping similarity in the FMA-NCI (left) and FMA-SNOMED (right) problems


3 Mapping similarity among systems
We have compared the similarity among (i) the representative mapping sets from the
harmonisation (see Section 2), (ii) the UMLS-based reference mappings of the track,
and (iii) the mapping sets computed by the top-8 ontology matching systems in the
FMA-NCI and FMA-SNOMED matching problems [1]. To this end we have calculated
the jaccard distance (|MA ∪ MB | − |MA ∩ MB |)/|MA ∪ MB |, which ranges from 0
(the same) to 1 (different), between each pair (MA and MB ) of the mapping sets from
(i)-(iii), and represented such distances in a two-dimensional scatterplot (see Figure 2).
System names which are distant to each other indicate that their computed mappings
differ to a large degree. For example, in Figure 2 (right), the mappings computed by
LogMapLt and GOMMA are very different with respect to the mappings computed by
other systems, as well as with respect to the harmonized and reference mapping sets.


References
1. Aguirre, J., et al.: Results of the Ontology Alignment Evaluation Initiative 2012. In: Ontology
   Matching Workshop. Vol-946 of CEUR Workshop Proceedings (2012)
2. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment
   evaluation initiative: Six years of experience. J. Data Sem. 15, 158–192 (2011)