Search Space Reduction for Post-Matching
           Correspondence Provisioning

                                        Thomas Kowark and Hasso Plattner

                               Hasso Plattner Institute, Potsdam, Germany
                                     {firstname.lastname}@hpi.de,
                                           http://www.hpi.de

    If users participate in ontology matching, the goal always is to minimize
the amount of necessary interactions while maximizing the gains in alignment
quality [2]. Interaction can either happen pre-matching (selection of matching
systems or parameter tuning), during the matching process (judging intermedi-
ate results or providing sample correspondences), or post-matching (detecting
incorrect correspondences and providing missing ones). In this paper, we eval-
uate an approach that aims to reduce post matching interactions by exploiting
concept proximity within ontologies. An initial analysis of reference alignments
available for OAEI revealed that, if a correspondence for one element (class or
property) of an ontology exists, the probability that a correspondence also exists
for a closely connected element is higher than for unconnected elements. Based
on this finding, we extracted the closeness criteria depicted in Figure 1. For
evaluation, we applied the criteria on candidate alignments that were created by
top-performing systems of OAEI 2014 for the anatomy, library, and conference
tracks. For each criterion, we determined which elements it would add to the task
set, i.e., the selection of ontology elements a user should provide correspondences
for. Based on these task sets (U T ) we calculated the expected number of inter-
actions (IE) it would on average take to provide all included correspondences
(IC), if elements were presented to the user at random. To assess whether our
selection technique is viable, we further compared this value to the amount of in-
teractions it would take users on average to provide the same amount of missing
correspondences, if tasks were randomly selected from the entirety of elements
that are not included in correspondences after initial, automatic matching.


                                   16
                               1               3             7                    4
     Domain                        Super           Sibling       Domain    Datatype
                                                                                            Dataype
      Class                        Class           Class          Class    Property

                 O                                                                                       13   := 6 + 7
              Pr bje
                op ct                                                                                    14   := 7 + 8
                   er
                      ty
                           6
                                                   17                        Object                      15   := 5 + 8
                                               2             7              Property        8
     Range     Object 5             Class          Sub           Domain      [equiv.] 12        Range
     Class    Property             [equiv.]        Class          Class     Inverse 9           Class
                                        12                                  Property

                                                                 Subproperty 10       Superproperty 11


Figure 1. Connections considered for element proximity. Matched elements are de-
picted bold, ignored entities in italic. Visual Notation for OWL Ontologies (VOWL)[1]
    The ratio between the two values is called task set compression. Minimal
criteria sets denote the closeness criteria, which yield the corresponding task
sets. Since we strive for minimization of user tasks, only the smaller ontology in
terms of concept count was considered. As shown in Table 1, an average task set
reduction of 60% could be achieved for the conference ontologies of OAEI, while
increasing the recall from 0,62 to 0,956. For taxonomy-like ontologies, such as
the ones used in the library and anatomy tracks, only marginal compression or
even increase in interaction expectancy was achieved. Future work will therefore
focus on such cases by finding other, more suitable task selection criteria and
adapting existing ones, e.g., by limiting the depth of hierarchy traversal for
class relationships. Furthermore, correspondences generated through different
matcher settings (high precision vs. high recall) could be explored in addition
to criteria based solely on ontology structures in order to yield smaller task sets
with an increased potential success ratio for user interactions.

               ontologies       |U T | IC   IE   Compression Minimal Criteria Sets Rcand Rcomp
               cmt-conference    12    4    10    0,2        [9, 17]                 0,6   0,87
               confof-conference 11    3    9     0,19       [13]                    0,73 0,93
               conference-edas 36      5    31    0,39       [1, 2]                  0,65 0,94
               ekaw-conference 38      8    35    0,51       [3, 6, 16]              0,6   0,92
               conference-iasted 51    7    46    0,57       [1, 2, 5]               0,36 0,86
               sigkdd-conference 13    4    11    0,25       [4, 9, 12]              0,6   0,87
               confof-cmt        31    9    29    0,48       [2, 4, 6, {7, 8, 14}]   0,38 1
               cmt-edas          27    4    22    0,35       [6]                     0,69 1
               cmt-ekaw          38    5    33    0,49       [5, 9, 17]              0,55 1
               cmt-iasted        36    0    36    0,44       []                      1     1
               sigkdd-cmt        7     1    4     0,13       [2]                     0,92 1
               confof-edas       26    8    24    0,43       [1, 3, 5, 9]            0,58 1
               confof-ekaw       18    4    15    0,33       [1, 13, 15]             0,8   1
               confof-iasted     30    5    26    0,45       [1, 3, 15]              0,44 1
               confof-sigkdd     16    4    13    0,25       [4, 17]                 0,57 1
               ekaw-edas         63    11 59      0,71       [9, 13, 17]             0,52 1
               edas-iasted       35    8    32    0,28       [2]                     0,53 0,95
               sigkdd-edas       20    5    18    0,37       [2, 4]                  0,6   0,93
               ekaw-iasted       73    3    56    0,76       [2, 15], [2, 13]        0,7   1
               sigkdd-ekaw       22    4    18    0,32       [3, 15]                 0,64 1
               sigkdd-iasted     36    0    36    0,58       []                      0,87 0,87
               avg(conference) 30,4 4,8 26,8 0,404                                   0,635 0,956
               mouse-human       683 57 672 1,09             [2, 3, 16]              0,9   0,94
               stw-thesoz        3604 169 3584 0,97          [2, 3]                  0,78 0,84
               fma-nci           1011 216 1007 1,08          [2, 3, 16]              0,85 0,93
               fma-snomed        3485 1997 3484 0,99         [2, 3]                  0,71 0,95
               nci-snomed        10008 2281 10005 0,94       [1, 2, 3, 12]           0,67 0,82
Table 1. Overview about the maximal task reduction that could be achieved using
minimal criteria sets. The used criteria are numbered according to Figure 1. Rcand is
the recall achieved by the automatic matcher, Rcomp the recall after user interaction.


References
1. Lohmann, S., Negru, S., Haag, F., Ertl, T.: VOWL 2: User-Oriented Visualization
   of Ontologies. In: Proceedings of the 19th International Conference on Knowledge
   Engineering and Knowledge Management. pp. 266–281. EKAW ’14 (2014)
2. Shvaiko, P., Euzenat, J.: Ontology matching: State of the art and future challenges.
   IEEE Trans. on Knowl. and Data Eng. 25(1), 158–176 (Jan 2013)