ALIN Results for OAEI 2017

        Jomar da Silva1 , Fernanda Araujo Baião1 , and Kate Revoredo1

       Graduated Program in Informatics, Department of Applied Informatics
        Federal University of the State of Rio de Janeiro (UNIRIO), Brazil
          {jomar.silva, fernanda.baiao,katerevoredo}@uniriotec.br


       Abstract. ALIN is an ontology alignment system specialized in the in-
       teractive alignment of ontologies. Its main characteristic is the selection
       of correspondences to be shown to the expert, depending on the previ-
       ous feedbacks given by the expert. This selection is based on semantic
       and structural characteristics. ALIN has obtained the alignment with
       the highest quality in the interactive tracking for Conference data set.
       This paper describes its configuration for the OAEI 2017 competition
       and discusses its results.

       Keywords: ontology matching, Wordnet, interactive ontology match-
       ing, ontology alignment, interactive ontology alignment


1     Presentation of the system
A large amount of data repositories became available due to the advances in
information and communication technologies. Those repositories, however, are
highly semantically heterogeneous, which hinders their integration. Ontology
alignment has been successfully applied to solve this problem, by discovering cor-
respondences between two distinct ontologies which, in turn, conceptually define
the data stored in each repository. Among the various ontology alignment ap-
proaches that exist in the literature, interactive ontology alignment includes the
participation of experts of the domain to improve the quality of the final align-
ment. This approach has proven more effective than non-interactive ontology
alignment [1]. ALIN is an ontology alignment system specialized in interactive
alignment.

1.1   State, purpose, general statement
ALIN is an ontology alignment system, specialized in the ontology interactive
alignment, based primarily on linguistic matching techniques, using the Wordnet
as external resource. After generating an initial set of correspondences ( called set
of candidate correspondences, which are the correspondences selected to receive
the feedback from the expert ), interactions are made with the expert, and to each
interaction, the set of candidate correspondences is modified. The modification of
the set of candidate correspondences is through the use of the structural analysis
of ontologies and use of correspondence anti-patterns. The interactions continue
until there are no more candidate correspondences left. ALIN was built with a
special focus on the interactive matching track of OAEI 2017.
1.2   Specific techniques used

The ALIN algorithm is shown in algorithm 1.


Algorithm 1 ALIN algorithm
Input: Two ontologies to be aligned
Output: Alignment between the two ontologies
 1: Loading of ontologies
 2: Generation of the initial set of candidate correspondences
 3: Automatic classification of correspondences
 4: Removal of correspondences by the low value of semantic similarity
 5: while Set of candidate correspondences is not empty do
 6:     Choose correspondences to show to the expert
 7:     Receive expert feedback to chosen correspondences and remove them of
    the set of candidate correspondences
 8:     Remove correspondences in an correspondence anti-pattern from set of
    candidate correspondences
 9:     Insert some data property and object property correspondences into set
    of candidate correspondences
10:     Insert some correspondences from the backup set into set of candidate
    correspondences
11: end while


    The steps of ALIN algorithm are the following:
    1. Load of the ontologies with load of classes, object properties and data
properties through the Align API1 . For each entity some data are stored such
as name and label. In the case of classes, their superclasses and disjunctions are
saved. In the case of object properties the properties that are their hypernyms
and their associated classes are saved. The classes of data properties are saved,
too. ALIN does not use instances. The ALIN can only work with ontologies
whose entity names are in English.
    2. As an initial set of candidate correspondences a stable marriage algorithm
with incomplete preference lists with maximum size of the list equals to 1, using
linguistic metrics to sort the priority list was used [2]. The list is sorted in
decreasing order. For this algorithm only the correspondence whose first entity
is in the list of second entity and vice-versa is selected. The linguist metrics used
are Jaccard, Jaro-Winkler and n-Gram [3] provided by Simmetrics API2 and

1
  Alignment API . Available at http://alignapi.gforge.inria.fr/ Last accessed on Oct,
  10, 2017.
2
  String Similarity Metrics for Information Integration             . Available on
  http://www.coli.uni-saarland.de/courses/LT1/2011/slides/stringmetrics.pdf. Last
  accessed on Oct, 10, 2017.
Resnick, Jiang-Conrath and Lin [3] provide by HESML API3 that use Wordnet.
To use Wordnet the canonical form of the entity names is needed, therefore
Stanford CoreNLP API4 was used. The most frequent synsets of words are used
to calculate semantic similarities. To find this synset is used the WS4J API5 .
The algorithm is run six times, once by each metric, and the result set is the
union of results of each metric.
    3. The value of the similarity metrics ( Resnick, Jiang-Conrath, Lin, Jaccard,
Jaro-Winkler and n-Gram ) vary from 0 to 1 ( 1 is the maximum value ). When
a correspondence in the set of candidate correspondences has all the six metrics
with the maximum value, it is added to the final alignment and removed from
the set of candidate correspondences. There are exceptions to this rule, some
correspondences that fall into some structural patterns are not put on the final
alignment and are not removed from the set of candidate correspondences.
    4. The correspondences whose entities has one of its linguistic metrics less
than a given threshold are removed from the set of candidate correspondences.
These correspondences are put into a backup set, and can return to the set of
candidate correspondences using structural analysis. The use of this technique
can best be seen in [4], with the difference that, in [4], instead of applying a
threshold, it was removed the classes of correspondences that were not in the
same Wordnet synset.
    5-11. At this point the interactions with the expert begin. The correspon-
dences in the set of candidate correspondences are sorted by the sum of similarity
metric values, with the greatest sum first. The correspondences are showed to the
expert. The set of candidate correspondences has, at first, only correspondences
of classes. When the expert answer one question, the set of candidate correspon-
dences is modified. Correspondences ( besides the correspondence answered by
expert ) can be removed and correspondences can be included into the set of
candidate correspondences, depending on the answer of the expert. If the expert
does not accept the correspondence it is removed from the set of candidate cor-
respondences. But if the expert accepts the correspondence it is removed from
the set of candidate correspondences and put in the final alignment.
    At each interaction with the expert:
    - We remove from the set of candidate correspondences and disregard all the
correspondences that are in correspondence anti-pattern [5] with the correspon-
dences accepted by the expert;
    - We insert into the set of candidate correspondences, data property and
object property correspondences related to the class correspondences accepted
by the expert.

3
  HESML. Available at https://www.researchgate.net/publication/313881253 HESML A
   scalable ontologybased semantic similarity measures library with a set of reproducible
   experiments and a replication dataset Last accessed on Oct, 10, 2017.
4
  Stanford CoreNLP . Available at http://stanfordnlp.github.io/CoreNLP/ Last ac-
  cessed on Oct, 10, 2017.
5
  WS4J . Available at https://github.com/Sciss/ws4j Last accessed on Nov, 08, 2017.
    - We insert into the set of candidate correspondences, correspondences of
the backup set ( step 4 ) whose both entities are subclasses of the classes of a
correspondence accepted by expert.
    This step continues until the set of candidate correspondences is empty.
    Detailed information about the ALIN system can be seen in the master thesis
of Jomar da Silva6 .

1.3   Link to the system and parameters file
ALIN is available through Google drive (
    https://drive.google.com/open?id=1myVtcRoKKdUDHQTKNKsomna8AFbukanf)
as a package for running through the SEALS client.

2     Results
The system ALIN has been developed with its focus on interactive ontology
alignment. The approach performs better when the number of data and object
properties is proportionately large. ALIN considers properties associated to cor-
respondent classes when selecting entities for user feedback, thus allowing for
increased recall. When the number of properties in the ontologies is small, the
system still generates a very precise alignment, but its recall tends to decrease.
    Another characteristic of ALIN is its reliance on an interactive phase. The
non-interactive phase of the system is quite simple, mainly based on maximum
string similarity, specializing in maintaining a high precision without worrying
about recall, generating initially a low f-measure. The recall increases in the in-
teractive phase. Finally, ALIN is also not robust to users errors. The system uses
a number of techniques that take advantage of the expert feedback to reach other
conclusions. When the expert gives a wrong answer it is propagated generating
other errors, thereby decreasing the f-measure.

2.1   Comments on the participation of the ALIN in non-interactive
      tracks
As expected the participation of ALIN in non-interactive alignment processes
showed the following results: high precision and not so high recall, as can be
seen in Anatomy track7 shown in Table 1, where recall+ field refers to non-
trivial correspondences found and Coherent field filled by + indicates that the
generated alignment is consistent.
6
  INTERACTIVE ONTOLOGY ALIGNMENT: AN APPROACH BASED ON
  THE INTERACTIVE MODIFICATION OF THE SET OF CANDIDATE
  CORRESPONDENCES             . Available at http://www2.uniriotec.br/ppgi/banco-
  de-dissertacoes-ppgi-unirio/ano-2017/interactive-ontology-alignment-an-approach-
  based-on-the-interactive-modification-of-the-set-of-candidate-correspondences/view
  Last accessed on Nov, 12, 2017.
7
  Results     for   OAEI      2017    -     Anatomy      track    .    Available   at
  http://oaei.ontologymatching.org/2017/results/anatomy/index.html Last accessed
  on Nov, 012, 2017.
    Regarding the Conference track8 , as ALIN evaluates only the properties as-
sociated with classes already evaluated as belonging to the alignment, the align-
ment of the M2 type (which take into account only the properties of ontologies)
were with the f-measure = 0, as can be seen in Table 2. As properties are eval-
uated only in the interactive phase in the ALIN, alignments of type M1 (only
classes) remained with a higher recall than M3 (classes and properties), as can be
seen in Table 2, because the reference alignments of type M3 contain properties
besides classes.

           Table 1. Participation of ALIN in Anatomy non-interactive track

            Runtime Size Precision F-Measure Recall Recall+ Coherent
               836      516   0.996      0.506     0.339   0.0           +


           Table 2. Participation of ALIN in Conference non-interactive track

              Threshold Precision Recall F1-Measure F2-Measure F.5-Measure
    ra1+m1        0.0         0.89    0.32       0.47       0.37             0.66
    ra1+m2        0.0          0.0     0.0        0.0        0.0              0.0
    ra1+m3        0.0         0.89    0.27       0.41       0.31             0.61


2.2     Comments on the participation of the ALIN in interactive
        tracks


      Table 3. Participation of ALIN in Anatomy interactive track - Error rate 0.0

    Tool    Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
 ALIN             1074           0.993   0.794     0.882           939               1472
 AML               45            0.968   0.948     0.958           241                240
LogMap             23            0.982   0.846     0.909           388               1164
 XMap              43            0.927   0.865     0.895            35                 35


8
    ”Results of Evaluation for the Conference track within OAEI 2017 . Available at
    http://oaei.ontologymatching.org/2017/conference/eval.html Last accessed on Nov,
    12, 2017.
     Table 4. Participation of ALIN in Anatomy interactive track - Error rate 0.1

    Tool   Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
  ALIN           1000          0.94     0.745    0.831           905                  1352
  AML             45           0.956    0.946     0.95           266                   264
 LogMap           23           0.962    0.83     0.891           388                  1164
  XMap            44           0.927    0.865    0.895            35                    35

    Table 5. Participation of ALIN in Conference interactive track - Error rate 0.0

    Tool   Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
  ALIN            35           0.957    0.731    0.829           329                  571
  AML             30           0.912    0.711    0.799           271                  270
 LogMap           35           0.886    0.61     0.723            82                  246
  XMap            21           0.837    0.57     0.678            4                    4

    Table 6. Participation of ALIN in Conference interactive track - Error rate 0.1

    Tool   Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
  ALIN            35           0.804    0.669     0.73           321                  549
  AML             30           0.841    0.701    0.765           282                  275
 LogMap           35           0.851    0.598    0.702            82                  246
  XMap            21           0.837    0.57     0.678            4                    4


Anatomy track In this track the program ALIN showed the highest precision
among the four evaluated tools when the error rate is zero, as can be seen in Table
3. When the error rate increases both the precision as the recall falls, reducing
the f-measure, as can be seen in Table 4. This is expected and explained earlier.
    As ontologies of the Anatomy Track contains almost no properties, some
interactive techniques used in ALIN can not be utilized, like the selection of
properties associated with classes with positive feedback. This has limited the
increase in recall, which influenced the f-measure.


Conference Track In this track ALIN stood out, showing the greatest f-
measure among the four tools when the error rate is zero, as can be seen in
5, as with a loss of f-measure when the error rate increases, as can be seen in
Table 6.
    Other results, including results with other error rates can be seen on the
OAEI 20179 page.
9
    Results    for    OAEI    2017    -   Interactive    Track    .   Available        at
    http://oaei.ontologymatching.org/2017/results/interactive/index.html Last         ac-
    cessed on Nov, 11, 2017.
2.3     Comparison of the participation to ALIN in OAEI 2017 with
        his participation in OAEI 2016
The difference between the participation of ALIN in OAEI 2016 and his partici-
pation in OAEI 2017 was the use of the HESML API in 2017 instead of the WS4J
API in calculating semantic similarities, which greatly increased the efficiency
in these calculations. In ALIN’s participation in OAEI 2016[6], three seman-
tic similarity metrics were used: Wu-Palmer, Jiang-Conrath and Lin. In ALIN’s
participation in OAEI 2017 the metrics Resnick, Jiang-Conrath and Lin were
used. Resnick’s exchange of Wu-Palmer is due to the fact that the Wu-Palmer
metric in the HESML API took longer to execute than the same metric in the
WS4J API. The Resnick metric proved to be much faster than the Wu-Palmer
metric in the HESML API and according to [7] as good as, so the Resnick metric
was chosen to take Wu-Palmer’s place in the implementation of ALIN at OAEI
2017. More information about the HESML API can be found in [8]. In table 7.
it can be seen that the ALIN runtime has decreased considerably with the use
of the HESML API instead of the WS4J API. In the Anatomy interactive track
of OAEI 2016, ALIN did not use the semantic metrics, only the string metrics,
since the semantic metrics were taking a long time, making it impossible to ex-
ecute it. In OAEI 2017, using the HESML API, it was possible to use semantic
metrics, which led to an increase in the quality of the alignment generated, but
with an increase in the expert’s participation. The execution time also increased
with the inclusion of semantic metrics, as we can see in table 8.

Table 7. Participation of ALIN in Conference interactive track - OAEI 2016/2017-
Error rate 0.0

 Year Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
 2016        101         0.957    0.735    0.831          326                574
 2017         35         0.957    0.731    0.829          329                571


Table 8. Participation of ALIN in Anatomy interactive track - OAEI 2016/2017- Error
rate 0.0

 Year Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings
 2016        505         0.993    0.749    0.854          803               1221
 2017       1074         0.993    0.794    0.882          939               1472


3     General Comments
Evaluating the results it can be seen that the system can be improved towards:
    (a) handling user error rate;
    (b) generating a higher quality (especially w.r.t. recall) initial alignment in
its non-interactive phase;
    (c) reducing the number of interactions with the expert; and
    (d) optimize the process to reduce its execution time, especially in alignments
with large numbers of correspondences, such as Anatomy.


3.1   Conclusions
Within certain characteristics, the ALIN system stands out in ontology align-
ment process in interactive application scenarios, especially when the amount
of data and object properties are relatively large and when the expert does not
make mistakes. With these features there is an alignment generated with rela-
tively high precision and recall.
    The third author was partially funding by project PQ-UNIRIO N01/2017 (”
Aprendendo, adaptando e alinhando ontologias:metodologias e algoritmos.”) and
CAPES/PROAP.


References
1. H. Paulheim, S. Hertling, and D. Ritze, Towards Evaluating Interactive Ontology
   Matching Tools, Lect. Notes Comput. Sci., vol. 7882, pp. 31-45, 2013.
2. R. W. Irving, D. F. Manlove, and G. OMalley, Stable marriage with ties and bounded
   length preference lists J. Discret. Algorithms, vol. 7, no. 2, pp. 213-219, 2009.
3. J. Euzenat and P. Shvaiko, Ontology Matching - Second Edition, 2. Springer-Verlag,
   2013.
4. Silva, J., Baião, F. A., Revoredo, K., & Euzenat, J. (n.d.). Semantic Interactive
   Ontology Matching : Synergistic Combination of Techniques to Improve the Set of
   Candidate Correspondences.
5. A. Guedes, F. Baião, e K. Revoredo, Digging Ontology Correspondence Antipat-
   terns, Proceeding WOP14 Proc. 5th Int. Conf. Ontol. Semant. Web Patterns, vol.
   1302, p. 3848, 2014.
6. J. Silva, F. A. Baião, and K. Revoredo, ALIN Results for OAEI 2016, CEUR Work-
   shop Proc., vol. 1766, 2016.
7. E. G. M. Petrakis, G. Varelas, A. Hliaoutakis, and P. Raftopoulou, Design and
   Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same
   or Different Ontologies object instrumentality, Proc. 4th Work. Multimed. Semant.,
   vol. 4, pp. 233-237, 2006.
8. Lastra-Dı́az, J. J., Garcı́a-Serrano, A., Batet, M., Fernández, M., & Chirigati, F.
   (2017). HESML: A scalable ontology-based semantic similarity measures library
   with a set of reproducible experiments and a replication dataset. Information Sys-
   tems, 66, 97118. http://doi.org/10.1016/j.is.2017.02.002