=Paper= {{Paper |id=Vol-1766/oaei16_paper1 |storemode=property |title=ALIN results for OAEI 2016 |pdfUrl=https://ceur-ws.org/Vol-1766/oaei16_paper1.pdf |volume=Vol-1766 |authors=Jomar da Silva,Fernanda Baião,Kate Revoredo |dblpUrl=https://dblp.org/rec/conf/semweb/SilvaBR16 }} ==ALIN results for OAEI 2016== https://ceur-ws.org/Vol-1766/oaei16_paper1.pdf
                      ALIN Results for OAEI 2016

            Jomar da Silva, Fernanda Araujo Baião and Kate Revoredo
                              Department of Applied Informatics
      Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Brazil
       {jomar.silva,fernanda.baiao,katerevoredo}@uniriotec.br

       Abstract. ALIN is an ontology alignment system specialized in the interactive
       alignment of ontologies. Its main characteristic is the selection of correspondences
       to be shown to the expert, depending on the previous feedbacks given by the
       expert. This selection is based on semantic and structural characteristics. ALIN
       has obtained the alignment with the highest quality in the interactive tracking for
       Conference data set. This paper describes its configuration for the OAEI 2016
       competition and discusses its results.


       Keywords: Interactive Ontology Matching; Anti-patterns;




1      Presentation of the system

A large amount of data repositories became available due to the advances in
information and communication technologies. Those repositories, however, are
highly semantically heterogeneous, which hinders their integration. Ontology
alignment has been successfully applied to solve this problem, by discovering
correspondences between two distinct ontologies which, in turn, conceptually define
the data stored in each repository. Among the various ontology alignment
approaches that exist in the literature, interactive ontology alignment includes the
participation of experts of the domain to improve the quality of the final alignment.
This approach has proven more effective than non-interactive ontology alignment
[1]. ALIN is an ontology alignment system specialized in interactive alignment. This
is the first version of the system.


1.1    State, purpose, general statement

ALIN is an ontology alignment system, specialized in the ontology interactive
alignment, based primarily on linguistic matching techniques, using the Wordnet as
external resource. After generating an initial set of correspondences ( called set of
candidate correspondences, which are the correspondences selected to receive the
feedback from the expert ), interactions are made with the expert, and to each
interaction, the set of candidate correspondences is modified. The modification of
the set of candidate correspondences is through the use of the structural analysis of
ontologies and use of alignment anti-patterns. The interactions continue until there
are no more candidate correspondences left. ALIN was built with a special focus on
the interactive matching track of OAEI 2016.


1.2    Specific techniques used
The ALIN workflow is shown in figure 1.




                           Fig. 1. – Workflow of ALIN
The steps of ALIN workflow are the following:
       1.         Load of the ontologies with load of classes, object properties and data prop-
          erties through the Align API1. For each entity some data are stored such as name and
          label. In the case of classes, their superclasses and disjunctions are saved. In the case
          of object properties are saved the properties that are their hypernyms and their asso-
          ciated classes. The classes of property data are saved, too. ALIN does not use in-
          stances. After loading, the matching problem is profiled taking into account the size
          of the ontologies. The ALIN can only work with ontologies whose entity names are
          in English.
       2.         As an initial set of candidate correspondences a stable marriage algorithm
          with incomplete preference lists with maximum size of the list equals to 1, using lin-
          guistic metrics to sort the priority list was used [2]. The list is sorted in decreasing
          order. For this algorithm only the correspondence whose first entity is in the list of
          second entity and vice-versa is selected. The linguist metrics used are Jaccard, Jaro-
          Winkler and n-Gram [3] provided by Simmetrics API2 and Wu-Palmer, Jiang-Con-
          rath and Lin [3] provide by ws4j API3 that use Wordnet. To use Wordnet the canon-
          ical form of the word is needed, therefore Stanford CoreNLP API4 was considered.
          The algorithm is run six times, once by each metric, and the result set is the union of
          results of each metric.
       3.         The value of the similarity metrics ( Wu-Palmer, Jiang-Conrath, Lin, Jaccard,
          Jaro-Winkler and n-Gram ) vary from 0 to 1 ( 1 is the maximum value ). When a cor-
          respondence in the set of candidate correspondences has all the six metrics with the
          maximum value, it is added to the final alignment and removed from the set of can-
          didate correspondences. There are exceptions to this rule, some correspondences
          that fall into some structural patterns are not put on the final alignment and are not
          removed from the set of candidate correspondences.
       4.         The correspondences whose entities are not in the same synset of wordnet are
          removed from the set of candidate correspondences. These correspondences are put
          into a backup set, and can return to the set of candidate correspondences using struc-
          tural analysis.
       5.          At this point the interactions with the expert begin. The correspondences in
          the set of candidate correspondences are sorted by the sum of similarity metric
          values, with the greatest sum first. The options are showed one by one to the expert.
          The first correspondence is showed and it is removed from the list after the answer
          of the expert. The set of candidate correspondences has, at first, only
          correspondences of classes. When the expert answer one question, the set of
          candidate correspondences is changed. Correspondences ( besides the

1   “ Alignment API ” . Available at http://alignapi.gforge.inria.fr/ Last accessed on Apr, 11, 2016.

2   “ S tring Similarity Metrics for Information Integration ” . Available on http://www.coli.uni-saarland.de/cour-
    ses/LT1/2011/slides/stringmetrics.pdf. Last accessed on Apr, 19, 2016.

3   “ WS4J ” . Available at https://code.google.com/archive/p/ws4j/ Last accessed on Apr, 11, 2016.

4   “ S tanford CoreNLP ” . Available at http://stanfordnlp.github.io/CoreNLP/ Last accessd on Sept, 15, 2016.
correspondence answered by expert ) can be removed and included, depending on
the answer of the expert. If the expert does not accept the correspondence it is
removed from the set of candidate correspondences. But if the expert accepts the
correspondence it is removed from the set of candidate correspondences and put in
the final alignment.


At each interaction with the specialist we also:

- We remove from the set of candidate correspondences and disregard all the corre-
spondences that are in anti-pattern of alignment [4]with the correspondence accepted
by the expert;
- We insert into the set of candidate correspondences, correspondences of data prop-
erties and correspondences of object properties related to the correspondence of
classes accepted by the expert.
- We insert into the set of candidate correspondences, correspondences of the backup
set ( step 4 ) whose both entities are subclasses of the classes of a correspondence
accepted by expert.

This step continues until the set of candidate correspondences is empty.


1.3    Link to the system and parameters file
ALIN is available through Mediafire (https://www.mediafire.com/folder/726zo-
hj792kod/ALIN) as a package for running through the SEALS client.


2      Results

The system ALIN has been developed with its focus on interactive ontology
alignment. The approach performs better when the number of data and object
properties is proportionately large. ALIN considers properties associated to
correspondent classes when selecting entities for user feedback, thus allowing for
increased recall. When the number of properties in the ontologies is small, the system
still generates a very precise alignment, but its recall tends to decrease.

Another characteristic of ALIN is its reliance on an interactive phase. The non-
interactive phase of the system is quite simple, mainly based on maximum string
similarity, specializing in maintaining a high precision without worrying about recall,
generating initially a low f-measure. The recall increases in the interactive phase.
Finally, ALIN is also not robust to users errors. The system uses a number of
techniques that take advantage of the expert response to reach other conclusions
when the expert gives a wrong answer it is propagated generating other errors,
thereby diminishing the f-measure.
2.1    Comments on the participation of the ALIN in non-interactive tracks
As expected the participation of ALIN in non-interactive alignment processes
showed the following results: high precision and not so high recall, as can be seen in
Table 1, where recall+ field refers to non-trivial correspondences found and Coherent
field filled by + indicates that the generated alignment is consistent.



                     Table 1. - Participation of ALIN in Anatomy track




Table 2. - Participation of ALIN in Conference track taking into account only the classes (m1)
                      , and the reference alignment publicly available (r1) .



 Table 3. - Participation of ALIN in Conference track taking into account only the properties
                    (m2) and the reference alignment publicly available (r1 )



  Table 4. - Participation of ALIN in Conference track taking into account the classes and
            properties (m3), and the reference alignment publicly available (r1).


Regarding the Conference track, as ALIN evaluates only the properties associated
with classes already evaluated as belonging to the alignment, the alignment of the
M2 type (which take into account only the properties of ontologies) were with the f-
measure = 0, as can be seen in Table 3. As properties are evaluated only in the
interactive phase in the ALIN, alignments of type M1 (only classes) remained with a
higher recall than M3 (classes and properties), as can be seen in Tables 2 and 4,
because the reference alignments of type M3 contain properties besides classes.


2.2    Comments on the participation of ALIN in interactive tracks

Anatomy track.
In this track the program ALIN showed the highest precision among the four
evaluated tools when the error rate is zero. When the error rate increases both the
precision as the recall falls, reducing the f-measure. This is expected and explained
earlier.
Table 5. - Participation of ALIN in interactive alignment - Anatomy track.
Table 6. - Participation of ALIN in interactive alignment - Conference track.
As ontologies of the Anatomy Track contains almost no properties, techniques used
in ALIN can not be utilized, the selection of properties associated with classes as-
sessed as belonging to the alignment, this has limited the increase in recall, which in-
fluenced the f-measure, as can be seen in Table 5.

Conference Track.
In this track ALIN stood out, showing the greatest f-measure among the four tools
when the error rate is zero, as with a loss of f-measure when the error rate increases,
as can be seen in Table 6.


3      General Comments

Evaluating the results it can be seen that the system can be improved towards:
(a) handling user error rate;
(b) generating a higher quality (especially w.r.t. recall) initial alignment in its non-in-
teractive phase;
(c) reducing the number of interactions with the expert; and
(d) optimize the process to reduce its execution time.


4      Conclusions

Within certain characteristics, the ALIN system stands out in ontology alignment
process in interactive application scenarios, especially when the amount of data and
object properties are also subject to the alignment and when the expert does not
make mistakes. With these features there is an alignment generated with relatively
high precision and recall.


References
[1]    H. Paulheim, S. Hertling, e D. Ritze, “Towards Evaluating Interactive Ontology
Matching Tools”, Lect. Notes Comput. Sci., vol. 7882, p. 31–45, 2013.

[2]     R. W. Irving, D. F. Manlove, e G. O’Malley, “Stable marriage with ties and bounded
length preference lists”, J. Discret. Algorithms, vol. 7, no 2, p. 213–219, 2009.

[3]   J. Euzenat e P. Shvaiko, Ontology Matching - Second Edition, 2°. Springer-Verlag,
2013.

[4]     A. Guedes, F. Baião, e K. Revoredo, “Digging Ontology Correspondence Antipat-
terns”, Proceeding WOP ’ 14 Proc. 5th Int. Conf. Ontol. Semant. Web Patterns, vol. 1302, p.
38–48, 2014.