-

ALIN Results for OAEI 2017

Jomar da Silva

Fernanda Araujo Bai~ao

Kate Revoredo

katerevoredog@uniriotec.br 0 0 Graduated Program in Informatics, Department of Applied Informatics Federal University of the State of Rio de Janeiro (UNIRIO) , Brazil

ALIN is an ontology alignment system specialized in the interactive alignment of ontologies. Its main characteristic is the selection of correspondences to be shown to the expert, depending on the previous feedbacks given by the expert. This selection is based on semantic and structural characteristics. ALIN has obtained the alignment with the highest quality in the interactive tracking for Conference data set. This paper describes its con guration for the OAEI 2017 competition and discusses its results.

ontology matching Wordnet interactive ontology matching ontology alignment interactive ontology alignment

1.1

State, purpose, general statement

ALIN is an ontology alignment system, specialized in the ontology interactive alignment, based primarily on linguistic matching techniques, using the Wordnet as external resource. After generating an initial set of correspondences ( called set of candidate correspondences, which are the correspondences selected to receive the feedback from the expert ), interactions are made with the expert, and to each interaction, the set of candidate correspondences is modi ed. The modi cation of the set of candidate correspondences is through the use of the structural analysis of ontologies and use of correspondence anti-patterns. The interactions continue until there are no more candidate correspondences left. ALIN was built with a special focus on the interactive matching track of OAEI 2017. 1.2

Speci c techniques used The ALIN algorithm is shown in algorithm 1. Algorithm 1 ALIN algorithm

Input: Two ontologies to be aligned Output: Alignment between the two ontologies 1: Loading of ontologies 2: Generation of the initial set of candidate correspondences 3: Automatic classi cation of correspondences 4: Removal of correspondences by the low value of semantic similarity 5: while Set of candidate correspondences is not empty do 6: Choose correspondences to show to the expert 7: Receive expert feedback to chosen correspondences and remove them of the set of candidate correspondences 8: Remove correspondences in an correspondence anti-pattern from set of candidate correspondences 9: Insert some data property and object property correspondences into set of candidate correspondences 10: Insert some correspondences from the backup set into set of candidate correspondences 11: end while

The steps of ALIN algorithm are the following: 1. Load of the ontologies with load of classes, object properties and data properties through the Align API1. For each entity some data are stored such as name and label. In the case of classes, their superclasses and disjunctions are saved. In the case of object properties the properties that are their hypernyms and their associated classes are saved. The classes of data properties are saved, too. ALIN does not use instances. The ALIN can only work with ontologies whose entity names are in English.

2. As an initial set of candidate correspondences a stable marriage algorithm with incomplete preference lists with maximum size of the list equals to 1, using linguistic metrics to sort the priority list was used [ 2 ]. The list is sorted in decreasing order. For this algorithm only the correspondence whose rst entity is in the list of second entity and vice-versa is selected. The linguist metrics used are Jaccard, Jaro-Winkler and n-Gram [ 3 ] provided by Simmetrics API2 and 1 Alignment API . Available at http://alignapi.gforge.inria.fr/ Last accessed on Oct, 10, 2017. 2 String Similarity Metrics for Information Integration . Available on http://www.coli.uni-saarland.de/courses/LT1/2011/slides/stringmetrics.pdf. Last accessed on Oct, 10, 2017.

Resnick, Jiang-Conrath and Lin [ 3 ] provide by HESML API3 that use Wordnet. To use Wordnet the canonical form of the entity names is needed, therefore Stanford CoreNLP API4 was used. The most frequent synsets of words are used to calculate semantic similarities. To nd this synset is used the WS4J API5. The algorithm is run six times, once by each metric, and the result set is the union of results of each metric.

3. The value of the similarity metrics ( Resnick, Jiang-Conrath, Lin, Jaccard, Jaro-Winkler and n-Gram ) vary from 0 to 1 ( 1 is the maximum value ). When a correspondence in the set of candidate correspondences has all the six metrics with the maximum value, it is added to the nal alignment and removed from the set of candidate correspondences. There are exceptions to this rule, some correspondences that fall into some structural patterns are not put on the nal alignment and are not removed from the set of candidate correspondences.

4. The correspondences whose entities has one of its linguistic metrics less than a given threshold are removed from the set of candidate correspondences. These correspondences are put into a backup set, and can return to the set of candidate correspondences using structural analysis. The use of this technique can best be seen in [ 4 ], with the di erence that, in [ 4 ], instead of applying a threshold, it was removed the classes of correspondences that were not in the same Wordnet synset.

5-11. At this point the interactions with the expert begin. The correspondences in the set of candidate correspondences are sorted by the sum of similarity metric values, with the greatest sum rst. The correspondences are showed to the expert. The set of candidate correspondences has, at rst, only correspondences of classes. When the expert answer one question, the set of candidate correspondences is modi ed. Correspondences ( besides the correspondence answered by expert ) can be removed and correspondences can be included into the set of candidate correspondences, depending on the answer of the expert. If the expert does not accept the correspondence it is removed from the set of candidate correspondences. But if the expert accepts the correspondence it is removed from the set of candidate correspondences and put in the nal alignment.

At each interaction with the expert: - We remove from the set of candidate correspondences and disregard all the correspondences that are in correspondence anti-pattern [ 5 ] with the correspondences accepted by the expert;

- We insert into the set of candidate correspondences, data property and object property correspondences related to the class correspondences accepted by the expert. 3 HESML. Available at https://www.researchgate.net/publication/313881253 HESML A scalable ontologybased semantic similarity measures library with a set of reproducible experiments and a replication dataset Last accessed on Oct, 10, 2017. 4 Stanford CoreNLP . Available at http://stanfordnlp.github.io/CoreNLP/ Last accessed on Oct, 10, 2017. 5 WS4J . Available at https://github.com/Sciss/ws4j Last accessed on Nov, 08, 2017.

- We insert into the set of candidate correspondences, correspondences of the backup set ( step 4 ) whose both entities are subclasses of the classes of a correspondence accepted by expert.

This step continues until the set of candidate correspondences is empty.

Detailed information about the ALIN system can be seen in the master thesis of Jomar da Silva6. 1.3

Link to the system and parameters le

ALIN is available through Google drive (

https://drive.google.com/open?id=1myVtcRoKKdUDHQTKNKsomna8AFbukanf) as a package for running through the SEALS client. 2

Results The system ALIN has been developed with its focus on interactive ontology alignment. The approach performs better when the number of data and object properties is proportionately large. ALIN considers properties associated to correspondent classes when selecting entities for user feedback, thus allowing for increased recall. When the number of properties in the ontologies is small, the system still generates a very precise alignment, but its recall tends to decrease.

Another characteristic of ALIN is its reliance on an interactive phase. The non-interactive phase of the system is quite simple, mainly based on maximum string similarity, specializing in maintaining a high precision without worrying about recall, generating initially a low f-measure. The recall increases in the interactive phase. Finally, ALIN is also not robust to users errors. The system uses a number of techniques that take advantage of the expert feedback to reach other conclusions. When the expert gives a wrong answer it is propagated generating other errors, thereby decreasing the f-measure. 2.1

Comments on the participation of the ALIN in non-interactive tracks

As expected the participation of ALIN in non-interactive alignment processes showed the following results: high precision and not so high recall, as can be seen in Anatomy track7 shown in Table 1, where recall+ eld refers to nontrivial correspondences found and Coherent eld lled by + indicates that the generated alignment is consistent. 6 INTERACTIVE ONTOLOGY ALIGNMENT: AN APPROACH BASED ON THE INTERACTIVE MODIFICATION OF THE SET OF CANDIDATE CORRESPONDENCES . Available at http://www2.uniriotec.br/ppgi/bancode-dissertacoes-ppgi-unirio/ano-2017/interactive-ontology-alignment-an-approachbased-on-the-interactive-modi cation-of-the-set-of-candidate-correspondences/view Last accessed on Nov, 12, 2017. 7 Results for OAEI 2017 - Anatomy track . Available at http://oaei.ontologymatching.org/2017/results/anatomy/index.html Last accessed on Nov, 012, 2017.

Regarding the Conference track8, as ALIN evaluates only the properties associated with classes already evaluated as belonging to the alignment, the alignment of the M2 type (which take into account only the properties of ontologies) were with the f-measure = 0, as can be seen in Table 2. As properties are evaluated only in the interactive phase in the ALIN, alignments of type M1 (only classes) remained with a higher recall than M3 (classes and properties), as can be seen in Table 2, because the reference alignments of type M3 contain properties besides classes.

Comments on the participation of the ALIN in interactive tracks

8 "Results of Evaluation for the Conference track within OAEI 2017 . Available at http://oaei.ontologymatching.org/2017/conference/eval.html Last accessed on Nov, 12, 2017.

ALIN

AML LogMap XMap 1000 45 23 44

ALIN AML LogMap XMap ALIN

AML LogMap XMap Anatomy track In this track the program ALIN showed the highest precision among the four evaluated tools when the error rate is zero, as can be seen in Table 3. When the error rate increases both the precision as the recall falls, reducing the f-measure, as can be seen in Table 4. This is expected and explained earlier.

As ontologies of the Anatomy Track contains almost no properties, some interactive techniques used in ALIN can not be utilized, like the selection of properties associated with classes with positive feedback. This has limited the increase in recall, which in uenced the f-measure.

Conference Track In this track ALIN stood out, showing the greatest fmeasure among the four tools when the error rate is zero, as can be seen in 5, as with a loss of f-measure when the error rate increases, as can be seen in Table 6.

Other results, including results with other error rates can be seen on the OAEI 20179 page. 9 Results for OAEI 2017 - Interactive Track . Available http://oaei.ontologymatching.org/2017/results/interactive/index.html Last cessed on Nov, 11, 2017. at ac

Comparison of the participation to ALIN in OAEI 2017 with his participation in OAEI 2016

The di erence between the participation of ALIN in OAEI 2016 and his participation in OAEI 2017 was the use of the HESML API in 2017 instead of the WS4J API in calculating semantic similarities, which greatly increased the e ciency in these calculations. In ALIN's participation in OAEI 2016[ 6 ], three semantic similarity metrics were used: Wu-Palmer, Jiang-Conrath and Lin. In ALIN's participation in OAEI 2017 the metrics Resnick, Jiang-Conrath and Lin were used. Resnick's exchange of Wu-Palmer is due to the fact that the Wu-Palmer metric in the HESML API took longer to execute than the same metric in the WS4J API. The Resnick metric proved to be much faster than the Wu-Palmer metric in the HESML API and according to [ 7 ] as good as, so the Resnick metric was chosen to take Wu-Palmer's place in the implementation of ALIN at OAEI 2017. More information about the HESML API can be found in [ 8 ]. In table 7. it can be seen that the ALIN runtime has decreased considerably with the use of the HESML API instead of the WS4J API. In the Anatomy interactive track of OAEI 2016, ALIN did not use the semantic metrics, only the string metrics, since the semantic metrics were taking a long time, making it impossible to execute it. In OAEI 2017, using the HESML API, it was possible to use semantic metrics, which led to an increase in the quality of the alignment generated, but with an increase in the expert's participation. The execution time also increased with the inclusion of semantic metrics, as we can see in table 8. Year Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings Year Run Time (sec) Precision Recall F-measure Total Requests Distinct Mappings Evaluating the results it can be seen that the system can be improved towards: Within certain characteristics, the ALIN system stands out in ontology alignment process in interactive application scenarios, especially when the amount of data and object properties are relatively large and when the expert does not make mistakes. With these features there is an alignment generated with relatively high precision and recall.

The third author was partially funding by project PQ-UNIRIO N01/2017 (" Aprendendo, adaptando e alinhando ontologias:metodologias e algoritmos.") and CAPES/PROAP.

Paulheim ,

Hertling , and

Ritze , Towards Evaluating Interactive Ontology Matching Tools, Lect. Notes Comput. Sci. , vol. 7882 , pp. 31 - 45 , 2013 .

R. W.

Irving ,

D. F.

Manlove , and G. OMalley, Stable marriage with ties and bounded length preference lists J . Discret. Algorithms , vol. 7 , no. 2 , pp. 213 - 219 , 2009 .

Euzenat and

Shvaiko , Ontology Matching - Second Edition , 2. Springer-Verlag, 2013 .

4. Silva , J. , Baia~o, F. A. , Revoredo , K. , & Euzenat , J. (n.d.). Semantic Interactive Ontology Matching : Synergistic Combination of Techniques to Improve the Set of Candidate Correspondences .

Guedes , F. Baia~o, e

Revoredo , Digging Ontology Correspondence Antipatterns, Proceeding WOP14 Proc. 5th Int. Conf. Ontol. Semant. Web Patterns , vol. 1302 , p. 3848 , 2014 .

Silva ,

F. A.

Bai ~ao, and K. Revoredo, ALIN Results for OAEI 2016 , CEUR Workshop Proc., vol. 1766 , 2016 .

E. G. M.

Petrakis ,

Varelas ,

Hliaoutakis , and

Raftopoulou , Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Di erent Ontologies object instrumentality , Proc. 4th Work. Multimed. Semant. , vol. 4 , pp. 233 - 237 , 2006 .

8. Lastra-D az , J. J. , Garc a-Serrano, A. , Batet , M. , Fernandez , M. , & Chirigati , F. ( 2017 ). HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset . Information Systems , 66 , 97118. http://doi.org/10.1016/j.is. 2017 . 02 .002