Holontology : results of the 2018 OAEI evaluation campaign Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn Institut de Recherche en Informatique de Toulouse, Toulouse, France {firstname.name}@irit.fr Abstract. This paper presents the results obtained by the Holontology (Holistic ontology matcher) system in the OAEI 2018 evaluation campaign. We describe here the results in the Anatomy, Conference, Taxon and Knowledge Graph tracks. We report a general discussion on the results and on the future improvements of the system. 1 Presentation of the system 1.1 State, purpose, general statement Holontology is a modular system based on the LPHOM system (Linear Program for Holistic Ontology Matching) [1]. As its predecessor LPHOM, the system remains a holistic ontology matching system i.e., matching multiple ontologies simultaneously. Although the system has been designed to deal with holistic matching, it is able as well to deal with pairwise ontology matching, as we consider this task as a particular case of the holistic one, as described here. The system treats the ontology matching problem, at schema-level, as a combinato- rial optimization problem. The problem is modelled through a linear program extending the maximum-weighted graph matching problem with linear constraints (matching car- dinality, structural, and coherence constraints). 1.2 Specific techniques used The way the system works is detailed in the following steps: 1. The first step of Holontology is to load the ontologies and translate them into an in- ternal structure, which can be annotated and edited, so that later treatment is done in an efficient way. Each ontology is loaded independently, and its hierarchical struc- ture is conserved in a format based on the three possible basic blocks of ontologies: classes and properties (object and data). Each of them is represented as a Node, and depending on their subtype, they are subclassed into a ClassNode, ObjectProper- tyNode or DataPropertyNode. These nodes are then enriched with AnnotationData. 2. The second step consists in a pre-processing step. Here, we expand camel case and title case into proper names. As the loading of the ontologies is done only once, we take advantage of Java ways of storing references, so we have no need to translate and cut matrices. We can use our hierarchies both for storage and computation. 2 Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn 3. The third step computes a cartesian product between all the entities of same type (classes and properties) in order to build a similar linear program using the same framework as LPHOM, as described below. 4. The fourth step leads a second wave of pre-processing, in such a way that new re- lations are created. For example, this is the step where we compute tense similarity (seeing whether or not an object property is passive or active towards its classes). More precisely, for each property, we compute its polarity using tense analysis, e.g. Author reviews Paper (active) vs. Paper reviewed by Author (passive). This allows reviews matching reviewed by reversed. 5. The fifth step applies a combination of different similarity measures (exact match, Levenstein, Jaccard, and Lin), drawing both from the AnnotationData build in the previous steps, and the constraints based on ontologies themselves.We have tested the combination of similarity measures and the results reported here are in terms of ‘exact match’ over the pre-processed entity naming and annotations (considering the exact match between tokens). As expected, we obtain good values in terms of precision, as better discussed below. 6. The sixth step converts the given constraints in a form that can be used by a solver. We choose CPLEX for its ability to perform optimizations, and we manage prob- able thresholds if needed. Unlike LPHOM, we provide all the constraints, and we do not apply any cutting in this step. 7. In the seventh step, we take the given result from the solver and convert it in an alignment that can be exported in RDF. 8. In the (optional) eight step, we combine the alignment result and run step five to seven if needed to reinforce the obtained results. 1.3 Adaptations made for the evaluation Due to a huge incompatibility between some libraries used in the SEALS client and ours, we had to create a fully executable jar in the conf/ directory, and call this jar from the SEALS bridge. The bridge grabbed the URI of the two ontologies passed as argu- ments, and wrote them to a file called “bridge-ontologies.url”, then called the holon- tology.jar in the conf directory, waiting for its full execution. holontology.jar opens the created “ontologies.url”, reads the two URIs, and does the processing. The result is written in the “result.alignment” file. The bridge then reads the result file and returned it to the SEALS client. 1.4 Link to the system and parameters file Holontology, as its predecessor LPHOM, is not available as an open-source. The version we present here is meant to be more modulable than LPHOM, and handles the problem differently, by insisting on annotating data and using structures instead of matrices. The .zip for the SEALS jar can be retrieved at https://cloud.irit.fr/index. php/s/gReZo8yaRDqdmjk Holontology : results of the 2018 OAEI evaluation campaign 3 1.5 Link to the set of provided alignments The generated alignments are available at https://cloud.irit.fr/index. php/s/hv3oALXN6fHuZWi. 2 Results The reader can refer to the OAEI web pages for the results of Holontology in the Anatomy, Conference, Taxon and Knowledge Graph tracks as well a comparison with other participants. Here, we provide a first discussion and comments on our results. 2.1 Anatomy track Our results for the Anatomy track are summarised in Table 2.1. Compared to the eval- uation of LPHOM in OAEI 2016 1 , we observe that globally the quality of results de- creases, for instance the F-measure looses 0.3 points. These results can be explained by two choices in Holontology. First we only use exact match (we observe that Holontol- ogy returns only 456 alignments compared to LPHOM which returns 1555 alignments). Second we do not cut results according to that (i.e, we do not apply any threshold). However, we observe that Holontology is 8 times faster that LPHOM. Matcher Runtime Size Precision F-Measure Recall Recall+ Coherent Holontology 265 456 0.976 0.451 0.294 0.005 - Table 1. Results for Anatomy track. 2.2 Conference track Our results for the Conference track are summarised in Table 2.2. Contrarily to the anatomy track, the results of Holontology are better than the results of LPHOM for the conference track. Globally, Holontology gains in recall for the different tasks in this track. We can explain that by the different pre-processing strategies that have been implemented in Holontology compared to LPHOM. By comparing the different tasks, the tool needs additional efforts to handle data and object properties that occur in the M2 tasks. 2.3 Complex track (Taxon task) Hontology is not able to deal with complex matching and has not been initially reg- istered to this track. However, given that only 3 systems have been registered to the complex track, the organisers have also run the systems registered to Anatomy and Conference on the complex datasets. Hence, the results reported for Hontology are in terms of simple generated alignments. 1 http://oaei.ontologymatching.org/2016/results/anatomy/index.html 4 Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn Track Rank Prec. F.5-measure F1-measure F2-measure Recall ra1-M1 8/13 0.88 0.78 0.67 0.59 0.54 ra1-M2 11/13 0.22 0.17 0.13 0.1 0.09 ra1-M3 10/13 0.78 0.69 0.59 0.52 0.48 ra2-M1 6/13 0.81 0.72 0.62 0.54 0.5 ra2-M2 11/13 0.07 0.05 0.03 0.02 0.09 ra2-M3 7/13 0.74 0.65 0.55 0.48 0.44 rar2-M1 9/13 0.8 0.72 0.63 0.56 0.52 rar2-M2 11/13 0.22 0.17 0.13 0.1 0.09 rar2-M3 9/13 0.73 0.65 0.56 0.49 0.45 Table 2. Results for the Conference track. Our results for the Taxon task are summarised in Table 2.3. We have been obtained intermediate results, with a precision up to 0.22. However, for the set of given queries to be translated with the help of the generated alignments, our alignments were not useful, QWR (Query Well Rewritten) measure of 0. Time (s) output corres. eval. corres. correct corres. Global Prec. Average Prec. (1:1) (1:n) (m:n) QWR 965 44 13 3 0.23 0.22 44 0 0 0.00 Table 3. Results for taxon task in the Complex track. 2.4 Knowledge graph track Our results for the Knowledge graph track are summarized in Tables 2.4 and 2.4. For this track, Holontology proceeded faster than the other systems (including the base- line). However, it has not be able to deal with properties, probably, as the track organis- ers explained, because all properties are typed as rdf:Property and not subdivided into owl:DatatypeProperty and owl:ObjectProperty). class overall Time #tracks Size Prec. F-m. Rec. Size Prec. F-m. Rec. 318 9 16.8 0.80 (0.80) 0.83 (0.83) 0.87 (0.87) 18.8 0.80 (0.80) 0.17 (0.17) 0.10 (0.10) Table 4. Global results for Knowledge graph track. 3 General comments Despite the fact that Hontology is an extended version of LPHOM that has participated in OAEI 2016, for its first participation Hontology has relative intermediate results. Ta- ble 3 summarises the performance of Hontology in terms of ranking of the best systems in each task. Holontology : results of the 2018 OAEI evaluation campaign 5 Track Time Size Prec. F-m. Rec. darkscape∼oldschoolrunescape 23 42 0.75 0.22 0.13 runescape∼darkscape 38 62 0.93 0.25 0.14 runescape∼oldschoolrunescape 35 47 0.62 0.22 0.13 heykidscomics∼dc 67 4 1.00 0.10 0.05 marvel∼dc 26 4 1.00 0.21 0.12 marvel∼heykidscomics 61 4 1.00 0.11 0.06 memory-alpha∼memory-beta 26 2 0.00 0.00 0.00 memory-alpha∼stexpanded 21 2 0.00 0.00 0.00 memory-beta∼stexpanded 21 2 0.00 0.00 0.00 Table 5. Track results for Knowledge graph track. Track Rank Anatomy 14/14 Taxon 3/7 Knowledge graph 6/8 (overall) Knowledge graph 1/8 (classes) Conference 9.1/13 (average) Table 6. Global rankings for Holontology per track. With respect to LPHOM, Hontology is a modular system that optimises the ontol- ogy structures in memory. As for LPHOM, we model and express the matching problem through a set of constraints (cardinality, structural, and coherence constraints) applied on the results of a pre-processing and exact matching steps. We do not have applied any threshold on the generated alignments. As expected, using an exact match on pre- processed entity naming and comments may improve precision in detriment of recall. With respect to these aspects, we plan to improve the criteria of selection of similar- ity measures and thresholds for our future participation, in particular with the aim of improving recall. Other points include the fact that Hontology is a system designed to deal with holis- tic ontology matching at schema-level. Hence, it was not able to generated alignments for the tasks involving instance matching. We plan to implement instance matching strategies in future versions of the system. Furthermore, despite our optimisation over LPHOM, our system was not able to deal at all with the large ontologies in the Large- Bio and Phenotype tasks. We note, however, that we could deal with the Complex Taxon task in terms of volume because we do not treat the instances. We plan to address these points in the future. With respect to the OAEI procedure, we focus on the tracks based on SEALS. How- ever, as stated above, we have encountered problems for dealing with the incompatibili- ties of package versions in the SEALS dependencies. We have implemented a non-ideal solution and hope for the next evaluation this kind of issue will be fixed. Finally, Hontology has been initially designed to deal with hoslistic ontology match- ing. However, there is no track in the campaign proposing the evaluation of such kind of 6 Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn matching approaches. In the future, it could be interesting to have a dedicated holistic track. 4 Conclusions This paper has introduced the Hontology system and discussed the main points on the results of its first participation in the OAEI campaigns. We have as well pointed out some directions for future improvements. Acknowledgements Cassia Trojahn has been partially supported by the French CIMI Labex project IBLiD (Integration of Big and Linked Data for On-Line Analytics). References 1. I. Megdiche, O. Teste, and C. Trojahn. An extensible linear approach for holistic ontology matching. In Proceedings of the 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 18, 2016., pages 393–410. Springer International Publishing, 2016.