Holontology : results of the 2018 OAEI evaluation
                          campaign

           Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn

             Institut de Recherche en Informatique de Toulouse, Toulouse, France
                                   {firstname.name}@irit.fr


        Abstract. This paper presents the results obtained by the Holontology (Holistic
        ontology matcher) system in the OAEI 2018 evaluation campaign. We describe
        here the results in the Anatomy, Conference, Taxon and Knowledge Graph tracks.
        We report a general discussion on the results and on the future improvements of
        the system.


1     Presentation of the system
1.1   State, purpose, general statement
Holontology is a modular system based on the LPHOM system (Linear Program for
Holistic Ontology Matching) [1]. As its predecessor LPHOM, the system remains a
holistic ontology matching system i.e., matching multiple ontologies simultaneously.
Although the system has been designed to deal with holistic matching, it is able as well
to deal with pairwise ontology matching, as we consider this task as a particular case of
the holistic one, as described here.
     The system treats the ontology matching problem, at schema-level, as a combinato-
rial optimization problem. The problem is modelled through a linear program extending
the maximum-weighted graph matching problem with linear constraints (matching car-
dinality, structural, and coherence constraints).

1.2   Specific techniques used
The way the system works is detailed in the following steps:
 1. The first step of Holontology is to load the ontologies and translate them into an in-
    ternal structure, which can be annotated and edited, so that later treatment is done in
    an efficient way. Each ontology is loaded independently, and its hierarchical struc-
    ture is conserved in a format based on the three possible basic blocks of ontologies:
    classes and properties (object and data). Each of them is represented as a Node, and
    depending on their subtype, they are subclassed into a ClassNode, ObjectProper-
    tyNode or DataPropertyNode. These nodes are then enriched with AnnotationData.
 2. The second step consists in a pre-processing step. Here, we expand camel case and
    title case into proper names. As the loading of the ontologies is done only once, we
    take advantage of Java ways of storing references, so we have no need to translate
    and cut matrices. We can use our hierarchies both for storage and computation.
2        Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn

 3. The third step computes a cartesian product between all the entities of same type
    (classes and properties) in order to build a similar linear program using the same
    framework as LPHOM, as described below.
 4. The fourth step leads a second wave of pre-processing, in such a way that new re-
    lations are created. For example, this is the step where we compute tense similarity
    (seeing whether or not an object property is passive or active towards its classes).
    More precisely, for each property, we compute its polarity using tense analysis, e.g.
    Author reviews Paper (active) vs. Paper reviewed by Author (passive). This allows
    reviews matching reviewed by reversed.
 5. The fifth step applies a combination of different similarity measures (exact match,
    Levenstein, Jaccard, and Lin), drawing both from the AnnotationData build in the
    previous steps, and the constraints based on ontologies themselves.We have tested
    the combination of similarity measures and the results reported here are in terms of
    ‘exact match’ over the pre-processed entity naming and annotations (considering
    the exact match between tokens). As expected, we obtain good values in terms of
    precision, as better discussed below.
 6. The sixth step converts the given constraints in a form that can be used by a solver.
    We choose CPLEX for its ability to perform optimizations, and we manage prob-
    able thresholds if needed. Unlike LPHOM, we provide all the constraints, and we
    do not apply any cutting in this step.
 7. In the seventh step, we take the given result from the solver and convert it in an
    alignment that can be exported in RDF.
 8. In the (optional) eight step, we combine the alignment result and run step five to
    seven if needed to reinforce the obtained results.


1.3   Adaptations made for the evaluation

Due to a huge incompatibility between some libraries used in the SEALS client and
ours, we had to create a fully executable jar in the conf/ directory, and call this jar from
the SEALS bridge. The bridge grabbed the URI of the two ontologies passed as argu-
ments, and wrote them to a file called “bridge-ontologies.url”, then called the holon-
tology.jar in the conf directory, waiting for its full execution. holontology.jar opens the
created “ontologies.url”, reads the two URIs, and does the processing. The result is
written in the “result.alignment” file. The bridge then reads the result file and returned
it to the SEALS client.


1.4   Link to the system and parameters file

Holontology, as its predecessor LPHOM, is not available as an open-source. The version
we present here is meant to be more modulable than LPHOM, and handles the problem
differently, by insisting on annotating data and using structures instead of matrices. The
.zip for the SEALS jar can be retrieved at https://cloud.irit.fr/index.
php/s/gReZo8yaRDqdmjk
                           Holontology : results of the 2018 OAEI evaluation campaign   3

1.5     Link to the set of provided alignments
The generated alignments are available at https://cloud.irit.fr/index.
php/s/hv3oALXN6fHuZWi.


2      Results
The reader can refer to the OAEI web pages for the results of Holontology in the
Anatomy, Conference, Taxon and Knowledge Graph tracks as well a comparison with
other participants. Here, we provide a first discussion and comments on our results.

2.1     Anatomy track
Our results for the Anatomy track are summarised in Table 2.1. Compared to the eval-
uation of LPHOM in OAEI 2016 1 , we observe that globally the quality of results de-
creases, for instance the F-measure looses 0.3 points. These results can be explained by
two choices in Holontology. First we only use exact match (we observe that Holontol-
ogy returns only 456 alignments compared to LPHOM which returns 1555 alignments).
Second we do not cut results according to that (i.e, we do not apply any threshold).
However, we observe that Holontology is 8 times faster that LPHOM.


          Matcher     Runtime Size Precision F-Measure Recall Recall+ Coherent
          Holontology 265     456 0.976       0.451      0.294 0.005 -
                            Table 1. Results for Anatomy track.


2.2     Conference track
Our results for the Conference track are summarised in Table 2.2. Contrarily to the
anatomy track, the results of Holontology are better than the results of LPHOM for
the conference track. Globally, Holontology gains in recall for the different tasks in
this track. We can explain that by the different pre-processing strategies that have been
implemented in Holontology compared to LPHOM. By comparing the different tasks,
the tool needs additional efforts to handle data and object properties that occur in the
M2 tasks.

2.3     Complex track (Taxon task)
Hontology is not able to deal with complex matching and has not been initially reg-
istered to this track. However, given that only 3 systems have been registered to the
complex track, the organisers have also run the systems registered to Anatomy and
Conference on the complex datasets. Hence, the results reported for Hontology are in
terms of simple generated alignments.
 1
     http://oaei.ontologymatching.org/2016/results/anatomy/index.html
4        Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn

            Track Rank Prec. F.5-measure F1-measure F2-measure Recall
            ra1-M1 8/13 0.88 0.78             0.67        0.59     0.54
            ra1-M2 11/13 0.22 0.17            0.13        0.1      0.09
            ra1-M3 10/13 0.78 0.69            0.59        0.52     0.48
            ra2-M1 6/13 0.81 0.72             0.62        0.54     0.5
            ra2-M2 11/13 0.07 0.05            0.03        0.02     0.09
            ra2-M3 7/13 0.74 0.65             0.55        0.48     0.44
            rar2-M1 9/13 0.8 0.72             0.63        0.56     0.52
            rar2-M2 11/13 0.22 0.17           0.13        0.1      0.09
            rar2-M3 9/13 0.73 0.65            0.56        0.49     0.45
                        Table 2. Results for the Conference track.


    Our results for the Taxon task are summarised in Table 2.3. We have been obtained
intermediate results, with a precision up to 0.22. However, for the set of given queries to
be translated with the help of the generated alignments, our alignments were not useful,
QWR (Query Well Rewritten) measure of 0.


Time (s) output corres. eval. corres. correct corres. Global Prec. Average Prec. (1:1) (1:n) (m:n) QWR
965      44             13            3               0.23         0.22          44 0        0     0.00
                    Table 3. Results for taxon task in the Complex track.


2.4   Knowledge graph track

Our results for the Knowledge graph track are summarized in Tables 2.4 and 2.4. For
this track, Holontology proceeded faster than the other systems (including the base-
line). However, it has not be able to deal with properties, probably, as the track organis-
ers explained, because all properties are typed as rdf:Property and not subdivided into
owl:DatatypeProperty and owl:ObjectProperty).


             class                                    overall
Time #tracks Size Prec.       F-m.        Rec.        Size Prec.       F-m.        Rec.
318 9        16.8 0.80 (0.80) 0.83 (0.83) 0.87 (0.87) 18.8 0.80 (0.80) 0.17 (0.17) 0.10 (0.10)
                     Table 4. Global results for Knowledge graph track.


3     General comments

Despite the fact that Hontology is an extended version of LPHOM that has participated
in OAEI 2016, for its first participation Hontology has relative intermediate results. Ta-
ble 3 summarises the performance of Hontology in terms of ranking of the best systems
in each task.
                        Holontology : results of the 2018 OAEI evaluation campaign      5

                Track                           Time Size Prec. F-m. Rec.
                darkscape∼oldschoolrunescape 23      42 0.75 0.22 0.13
                runescape∼darkscape             38   62 0.93 0.25 0.14
                runescape∼oldschoolrunescape 35      47 0.62 0.22 0.13
                heykidscomics∼dc                67   4    1.00 0.10 0.05
                marvel∼dc                       26   4    1.00 0.21 0.12
                marvel∼heykidscomics            61   4    1.00 0.11 0.06
                memory-alpha∼memory-beta 26          2    0.00 0.00 0.00
                memory-alpha∼stexpanded         21   2    0.00 0.00 0.00
                memory-beta∼stexpanded          21   2    0.00 0.00 0.00
                     Table 5. Track results for Knowledge graph track.


                            Track             Rank
                            Anatomy           14/14
                            Taxon             3/7
                            Knowledge graph 6/8 (overall)
                            Knowledge graph 1/8 (classes)
                            Conference        9.1/13 (average)
                    Table 6. Global rankings for Holontology per track.


     With respect to LPHOM, Hontology is a modular system that optimises the ontol-
ogy structures in memory. As for LPHOM, we model and express the matching problem
through a set of constraints (cardinality, structural, and coherence constraints) applied
on the results of a pre-processing and exact matching steps. We do not have applied
any threshold on the generated alignments. As expected, using an exact match on pre-
processed entity naming and comments may improve precision in detriment of recall.
With respect to these aspects, we plan to improve the criteria of selection of similar-
ity measures and thresholds for our future participation, in particular with the aim of
improving recall.
     Other points include the fact that Hontology is a system designed to deal with holis-
tic ontology matching at schema-level. Hence, it was not able to generated alignments
for the tasks involving instance matching. We plan to implement instance matching
strategies in future versions of the system. Furthermore, despite our optimisation over
LPHOM, our system was not able to deal at all with the large ontologies in the Large-
Bio and Phenotype tasks. We note, however, that we could deal with the Complex Taxon
task in terms of volume because we do not treat the instances. We plan to address these
points in the future.
     With respect to the OAEI procedure, we focus on the tracks based on SEALS. How-
ever, as stated above, we have encountered problems for dealing with the incompatibili-
ties of package versions in the SEALS dependencies. We have implemented a non-ideal
solution and hope for the next evaluation this kind of issue will be fixed.
     Finally, Hontology has been initially designed to deal with hoslistic ontology match-
ing. However, there is no track in the campaign proposing the evaluation of such kind of
6        Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn

matching approaches. In the future, it could be interesting to have a dedicated holistic
track.


4   Conclusions

This paper has introduced the Hontology system and discussed the main points on the
results of its first participation in the OAEI campaigns. We have as well pointed out
some directions for future improvements.


Acknowledgements
Cassia Trojahn has been partially supported by the French CIMI Labex project IBLiD
(Integration of Big and Linked Data for On-Line Analytics).


References
1. I. Megdiche, O. Teste, and C. Trojahn. An extensible linear approach for holistic ontology
   matching. In Proceedings of the 15th International Semantic Web Conference (ISWC 2016),
   Kobe, Japan, October 18, 2016., pages 393–410. Springer International Publishing, 2016.