=Paper=
{{Paper
|id=Vol-156/paper-15
|storemode=property
|title=OLA in the OAEI 2005 Alignment Contest
|pdfUrl=https://ceur-ws.org/Vol-156/paper15.pdf
|volume=Vol-156
|dblpUrl=https://dblp.org/rec/conf/kcap/EuzenatGV05
}}
==OLA in the OAEI 2005 Alignment Contest==
OLA in the OAEI 2005 alignment contest
Jérôme Euzenat Philippe Guégan Petko Valtchev
INRIA Rhône-Alpes DIRO, Université de Montréal DIRO, Université de Montréal
Montbonnot, France Montr eal (Qc), Canada Montr eal (Qc), Canada
euzenat-at-inrialpes.fr Guegan-at- Petko.Valtchev-at-
iro.umontreal.ca umontreal.ca
ABSTRACT vironment for alignment manipulation. Indeed, in its
Among the variety of alignment approaches (e.g., us- current version, the system offers, via its GUI compo-
ing machine learning, subsumption computation, for- nent VisOn, the following services:
mal concept analysis, etc.) similarity-based ones rely
on a quantitative assessment of pair-wise likeness be-
• parsing and visualization of OWL-Lite and OWL-
tween entities. Our own alignment tool, OLA, features
DL ontologies,
a similarity model rooted in principles such as: com-
• computation of similarities between entities from
pleteness on the ontology language features, weighting
two ontologies,
of different feature contributions and mutual influence
• extraction of alignments from a pair of ontologies,
between related ontology entities. The resulting similar-
provided with a set of similarity matrices, one per
ities are recursively defined hence their values are cal-
category of ontology entities (see below),
culated by a step-wise, fixed-point-bound approximation
• manual construction of alignments by composing
process. For the OAEI 2005 contest, OLA was provided
entity pairs from two ontologies,
with an additional mechanism for weight determination
• use of an existing (partial) alignment as a seed
that increases the autonomy of the system.
for automated alignment construction (alignment
completion),
1. PRESENTATION OF THE SYSTEM • alignment visualization,
OLA (for OWL-Lite Alignment) is an open-source tool • comparison of two alignments.
jointly developed by teams at University of Montréal
and INRIA Rhône Alpes. It features similarity-based
alignment and a set of auxiliary services supporting the In the remainder, the focus will be limited to the auto-
manipulation of alignment results [5, 6]. mated alignment construction with OLA.
1.1 General purpose statement 1.1.2 Principles of the alignment in OLA
The primary goal behind the OLA tool design is to per- The following fundamental principles underly the de-
form alignment of ontologies expressed in OWL, with a sign of the three key mechanisms in OLA, internal rep-
short-term emphasis on OWL-Lite and long-term one resentation of the ontology, similarity computation and
on OWL-DL. However, its GUI component, VisOn1 alignment extraction, that are involved in the global
allows for many other services involving alignments (in ontology alignment process:
the sense of [4]) to be accessed.
All-encompassing comparison : We tend to believe
1.1.1 Functional specifications that all the available knowledge about a pair of on-
From a mere algorithm for automated alignment con- tology entities should be taken into account when
struction, OLA has grown for the last year to an en- aligning. This does not exclude the possibility of
1
see http://www.iro.umontreal.ca/∼owlola/ ignoring particular aspects, i.g., OWL instances
in case of OWL class comparison. However such
a choice should be deliberately made by the tool
user, here through appropriate weight assignment,
or, if performed by an automated mechanisms,
should reflect some particularity, either of the en-
tire ontology (e.g., global absence of instances in
both ontologies) or of the pair of entities at hand
(e.g., local absence of instances in the pair of classes
to be compared).
97
Highest automation level : Although we recognize OLA features an alignment process that splits into three
that the entire alignment process often needs to basic steps: constructing the intermediate representa-
be set on a semi-automated basis, we nevertheless tion of the compared ontologies as labeled graphs, com-
argue in favor of a completely automated process puting the similarity of each pair of same-category en-
for ”draft” alignment generation. Thus, we see the tities from the respective ontology graphs, extracting
OLA user providing a minimal set of parameters an alignment from the similarity matrices for each cat-
at the initial steps of the process whereas the tool egory.
will suggest one or more candidate alignments at
the end, without any other human intervention. 1.2.1 OL-Graph construction
Category-dependent comparison : Following the syn- OL-Graphs are graph structures that provide an easy-
tactic structure of the OWL language, entities are to-process inner representation of OWL ontologies. An
divided into categories, e.g., classes, objects, prop- OL-Graph is a labeled graph where vertices correspond
erties, relations, and only entities of the same cat- to OWL entities and edges to inter-entity relationships.
egory are compared. Moreover, the entities of a As described in [6], the set of different vertex categories
category are compared using similarity functions is: class (C), object (O), relation (R), property (P ),
of the same basic shape. The respective category property instance (A), datatype (D), datavalue (V ),
functions comprise the same factors and the same property restriction labels (L). Furthermore, we dis-
weights. They are further customized for each pair tinguish between datatype relations (Rdt ) and object
of category entities by projecting them over the ac- relations (Ro ), and between datatype properties (Pdt )
tual feature space of the entities (which may be far and object ones (Po ).
smaller than the complete space of the category).
Comparability of similarity results : To enable com- The OL-Graph model allows the following relationships
parison of similarity scores between different align- among entities to be expressed:
ment tasks but also for some computational rea-
sons, a set of useful properties is insured for the
• specialization between classes or relations (denoted
similarity functions: normalization, positiveness,
S),
maximalness 2 , and symmetry 3 .
• instanciation (denoted I) between objects and classes,
property instances and properties, values and datatypes,
1.1.3 Current limitations • attribution (denoted A) between classes and prop-
• Although it would be of certain value for align- erties, objects and property instances;
ment, OLA currently offers no inference mecha- • restriction (denoted R) expressing the restriction
nisms that could help complete the entity descrip- on a property in a class,
tions. In particular, inheritance is not used to ex- • valuation (denoted U) of a property in an object.
pand entities, mostly out of efficiency considera-
tions.
The OL-Graph of an ontology is built after the ontology
• Although neighborhoods play crucial role in the is parsed4 . The process of OL-Graph construction is
similarity definition, two neighbor entities are not described in [8].
necessarily affecting each other’s respective simi-
larities to a pair of other entities. As only descrip- 1.2.2 Similarity model
tive knowledge is taken into account, given two The similarity functions used in OLA are designed in
such entities, say e1 and e2 , for e2 to appear in a category-specific manner and cover all the available
a similarity expression for e1 , it should be consid- descriptive knowledge about an entity pair. Thus, given
ered as part of the description of the latter. For a category X of OL-Graph nodes, the similarity of two
instance, a data type is not seen as being described nodes from X depends on:
by a property whose range the datatype repre-
sents. Consequently, datatypes are compared in
an ontology-independent manner. • the similarities of the terms used to designate them,
i.e., URIs, labels, names, etc.,
• Category borders are not similarity-permeable: Only • the similarity of the pairs of neighbor nodes in the
entities from the same category are compared for respective OL-Graphs that are linked by edges ex-
similarity and hence for alignment. pressing the same relationships (e.g., class node
similarity depends on similarity of superclasses, of
1.2 Specific techniques used property restrictions and of member objects),
• the similarity of other local descriptive features de-
2
With normalization, this amounts to forcing scores of 1 for pending on the specific category (e.g., cardinality
identical entities within identical ontologies intervals, property types)
3
The price to pay for symmetry is the impossibility of de-
4
tecting subsumption by this purely numerical procedure. So far, we use the OWL API [1].
98
Funct. Node Factor Measure where N + (x, x0 ) is the set of all relationships F for
SimO o∈O λ(o) simL
a ∈ A, (o, a) ∈ A M SimA which F(x) ∪ F(x0 ) 6= ∅ 5 .
SimA a∈A r ∈ R, (a, r) ∈ R SimR
o ∈ O, (a, o) ∈ U M SimO OLA relies on various functions for identifiers compar-
v ∈ V , (a, v) ∈ U M SimV
SimV v∈V value literal type dependent ison. Both string distances and lexical distances are
SimC c∈C λ(c) simL used. Lexical distances rely on an exploration of Word-
p ∈ P , (c, p) ∈ A M SimP
c0 ∈ C, (c, c0 ) ∈ S M SimC
Net 2.0 [7] with a quantitative assessment of the “relat-
simD d∈D λ(r) XML-Schema edness” between two, possibly multi-word, terms. More
SimR r∈R λ(r) simL specifically, the degree of relatedness between two Word-
c ∈ C, (r, domain, c) ∈ R M SimC
c ∈ C, (r, range, c) ∈ R M SimC Net entries is computed as the ratio between the depth,
d ∈ D, (r, range, d) ∈ R SimD in graph-theoretic sense, of the most specific common
r 0 ∈ R, (r, r 0 ) ∈ S M SimR
SimP p∈P r ∈ R, (p, r 0 ) ∈ S SimR
hypernym and the average of both term depths. The
c ∈ C, (p, all, c) ∈ R M SimC computation of multi-word term similarity consists in
n ∈ {0, 1, ∞}, (p, card, n) ∈ R equality first splitting the terms into a set of tokens each and
Table 1: Similarity function decomposition then comparing all possible pairs of tokens from oppo-
(card = cardinality and all = allValuesFrom). site sets using the above depth-based principle. The
global term similarity is then computed as a similarity-
based matching between both sets (see above).
Datatype and datavalue similarities are external to our As circular dependencies are impossible to avoid with
model and therefore they are either user-provided or the above definitions, the computing of the similarity
measured by a standard function (e.g., string identity values requires non-standard mechanisms. Following [2,
of values and datatype names/URIs). 9], an equation system is composed out of the similar-
ity definitions where variables correspond to similari-
Formally, given a category X together with the set of ties of node pairs while coefficients come from weights.
relationships it is involved in, N (X), the similarity mea- The process of iterative, fixed-point-bound resolution
sure SimX : X 2 → [0, 1] is defined as follows: of that system, as well as the related convergence and
determinism issues are described in [6].
X
SimX (x, x0 ) = X
πF M SimY (F(x), F(x0 )).
F ∈N (X)
1.3 Implementation
OLA is implemented in Java. Its architecture follows
X the one of the Alignment API and the recent implemen-
The function
P is normalized, i.e., the weights πF sum to tation that was described in [4]. OLA relies on the OWL
X
a unit, F ∈N (X) πF = 1. for the computability The API [1] for parsing OWL files. An entire subsystem is
set functions M SimY compare two sets of nodes of the dedicated to the onstruction of OL-Graphs on top of the
same category (see [6] for details). Table 1 illustrates parsed ontologies. A set of further components that of-
the set of similarities in our model. fer similarity computation services: substring distances,
Following the lessons learned with our participation in
the EON 2004 alignment contest [?], we have adapted edit distances, Hamming distance, WordNet interface
the above measure to fit cases where particular pair (via the JWNL library [3]), etc., that were originally
of entities is described only by a small subset of the designed for OLA are now part of the Alignment API.
entire set of category descriptors. Thus, a descriptive The VisOn GUI component offers a uniform interface
factor is ignored for similarity computation whenever to all services provided by Alignment API and OLA.
neither of the compared entities possesses a neighbor In particular, it visualizes both the input data, i.e., the
with the underlying link label (e.g., no instances for a OL-Graphs, and the final result, i.e., the alignment file,
pair of compared classes). In this case, not only its of the global process.
weight is set to 0, but also the weights of the remaining
”active” factors are increased correspondingly. To scale 1.4 Adaptations made for the contest
that principle up to the entire set of descriptive factors, Along the preparation of the AOEI 2005 contest, a row
the following simple mechanism has been realized in of changes have been made to the system in order to
OLA: In order to keep both normalization and equity make it fit the complexity of the alignment discovery
in similarity values, the weights of all non-null factors task. The most striking one is the introduction of a
for a given entity pair are divided through their sum. weight-computing mechanism that eliminates the ne-
Thus, for a category X, the similarity measure Sim+ X : cessity for the tool user to provide initial weights and
X 2 → [0, 1] becomes: hence makes a significant step towards full automation
of the alignment process.
0 SimX (x, x0 ) 5
That is, there exists at least one y such that (x, y) ∈ F or
Sim+
X (x, x ) = P
F ∈N + (x,x0 ) πF at least one y 0 such that (x0 , y 0 ) ∈ F.
99
1.4.1 Weight computing mechanism 1.4.3 Minor adaptations
As it is far from obvious for novice users how to weigh Following experiences from EON 2004, a set of simple
the different similarity factors, we initiated work on in- but decisive modifications have been applied in order
corporating a weight computing mechanism within the to prevent the precision leak in the tests. First, the
system. The intended mechanism is both intuitive and instances have been excluded from the alignments by
effective so that alignment practitioners with various default, although the possibility is given to the user
skill levels could find a match for their knowledge and to reverse this choice. Then, entities external to the
experience. So far, we used a simple heuristic method ontologies at hand have also been excluded from the
that, according to the obtained results, performs rea- alignment (but not from the similarity computation).
sonably well. The basic idea of the method consists Finally, one-to-one alignment production has been en-
in distributing the weights among similarity factors in forced in OLA to increase the potential recall of the
the generic similarity function of a node category ac- resulting alignment.
cording to the relative importance of the corresponding
category in the entire ontology. That is to say we use 2. RESULTS
the average number of links of the corresponding type The comments are grouped by test categories.
per entity of the category at hand. For instance, the
greater the number of super-class links in the ontology, 2.101 Tests 10X
the higher the weight of the super-class factor in the OLA performed very well on the tests of this group.
class similarity formula. This seems to be due to the fact that while the lan-
guage varies along the individual tests of the group, the
1.4.2 Similarity measure for entity names basic ontology entities involved in the similarity compu-
tation remain unchanged with respect to the reference
OLA uses two alternative modes of comparison for en-
ontology.
tity names (URIs, labels, etc.): a string measure6 (a
default) and a lexical similarity measure that relies on
WordNet 2.0 (see above).
2.102 Tests 2XX
The performances of the algorithm seem to suggest that
The highly sophisticated lexical similarity measure that three sub-groups of tests can be distinguished. The first
was used in OLA for the EON competition has been re- one comprises the tests 21X, 22X, 23X and 24X, with a
placed by a simpler but more purposeful one. Indeed, small number of exceptions where the performance have
the initial function compared multi-word terms on three been:
separate axes: nouns, verbs and adjectives, as provided
by WordNet 2,0. Such comparison seemed appropriate • Quite good: This is the case of tests 201, 202,
for cases where the meanings of a word fall in more with random class names. The random names
than one part-of-speech category. The inter-word simi- were putting a strain on the ability of the algo-
larities on each axis were aggregated by an independent rithm to propagate similarity along the network of
best-match computations while the three resulting val- node pairs. Obviously, our technique needs some
ues were further combined to a single one via a weighted improvements on that point.
sum.
• Satisfactory: In the case of tests 248, 249, there
The new measure trades separate matchings on speech- is a combination of missing (or random) names
part-wise basis to a single global matching along entry with one other missing factor. For tests 248, 249,
similarities that aggregate all three possible aspects of a the missing factors are hierarchy (sub-class links)
word. Thus, the words are compared to each other with and instances, respectively. Both play important
all possible meanings and the highest similarity over a role in similarity computation of classes, whenever
single pair of meanings is taken for the words. these are stripped of their names as is the case
with these two ontologies. Hence the sharp drop in
For the OAEI competition, as we had to rely on a fixed precision and recall with respect to the preceding
parameter set for the entire collection of tests, we have tests.
chosen to force the use of the string distance. Indeed, • Weak: The notorious failure here have been the
it showed better performances while being much more tests 205, 209, which are the only ones to use of
efficient than the WordNet-based computation. synonymous names in the ontology entities (with
respect to the intial ontology). As WordNet has
Nevertheless, the improved lexical similarity was not been plugged-out of the similarity computation,
completely discarded: it is currently used as a pre- these results are not surprising.
processing tool that helps decide automatically the dis-
tribution of weights among similarity factors.
The second groups is made of the tests 25X. Here OLA
6
subString distance provided by the Alignment API performances varied substantially: from extremely poor
100
(254) to satisfactory (252, 259). choice could easily generate a chain of wrong alignment
decisions and thus negatively impact the performances
The last five ontologies of the group, the 26X ones, have of the tool.
proven to represent a serious obstacle for OLA. The
performances of the system here were poor to very poor.
2.103 Tests 30X
The real-world ontologies of the group 30X made OLA 3.3 Comments on the experiment
perform in an unimpressive way. We believe that this Two months during summer period is definitely too
is due to the fact that string similarity was systemati- short to run shuch an experiment.
cally used as identifier comparison means. Indeed, ten-
tative runs with WordNet as basis for name similarity
yielded way more precise alignments on that group. Un-
fortunately, they also brought down the overall statis-
tics from the entire test set such as mean precision and
mean recall. Hence the choice of the WordNet-based 4. CONCLUSION
lexical similarity for a default name comparison means In its latest version, OLA has proven a more robust
has been dropped. tool for alignment than it was a year before. While
the difficulties with real-world ontologies persist, the
progress on noisy ones has been substantial.
3. GENERAL COMMENTS
3.1 Comments on the results The next key topic of the research around OLA will be
The results show a substantial progress has been made the automation of the weight computation for a specific
since the EON 2004 alignment contest. With respect to pair of ontologies.
the performances of OLA at that forum, we made a big
leap amounting to about 25% in both mean precision
and mean recall.
Nevertheless, we see that a vast space for improvement
lays ahead of our project. The weaknesses of the current
5. RAW RESULTS
similarity mechanisms can be summarized as follows.
First, the tuning of the algorithm is still a rigid pro-
cess. Indeed, while the weights can now be computed
following a specific footprint of the ontology, a mecha- 5.1 Link to the set of provided alignments
nism for the choice of a particular name similarity on A .zip archive of all the contest results is available at
the same basis has yet to be defined. the following URL:
Second, although we take into account the biggest possi-
ble amount of knowledge about entities, there are sources
of similarity that have been ignored so far, in particular http://www.iro.umontreal.ca/∼owlola/OAEI.html
entity comments.
3.2 Discussions on the way to improve the pro-
posed system
Besides expanding the lexical processing to comments 5.2 Link to the system and parameters file
in entities and providing a flexible decision mechanism A similar archive with the parameters and the .jar files
for the choice of the default name similarity, a possible used in the contest-related experiments is available at
improvement of the system will be the integration of a the following URL:
learning module for weight estimation. As for similarity,
the biggest challenge here is to define the representation
of the input data, i.e., the descriptors of the entries for
the learning algorithm.
http://www.iro.umontreal.ca/∼owlola/OAEI.html
Another research track would be the definition of an
optimal matching algorithm. In fact, the current proce-
dures are sub-optimal in the sense that they only chose
local optima for each aligned entity. Consequently, as
strict 1:1 matchings are to be produced, a single bad 5.3 Matrix of results
101
[2] Gilles Bisson. Learning in FOL with similarity
# Name Prec. Rec. Time measure. In Proc. 10th AAAI, San-Jose (CA US),
101 Reference alignment 1 1 pages 82–87, 1992.
103 Language generalization 1 1
104 Language restriction 1 1 [3] John Didion. The Java WordNet Library,
201 No names 0.71 0.62 http://jwordnet.sourceforge.net/, 2004.
202 No names & no comments 0.66 0.56 [4] Jérôme Euzenat. An API for ontology alignment.
203 No comments 1 1 In Proc. 3rd ISWC, pages 698–712, Hiroshima
204 Naming conventions 0.94 0.94 (JP), 2004.
205 Synonyms 0.43 0.42
206 Translation 0.94 0.93 [5] Jérôme Euzenat and Petko Valtchev. An
207 0.95 0.94 integrative proximity measure for ontology
208 0.94 0.94 alignment. In Proc. ISWC-2003 workshop on
209 0.43 0.42 semantic information integration, Sanibel Island
210 0.95 0.94 (FL US), pages 33–38, 2003.
221 No specialisation 1 1
[6] Jérôme Euzenat and Petko Valtchev.
222 Flatenned hierarchy 1 1
Similarity-based ontology alignment in OWL-lite.
223 Expanded hierarchy 1 1
In Proc. 15th ECAI, pages 333–337, Valencia (ES),
224 No instance 1 1
2004.
225 No restrictions 1 1
228 No properties 1 1 [7] A.G. Miller. Wordnet: A lexical database for
230 Flattened classes 0.95 0.97 english. Communications of the ACM,
231 1 1 38(11):39–41, 1995.
232 1 1
233 1 1 [8] Mohamed Touzani. Alignement d’ontologies dans
236 1 1 OWL. Master’s thesis, University of Montréal, in
237 0.97 0.98 preparation.
238 0.99 0.99 [9] Petko Valtchev. Construction automatique de
239 0.97 1 taxonomies pour l’aide à la représentation de
240 0.97 1 connaissances par objets. Thèse d’informatique,
241 1 1 Université Grenoble 1, 1999.
246 0.97 1
247 0.97 1
248 0.59 0.46
249 0.59 0.46
250 0.3 0.24
251 0.42 0.3
252 0.59 0.52
253 0.56 0.41
254 0.04 0.03
257 0.25 0.21
258 0.49 0.35
259 0.58 0.47
260 0.26 0.17
261 0.14 0.09
262 0.2 0.06
265 0.22 0.14
266 0.14 0.09
301 Real: BibTeX/MIT 0.42 0.38
302 Real: BibTeX/UMBC 0.37 0.33
303 Real: Karlsruhe 0.41 0.49
304 Real: INRIA 0.74 0.66
6. REFERENCES
[1] Sean Bechhofer, Raphael Voltz, and Phillip Lord.
Cooking the semantic web with the OWL API. In
Proc. 2nd International Semantic Web Conference
(ISWC), Sanibel Island (FL US), 2003.
102