=Paper= {{Paper |id=None |storemode=property |title=TaxoMap alignment and refinement modules: results for OAEI 2010 |pdfUrl=https://ceur-ws.org/Vol-689/oaei10_paper13.pdf |volume=Vol-689 |dblpUrl=https://dblp.org/rec/conf/semweb/HamdiSNR10 }} ==TaxoMap alignment and refinement modules: results for OAEI 2010== https://ceur-ws.org/Vol-689/oaei10_paper13.pdf
    TaxoMap alignment and refinement modules: Results
                    for OAEI 2010

      Fayçal Hamdi1 , Brigitte Safar1 , Nobal B. Niraula2 , and Chantal Reynaud1
          1
              LRI CNRS UMR 8623, Université Paris-Sud 11, Bat. G, INRIA Saclay
                      2-4 rue Jacques Monod, F-91893 Orsay, France
                           firstname.lastname@lri.fr
                               2
                                 The University of Memphis
                                   Memphis, TN, USA
                               nb.niraula@gmail.com



       Abstract. TaxoMap is an alignment tool which aims to discover rich correspon-
       dences between concepts (equivalence relations (isEq), subsumption relations
       (isA) and their inverse (isMoreGnl) or proximity relations (isClose)). It performs
       an oriented alignment (from a source to a target ontology) and takes into ac-
       count labels and sub-class descriptions. This new implementation of TaxoMap
       uses a pattern-based approach implemented in the TaxoMap Framework helping
       an engineer to refine mappings to take into account specific conventions used in
       ontologies.


1    Introduction
TaxoMap was designed to retrieve useful alignments for information integration be-
tween different sources. The alignment process is then oriented from ontologies that
describe external resources (named source ontology) to the ontology (named target on-
tology) of a web portal. The target ontology is supposed to be well-structured whereas
source ontology can be a flat list of concepts.
    TaxoMap makes the assumption that most semantic resources are based essentially
on classification structures. This assumption is confirmed by large scale ontologies
which contain rich lexical information and hierarchical specification without describing
specific properties or instances. Then, to find mappings we use the following available
elements: labels of concepts and hierarchical structures.
    The new implementation of TaxoMap introduces a step of refinement of mappings
(the alignment results) which extends the alignment process and completes it.
    We take part to three tests. We hope the new step of refinement helps us to perform
better in terms of precision of generated mappings.


2    Presentation of the System
Our system is composed of two elements: TaxoMap, the alignment tool and TaxoMap
Framework, an environment allowing to specify and perform refinement treatments ap-
plied on the prior obtained mappings.
2.1   State, Purpose and General Statement

TaxoMap has been designed to align owl ontologies O = (C, H). C is a set of concepts
characterized by a set of labels and H is a subsumption hierarchy which contains a set
of isA relationships between nodes corresponding to concepts. The alignment process is
an oriented process which tries to connect the concepts of a source ontology OS to the
concepts of a target ontology OT . The correspondences found are equivalence relations
(isEq), subsumption relations (isA) and their inverse (isMoreGnl) or proximity relations
(isClose).
     To identify these correspondences, TaxoMap implements techniques which exploit
the labels of the concepts and the subsumption links that connect the concepts in the
hierarchy. The morpho-syntactic analysis tool, TreeTagger [1], is used to classify the
words of the labels of the concepts and to divide them into two classes, full words and
complementary words, according to their category and their position in the labels. At
first the repartition between full and complementary words is used by a similarity mea-
sure that compares the tri-grams of the labels of the concepts [2] and gives more weight
to the common full words. Then it is also used by the alignment techniques. For exam-
ple, one technique named LabelInclusion generates an isA mapping between cs and
ctmax if (1) the concept ctmax is the concept of OT having the highest similarity value
with the concept cs of OS , (2) one of the labels of ctmax is included in one of the labels
of cs , (3) all the words of the included label of ctmax are classified as full words by
TreeTagger.

    Given a concept cS of the ontology source OS , our similarity measure identify the
concept ctmax of the target ontology OT which have the highest similarity with cS .
The alignment techniques are then used to decide if the concept cS can be effectively
aligned with this concept ctmax and which relation should be established between the
two concepts, or whether, another concept of OT must be chosen. A proposed mapping
belongs to a single method, a concept of OS can be aligned at most with one concept
of OT . In contrast, the concepts of OT may be involved in several proposed alignments.

   The main methods used to extract mappings between a concept cs in OS and a
concept ct in OT are:

 – Label equivalence: An equivalence relationship, isEq, is generated if the similar-
   ity between one label of ctmax and one label of cs is greater than a threshold
   (Equiv.threshold).
 – Label inclusion (and its inverse): If one of the labels of ctmax is included in one
   of the labels of cs , and if all words of included label are full words, we propose a
   subclass relationships < cs isA ctmax >. Inversely, if one of the labels of cs is in-
   cluded in one of the labels of ctmax , we propose the relationships < cs isM oreGnl
   ctmax >.
 – High lexical similarity: If the similarity measure of ctmax is greater than a thresh-
   old (HighSim.threshold) and if one of its labels shares at least two full words in
   common with one of the labels of cs , without being including in the labels of cs ,
   the heuristic generates the relationship < cs isClose ctmax >.
 – Reasoning on similarity values: Let ctmax and ct2 be the two concepts in OT with
   the highest similarity measure with cs , the relative similarity is the ratio of ct2
   similarity on similarity ctmax . If the relative similarity is lower than a threshold
   (isA.threshold), one of the two following techniques can be used:
     • the relationship < cs isClose ctmax > is generated if the similarity of ctmax
       is greater than a threshold (isClose.thresholdMax).
     • an isA relationship is generated between cs and the father of ctmax if the sim-
       ilarity of ctmax is greater than a second threshold (isA.thresholdMax).
 – Best similarity: If none of the above techniques is applicable, the relationship < cs
   isClose ctmax > is generated if the similarity of ctmax is greater than a threshold
   (Better.thresholdMax).
 – Property similarity: Two classes cs and ct are likely to be aligned if they share the
   same properties. Our property similarity is computed using Degree of Commonality
   Coefficient presented in [8].
    Mappings identified by TaxoMap are generated in the Alignment format used as
a standard in the OAEI campaign. We added to this format the information about the
names of the techniques that generated mappings. The aim is to facilitate the specifi-
cation of treatments exploiting the mappings generated by those techniques. All these
pieces of information are stored in a relational mappings database which can then be
queried using SQL queries. This allows, in particular, to present the generated mappings
to the expert in the validation phase, technique by technique.

   In the OAEI campaigns, only equivalence relations are evaluated in the alignment
contest. This has important implications on our results:
 1. None of the mappings generated by the label inclusion techniques that lead to a
    subsumption relation isA is considered as such. Most of them are wrong if they are
    converted into equivalence relation.
 2. Moreover, while it is natural to consider that several different concepts can be relied
    by a subsumption relation to the same concept, the conversion of subsumption rela-
    tion into equivalence relation leads to the creation of multiple equivalence relations
    for a same concept.
 3. As a concept of OT must have only one equivalent concept in OS in OAEI cam-
    paigns, if we consider the mappings leading to subsumption or proximity relations,
    all mappings which connect a concept of OS to a concept of OT which is already
    involved in an equivalence relation are false.
    We will see in the next section how the TaxoMap refinement module [7] will allow
us to remove these incorrect mappings.

2.2   TaxoMap Framework
We proposed an environment allowing to specify and perform treatments applied on the
prior obtained mappings. At first, this environment will be used to improve the quality
of an alignment provided by TaxoMap. Subsequently, it will be used for other treatments
based on mappings as enriching, restructuring or merging ontologies.
     An important feature of the approach is to allow a declarative specification of treat-
ments based on particular alignment results, concerning particular ontologies and using
a predefined vocabulary. Treatments which can be specified depend on the character-
istics of the concerned ontologies and on the task to be performed (at first mapping
refinement and subsequently ontology merging, restructuring, enriching). These treat-
ments are thus associated to independent specification modules, one for each task, each
having their own vocabulary. The approach is extensible and a priori applicable to any
treatment based on alignment results.

    We present the Mapping Refinement Pattern Language (MRPL) used to specify
mapping refinement pattern. This language differs from the one defined in [3] espe-
cially because it includes patterns which test the existence of mappings generated by
alignment techniques.

The vocabulary of MRPL contains:

 – a set of predicate constants. We distinguish three categories of predicate constants:
   the predicate constants relating to the type of techniques applied in the identification
   of a mapping by TaxoMap, the predicate constants expressing structural relations
   between concepts of a same ontology, the predicate constants expressing termino-
   logical relations between labels of concepts.
 – a set of individual constants: {a, b, c, ...}
 – a set of variables: {x, y, z, ..., } where is an unnamed variable used to represent
   parameters which do not need to be precised.
 – a set of built-in predicates: {Add M apping, Delete M apping}
 – a set of logical symbols: {∃, ∧, ¬}

   MRPL allows the definition of a context part which must be satisfied to make the
execution of a pattern possible, and of a solution part which expresses the process to
achieve when the context part is satisfied.

Context part of pattern

    The context part tests (1) the technique used to identify the considered mapping,
(2) the structural constraints on mapped elements, for example, the fact that they are
related by a subsumption relation to concepts verifying or not some properties, or (3)
the terminological constraints, for example, the fact that the labels of a concept are in-
cluded in the labels of other concepts. These conditions are represented using formulae
built from predicate symbols. So, we distinguish three kinds of formula according to
the kind of predicate symbols used.

     The formulae related to the type of techniques applied in the identification
of a mapping by TaxoMap. By testing the existence in the mappings database of
a particular relation generated by a given technique, we build formulae that implic-
itly test the conditions for the application of this technique. For example the formula
isAStrictInclusion(x, y) tests the existence of a mapping isA generated between two
concepts x and y using the technique searching LabelInclusion, t2 . It validates im-
plicitly at the same time all the conditions for the application of t2 , i.e. (1) the concept
y is the concept of OT having the highest similarity value with the concept x of OS ,
(2) one of the labels of y is included in one of the labels of x, and (3) all the words
of the labels of y are classified as full words by TreeTagger. TaxoMap includes several
alignment techniques. Thus, several predicate symbols leading to formulae of that kind
are needed. More formally, let:
    RM = {isEq, isA, isM oreGnl, isClose}, the set of correspondence relations
used by TaxoMap,
    T = {t1 , t2 , .........}, the set of techniques.
    TM , the table storing generated mappings in the form of 4-tuple (x, y, r, t) where
x ∈ CS , y ∈ CT , r ∈ RM , t ∈ T . The pairs of variables (x, y) which can instantiate
these formulae will take their values in the set (x, y) | (x, y, r, t) ∈ TM . The predicate
symbols necessary for the task of refinement presented in this paper are isEquivalent
and isAStrictInclusion the semantics of which are the following:
 – isEquivalent(x, y) is true iff ∃(x, y, isEq, t1 ) ∈ TM
 – isAStrictInclusion(x, y) is true iff ∃(x, y, isA, t2 ) ∈ TM
 – mapping(x, y) is true iff ∃(x, y, , ) ∈ TM
    The formulae expressing structural relations between concepts x and y of the
same ontology O = (C, H). Since the aim of TaxoMap is the alignment of taxonomies,
the structural relations considered here are subsumption relations. If the approach was
used with another alignment tool, other relations could be considered. Note that the
instances of variables in these formulae will be constrained, either directly because
they instantiate the previous formulae, related to the type of the applied techniques, or
indirectly by having to be in relation with other instances.
 – isSubClassOf (x, y, O) is true ⇔ isA(x, y) ∈ H
 – isP arentOf (x, y, O) is true ⇔ isA(y, x) ∈ H
 – conceptsDif f erent(x, y) is true ⇔ ID(x) 6= ID(y) with ID(x) is the identifier
   of the concept x.
   The formulae expressing terminological relations between the labels of the con-
cepts: not detailed here because not used in the examples of this paper.

Solution part of pattern

    A context part is associated to a solution part which is a set of actions to be per-
formed. This set of actions is modeled by a conjunction of built-in predicates executed
in a database. The built-in predicates are defined as follows:
 – Add M apping(x, y, r) has the effect of adding a tuple to the table TM which be-
   comes TM ∪ {(x, y, r, t)} where r and t are fixed in the treatment condition by
   instantiating the predicate corresponding to the type of technique associated with
   the considered mapping.
 – Delete M apping(x, y, ) has the effect of removing a tuple from the table TM
   which becomes TM − {(x, y, , )}.
Mapping Refinement Pattern used in OAEI

    Pattern-1: This pattern concerns mappings generated by the technique t1 , connect-
ing by an equivalence relationship a concept y of the target ontology OT with a concept
x of the source ontology OS . Because a concept y of the target ontology OT must be
involved in at most one equivalence relation, mappings involving y and obtained from
other techniques than t1 should be removed.

Context part of Pattern-1:
   ∃x∃y (isEquivalent(x, y)
   ∧ ∃z (mapping(z, y) ∧ conceptDif f erent(z, x) ))
Solution part of Pattern-1:
   Delete M apping(z, y, )




                            Fig. 1. Illustration of Pattern-1

    Pattern-2: For the anatomy subtask 4, if we know a set of reference mappings, and
we know that the pair (x, y) belongs to this set, we could express a new refinement
pattern to remove generated mappings that relie to the concept y of the target ontology,
concepts z of the source ontology other than x.

We should define the new predicate ref erenceM apping(x, y) as follow:
  ref erenceM apping(x, y) is true iff ∃(x, y) ∈ Set Ref erence M apping.

Context part of Pattern-2:
   ∃x∃y (ref erenceM apping(x, y)
   ∧ ∃z (mapping(z, y) ∧ conceptDif f erent(z, x) ))
Solution part of Pattern-2:
   Delete M apping(z, y, )


2.3     Link to the system and parameters file

TaxoMap requires:

 – Java (Version 1.6 and above )3

The version of TaxoMap used in 2010 contest can be downloaded from:

 – http://www.lri.fr/ hamdi/TaxoMap/TaxoMap.html
 3
     http://java.sun.com
2.4   Link to the Set of Provided Alignments

The alignments produced by TaxoMap are available at the following URLs:
http://www.lri.fr/˜hamdi/OAEI10/


3     Results

3.1   Benchmark Tests

Since our algorithm only provides mapping for concepts, the recall is low even for the
reference alignment. The overall results show a slight improvement over those the last
year.


3.2   Anatomy Test

The anatomy real world case is to match the Adult Mouse Anatomy (denoted by Mouse)
and the NCI Thesaurus describing the human anatomy (tagged as Human). Mouse has
2,744 classes, while Human has 3,304 classes. We considered Human as the target on-
tology as is it well structured and larger than Mouse. TaxoMap performs the alignment
in about 12 minutes.

                       Table 1. Results of TaxoMap in the different tasks

                                      Task#1             Task#2             Task#3
# Computed Mappings                   1 449              1 127              2 357
Precision without Refinement          0.779              0.929              0.477
# Removed Mappings                    226                32                 945
# Submitted Mappings                  1 223              1 095              1 412
Precision                             0.924              0.956              0.838
∆ Precision with Refinement           + 0.145            + 0.027            + 0.361
Recall                                0.743              0.689              0.774
F-measure                             0.824              0.801              0.802



    As only equivalence relationships will be evaluated in the alignment contest, we did
not use this year the techniques which generate isA relationship (except in the subtask
3) and we change isClose mapping to equivalence. In addition, we use the refinement
pattern described above to delete mappings between a concept of the target ontology
that was already aligned with an equivalence mapping. As a result, we found fewer
mappings than last year but the precision is better [4].


3.3   Directory Test

The directory task consists of Web sites directories like Google, Yahoo! or Looksmart.
Two modalities are proposed this year:
    – Small tasks: includes 4,639 tests represented by pairs of OWL ontologies.
    – Single task: includes only one matching task. The source and the target ontologies
      to be matched contain 2854 and 6555 concepts respectively.
TaxoMap takes about 40 minutes to complete each modality.

4     General Comments
4.1    Results
The new version of TaxoMap improves significantly the results on the previous ver-
sion of TaxoMap in terms of runtime and precision of generated mappings. The new
implementation offers extensibility and modularity of code. TaxoMap can be parame-
terized by the language used in ontologies, the choice of used techniques and different
thresholds.

4.2    Future Improvements
The following improvements can be made to obtain better results:
    – To take into account all concepts properties instead of only the hierarchical ones.
    – To use WordNet as a dictionary of synonymy. The synsets can enrich the termino-
      logical alignment process if an a priori disambiguation is made.
    – To develop the remaining structural techniques which proved to be efficient in last
      experiments [5] [6].

5     Conclusion
This paper reports our participation to OAEI campaign with the new implementation
of TaxoMap. Our participation in the campaign allows us to test the robustness of Tax-
oMap, the new implemented techniques and the efficacity of the new refinement module
in TaxoMap Framework.

References
[1] Schmid H. Probabilistic Part-of-Speech Tagging Using Decision Trees, International Confer-
  ence on New Methods in Language Processing (1994)
[2] Lin, D. An Information-Theoretic Definition of Similarity. ICML. Madison. (1998) 296–304
[3] Scharffe, F.: Correspondence Patterns Representations. PhD thesis, Univ. of Innsbruck, 2009.
[4] Hamdi, F., Safar, B., Niraula, NB. and Reynaud, C. TaxoMap in the OAEI 2009 alignment
  contest, Proceedings of the ISWC’09 Workshop on Ontology Matching OM-09 (2009)
[5] Reynaud, C. and Safar, B. When usual structural alignment techniques don’t apply, The
  ISWC’06 Workshop on Ontology matching (OM-06), (2006)
[6] Reynaud, C. and Safar, B. Exploiting WordNet as Background Knowledge,The ISWC’07
  Workshop on Ontology Matching (OM-07), (2007)
[7] Hamdi, F., Reynaud, C. and Safar, B., Pattern-based Mapping Refinement, 17th International
  Conference on Knowledge Engineering and Knowledge Management, EKAW (2010)
[8] Reul, Q. and Jeff Z. Pan, KOSIMap: Ontology alignments results for OAEI 2009, Proceed-
  ings of the ISWC’09 Workshop on Ontology Matching OM-09 (2009)