=Paper=
{{Paper
|id=None
|storemode=property
|title=Semantic Enrichment of Ontology Mappings: Detecting Relation Types and Complex Correspondences
|pdfUrl=https://ceur-ws.org/Vol-1020/paper_06.pdf
|volume=Vol-1020
|dblpUrl=https://dblp.org/rec/conf/gvd/Arnold13
}}
==Semantic Enrichment of Ontology Mappings: Detecting Relation Types and Complex Correspondences==
<pdf width="1500px">https://ceur-ws.org/Vol-1020/paper_06.pdf</pdf>
<pre>
     Semantic Enrichment of Ontology Mappings: Detecting
        Relation Types and Complex Correspondences

                                                                           ∗
                                                          Patrick Arnold
                                                          Universität Leipzig
                                              arnold@informatik.uni-leipzig.de


ABSTRACT                                                              being a tripe (s, t, c), where s is a concept in the source ontol-
While there are numerous tools for ontology matching, most            ogy, t a concept in the target ontology and c the confidence
approaches provide only little information about the true na-         (similarity).
ture of the correspondences they discover, restricting them-             These tools are able to highly reduce the effort of man-
selves on the mere links between matching concepts. How-              ual ontology mapping, but most approaches solely focus on
ever, many disciplines such as ontology merging, ontology             detecting the matching pairs between two ontologies, with-
evolution or data transformation, require more-detailed in-           out giving any specific information about the true nature
formation, such as the concrete relation type of matches or           of these matches. Thus, a correspondence is commonly re-
information about the cardinality of a correspondence (one-           garded an equivalence relation, which is correct for a corre-
to-one or one-to-many). In this study we present a new ap-            spondence like (zip code, postal code), but incorrect for cor-
proach where we denote additional semantic information to             respondences like (car, vehicle) or (tree trunk, tree), where
an initial ontology mapping carried out by a state-of-the-art         is-a resp. part-of would be the correct relation type. This re-
matching tool. The enriched mapping contains the relation             striction is an obvious shortcoming, because in many cases
type (like equal, is-a, part-of) of the correspondences as well       a mapping should also include further kinds of correspon-
as complex correspondences. We present different linguistic,          dences, such as is-a, part-of or related. Adding these infor-
structural and background knowledge strategies that allow             mation to a mapping is generally beneficial and has been
semi-automatic mapping enrichment, and according to our               shown to considerably improve ontology merging [13]. It
first internal tests we are already able to add valuable se-          provides more precise mappings and is also a crucial aspect
mantic information to an existing ontology mapping.                   in related areas, such as data transformation, entity resolu-
                                                                      tion and linked data.
                                                                         An example is given in Fig. 1, which depicts the basic
Keywords                                                              idea of our approach. While we get a simple alignment as
ontology matching, relation type detection, complex corre-            input, with the mere links between concepts (above picture),
spondences, semantic enrichment                                       we return an enriched alignment with the relation type an-
                                                                      notated to each correspondence (lower picture). As we will
1.   INTRODUCTION                                                     point out in the course of this study, we use different linguis-
                                                                      tic methods and background knowledge in order to find the
   Ontology matching plays a key role in data integration
                                                                      relevant relation type. Besides this, we have to distinguish
and ontology management. With the ontologies getting in-
                                                                      between simple concepts (as ”Office Software”) and complex
creasingly larger and more complex, as in the medical or
                                                                      concepts, which contain itemizations like ”Monitors and Dis-
biological domain, efficient matching tools are an important
                                                                      plays”, and which need a special treatment for relation type
prerequisite for ontology matching, merging and evolution.
                                                                      detection.
There are already various approaches and tools for ontol-
                                                                         Another issue of present ontology matchers is their restric-
ogy matching, which exploit most different techniques like
                                                                      tion to (1:1)-correspondences, where exactly one source con-
lexicographic, linguistic or structural methods in order to
                                                                      cept matches exactly one target concept. However, this can
identify the corresponding concepts between two ontologies
                                                                      occasionally lead to inaccurate mappings, because there may
[16], [2]. The determined correspondences build a so-called
                                                                      occur complex correspondences where more than one source
alignment or ontology mapping, with each correspondence
                                                                      element corresponds to a target element or vice versa, as
∗                                                                     the two concepts first name and last name correspond to a
                                                                      concept name, leading to a (2:1)-correspondence. We will
                                                                      show in section 5 that distinguishing between one-to-one
                                                                      and one-to-many correspondences plays an important role
                                                                      in data transformation, and that we can exploit the results
                                                                      from the relation type detection to discover such complex
                                                                      matches in a set of (1:1)-matches to add further knowledge
                                                                      to a mapping.
25th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
                                                                         In this study we present different strategies to assign the
banken), 28.05.2013 - 31.05.2013, Ilmenau, Germany.                   relation types to an existing mapping and demonstrate how
Copyright is held by the author/owner(s).
                                                                   lence, less/more-general (is-a / inverse is-a) and is-close (”re-
                                                                   lated”) and exploits linguistic techniques and background
                                                                   sources such as WordNet. The linguistic strategies seem
                                                                   rather simple; if a term appears as a part in another term,
                                                                   a more-general relation is assumed which is not always the
                                                                   case. For example, in Figure 1 the mentioned rule holds
                                                                   for the correspondence between Games and Action Games,
                                                                   but not between M onitors and M onitors and Displays. In
                                                                   [14], the authors evaluated Taxomap for a mapping scenario
                                                                   with 162 correspondences and achieved a recall of 23 % and
                                                                   a precision of 89 %.
                                                                      The LogMap tool [9] distinguishes between equivalence
                                                                   and so-called weak (subsumption / is-a) correspondences. It
                                                                   is based on Horn Logic, where first lexicographic and struc-
                                                                   tural knowledge from the ontologies is accumulated to build
                                                                   an initial mapping and subsequently an iterative process is
                                                                   carried out to first enhance the mapping and then to verify
                                                                   the enhancement. This tool is the least precise one with
                                                                   regard to relation type detection, and in evaluations the re-
                                                                   lation types were not further regarded.
                                                                      Several further studies deal with the identification of se-
                                                                   mantic correspondence types without providing a complete
                                                                   tool or framework. An approach utilizing current search
                                                                   engines is introduced in [10]. For two concepts A, B they
                                                                   generate different search queries like ”A, such as B” or ”A,
                                                                   which is a B” and submit them to a search engine (e.g.,
                                                                   Google). They then analyze the snippets of the search en-
                                                                   gine results, if any, to verify or reject the tested relation-
Figure 1: Input (above) and output (below) of the                  ship. The approach in [15] uses the Swoogle search engine
Enrichment Engine                                                  to detect correspondences and relationship types between
                                                                   concepts of many crawled ontologies. The approach sup-
                                                                   ports equal, subset or mismatch relationships. [17] exploits
complex correspondences can be discovered. Our approach,           reasoning and machine learning to determine the relation
which we refer to as Enrichment Engine, takes an ontology          type of a correspondence, where several structural patterns
mapping generated by a state-of-the-art matching tool as in-       between ontologies are used as training data.
put and returns a more-expressive mapping with the relation           Unlike relation type determination, the complex corre-
type added to each correspondence and complex correspon-           spondence detection problem has hardly been discussed so
dences revealed. According to our first internal tests, we         far. It was once addressed in [5], coming to the conclusion
recognized that even simple strategies already add valuable        that there is hardly any approach for complex correspon-
information to an initial mapping and may be a notable gain        dence detection because of the vast amount of required com-
for current ontology matching tools.                               parisons in contrast to (1:1)-matching, as well as the many
  Our paper is structured as follows: We discuss related           possible operators needed for the mapping function. One
work in section 2 and present the architecture and basic           key observation for efficient complex correspondence detec-
procedure of our approach in section 3. In section 4 we            tion has been the need of large amounts of domain knowl-
present different strategies to determine the relation types       edge, but until today there is no available tool being able to
in a mapping, while we discuss the problem of complex cor-         semi-automatically detect complex matches.
respondence detection in section 5. We finally conclude in            One remarkable approach is iMAP [4], where complex
section 6.                                                         matches between two schemas could be discovered and even
                                                                   several transformation functions calculated, as RoomP rice =
2.   RELATED WORK                                                  RoomP rice∗(1+T axP rice). For this, iMAP first calculates
                                                                   (1:1)-matches and then runs an iterative process to gradu-
  Only a few tools and studies regard different kinds of
                                                                   ally combine them to more-complex correspondences. To
correspondences or relationships for ontology matching. S-
                                                                   justify complex correspondences, instance data is analyzed
Match [6][7] is one of the first such tools for ”semantic ontol-
                                                                   and several heuristics are used. In [8] complex correspon-
ogy matching”. They distinguish between equivalence, sub-
                                                                   dences were also regarded for matching web query inter-
set (is-a), overlap and mismatch correspondences and try
                                                                   faces, mainly exploiting co-occurrences. However, in order
to provide a relationship for any pair of concepts of two
                                                                   to derive common co-occurrences, the approach requires a
ontologies by utilizing standard match techniques and back-
                                                                   large amount of schemas as input, and thus does not appear
ground knowledge from WordNet. Unfortunately, the result
                                                                   appropriate for matching two or few schemas.
mappings tend to become very voluminous with many corre-
                                                                      While the approaches presented in this section try to a-
spondences per concept, while users are normally interested
                                                                   chieve both matching and semantic annotation in one step,
only in the most relevant ones.
                                                                   thus often tending to neglect the latter part, we will demon-
  Taxomap [11] is an alignment tool developed for the geo-
                                                                   strate a two-step architecture in which we first perform a
graphic domain. It regards the correspondence types equiva-
schema mapping and then concentrate straight on the en-           Strategy             equal       is-a     part-of    related
                                                                  Compounding                       X
richment of the mapping (semantic part). Additionally, we         Background K.          X          X          X          X
want to analyze several linguistic features to provide more       Itemization            X          X
qualitative mappings than obtained by the existing tools,         Structure                         X          X
and finally develop an independent system that is not re-
stricted to schema and ontology matching, but will be dif-       Table 1: Supported correspondence types by the
ferently exploitable in the wide field of date integration and   strategies
data analysis.
                                                                 ”undecided”. In this case we assign the relation type ”equal”,
3.   ARCHITECTURE                                                because it is the default type in the initial match result and
  As illustrated in Fig. 2 our approach uses a 2-step ar-        possibly the most likely one to hold. Secondly, there might
chitecture in which we first calculate an ontology mapping       be different outcomes from the strategies, e.g., one returns
(match result) using our state-of-the-art matching tool          is-a, one equal and the others undecided. There are different
COMA 3.0 (step 1) [12] and then perform an enrichment            ways to solve this problem, e.g., by prioritizing strategies or
on this mapping (step 2).                                        relation types. However, we hardly discovered such cases so
  Our 2-step approach for semantic ontology matching offers      far, so we currently return ”undecided” and request the user
different advantages. First of all, we reduce complexity com-    to manually specify the correct type.
pared to 1-step approaches that try to directly determine the       At the present, our approach is already able to fully assign
correspondence type when comparing concepts in O1 with           relation types to an input mapping using the 4 strategies,
concepts in O2 . For large ontologies, such a direct match-      which we will describe in detail in the next section. We have
ing is already time-consuming and error-prone for standard       not implemented strategies to create complex matches from
matching. The proposed approaches for semantic matching          the match result, but will address a couple of conceivable
are even more complex and could not yet demonstrate their        techniques in section 5.
general effectiveness.
  Secondly, our approach is generic as it can be used for
different domains and in combination with different match-       4.    IMPLEMENTED STRATEGIES
ing tools for the first step. We can even re-use the tool in        We have implemented 4 strategies to determine the type
different fields, such as entity resolution or text mining. On   of a given correspondence. Table 1 gives an overview of the
the other hand, this can also be a disadvantage, since the       strategies and the relation types they are able to detect. It
enrichment step depends on the completeness and quality of       can be seen that the Background Knowledge approach is
the initially determined match result. Therefore, it is im-      especially valuable, as it can help to detect all relationship
portant to use powerful tools for the initial matching and       types. Besides, all strategies are able to identify is-a corre-
possibly to fine-tune their configuration.                       spondences.
                                                                    In the following let O1 , O2 be two ontologies with c1 , c2
                                                                 being two concepts from O1 resp. O2 . Further, let C =
                                                                 (c1 , c2 ) be a correspondence between two concepts (we do
                                                                 not regard the confidence value in this study).

                                                                 4.1    Compound Strategy
                                                                    In linguistics, a compound is a special word W that con-
                                                                 sists of a head WH carrying the basic meaning of W , and
                                                                 a modifier WM that specifies WH [3]. In many cases, a
                                                                 compound thus expresses something more specific than its
                                                                 head, and is therefore a perfect candidate to discover an is-a
                                                                 relationship. For instance, a blackboard is a board or an
                                                                 apple tree is a tree. Such compounds are called endocen-
                                                                 tric compounds, while exocentric compounds are not related
Figure 2: Basic Workflow for Mapping Enrichment                  with their head, such as buttercup, which is not a cup, or saw
                                                                 tooth, which is not a tooth. These compounds are of literal
   The basics of the relation type detection, on which we fo-    meaning (metaphors) or changed their spelling as the lan-
cus in this study, can be seen in the right part of Fig. 2. We   guage evolved, and thus do not hold the is-a relation, or only
provide 4 strategies so far (Compound, Background Knowl-         to a very limited extent (like airport, which is a port only in
edge, Itemization, Structure), where each strategy returns       a broad sense). There is a third form of compounds, called
the relation type of a given correspondence, or ”undecided”      appositional or copulative compounds, where the two words
in case no specific type can be determined. In the Enrich-       are at the same level, and the relation is rather more-general
ment step we thus iterate through each correspondence in         (inverse is-a) than more-specific, as in Bosnia-Herzegowina,
the mapping and pass it to each strategy. We eventually          which means both Bosnia and Herzegowina, or bitter-sweet,
annotate the type that was most frequently returned by the       which means both bitter and sweet (not necessarily a ”spe-
strategies (type computation). In this study, we regard 4        cific bitter” or a ”specific sweet”). However, this type is quite
distinct relation types: equal, is-a and inv. is-a (composi-     rare.
tion), part-of and has-a (aggregation), as well as related.         In the following, let A, B be the literals of two con-
   There are two problems we may encounter when comput-          cepts of a correspondence. The Compound Strategy ana-
ing the correspondence type. First, all strategies may return    lyzes whether B ends with A. If so, it seems likely that B
is a compound with head A, so that the relationship B is-a             by w1 .
A (or A inv. is-a B) is likely to hold. The Compound ap-
proach allows us to identify the three is-a correspondences        3. Remove each w1 ∈ I1 , w2 ∈ I2 if there is a synonym
shown in Figure 1 (below).                                            pair (w1 , w2 ).
   We added an additional rule to this simple approach: B is       4. Remove each w2 ∈ I2 which is a hyponym of w1 ∈ I1 .
only considered a compound to A if length(B)−length(A) ≥
3, where length(X) is the length of a string X. Thus, we           5. Determine the relation type:
expect the supposed compound to be at least 3 characters
longer than the head it matches. This way, we are able to              (a) If I1 = ∅, I2 = ∅: equal
eliminate obviously wrong compound conclusions, like sta-              (b) If I1 = ∅, |I2 | ≥ 1: is-a
ble is a table, which we call pseudo compounds. The value                  If I2 = ∅, |I1 | ≥ 1: inverse is-a
of 3 is motivated by the observation that typical nouns or              (c) If |I1 | ≥ 1, I2 ≥ 1: undecided
adjectives consist of at least 3 letters.
                                                                 The rationale behind this algorithm is that we remove items
4.2   Background Knowledge                                       from the item sets as long as no information gets lost. Then
   Background knowledge is commonly of great help in on-         we compare what is left in the two sets and come to the
tology matching to detect more difficult correspondences,        conclusions presented in step 5.
especially in special domains. In our approach, we intend to        Let us consider the concept pair C1 = ”books, ebooks,
use it for relation type detection. So far, we use WordNet       movies, films, cds” and C2 =”novels, cds”. Our item sets are
3.0 to determine the relation that holds between two words       I1 = {books, ebooks, movies, f ilms, cds}, I2 = {novels, cds}.
(resp. two concepts). WordNet is a powerful dictionary and       First, we remove synonyms and hyponyms within each set,
thesaurus that contains synonym relations (equivalence), hy-     because this would cause no loss of information (steps 1+2).
pernym relations (is-a) and holonym relations (part-of) be-      We remove f ilms in I1 (because of the synonym movies)
tween words [22]. Using the Java API for WordNet Search          and ebooks in I1 , because it is a hyponym of books. We have
(JAWS), we built an interface that allows to answer ques-        I1 = {books, movies, cds} , I2 = {novels, cds}. Now we re-
tions like ”Is X a synonym to Y?”, or ”Is X a direct hyper-      move synonym pairs between the two item sets, so we remove
nym of Y?”. The interface is also able to detect cohyponyms,     cds in either set (step 3). Lastly, we remove a hyponym in I1
which are two words X, Y that have a common direct hyper-        if there is a hypernym in I2 (step 4). We remove novel in I2 ,
nym Z. We call a correspondence between two cohyponyms           because it is a book. We have I1 = {books, movies} , I2 = ∅.
X and Y related, because both concepts are connected to          Since I1 still contains items, while I2 is empty, we conclude
the same father element. For example, the relation between       that I1 specifies something more general, i.e., it holds C1
apple tree and pear tree is related, because of the common       inverse is-a C2 .
father concept tree.                                                If neither item set is empty, we return ”undecided” because
   Although WordNet has a limited vocabulary, especially         we cannot derive an equal or is-a relationship in this case.
with regard to specific domains, it is a valuable source to
detect the relation type that holds between concepts. It al-     4.4    Structure Strategy
lows an excellent precision, because the links in WordNet are       The structure strategy takes the structure of the ontolo-
manually defined, and contains all relation types we intend      gies into account. For a correspondence between concepts
to detect, which the other strategies are not able to achieve.   Y and Z we check whether we can derive a semantic rela-
                                                                 tionship between a father concept X of Y and Z (or vice
4.3   Itemization                                                versa). For an is-a relationship between Y and X we draw
   In several taxonomies we recognized that itemizations ap-     the following conclusions:
pear very often, and which cannot be processed with the pre-
viously presented strategies. Consider the correspondence           • X equiv Z → Y is-a Z
(”books and newspapers”, ”newspapers”). The compound                • X is-a Z → Y is-a Z
strategy would be mislead and consider the source concept
a compound, resulting in the type ”is-a”, although the op-       For a part-of relationship between Y and X we can analo-
posite is the case (inv. is-a). WordNet would not know the       gously derive:
word ”books and newspapers” and return ”undecided”.
   Itemizations thus deserve special treatment. We first split      • X equiv Z → Y part-of Z
each itemization in its atomic items, where we define an item
as a string that does not contain commas, slashes or the            • X part-of Z → Y part-of Z
words ”and” and ”or”.
                                                                 The approach obviously utilizes the semantics of the intra-
   We now show how our approach determines the correspon-
                                                                 ontology relationships to determine the correspondence types
dence types between two concepts C1 , C2 where at least one
                                                                 for pairs of concepts for which the semantic relationship can-
of the two concepts is an itemization with more than one
                                                                 not directly be determined.
item. Let I1 be the item set of C1 and I2 the item set of
C2 . Let w1 , w2 be two words, with w1 6= w2 . Our approach      4.5    Comparison
works as follows:
                                                                   We tested our strategies and overall system on 3 user-
  1. In each set I remove each w1 ∈ I which is a hyponym         generated mappings in which each correspondence was tagged
     of w2 ∈ I.                                                  with its supposed type. After running the scenarios, we
                                                                 checked how many of the non-trivial relations were detected
  2. In each set I, replace a synonym pair (w1 ∈ I, w2 ∈ I)      by the program. The 3 scenario consisted of about 350
.. 750 correspondences. We had a German-language sce-
nario (product catalogs from online shops), a health scenario
(diseases) and a text annotation catalog scenario (everyday
speech).
   Compounding and Background Knowledge are two inde-
pendent strategies that separately try to determine the rela-
tion type of a correspondence. In our tests we saw that Com-
pounding offers a good precision (72 .. 97 %), even without
the many exocentric and pseudo-compounds that exist. By
contrast, we recognized only moderate recall, ranging from
12 to 43 %. Compounding is only able to determine is-a
relations, however, it is the only strategy that invariably
works.
   Background Knowledge has a low or moderate recall (10 ..     Figure 3: Match result containing two complex cor-
50 %), depending on the scenario at hand. However, it offers    respondences (name and address)
an excellent precision being very close to 100 % and is the
only strategy that is able to determine all relation types we
regard. As matter of fact, it did not work on our German-
                                                                structure of the schemas to transform several (1:1)-corres-
language example and only poorly in our health scenario.
                                                                pondences into a complex correspondence, although these
   Structure and Itemization strategy depend much on the
                                                                approaches will fail in more intricate scenarios. We used
given schemas and are thus very specific strategies to han-
                                                                the structure of the schemas and the already existing (1:1)-
dle individual cases. They exploit the Compound and Back-
                                                                matches to derive complex correspondences. Fig. 3 demon-
ground Knowledge Strategy and are thus not independent.
                                                                strates this approach. There are two complex correspon-
Still, they were able to boost the recall to some degree.
                                                                dences in the mapping, ( (First Name, Last Name), (Name))
   We realized that the best result is gained by exploiting
                                                                and ( (Street, City, Zip Code, Country), Address), repre-
all strategies. Currently, we do not weight the strategies,
                                                                sented by simple (1:1)-correspondences. Our approach was
however, we may do so in order to optimize our system. We
                                                                able to detect both complex correspondences. The first one
finally achieved an overall recall between 46 and 65 % and
                                                                (name) was detected, because first name and last name can-
precision between 69 and 97 %.
                                                                not be mapped to one element at the same time, since the
                                                                name element can only store either of the two values. The
5.   COMPLEX CORRESPONDENCES                                    second example (address) is detected since schema data is
   Schema and ontology matching tools generally calculate       located in the leaf nodes, not in inner nodes. In database
(1:1)-correspondences, where exactly one source element         schemas we always expect data to reside in the leaf nodes,
matches exactly one target element. Naturally, either el-       so that the match (Address, Address) is considered unrea-
ement may take part in different correspondences, as in         sonable.
(name, first name) and (name, last name), however, having          In the first case, our approach would apply the concatena-
these two separate correspondences is very imprecise and the    tion function, because two values have to be concatenated to
correct mapping would rather be the single correspondence       match the target value, and in the second case the split func-
( (first name, last name), (name)). These kind of matches       tion would be applied, because the Address values have to
are called complex correspondences or one-to-many corre-        be split into the address components (street, city, zip code,
spondences.                                                     country). The user needs to adjust these functions, e.g., in
   The disambiguation between a complex correspondence          order to tell the program where in the address string the
or 2 (or more) one-to-one correspondences is an inevitable      split operations have to be performed.
premise for data transformation where data from a source           This approach was mostly based on heuristics and would
database is to be transformed into a target database, which     only work in simple cases. Now that we are able to de-
we could show in [1]. Moreover, we could prove that each        termine the relation types of (1:1)-matches, we can enhance
complex correspondence needs a transformation function in       this original approach. If a node takes part in more than one
order to correctly map data. If elements are of the type        composition relation (part-of / has-a), we can conclude that
string, the transformation function is normally concatena-      it is a complex correspondence and can derive it from the
tion in (n:1)-matches and split in (1:n)-matches. If the el-    (1:1)-correspondences. For instance, if we have the 3 corre-
ements are of a numerical type, as in the correspondence        spondences (day part-of date), (month part-of date), (year
( (costs), ((operational costs), (material costs), (personnel   part-of date) we could create the complex correspondence (
costs))), a set of numerical operations is normally required.   (day, month, year), date).
   There are proprietary solutions that allow to manually          We have not implemented this approach so far, and we as-
create transformation mappings including complex corre-         sume that detecting complex correspondences and the cor-
spondences, such as Microsoft Biztalk Server [19], Altova       rect transformation function will still remain a very challeng-
MapForce [18] or Stylus Studio [20], however, to the best       ing issue, so that we intend to investigate additional methods
of our knowledge there is no matching tool that is able to      like using instance data to allow more effectiveness. How-
detect complex correspondences automatically. Next to rela-     ever, adding these techniques to our existing Enrichment
tion type detection, we therefore intend to discover complex    Engine, we are able to present a first solution that semi-
correspondences in the initial mapping, which is a second       automatically determines complex correspondences, which
important step of mapping enrichment.                           is another step towards more precise ontology matching, and
   We already developed simple methods that exploit the         an important condition for data transformation.
6.   OUTLOOK AND CONCLUSION                                       [4] Dhamankar, R., Yoonkyong, L., Doan, A., Halevy, A.,
   We presented a new approach to semantically enrich ontol-          Domingos, P.: iMAP: Discovering Complex Semantic
ogy mappings by determining the concrete relation type of a           Matches between Database Schemas. In: SIGMOD ’04,
correspondence and detecting complex correspondences. For             pp. 383–394
this, we developed a 2-step architecture in which the actual      [5] Doan, A., Halevy, A. Y.: Semantic Integration
ontology matching and the semantic enrichment are strictly            Research in the Database Community: A Brief Survey.
separated. This makes the Enrichment Engine highly generic            In AI Mag. (2005), pp. 83–94
so that it is not designed for any specific ontology matching     [6] Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-Match:
tool, and moreover, can be used independently in various              An Algorithm and an Implementation of Semantic
fields different from ontology matching, such as data trans-          Matching. Proceedings of the European Semantic Web
formation, entity resolution and text mining.                         Symposium (2004), LNCS 3053, pp. 61–75
   In our approach we developed new linguistic strategies         [7] Giunchiglia, F., Autayeu, A., Pane, J.: S-Match: an
to determine the relation type, and with regard to our first          open source framework for matching lightweight
internal tests even the rather simple strategies already added        ontologies. In: Semantic Web, vol. 3-3 (2012), pp.
much useful information to the input mapping. We also                 307-317
discovered that some strategies (Compounding, and to a less       [8] He, B., Chen-Chuan Chang, H., Han, J.: Discovering
degree Itemization and Structure) are rather independent              complex matchings across web query interfaces: A
from the language of the ontologies, so that our approach             correlation mining approach. In: KDD ’04, pp. 148–157
provided remarkable results both in German and English-           [9] Jiménez-Ruiz, E., Grau, B. C.: LogMap: Logic-Based
language ontologies.                                                  and Scalable Ontology Matching. In: International
   One important obstacle is the strong dependency to the             Semantic Web Conference (2011), LNCS 7031, pp.
initial mapping. We recognized that matching tools tend to            273–288
discover equivalence relations, so that different non-equiva-     [10] van Hage, W. R., Katrenko, S., Schreiber, G. A
lence correspondences are not contained by the initial map-           Method to Combine Linguistic Ontology-Mapping
ping, and can thus not be detected. It is future work to              Techniques. In: International Semantic Web Conference
adjust our tool COMA 3.0 to provide a more convenient in-             (2005), LNCS 3729, pp. 732–744
put, e.g., by using relaxed configurations. A particular issue    [11] Hamdi, F., Safar, B., Niraula, N. B., Reynaud, C.:
we are going to investigate is the use of instance data con-          TaxoMap alignment and refinement modules: Results
nected with the concepts to derive the correct relation type          for OAEI 2010. Proceedings of the ISWC Workshop
if the other strategies (which operate on the meta level) fail.
                                                                      (2010), pp. 212–219
This will also result in a time-complexity problem, which we
                                                                  [12] Massmann, S., Raunich, S., Aumueller, D., Arnold, P.,
will have to consider in our ongoing research.
                                                                      Rahm, E. Evolution of the COMA Match System. Proc.
   Our approach is still in a rather early state, and there
                                                                      Sixth Intern. Workshop on Ontology Matching (2011)
is still much space for improvement, since the implemented
strategies have different restrictions so far. For this reason,   [13] Raunich, S.,Rahm, E.: ATOM: Automatic
we will extend and fine-tune our tool in order to increase            Target-driven Ontology Merging. Proc. Int. Conf. on
effectiveness and precision. Among other aspects, we intend           Data Engineering (2011)
to improve the structure strategy by considering the entire       [14] Reynaud, C., Safar, B.: Exploiting WordNet as
concept path rather than the mere father concept, to add              Background Knowledge. Proc. Intern. ISWCŠ07
further background knowledge to the system, especially in             Ontology Matching (OM-07) Workshop
specific domains, and to investigate further linguistic strate-   [15] Sabou, M., d’Aquin, M., Motta, E.: Using the
gies, for instance, in which way compounds also indicate the          semantic web as background knowledge for ontology
part-of relation. Next to relation type detection, we will also       mapping. Proc. 1st Intern. Workshop on on Ontology
concentrate on complex correspondence detection in data               Matching (2006).
transformation to provide further semantic information to         [16] Shvaiko, P., Euzenat, J.: A Survey of Schema-based
ontology mappings.                                                    Matching Approaches. J. Data Semantics IV (2005),
                                                                      pp. 146–171
7.   ACKNOWLEDGMENT                                               [17] Spiliopoulos, V., Vouros, G., Karkaletsis, V: On the
                                                                      discovery of subsumption relations for the alignment of
   This study was partly funded by the European Commis-
                                                                      ontologies. Web Semantics: Science, Services and
sion through Project ”LinkedDesign” (No. 284613 FoF-ICT-
                                                                      Agents on the World Wide Web 8 (2010), pp. 69-88
2011.7.4).
                                                                  [18] Altova MapForce - Graphical Data Mapping,
                                                                      Conversion, and Integration Tool.
8.   REFERENCES                                                       http://www.altova.com/mapforce.html
[1] Arnold P.: The Basics of Complex Correspondences              [19] Microsoft BizTalk Server.
    and Functions and their Implementation and                        http://www.microsoft.com/biztalk
    Semi-automatic Detection in COMA++ (Master’s                  [20] XML Editor, XML Tools, and XQuery - Stylus
    thesis), University of Leipzig, 2011.                             Studio. http://www.stylusstudio.com/
[2] Bellahsene., Z., Bonifati, A., Rahm, E. (eds.): Schema        [21] Java API for WordNet Searching (JAWS),
    Matching and Mapping, Springer (2011)                             http://lyle.smu.edu/~tspell/jaws/index.html
[3] Bisetto, A., Scalise, S.: Classification of Compounds.        [22] WordNet - A lexical database for English,
    University of Bologna, 2009. In: The Oxford Handbook              http://wordnet.princeton.edu/wordnet/
    of Compounding, Oxford University Press, pp. 49-82.

</pre>