<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linguistic Analysis for Complex Ontology Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dominique Ritze</string-name>
          <email>dritze@mail.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johanna Vo¨lker</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Meilicke</string-name>
          <email>christiang@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ondrˇej Sˇ va´b-Zamazal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Economics</institution>
          ,
          <addr-line>Prague</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Mannheim</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Current ontology matching techniques focus on detecting correspondences between atomic concepts and properties. Nevertheless, it is necessary and possible to detect correspondences between complex concept or property descriptions. In this paper, we demonstrate how complex matching can benefit from natural language processing techniques, and propose an enriched set of correspondence patterns leveraging linguistic matching conditions. After elaborating on the integration of methods for the linguistic analysis of textual labels with an existing framework for detecting complex correspondences, we present the results of an experimental evaluation on an OAEI dataset. The results of our experiments indicate a large increase of precision as compared to the original approach, which was based on similarity measures and thresholds.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Ontology matching can be considered one of the key technologies for efficient
knowledge exchange and the successful realization of the Semantic Web. Bridging the gap
between different terminological representations is an indispensable requirement for a
large variety of tools and technologies including, for example, distributed reasoning,
instance migration, and query rewriting in distributed environments.</p>
      <p>
        In the past, ontology matching was commonly considered the task of detecting
similar or equivalent concepts and properties in two ontologies. However, this view on
ontology matching seems too narrow for many application scenarios, and new
requirements have motivated several extensions to the original task definition. Among the
challenges of the last Ontology Alignment Evaluation Initiative (OAEI) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], for example,
we find the task of instance matching as well as a track that aims at the generation of
correspondences expressing subsumption (instead of equivalence) between concepts.
      </p>
      <p>In our work, we suggest to extend the classical way of ontology matching in a
different direction – the generation of correspondences between complex concept and
property descriptions. We refer to these correspondences as complex correspondences
and call the process of generating them as complex ontology matching. Our work is
motivated by the insight that equivalence or even subsumption correspondences between
atomic entities are often not applicable or not expressive enough to capture important
dependencies between ontological entities.</p>
      <p>
        This paper is based on our previous approach [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which we found to have several
disadvantages, including problems related to the precision of the patterns as well as
the prerequisite of a reference alignment as additional input. In order to address these
issues, we modified the original approach in the following way:
– A partial reference alignment is no longer required as input. We generate the
alignment in a preprocessing step and use it as an anchor alignment later on.
– The complete logic required to express the conditions for generating
correspondences is now described declaratively by means of XML. The XML-based
specification of the matching conditions is interpreted and executed by our tool, while the
concrete implementation remains transparent to the user.
– We changed the output format of our system so that it adheres to the
Expressive Declarative Ontology Alignment Language (EDOAL) for complex
correspondences, that is supported by the alignment API [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
– We extended our approach by various methods for the linguistic analysis of
concept or property labels. According to our experiments, the appropriate use of these
methods results in a significantly increased precision.
      </p>
      <p>
        The last point refers to the most essential contribution of this paper. In our previous
approach, many matching conditions included similarity thresholds. Thanks to
linguistic methods, we can now avoid the need for finding appropriate thresholds. In the
remainder of this paper, we show that the use of these methods significantly improves the
quality of complex matching. In particular, we find that the linguistic analysis enables
us to achieve a significantly higher precision with respect to the previously detected [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
pattern instantiations. Moreover, we present a new correspondence pattern leveraging
linguistic matching conditions and illustrate the advantages of the new approach by
means of concrete examples.
      </p>
      <p>Our paper is structured as follows. In Section 2 we describe our approach to
complex matching. We introduce the terminology we adopt, and sketch the core elements of
our algorithm. Section 3 discusses the related work, whereas in Section 4, we describe
the linguistic methods that we apply to detect non-trivial semantic relations between
concepts and properties. The pattern-specific matching conditions, which constitute the
heart of our approach, are presented in Section 5. In Section 6, we report on our
evaluation experiments, before concluding in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        Now we introduce the basics of our approach and explain the terminology we use in the
subsequent sections. First of all, we adopt and slightly simplify the generic terminology
defined in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Thus, we understand an alignment between two ontologies O1 and O2
as a set of correspondences. A correspondence is a triple hX; Y; Zi where X is an
entity from O1, Y is an entity from O2 and R is a relation such as equivalence or
subsumption. Whenever it is required to refer to the origin of a specific concept, we
write C#i to indicate that C belongs to Oi.
      </p>
      <p>State-of-the-art ontology matching techniques are bound to detect correspondences
as hPaper; Article; i or hwrites; writesPaper; i. In the following we describe an
approach that allows to detect correspondences where X and Y are complex concept or
property descriptions.</p>
      <p>Remember that the power of description logic originates from the ability to build
complex concept and property descriptions from atomic ones, i.e. from concept and
property names. The following listing shows some of the different ways to construct
complex descriptions in description logics (here at the example of SHOIN ).
fo1; : : : ; ong (one of)</p>
      <p>:C (atomic negation)</p>
      <sec id="sec-2-1">
        <title>B u C (conjunction)</title>
      </sec>
      <sec id="sec-2-2">
        <title>B t C (disjunction)</title>
      </sec>
      <sec id="sec-2-3">
        <title>9P:C (exists restriction)</title>
      </sec>
      <sec id="sec-2-4">
        <title>8P:C (value restriction)</title>
      </sec>
      <sec id="sec-2-5">
        <title>9 nP (at least restriction)</title>
      </sec>
      <sec id="sec-2-6">
        <title>9 nP (at most restriction)</title>
        <p>P 1 (inverse property)</p>
        <p>In this listing, C refers to an arbitrary concept description and P refers to a property
name. We define a correspondence hX; Y; Zi, where X or Y is built according to one
of the rules, as complex correspondence. An alignment that contains a complex
correspondence as defined as a complex alignment. Note also that several of these rules can
be applied sequentially according to their use in standard description logics.</p>
        <p>In the following we will talk about matching conditions and correspondence
patterns. A correspondence pattern describes a special type of complex
correspondence. Suppose, for example, that our algorithms detects a correspondence
h9earlyRegistered:ftrueg; EarlyRegisteredParticipant; i. This correspondence is a
concrete instantiation of the general correspondence pattern h9P:ftrueg; C; i, where
P and C denote variables. A correspondence pattern can coincide with one of the
construction rules listed above, but can also be compounded of several rules and might
contain constants, as shown in the example.</p>
        <p>O1
O2</p>
        <sec id="sec-2-6-1">
          <title>Non-complex correspondences</title>
        </sec>
        <sec id="sec-2-6-2">
          <title>Matching conditions</title>
        </sec>
        <sec id="sec-2-6-3">
          <title>Matching conditions</title>
        </sec>
        <sec id="sec-2-6-4">
          <title>Matching conditions</title>
          <p>- x is subclass of y OR ...
- head of z is hyponym of y
- range of p is ...</p>
        </sec>
        <sec id="sec-2-6-5">
          <title>Non-complex correspondences</title>
        </sec>
        <sec id="sec-2-6-6">
          <title>Complex</title>
          <p>correspondences
with instantiations of
different correspondence
pattern</p>
          <p>In our approach we define for each correspondence pattern a set of matching
conditions. If these conditions are fulfilled, we generate a concrete instantiation of the pattern
and output a complex correspondence. The corresponding approach is depicted in
Figure 1. First of all we generate by state of the art methods a non-complex alignment.
This alignment is used as a kind of anchor that allows to check if certain relations hold
between entities of O1 and O2 such as ”is concept C in O1 a subconcept of concept
D in O2”. The matching conditions used to detect a certain pattern comprise structural
conditions as well as the linguistic conditions are presented in Section 4.</p>
          <p>The following section reviews related work in the field of complex ontology
matching as well as recent approaches to leveraging linguistic tools and resources in ontology
matching.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Considering the state-of-the-art in complex ontology matching, we find recent
approaches to be distinguished by three key dimensions: the design, the representation and
discovery of complex correspondences. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], complex correspondences are mainly
considered in terms of design and representational aspects. The author proposes
alignment patterns1 as a solution for recurring mismatches raised during the alignment of
two ontologies.2 According to Scharffe [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], complex matching is a task that has to
be performed by a human user (e.g., a domain expert), who can be supported by
templates for capturing complex correspondences. However, similar patterns can also be
exploited by automated matching approaches, as demonstrated in this paper. The
alignment patterns from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are expressed in terms of EDOAL3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an extension of the
alignment format proposed by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It covers concept and property descriptions, concept
restrictions, property value transformations, comparators for restriction over entities,
and variables for representing ontology entities in patterns. In this paper, we adhere to
this expressive language for capturing our correspondence patterns.
      </p>
      <p>
        Sˇva´b-Zamazal et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] consider complex ontology matching as a use case for
ontology transformation. An ontology is transformed in such a way that it can be more
easily matched with other ontologies. Each transformation is performed by means of a
transformation pattern containing several source and target ontology patterns as well as
an appropriate pattern transformation, which captures the relationships between them.
Each source ontology pattern specifies detection conditions such as structural and
naming conditions. The authors argue that successful non-complex matching applied to the
transformed ontology can be used for finding complex correspondences by tracking
changes back to the original ontology. This approach, however, lacks experimentation.
      </p>
      <p>
        Further work related to the discovery of complex correspondences relies on
machine learning techniques such as Inductive Logic Programming, for example [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This
type of approach takes correspondences with more than two atomic terms into account,
but requires the ontologies to include matchable instances – a prerequisite that is not
fulfilled in many application scenarios. The approach proposed in this paper does not
require the existence of instances. Moreover, we can find related work in the field of
database schema matching. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] the authors describe complex matching as the task of
1 In the ODP taxonomy of patterns the notion of an alignment pattern is used instead of the
notion of a correspondence pattern. In particular, correspondence patterns are considered as
more general, having alignment patterns and reengineering patterns as subcategories. However,
we stick to the terminology introduced in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] where an alignment is a set of correspondences.
2 These patterns are now being included within OntologyDesignPatterns.org (ODP).
3 http://alignapi.gforge.inria.fr/edoal.html
finding corresponding composite attributes (e.g., a name is equivalent with
concatenation of a first-name and a last-name). There are several systems dealing with this kind
of database schema matching (e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>
        According to Euzenat and Shvaiko [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] linguistic approaches to ontology matching
can be distinguished into language-based methods and methods which are based on
linguistic resources, whereas the more general class of terminological approaches also
includes string-based methods. The latter type of approach, i.e., similarity measures on
the lexical layer of ontologies, is part of almost every state-of-the-art matcher. There
is also a large body of work acknowledging the benefits of linguistic resources such
as WordNet when it comes to detecting lexical-semantic relations between concept or
property labels (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for an overview). In addition, the low coverage of WordNet in
certain application domains has motivated the development of methods which leverage
more implicit evidence for those relationships [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and of methods based on
distributional similarities which can be computed from textual information associated with
ontology entities [
        <xref ref-type="bibr" rid="ref10 ref8">10, 8</xref>
        ]. Only very few matchers, however, make use of natural
language processing techniques that go beyond tokenization and lemmatization [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].This
paper highlights the potential that lies within linguistic and in particular language-based
methods for ontology alignment.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Linguistic Analysis</title>
      <p>In order to facilitate the integration of state-of-the-art tools and resources for natural
language processing into the matcher, we developed LiLA4 (Linguistic Label Analysis)
– a Java-based framework for the linguistic analysis of class and property labels which
provides a single uniform interface to the following open-source tools:
JWNL (version 1.4.1)5 is a programing interface for accessing the WordNet dictionary
(version 3.0)6 which contains information about more than 200,000 English words
and their lexical semantic relationships.</p>
      <p>OpenNLP (version 1.3.0)7 is a framework for linguistic analysis including, for
instance, components for determining the lexical categories of words (e.g., adjective).
MorphAdorner (version 1.0)8 is a text processing framework which amongst other
components provides means for morphological analysis and generation, i.e.,
inflection of words.</p>
      <p>LexParser (version 1.6.3)9 also known as the Stanford Parser is a syntactic parser
which can be used to determine the grammatical structure of phrases (e.g., noun
phrases such as “accepted paper”) or sentences.
4 http://code.google.com/p/lila-project/
5 http://sourceforge.net/projects/jwordnet/
6 http://wordnet.princeton.edu
7 http://opennlp.sourceforge.net
8 http://morphadorner.northwestern.edu
9 http://nlp.stanford.edu/software/lex-parser.shtml</p>
      <p>In addition, LiLA features a simple word sense disambiguation component and a
spell checker. The remainder of this section illustrates the core functionalities of LiLA
by virtue of a noun phrase, which serves as a running example.</p>
      <p>paper written by clever students</p>
      <p>Part-of-Speech Tagging. Each word in natural language belongs to a syntactic
category (or part-of-speech), which defines its basic syntactic behavior. Accordingly, a
part-of-speech tagger is a linguistic processing component for assigning appropriate
syntactic categories to a given set of words. While in principle, each POS tagger may
use its own set of category labels (tags), tag sets such as the Penn Treebank Tag Set10
for English are widely used, and certain naming conventions have emerged as
quasistandards. Here, NN and NNS denote common nouns (singular or plural, respectively),
IN stands for a preposition, JJ indicates an adjective and VBN is the tag for a past
participle verb.</p>
      <p>paper [NN] written [VBN] by [IN] clever [JJ] students [NNS]</p>
      <p>Morphological Analysis. The field of morphology is concerned with the internal
structure of words, more precisely the morphological rules for inflection and
wordformation that enable humans to build a rich vocabulary from a basic inventory of
morphemes – the smallest units in natural language that carry meaning. Each word consists
of one or more morphemes. Words that are built from more than one morpheme can
be split into a stem and an affix, i.e., a morph attached to a stem like “student”+“s”, for
example. In this case, the plural “s” is an inflectional morpheme, which alters the base
form (also called lexeme) without changing its syntactic category. A component which
reduces each word to its lemma (i.e., the canonical form of a lexeme which is typically
included in the lexicon of a language) is called a lemmatizer.</p>
      <p>LiLA relies upon the MorphAdorner framework for performing both lemmatization
and morphological synthesis, i.e., the generation of specific word forms (e.g.,
“students”) from lexemes (e.g., “student”). This also works well for irregular verbs such as
“write” for which we are able to generate, for instance, the past participle (“written”)
by means of conjugation. This way LiLA can convert between singular and plural of
the same noun (declination), as well as between active and passive voice or different
tenses of a given verb. The functionality to obtain derivations such as nominalizations
of verbs (e.g., “accept” and “acceptance”), for example, is provided by JWPL.</p>
      <p>Lexical Semantic Analysis. Lexical semantics is the branch of linguistics that
studies the meaning of words and their relationships. The popular lexical database of
WordNet, which covers a wide range of such lexical semantic relations, is queried by LiLA
through the API of JWPL. Thus, given a word such as “clever” or “student”, LiLA
can get access to detailed information about the possible meanings of the word (see
homonymy), as well as about its synonyms, hyponyms, antonyms and otherwise related
senses (e.g., meronyms).</p>
      <p>Synonymy, at least true synonymy, is rarely found in natural language. However, there
are many so-called near-synonyms (or plesionyms), i.e., words that share a common
10 http://www.cis.upenn.edu/ treebank/
meaning in a given context. Hence, two words are considered synonyms (or
nearsynonyms) if they can be exchanged for one another in a sentence without altering
its truth conditions (e.g., “student” and “scholar”).</p>
      <p>Homonymy and polysemy are types of semantic ambiguity. Two words are
considered homonymous if they are spelled (homograph) and pronounced (homophone) in
the same way, while having distinct meanings (or senses). Homonyms with related
meanings, are called (regular) polysemes (e.g., “paper” as a substance or a writing
sheet made thereof). In case a query posed to the WordNet API returns multiple
senses for a given concept or property label, LiLA’s word sense disambiguation
component selects the most likely sense based on a vector-based representation of
the current lexical context.11
Hyponymy is a kind of subordination relating one lexical unit to another one with a
more general sense. The former is then called a hyponym, whereas the latter, i.e.,
the superordinate, represents the hypernym. Like meronymy, hyponymy is only
transitive within one and the same category (e.g., functional). A verb which is more
specific than another verb is sometimes called troponym (e.g., “write” and
“create”).</p>
      <p>Antonymy is a kind of oppositeness that mostly holds between adjectives, but also
some verbs and even nouns can be considered antonyms if they exhibit opposite
semantic traits. One can distinguish between different types of antonyms such as
gradable (“early” and “late”), complementary (“acceptable” and “unacceptable”)
and relational (“student” and “professor”) antonyms.</p>
      <p>Syntactic Parsing in computational linguistics typically refers to the analysis of
syntactic structures. Each parsing algorithm relies upon a certain grammar, that is a
formalism developed to describe the syntactically well-formed structures in a given
language. Essentially, two types of grammars – dependency grammars and phrase
structure grammars – have emerged as the most wide-spread means to analyze and generate
syntactically well-formed utterances. The phrase structure depicted further below (left
column) has been generated by the Stanford Parser.12 NP and VP are phrasal categories
denoting a noun phrase or verb phrase, respectively.</p>
      <p>(S
(NP (NN paper))
(VP (VBN written)
(PP (IN by)
(NP (JJ clever) (NNS students)))))
nsubj(written-2, paper-1)
pobj(by-3, students-5)
amod(students-5, clever-4)
prep(written-2, by-3)</p>
      <p>Given such a syntactic analysis and an appropriate set of rules for the English
language, we can determine (e.g., by means of OpenNLP or the Stanford Parser) the head
of a phrase.13 In this case the head, i.e., the word which determines the category of a
phrase and carries the most essential semantic information, is the noun “paper”. Note
that the widely used heuristic of considering the right-most word as the head a phrase
11 For the experiments reported in this paper, we initialized this vector by adding all of the entity
labels found in the respective ontology.
12 The phrasal category of the top-most node in the syntax tree should be NP (noun phrase) rather
than S, which denotes a sentence.
13 We omit the distinction between syntactic and semantic heads.
(righthand head rule) only works well for morphological units such as compounds (e.g.,
“student paper”).</p>
      <p>In addition, the Stanford Parser provides us with a list of syntactic dependencies
between the individual words of the phrase (right column). For example, it identifies
“paper” as the subject of “written” and “clever” as an adjective modifier of “students”.</p>
      <p>In the following, we will explain how a linguistic analysis along the dimensions
outlined in this section can improve the results of a complex matching approach.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Matching Conditions</title>
      <p>In the following we show how to use the linguistic analysis combined with a set of
simple structural techniques to detect complex correspondences. In particular, we
specify four correspondence patterns and define for each of them a set of matching
conditions. If each of these conditions is fulfilled we generate a correspondence as
instance of this pattern.</p>
      <p>Class by Attribute Type (CAT) A correspondence A#1 9R#2 :B#2 of the CAT
pattern is generated by our algorithm, if the following conditions hold.
1. The label of B#2 is the nominalization of the modifier of the label of A#1 .
2. The class B#2 is subclass of the range of R#2 .
3. One of the following two conditions holds:
(a) The class A#1 is a subclass of the domain of R#2 due to the anchor alignment.
(b) The label of A#1 is a hyponym of the label of the domain of R#2 .
A Accepted_Paper
modifier</p>
      <p>accepted
nominalization
range</p>
      <p>Decision
subclass of</p>
      <sec id="sec-5-1">
        <title>Acceptance B</title>
        <p>#1
#2</p>
        <p>A typical example is Accepted Paper #1 9hasDecision#2 :Acceptance#2 . In
Figure 2 we depict the three matching conditions relevant for this example: 1) The
linguistic analysis reveals that “Acceptance” is the nominalization of the active form of
“Accepted”, which is in turn the modifier of “AcceptedPaper”. A morphological
analysis indicates that the first condition is fulfilled. 2) We use a reasoner to check whether
Acceptance#2 is a subclass of the range of hasDecision#2 , and find that the second
condition is fulfilled, too. 3) The third condition is a disjunction. In this concrete case
the anchor alignment contains correspondence Paper #1 Paper #2 , which allows
us to conclude that the third condition is fulfilled. The third condition is a disjunction,
because on the one hand it might happen that the anchor alignment does not contain the
required correspondence, but the linguistic analysis detects the lexical-semantic relation
of hyponomy. On the other hand the linguistic analysis might fail, but the information
encoded in the anchor alignment might be sufficient. We defined similar disjunctions
for some of the other patterns.</p>
        <p>In our previous approach we computed e.g., the edit-distance between “Acceptance”
and “Accepted” to detect a relation between AcceptedPaper #1 and Acceptance#2 . In
case it exceeded a certain threshold the counterpart of the first condition was fulfilled.
Similarity-based conditions are now replaced by conditions based on linguistic analysis.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Class by Inverse Attribute Type (CAT 1) A correspondence A#1</title>
      <p>CAT 1 type is generated if the following conditions hold.
9R#21 :&gt; of the
1. The label of A#1 is the nominalization of the active form of the label of R#2 .
2. There exists a class B#2 which is a proper subclass of the range of R#2 .
3. One of the following two conditions holds:
(a) A#1 is, due to the anchor alignment, a subclass of B#2 .</p>
      <p>(b) The label of A#1 is a hyponym of the label of B#2 .</p>
      <p>This pattern and the conditions to detect it are similar to the CAT pattern and its
conditions. Due to the lack of space we omit a detailed description.</p>
      <p>Class by Attribute Value (CAV) Here, we restrain ourselves to detect the boolean
variant of the general CAV pattern whereby the the attribute values are true and false. Let
in the following adjm(X) refer to the adjective modifier of the phrase X, let advm(X)
be the adverbial modifier in X and let vp(X) refer to a verb phrase contained in X. A
correspondence A#1 9R#2 :ffalseg is generated by our algorithm, if the following
conditions hold.
1. The range of the datatype property R#2 is Boolean.
2. One of the following two conditions holds:
(a) The class A#1 is a subclass of the domain of R#2 due to the anchor alignment.
(b) The label of A#1 is a hyponym of the label of the domain R#2 .
3. advm(label(R#2 )) is the antonym of advm(adjm(label(A#1 ))).
4. The head of label(R#2 ) is the nominalization of vp(adjm(label(A#1 ))).
verb phrase
modifier</p>
      <p>late
modifier</p>
      <p>early
head</p>
      <p>registered
antonym</p>
      <p>nominal.
registration
correspondence
Late-Registered_</p>
      <p>Participant</p>
      <p>modifier
subclass of/hyponym
Participant
{true, false}
domain
range
late-registered
earlyRegistration</p>
      <p>Regarding this pattern we use the linguistic analysis to detect antonyms. We
expect that modifiers, which are antonyms, will be used to describe a pair of disjoint
classes. Complex expressions as 9R#2 :ftrueg and 9R#2 :ffalseg, given that R#2 is a
functional property, refer also – for logical reasons – to a pair of disjoint classes.
Inverse Property (IP) A correspondence R#11
following conditions hold.</p>
      <p>P#2 of type IP is generated, if all
1. The verb phrase of the label of R#1 is the active voice of the verb phrase of the
label of P#2 .
2. One of the following two conditions holds:
(a) The domain of R#1 is a subclass of the range of P#2 .</p>
      <p>(b) The label of the domain of R#1 is a hyponym of the label of the range of P#2 .
3. One of the following two conditions holds:
(a) The range of R#1 is a subclass of the domain of P#2 .
(b) The label of the range of R#1 is a hyponym of the label of the domain of P#2 .</p>
      <p>The IP pattern is the simplest pattern regarding its set of conditions. The first
condition is based on the fact that two properties are inverse properties with higher
probability, if both contain the same verb phrase in a different voice (active or passive voice).
The two other structural conditions ensure that there is a subsumption (possibly
equivalence) relation between domain and range of both properties.</p>
      <p>It is surprising that it is sometimes harder to detect a simple property equivalence or
subsumption than an instance of the IP pattern. An example found in our experiments
is the following one. In one ontology we have the property writtenBy #1 and its inverse
authorOf #1 , while in the other ontology we have a property write paper #2 .
Regarding these properties there are two correct correspondences, namely authorOf #1
write paper #2 and writtenBy #11 write paper #2 . While the first one is hard to
detect, the second one fulfills all of the conditions listed above and is thus detected by our
approach. It is now possible to derive the first correspondence from the second one.
6</p>
    </sec>
    <sec id="sec-7">
      <title>Experiments</title>
      <p>
        For our experiments we used the dataset of the OAEI conference track [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This dataset
consists of several, relatively expressive ontologies that describe the domain of
organizing conferences from different perspectives. Regarding this dataset, we made
experiments on the full set of ontologies.14 However, since we want to compare our approach
to its predecessor, we also present results restricted to a subset of 9 ontologies that we
have used in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To decide whether the structural conditions hold, we used the Pellet
reasoner [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For checking the linguistic conditions we rely on LiLA. Structural
conditions that compare entities from different ontologies require an anchor alignment of
non-complex correspondences. We generated this alignment by computing a similarity
measure based on the Levensthein distance thresholding the results at a high value.
14 Since we could not process the ontologies LINKLINGS, COCUS, CONFIOUS with the Pellet
reasoner we have excluded them from our experiments.
      </p>
      <p>Dataset Approach
subset
subset
full set</p>
      <p>Precision</p>
      <p>In Table 1, the approach described in this paper is referred to as linguistic approach,
its predecessor is called similarity approach. For each of the four patterns we show
the number of true and false positives, as well as the precision of the approach. We
can see that the use of linguistic methods helped us to increase precision of the overall
approach by a large degree from 45% to 94%, while the number of correctly detected
correspondences stays nearly stable. Notice that based on the similarity approach it was
not possible to define matching conditions for the IP pattern. If we include the IP pattern
in our analysis, we come up with the conclusion that we signficantly increased recall.</p>
      <p>
        With the similarity approach we could additionally define conditions for a pattern
called property chain (not depicted here). We omitted this pattern here, as we thought
that the rationales underlying this pattern are hard to justify and that the chosen
conditions were partially overfitting to certain aspects of the dataset. Note that the values
given for the similarity approach are based on the optimal threshold: raising or lowering
the threshold results in a clear loss of recall or precision, while the additional gain is
rather limited as shown in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The linguistic methods are different in that there is no
threshold whose value would be crucial for the overall performance of the approach.
      </p>
      <p>Our previous approach has been criticized for the requirement of a correct and
complete input alignment. However, the new results indicate that such an input alignment is
not necessary. It is sufficient to generate an incomplete and partially incorrect alignment
in a prior step and to use it as an anchor in the subsequent matching process. Thus, the
approach is robust against the noise introduced by such an imperfect anchor alignment.
7</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>
        In this paper we have described how to integrate linguistic techniques into a
patternbased approach for detecting complex correspondences. In particular, we have
presented correspondence patterns and defined for each of them a set of matching
conditions. While in a previous approach [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] many of these conditions were based on
computing a simple string-based similarity value, we argued now that it is more
appropriate to substitute these conditions by a set of conditions that make use of a linguistic
analysis. In our experiments we showed that the new approach yields a significantly
higher precision. The tool used to conduct the experiments is open source and available
online.15 Due to its modular structure, matching conditions for new correspondence
pattern can easily be specified in a generic XML syntax.
      </p>
      <p>Acknowledgments Johanna Vo¨lker is financed by a Margarete-von-Wrangell
scholarship of the European Social Fund (ESF) and the Ministry of Science, Research and
the Arts Baden-Wu¨rttemberg. Ondrˇej Sˇ va´b-Zamazal is partly supported by grant no.
P202/10/1825 of the Grant Agency of the Czech Republic.
15 http://code.google.com/p/generatingcomplexalignments/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Robin</given-names>
            <surname>Dhamankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yoonkyong</given-names>
            <surname>Lee</surname>
          </string-name>
          , Anhai Doan, Alon Halevy, and
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>iMAP: discovering complex semantic matches between database schemas</article-title>
          .
          <source>In Proceedings of the ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>AnHai</given-names>
            <surname>Doan and Alon</surname>
          </string-name>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <article-title>Semantic-integration research in the database community</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>26</volume>
          :
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Je´roˆme Euzenat.
          <article-title>An API for ontology alignment</article-title>
          .
          <source>In Proceedings of the 3rd International Semantic Web Conference (ISWC)</source>
          , pages
          <fpage>698</fpage>
          -
          <lpage>712</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Je´roˆme Euzenat, Alfio Ferrara, Laura Hollink, Antoine Isaac, Cliff Joslyn, Ve´ronique Malaise´,
          <string-name>
            <surname>Christian</surname>
            <given-names>Meilicke</given-names>
          </string-name>
          , Andriy Nikolov, Juan Pane, Marta Sabou, Franc¸ois Scharffe, Pavel Shvaiko, Vassilis Spiliopoulos, Heiner Stuckenschmidt,
          <article-title>Ondrˇej Sˇ va´b-</article-title>
          <string-name>
            <surname>Zamazal</surname>
          </string-name>
          , Vojteˇch Sva´tek, Ca´ssia Trojahn dos Santos,
          <article-title>George A</article-title>
          .
          <string-name>
            <surname>Vouros</surname>
            , and
            <given-names>Shenghui</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Results of the ontology alignment evaluation initiative 2009</article-title>
          .
          <source>In Proceedings of the 4th International Workshop on Ontology Matching (OM-2009)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Je´roˆme Euzenat, Franc¸ois Scharffe, and
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          .
          <article-title>Expressive alignment language and implementation</article-title>
          .
          <source>Deliverable 2.2</source>
          .10,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>web</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Je´roˆme Euzenat and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          .
          <source>Ontology Matching</source>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Feiyu</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Sandkuhl</surname>
          </string-name>
          .
          <article-title>A survey of exploiting WordNet in ontology matching</article-title>
          . In Max Bramer, editor,
          <source>IFIP AI</source>
          , volume
          <volume>276</volume>
          <source>of IFIP</source>
          , pages
          <fpage>341</fpage>
          -
          <lpage>350</lpage>
          . Springer,
          <year>September 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Pirro</surname>
          </string-name>
          ` and
          <string-name>
            <given-names>Domenico</given-names>
            <surname>Talia</surname>
          </string-name>
          .
          <article-title>LOM: a linguistic ontology matcher based on information retrieval</article-title>
          .
          <source>Journal on Information Science</source>
          ,
          <volume>34</volume>
          (
          <issue>6</issue>
          ):
          <fpage>845</fpage>
          -
          <lpage>860</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Han</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dejing</given-names>
            <surname>Dou</surname>
          </string-name>
          , and
          <string-name>
            <surname>Paea LePendu. Discovering Executable Semantic Mappings Between Ontologies</surname>
          </string-name>
          .
          <article-title>On the Move to Meaningful Internet Systems 2007: CoopIS, DOA</article-title>
          , ODBASE, GADA, and IS, pages
          <fpage>832</fpage>
          -
          <lpage>849</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Yuzhong</surname>
            <given-names>Qu</given-names>
          </string-name>
          , Wei Hu, and Gong Cheng.
          <article-title>Constructing virtual documents for ontology matching</article-title>
          .
          <source>In Proceedings of the 15th International Conference on World Wide Web (WWW)</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>31</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dominique</surname>
            <given-names>Ritze</given-names>
          </string-name>
          , Christian Meilicke, Ondrˇej Sˇva´
          <article-title>b-</article-title>
          <string-name>
            <surname>Zamazal</surname>
            , and
            <given-names>Heiner</given-names>
          </string-name>
          <string-name>
            <surname>Stuckenschmidt</surname>
          </string-name>
          .
          <article-title>A pattern-based ontology matching approach for detecting complex correspondences</article-title>
          .
          <source>In Proceedings of the ISWC workshop on ontology matching</source>
          , Washington DC, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Franc</surname>
          </string-name>
          <article-title>¸ois Scharffe. Correspondence Patterns Representation</article-title>
          .
          <source>PhD thesis</source>
          , University of Innsbruck,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Evren</surname>
            <given-names>Sirin</given-names>
          </string-name>
          , Bijan Parsia, Bernardo Cuenca Grau, Aditya Kalyanpur, and
          <string-name>
            <given-names>Yarden</given-names>
            <surname>Katz</surname>
          </string-name>
          .
          <article-title>Pellet: a practical OWL-DL reasoner,</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>5</volume>
          :
          <fpage>51</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ondrˇej Sˇva´</surname>
          </string-name>
          b-Zamazal,
          <article-title>Vojteˇch Sva´tek, and Luigi Iannone</article-title>
          .
          <article-title>Pattern-based ontology transformation service exploiting OPPL and OWL-API</article-title>
          .
          <article-title>In Knowledge Engineering and Knowledge Management by the Masses</article-title>
          . EKAW-
          <year>2010</year>
          .,
          <year>2010</year>
          . Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Willem Robert van Hage,
          <string-name>
            <surname>Sophia Katrenko</surname>
            , and
            <given-names>Guus</given-names>
          </string-name>
          <string-name>
            <surname>Schreiber</surname>
          </string-name>
          .
          <article-title>A method to combine linguistic ontology-mapping techniques</article-title>
          .
          <source>In Yolanda Gil</source>
          ,
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Motta</surname>
          </string-name>
          , V. Richard Benjamins, and Mark A. Musen, editors,
          <source>International Semantic Web Conference (ISWC)</source>
          , volume
          <volume>3729</volume>
          <source>of LNCS</source>
          , pages
          <fpage>732</fpage>
          -
          <lpage>744</lpage>
          . Springer, November
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Ondrˇej Sˇva´b, Vojteˇch Sva´tek, Petr Berka, Dusˇan Rak, and Petr Toma´sˇek. Ontofarm:
          <article-title>Towards an experimental collection of parallel ontologies</article-title>
          .
          <source>In Poster Track of the International Semantic Web Conference (ISWC)</source>
          , Galway, Ireland,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pinar</surname>
            <given-names>Wennerberg</given-names>
          </string-name>
          , Manuel Mo¨ller, and
          <string-name>
            <given-names>Sonja</given-names>
            <surname>Zillner</surname>
          </string-name>
          .
          <article-title>A linguistic approach to aligning representations of human anatomy and radiology</article-title>
          .
          <source>In Proceedings of the International Conference on Biomedical Ontologies (ICBO)</source>
          ,
          <year>7 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>