<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A New Paradigm for Alignment Extraction</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Group Data and Web Science University of Mannheim</institution>
          ,
          <addr-line>68163 Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology matching techniques that are based on the analysis of names usually create first a set of matching hypotheses annotated with similarity weights followed by the extraction or selection of a set of correspondences. We propose to model this last step as an optimization problem. Our proposal differs fundamentally from other approaches since both logical and linguistic entities appear as first class citizens in the optimization problem. The extraction step will not only result in a set of correspondences but will also entail assumptions related to the meaning of the tokens that appeared in the involved labels. We discuss examples that illustrate the benefits of our approach and present a Markov Logic formalization. We conduct an experimental evaluation and present first results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Ontology Matching has become a vivid field of research over the last decade. Hundreds
of papers propose and discuss ontology matching techniques, introduce improvements,
or present complete matching systems. Especially the system papers illustrate a general
paradigm common to probably all systems using name-based alignment methods. This
paradigm is the understanding of ontology matching as a sequential process that starts
with analyzing different types of evidence, in most cases with a focus on the involved
labels, and generates as an intermediate result a set of weighted matching hypotheses.
From the intermediate result a subset of the generated hypotheses is chosen as final
output. The first phase is typically dominated by the computation, aggregation,
propagation, and any other method for refining similarity scores. The techniques applied in
the second phase range from thresholds to the selection of coherent subsets [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ] that
might be optimal with respect to an objective function. Most approaches model the
intermediate result as a set of correspondences annotated with confidence scores. These
confidence scores are aggregated values derived from an analysis of the tokens that
appear in the labels of the ontological entities. With the help of several examples we argue
that the extraction problem should be modeled differently such that both tokens and
logical entities (classes and properties) appear as first class citizens. Otherwise it will
not be possible to exploit that the acceptance or rejection of a correspondence follows
from the assumption that two tokens have (or do not have) the same meaning. However,
any reasonable extraction should be consistent with its underlying assumptions. This
can only be ensured if the assumptions themselves can be modeled explicitly.
      </p>
      <p>
        We presented a first sketch of this approach in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Now we extend and concretize
the approach including a first implementation. We present foundations in Section 2. In
Section 3 we discuss two scenarios where a classic approach makes a selection
decision in an non-reasonable way. In Section 4 we present our approach and explain how
to deal with the issues mentioned before. Experimental results of a first prototypical
implementation are presented in Section 5 before concluding in Section 6.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Foundations</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Nomenclature</title>
        <p>We introduce some technical terms (Section 2.1), describe state of the art methods for
extracting an alignment (Section 2.2), and take a closer look at one them (Section 2.3).
Let O1 and O2 be ontologies that have to be matched. A correspondence is a quadruple
he1; e2; r; ci where e and e0 are entities defined in O1 and O2. r is a semantic relation
between e1 and e2. Within this paper the semantic relation will always be equivalence
and e1 and e2 will always be classes or (data or object) properties. The numerical value
c is referred to as confidence value. The higher the value, the higher is the probability
that r(e1; e2) holds. The confidence value is an optional element and will sometimes
be omitted. The outcome of a matching system is a set of correspondences between O1
and O2. Such a set is called an alignment A between O1 and O2.</p>
        <p>In the following we distinguish between linguistic entities (labels and tokens) and
ontological entities (classes and properties) using the following naming convention.
n#ClassOrProperty - Refers to a class or property in On (with n 2 1; 2).
n:Label - Refers to a label used in On as a class or property description.
n:Tokent - Refers to a token that appears as a part of a label in On.</p>
        <p>We will later, e.g., treat 1#AcceptedPaper and 1:AcceptedPaper as two
different entities. The first entity appears in logical axioms and the second might be a
description of the first entity. The label consists of the tokens 1:Acceptedt and 1:Papert.
We need three types of entities (logical entities, labels, tokens) because a logical entity
can be described by several labels and a label can be decomposed in several tokens.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Alignment Extraction</title>
        <p>The easiest way for selecting a final alignment A from a set of matching hypotheses H
is the application of a threshold. However, a threshold does not take into account any
dependencies between correspondences in H. Thus, it might happen that an entity 1#e
is mapped on 2#e’ and 2#e’’ even though 2#e’ and 2#e’’ are located in different
branches of the concept hierarchy.</p>
        <p>
          This can be solved easily. We first sort H by confidence scores. Starting with an
empty alignment A, we iterate over H and add each he1; e2; =; ci 2 H to A if A does
not yet contain a correspondence that links one of e1 or e2 to some other entity. This
ensures that A is finally a one-to-one alignment. Similar algorithms can be applied
to ensure that certain anti-pattern (e.g., Asmov [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) are avoided when adding
correspondences to A. It is also possible to use reasoning to guarantee the coherence of the
generated alignment (e.g., Logmap [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]). Checking a set of patterns is then replaced by
calling a reasoning engine.
        </p>
        <p>
          Such an approach needs to decide upon the order in which correspondences are
iterated over because different orders can lead to different results. Global methods try to
overcome this problem. Similarity flooding [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], for example, is based on the
following assumption: The similarity between two entities linked by a correspondence in H
must depend on the similarity of their adjacent nodes for which an initial similarity is
specified in H. The algorithm does not select a subset of H as final outcome but
generates a refined similarity distribution over H. Other global methods explicitly define an
optimization problem in which a subset from H needs to be chosen that maximizes an
objective function. This is detailed in the following section.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Global Optimization with Markov Logic</title>
        <p>
          In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] Markov Logic has been proposed to solve the alignment extraction
problem. The authors have argued that the solution to a given matching problem can
be obtained by solving the maximum a-posteriori (MAP) problem of a ground Markov
logic network. In such a formalization the MAP state, which is the solution of an
optimization problem, corresponds to the most probable subset A of H. In the following
we explain the basic idea of the approach proposed in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Due to the lack of space we
omit a theoretical introduction to Markov Logic and refer the reader to [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] the authors have defined, due to the fact that Markov Logic is a log linear
probabilistic model, the objective function as the confidence total of A H.
Without any further constraints and given that all confidences are positive it follows that
A = H. However, some of the constraints that have been mentioned above can easily
be encoded as first-order formulae in Markov Logic. We can postulate that a pair of
correspondences violating the 1:1 constraint is not allowed in the final solution. This
can be expressed as follows.
        </p>
        <p>map(e1; e2) ^ map(e01; e02) ^ e1 = e01 ! e2 = e02</p>
        <p>Similarly, coherence constraints can be added to avoid certain patterns of incoherent
mappings. An example is the constraint that the classes e1 and e01 where e01 is a subclass
of e1 cannot be mapped on e2 and e02 where e2 and e02 are disjoint:</p>
        <p>sub(e1; e01) ^ dis(e2; e02) ! :(map(e1; e2) ^ map(e01; e02))</p>
        <p>
          Due to the lack of space, we cannot specify all constraints of the complete
formalization. Additional constraints are required to take into account that properties can
also be involved in logical inconsistencies (see [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]). Moreover, there are some soft
constraints that reward homomorphism introduced by the selected correspondences.
        </p>
        <p>Given such a formalization, a reasoning engine for Markov Logic can be used to
compute the MAP state which corresponds to the most probable consistent mapping. In
our terminology we call this mapping a global optimal solution. Note that the entities
that appear in such a formalization are logical entities (classes and properties) only,
while labels or token are completely ignored. The have only been used to compute
weights for the matching hypotheses, which are the weights attached to the map-atoms.
In Section 3.1 and 3.2 we analyze examples that illustrate problems of the classical
approaches described in the previous section. In Section 3.3 we discuss the possibility
to cope with these problems without introducing a new modeling style.
3.1</p>
      </sec>
      <sec id="sec-2-4">
        <title>Multiple Token Occurrences</title>
        <p>For most matching problems some of the tokens used in the labels will appear in more
than one label. This is in particular the case for compound labels that can be
decomposed into modifier and head noun. Figure 1 shows a typical example.</p>
        <p>O2
2#Fee</p>
        <p>Let us first discuss a simplified version of the example where we ignore the branch in
O2 rooted at the 2#Fee class. Note that a matching problem very similar to the
simplified example can be found in the OAEI conference dataset (testcase conference-ekaw).
For this small excerpt there are four correspondences (solid arrows) in the reference
alignment. Probably, most systems would generate h1#Document; 2#Document; =i
due to the usage of the same label. The same does not hold for the other three
correspondences. For two of them the labels can be decomposed into modifier and headnoun.
For all of these correspondences it is crucial to answer the question whether the words
1:Contribution and 2:Paper have the same meaning. How would a standard
approach deal with this example? In such an approach a similarity metric would be used
to compute a similarity for all relevant pairs of words. This would probably also result
in a (numerical) similarity for the pair h1:Contribution; 2:Paperi, for example
sim(1:Contribution; 2:Paper) = 0:3. This similarity would then be aggregated
into a score that might result into a set of weighted hypotheses H.</p>
        <p>c1 = h1#Document; 2#Document; =; 1:0i
c2 = h1#Contribution; 2#Paper; = 0:3i
c3 = h1#ReviewedContribution; 2#ReviewedPaper; = 0:65i
c4 = h1#AcceptedContribution; 2#AcceptedPaper; = 0:65i
At this stage we have lost the dependency between our final decision and the question
whether or not the words 1:Contribution and 2:Paper have the same meaning.
Without being aware of this dependency it might happen that c1, c3, c4 and not c2 are
selected. This would, obviously, be an inconsistent decision, because the selection of c3
and c4 should always result in the selection of c2.</p>
        <p>One might criticize that we are making (invalid) assumptions. Above we used the
average for aggregating confidences. One might also use, for example, the minimum.
This results in the same confidences for c2, c3 and c4. Nevertheless, the distance
between 1:Contribution = 2:Paper is taken into account not once but several
times. Thus, the decision related to c2 will not be affected by the possibility of
generating c3 and c4, while a human expert would take c3 and c4 into account.</p>
        <p>Let us now analyze the extended example where we have the additional branch
that deals with fees and (monetary) contributions. Now we have another (incorrect)
matching candidate.</p>
        <p>c5 = h1#AcceptedContribution; 2#AcceptedContribution; =; 1:0i
Obviously, c5 is in a 1:1 conflict with c4. A consistent 1:1 mapping might thus consist
of c1; c2; c3 and c4 or (exclusive!) c5. However, taking the involved tokens and their
possible meanings into account, we should not generate an alignment that contains c2
and c5 at the same time. Such an alignment will only be correct, if the tokens in O1 are
used in an inconsistent way.</p>
        <p>The classical approach cannot handle such cases in the appropriate way. As long as
the tokens themselves are not explicitly modeled as entities in the extraction phase,
unreasonable and inconsistent decisions, inconsistent with respect to assumptions related
to the use of words, are made.</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.2 Ignoring Modifiers</title>
        <p>We illustrate another pattern by an example taken from the OAEI conference dataset,
namely the confof-ekaw testcase. The reference alignment for this testcase contains 20
correspondences, here we are interested in the following three correspondences.</p>
        <p>h1#Banquet; 2#ConferenceBanquet; =i
h1#Participant; 2#ConferenceParticipant; =i</p>
        <p>h1#Trip; 2#ConferenceTrip; =i
The developer of O2 was more verbose than the developer of O1. In O2 some of the
labels have been extended by adding the prefix modifier 2:Conference. This modifier
has been omitted in O1 because each of the participants, trips and banquets is implicitly
always associated to a conference. We are not interested in pros and cons of both styles.
Both exist and a matching system should be able to cope with them.</p>
        <p>Let us again think how we, as reasonable agents, would deal with this issue. After
studying the O1 ontology, we would come to the decision, that it might make sense
to ignore the token 1:Conferencet whenever it appears as modifier. Maybe we
would first try to match both ontologies without ignoring the modifier, then we would
match both ontologies while ignoring 1:Conferencet when it appears as modifier.
In both cases we ensure the coherency of the generated alignment. For our example
the outcome would be that the second approach allows to generate three additional
correspondences that do not introduce any logical conflicts. Thus, ignoring the modifier
1:Conference seems to be a good choice.</p>
        <p>Again, we can see that a first class citizen in such considerations are linguistic
entities. We make certain decisions about the role of tokens and their implications result in
the acceptance of correspondences, while logical constraints that deal with ontological
entities have also an impact on our interpretation of tokens.
3.3</p>
      </sec>
      <sec id="sec-2-6">
        <title>Work Around</title>
        <p>
          In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] the authors have proposed a measure called extended Tversky similarity that
copes with the situation described in Section 3.2. Their idea is to weigh each token by its
information content. A token like 2:Conference that appears very often has a very
low weight. It follows that a relatively high confidence score is assigned to a
correspondence like h1#Banquet; 2#ConferenceBanquet; =i because 2:Conference
has only a limited discriminative power. Note that this approach is still based on the
principle to assign confidences to correspondences. Once this assignment has been
made, the tokens that have been involved are no longer taken into account.
        </p>
        <p>
          This technique has been implemented in the YAM++ matcher. This matcher achieved
very good results the OAEI 2012 campaign [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] (see also the results table in Section 5).
However, not the number of token-occurrences is important, but the maximal
number of additional coherent correspondences that would result from ignoring a
modifier. While these numbers are often correlated, this is not necessarily the case. Suppose
that we have an ontology that contains the class 1#PaperAuthor and the property
1#paperTitle, as well as some other labels that contain the token 1:papert. Let
the other ontology contain a class 2#Author (including authors of reviews) and a
property 2#title (to describe the title of a conference). In O1 we have a relatively
high number of 1:papert-token occurrences, however, the word 1:papert is in most
cases a feature that needs to be taken into account. This can be derived from the fact
that h1#PaperAuthor; 2#Author; =i and h1#paperTitle; 2#title; =i
cannot be added without introducing logical conflicts given a meaningful axiomatization
in O1 and O2. In our approach we will be able to take such cases into account.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>We first present our approach and it formalization in Section 4.1 followed by an analysis
of its impact in Section 4.2 where we revisit the examples of the previous section.
4.1</p>
      <sec id="sec-3-1">
        <title>Formalization</title>
        <p>In the following we distinguish explicitly between entities from two different layers.
The first layer is the layer of labels and tokens; the entities that appear in the second
layer are classes and properties. In our approach we treat entities from both layers as first
class citizens of an optimization problem. Thus, we can define the objective function
of our optimization problem on top of token similarities (first layer) instead of using
confidence values attached to correspondences (second layer).</p>
        <p>Hidden predicates
map(e1; e2)
equivt(t1; t2)
equivl(l1; l2)
ignore(t)
Logical predicates
sub(e1; e2)
dis(e1; e2)
dom(e1; e2)
ran(e1; e2)
Linguistic predicates
pos1(l; t)
pos2(l; t)
pos3(l; t)
has1T oken(l)
has2T oken(l)
has3T oken(l)
hasLabel(e; l)
e1 is mapped on e2, i.e. he1; e2; =i 2 A
t1 and t2 have the same meaning
l1 and l2 have the same meaning
token t can be ignored if it appears as a modifier
class/property e1 is subsumed by class/property e2
e1 and e2 are disjoint classes
class e1 is the domain of property e2
class e1 is the range of property e2
label l has token t at first position
label l has token t at second position
label l has token t at third position
label l is composed of one token
label l is composed of two tokens
label l is composed of three tokens
entity e is described by label l</p>
        <p>We extend the approach described in Section 2.3, i.e., we use Markov Logic and
most of the constraints presented above. However, we also need a rich set of (new)
predicates listed in Table 1 to support our modeling style. The first four predicates in
the listing are hidden predicates. This means that we do not know in advance if the
ground atoms for these predicates are true or wrong. We attach a weight in the range
[ 1:0; 0:0] to the atoms instantiating the equivt predicate, if we have some evidence
that the respective tokens have a similar meaning. We explicitly negate the atom if there
is no such evidence. As a result we have a fragment as input that might look like this.</p>
        <p>equivt(1:Acceptedt; 2:Acceptedt) ; 0:0
equivt(1:Organizationt; 2:Organisationt) ; 0:084
equivt(1:Papert; 2:Contributiont) ; 0:9
:equivt(1:Acceptedt; 2:Rejectedt) unweighted
We do not add any (weighted or unweighted) groundings of the map, equivl, and
ignore predicates to the input. Our solution will finally consist of a set of atoms that are
groundings of the four hidden predicates. While we are mainly interested in the
mapatoms (each atom refers to a correspondence), the groundings of the other predicates
can be seen as additional explanations for the finally generated alignment. These atoms
inform us which tokens and labels are assumed to be equivalent and which tokens have
been ignored.</p>
        <p>The other predicates in the table are used to describe observations relevant for the
matching problem. We describe the relations between tokens and labels and the relation
between labels and logical entities.</p>
        <p>pos1(1:AcceptedPaper; 1:Acceptedt)
pos2(1:AcceptedPaper; 1:Papert)</p>
        <p>has2T oken(1:AcceptedPaper)
hasLabel(1#AcceptedPaper, 1:AcceptedPaper)
We postulate that a label is matched if and only if all of its tokens are matched. We
specify this explicitly for labels of different size.1 The 2-token case is shown here.</p>
        <p>has2T oken(l1) ^ has2T oken(l2) ^ pos1(l1; t11) ^ pos2(l1; t12) ^
pos1(l2; t21) ^ pos2(l2; t22) ! (equivl(l1; l2) $ equivt(t11; t21) ^ equivt(t12; t22))
Next, we have to establish the connection between label and logical entity. A logical
entity is matched if and only if at least one of its labels is matched.</p>
        <p>map(e1; e2) $ 9l1 9l2 (hasLabel(e1; l1) ^ hasLabel(e2; l2) ^ equivl(l1; l2))</p>
        <p>
          We follow the classic approach and translate (a subset of) the ontological axioms
to our formalism by using the logical predicates. We add several constraints as
restrictions of the map-predicate ensuring that the generated alignment is a 1:1
mapping and that this mapping is coherent taking the ontological axioms into account.
These constraints have already been explained in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and we can integrate them
easily in our approach as constraints on the second layer. In addition to the 1:1
constraint for the map predicate, we also add a 1:1 constraint for the equivt-predicate
on the token layer. This ensures that equiv(1:Papert; 2:Contributiont) and
equiv(1:Contributiont; 2:Contributiont) cannot be true at the same time.
        </p>
        <p>Computing the MAP state for the modeling described so far will always yield an
empty result, because the summands in the objective function are only the weights
attached to the equivt-atoms. All of them are 0, thus, the best objective will be
0, which is the objective of an empty mapping. We have to add a weighted rule that
rewards each correspondence, i.e., a rule that rewards each instantiation of the map
predicate. We have set the reward to 0.5.</p>
        <p>map(e1; e2); +0:5
Now each correspondence added to the solution increases the score of the objective by
0.5. At the same time each instantiation of the map predicate forces to instantiate at least
one equivl-atom, which again forces to instantiate the related equivt-atoms weighted
with values lower or equal to zero. Thus, we have defined a non trivial optimization
1 We have not included labels with more than three tokens in our first implementation. For larger
labels, we decided to match these labels directly if they are the same after normalization.
problem in which the idea of generating a comprehensive alignment conflicts with our
assumptions related to the meaning of words.</p>
        <p>Finally, we need to explain the role of the ignore predicate. We want to match a
1-token label to a 2-token label if and only if we are allowed to ignore the modifier of
the 2-token label and if the remaining token is equivalent to the token of the 1-token
label. This can be expressed as follows.</p>
        <p>has1T oken(l1) ^ has2T oken(l2) ^ pos1(l1; t11) ^ pos1(l2; t21) ^</p>
        <p>pos2(l2; t22) ! (equivl(l1; l2) $ equivt(t11; t22) ^ ignore(t21))
However, a modifier should not be ignored be default. For that reason we have to add
again a simple weighted rule.</p>
        <p>ignore(t); 0:95
Together, with the previous constraint this rule assigns a punishment to ignoring a token
that is used as modifier. Note that the weight is set to a value lower than -0.5. By setting
the value to -0.95 it will only pay off to ignore a token if it will result in at least two
additional correspondences (n 0:5 0:95 &gt; 0:0 for n 2).</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2 Impact</title>
        <p>For the small fragment depicted in Figure 1 (from Section 3.1), we present the weighted
input atoms (marked with an I) and the resulting output atoms (marked with an O) in
the following listing. We omit the atoms describing the relations between tokens, labels,
and logical entities, as well as those that model the logical axioms.</p>
        <p>I O
I O
I O
I
I O</p>
        <p>O
O
O
O
O
O
O
O
equivt(1:Documentt; 2:Documentt)
equivt(1:Reviewedt; 2:Reviewedt)
equivt(1:Acceptedt; 2:Acceptedt)
equivt(1:Contributiont; 2:Contributiont)
equivt(1:Contributiont; 2:Papert)
equivl(1:Document; 2:Document)
equivl(1:Contribution; 2:Paper)
equivl(1:ReviewedContribution; 2:ReviewedPaper)
equivl(1:AcceptedContribution; 2:AcceptedPaper)
c1 map(1#Document; 2#Document)
c2 map(1#Contribution; 2#Paper)
c3 map(1#ReviewedContribution; 2#ReviewedPaper)
c4 map(1#AcceptedContribution; 2#AcceptedPaper)
input weight 0.0
input weight 0.0
input weight 0.0
input weight 0.0
input weight -0.9
The generated solution consists of four equivt-atom, four equivl-atoms, and four
mapatoms. The four map-atoms are converted to the four correspondences of the output
alignment fc1; c2; c3; c4g. The objective of this solution is 1:1 = 4 0:5 + 0:0 + 0:0 +
0:0 + 0:0 0:9. The example shows that the low similarity between 1:Papert and
2:Contributiont atom is compensated by the possibility to generate four
correspondences. The same result would not have been achieved by attaching aggregated
weights directly to the map-atoms.</p>
        <p>Let us compare this solution to other possible and impossible solutions. Thus, let
c5 map(1#AcceptedContribution; 2#AcceptedContribution) and let
c6 map(1#Contribution; 2#AcceptedContribution). .</p>
        <p>objective for fc1; c2; c3; c4g = 4 0:5 0:9 = 1:1
objective for fc1; c5g = 2 0:5 = 1:0</p>
        <p>fc1; c2; c3; c4; c5g is invalid against 1:1 constraint on the token layer
objective for fc1g or fc5g = 1 0:5 = 0:5</p>
        <p>objective for fc1; c6g = 2 0:5 0:95 = 0:05
The alignment fc1; c5g is listed with a relatively high objective. Note that fc1; c5g
would be invalid, if we there would be a disjointness statement between 2#Fee and
2#Document due a constraint on the layer of ontological entities. We have also added
fc1; c6g to our listing. It illustrates the possibility to ignore a modifier. However, this
solution has a low objective and there are other solutions with a better objective.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Preliminary Evaluation Results</title>
      <p>In the following we report about experiments with a prototypical implementation based
on the formalization presented above. The formalization is extended as follows.
– We added the constraint that if a property p is matched on a property p0, then the
domain (range) of p has to be matched to the domain of p0 or to a direct super or
subclass of the domain (range) of p0. In the latter case a small negative weight is
added to the objective.
– We derived alternative labels from the directly specified labels by ignoring
certain parts. For example, we added the label 1:writes to a property labeled with
1:writesPaper, if 1:Paper was the label of that properties domain.
– We derived alternative labels by adding 1:ConferenceMember as alternative
label given a label like 1:MemberOfConference.
– We added rules that allow to match two-token labels on three-token labels in case
that all tokens from the two-token label are matched, however, such a case was
punished with a negative weight.</p>
      <p>
        We use the following basic techniques for computing the input similarity scores. First
we normalize and split the labels into tokens. Given two tokens t1 and t2, we compute
the maximum of the values returned by the following five techniques. (1) We assign a
score of 0.0, if t1 = t2. (2) If t1 and t2 appear in the same synset in WordNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we
assign a score of -0.01. (3) We compute the Levenshtein distance [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], multiply it with
-1 and assign any score higher than -0.2 to detect spelling variants. (4) If t1 or t2 is a
single letter token and t1 starts with t2 or vice versa, we assign a score of -0.3. (5) We
check if t1 and t2 have been modified at least two times by the same modifier. If this is
the case, we assign a (very low) score of -0.9.
      </p>
      <p>
        We have used the RockIt [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Markov Logic engine to solve the optimization
problem. RockIt does not support all logical constructs of our formalization directly. Thus,
we had to rewrite existential quantification in terms of a comprehensive grounded
representation. We applied our approach to the OAEI conference track. The results are
depicted in Table 2.
      </p>
      <p>
        Pre
2014
* .80
AML [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] .80
LogMap [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] .76
XMAP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] .82
      </p>
      <p>F</p>
      <p>We have listed the top-3 participants of the OAEI 2012, 2013, and 2014 conference
track. The results are presented in term of precision (Pre), recall (Rec), and F-measure
(F) using the the ra2 reference alignment.2 For each year the results are ordered by
the F-measure that has been achieved. We inserted the results of our system, marked
as *, at the appropriate row. Note that the vast majority of participating systems, which
perform worse, is not depicted in the table. It can be seen that our approach is on the
first position in 2014 and on the second in 2013 and 2012. This is a very good result,
because we spent only a limited amount of work in the computation of the ingoing
similarity scores. On the contrary, we presented above a complete description in less
then 10 lines. This indicates that the quality of the generated alignments is mainly based
our new approach for modeling the task of selecting the final alignment from the given
similarity scores.</p>
      <p>The OAEI conference dataset can processed in less than 20 minutes on a standard
laptop. While slightly larger matching tasks are still feasible, significantly larger tasks
cannot be solved anymore. Scalability is indeed an open challenge for the proposed
approach. Currently we are working on a robust version of our approach in order to
participate in the OAEI 2015 campaign.3
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We presented a new approach for extracting a final alignment from an initial set of
matching hypotheses. We have argued by a detailed discussion of several examples that
our approach makes reasonable choices in situations where classical approaches are
doomed to fail. Moreover, our approach generates results in a transparent and
comprehensible manner. It can, for example, be proven that any other solution with a better
objective must be invalid. Moreover, the objective for any other possible solution can be
2 The ra2 reference alignment is not available for the public. We thank Ondrˇej Sˇva´b-Zamazal,
one of the track organizers, for conducting an evaluation run outside an OAEI campaign.
3 A first implementation is available at http://web.informatik.uni-mannheim.de/mamba/
computed to understand why the generated alignment was preferred over an alternative.
A preliminary evaluation has shown that our approach can compete with the top systems
participating in previous OAEI campaigns even though we put only limited effort in the
optimal choice and design of the similarity measures we used in our evaluation. While
the evaluation revealed that scalability is a crucial issue for the proposed approach, the
positive results observed so far as well as the elegant nature of the approach engages us
to improve the approach and to analyze it future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Jose´-Luis
          <string-name>
            <surname>Aguirre</surname>
          </string-name>
          , Kai Eckert, Je´roˆ me Euzenat, Alfio Ferrara, Willem Robert van Hage,
          <string-name>
            <surname>Laura Hollink</surname>
          </string-name>
          , Christian Meilicke, Andriy Nikolov, Dominique Ritze, Franc¸ois Scharffe, Pavel Shvaiko, Ondrej Sva´
          <fpage>b</fpage>
          -Zamazal,
          <article-title>Ca´ssia Trojahn dos Santos, Ernesto Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <article-title>Bernardo Cuenca Grau, and Benjamin Zapilko. Results of the ontology alignment evaluation initiative 2012</article-title>
          .
          <source>In Proceedings of the 7th International Workshop on Ontology Matching</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Sivan</given-names>
            <surname>Albagli</surname>
          </string-name>
          ,
          <article-title>Rachel Ben-Eliyahu-</article-title>
          <string-name>
            <surname>Zohary</surname>
            , and
            <given-names>Solomon E.</given-names>
          </string-name>
          <string-name>
            <surname>Shimony</surname>
          </string-name>
          .
          <article-title>Markov network based ontology matching</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>78</volume>
          (
          <issue>1</issue>
          ):
          <fpage>105</fpage>
          -
          <lpage>118</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Warith</given-names>
            <surname>Eddine</surname>
          </string-name>
          <article-title>Djeddi and Mohamed Tarek Khadir</article-title>
          . XMap++:
          <article-title>Results for oaei 2014</article-title>
          .
          <source>In Proceedings of the 9th International Workshop on Ontology Matching co-located with the 13th International Semantic Web Conference</source>
          , pages
          <fpage>163</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Faria</surname>
          </string-name>
          , Catia Pesquita, Emanuel Santos, Matteo Palmonari, Isabel Cruz, and Francisco Couto.
          <article-title>The agreementmakerlight ontology matching system</article-title>
          .
          <source>In On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>
          , pages
          <fpage>527</fpage>
          -
          <lpage>541</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yves</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jean-Mary</surname>
            ,
            <given-names>E. Patrick</given-names>
          </string-name>
          <string-name>
            <surname>Shironoshita</surname>
          </string-name>
          , and
          <string-name>
            <surname>Mansur</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kabuka</surname>
          </string-name>
          .
          <article-title>Ontology matching with semantic verification</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>235</fpage>
          -
          <lpage>251</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ernesto</given-names>
            <surname>Jime</surname>
          </string-name>
          <article-title>´nez-Ruiz and Bernardo Cuenca Grau</article-title>
          . Logmap:
          <article-title>Logic-based and scalable ontology matching</article-title>
          .
          <source>In The Semantic Web-ISWC</source>
          <year>2011</year>
          , pages
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Vladimir</surname>
            <given-names>I</given-names>
          </string-name>
          <string-name>
            <surname>Levenshtein</surname>
          </string-name>
          .
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>In Soviet physics doklady</source>
          , volume
          <volume>10</volume>
          , pages
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          ,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Meilicke</surname>
          </string-name>
          .
          <article-title>Alignment incoherence in ontology matching</article-title>
          .
          <source>PhD thesis</source>
          , University Mannheim,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Meilicke</surname>
          </string-name>
          , Jan Noessner, and
          <string-name>
            <given-names>Heiner</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          .
          <article-title>Towards joint inference for complex ontology matching</article-title>
          .
          <source>In AAAI (Late-Breaking Developments)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sergey</surname>
            <given-names>Melnik</given-names>
          </string-name>
          , Hector Garcia-Molina, and
          <string-name>
            <given-names>Erhard</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Similarity flooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2002</year>
          . Proceedings. 18th International Conference on, pages
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>George</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. DuyHoa Ngo, Zohra Bellahsene, and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Todorov</surname>
          </string-name>
          .
          <article-title>Extended tversky similarity for resolving terminological heterogeneities across ontologies</article-title>
          .
          <source>In On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>
          , pages
          <fpage>711</fpage>
          -
          <lpage>718</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mathias</surname>
            <given-names>Niepert</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Meilicke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Heiner</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          .
          <article-title>A probabilistic-logical framework for ontology matching</article-title>
          .
          <source>In AAAI</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jan</surname>
            <given-names>Noessner</given-names>
          </string-name>
          , Mathias Niepert, and Heiner Stuckenschmidt.
          <article-title>RockIt: Exploiting parallelism and symmetry for map inference in statistical relational models</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Richardson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Markov logic networks</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>62</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>