<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computing minimal mappings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fausto Giunchiglia</string-name>
          <email>fausto@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Maltese</string-name>
          <email>maltese@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aliaksandr Autayeu</string-name>
          <email>autayeu@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria e Scienza dell'Informazione (DISI) - Università di Trento</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Given two classifications, or lightweight ontologies, we compute the minimal mapping, namely the subset of all possible correspondences, called mapping elements, between them such that i) all the others can be computed from them in time linear in the size of the input ontologies, and ii) none of them can be dropped without losing property i). In this paper we provide a formal definition of minimal mappings and define a time efficient computation algorithm which minimizes the number of comparisons between the nodes of the two input ontologies. The experimental results show a substantial improvement both in the computation time and in the number of mapping elements which need to be handled.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology matching</kwd>
        <kwd>lightweight ontologies</kwd>
        <kwd>minimal mappings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Given any two graph-like structures, e.g., database and XML schemas,
classifications, thesauri and ontologies, matching is usually identified as the problem of finding
those nodes in the two structures which semantically correspond to one another. Any
such pair of nodes, along with the semantic relationship holding between the two, is
what we informally call a mapping element. In the last few years a lot of work has
been done on this topic both in the digital libraries [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref21">15, 16, 17, 21</xref>
        ] and the computer
science [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref8 ref9">2, 3, 4, 5, 6, 8, 9</xref>
        ] communities. In this paper we concentrate on lightweight
ontologies (or formal classifications), as formally defined in [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ], and we focus on
the problem of finding minimal mappings, that is, the subset of all possible
correspondences, called mapping elements, such that i) all the others can be computed from
them in time linear in the size of the input graphs, and ii) none of them can be
dropped without losing property i). This must not be seen as a limitation. There are
plenty of schemas in the world which can be translated, with almost no loss of
information, into lightweight ontologies. For instance, thesauri, library classifications, file
systems, email folder structures, web directories, business catalogues and so on.
Lightweight ontologies are well defined and pervasive. The main advantage of
minimal mappings is that they are the minimal amount of information that needs to be
dealt with. Notice that this is a rather important feature as the number of possible
mapping elements can grow up to n*m with n and m being the size of the two input
ontologies. Minimal mappings provide clear usability advantages. Many systems and
corresponding interfaces, mostly graphical, have been provided for the management
of mappings but all of them hardly scale with the increasing number of nodes, and the
resulting visualizations are rather messy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Furthermore, the maintenance of smaller
sets makes the work of the user much easier, faster and less error prone [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The main contributions of this paper are a formal definition of minimal and, dually,
redundant mappings, evidence of the fact that the minimal mapping always exists and
it is unique and an algorithm for computing it. This algorithm has the following main
features:
1. It can be proved to be correct and complete, in the sense that it always
computes the minimal mapping;
2. It minimizes the number of calls to the node matching function which
computes the relation between two nodes. Notice that node matching in the general
case amounts to logical reasoning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and it may require exponential time;
3. It computes the mapping of maximum size (including the maximum number of
redundant elements) as it maximally exploits the information codified in the
graph of the lightweight ontologies in input. This, in turn, avoids missing
mapping elements due to pitfalls in the node matching functions, e.g. because
of missing background knowledge [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        As far as we know very little work has been done on the issue of computing
minimal mappings. In general the computation of minimal mappings can be seen as a
specific instance of the mapping inference problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Closer to our work, in [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ]
the authors use Distributed Description Logics (DDL) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to represent and reason
about existing ontology mappings. They introduce a few debugging heuristics which
remove mapping elements which are redundant or generate inconsistencies from a
given set [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The main problem of this approach, as also recognized by the authors,
is the complexity of DDL reasoning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In our approach, instead of pruning
redundant elements, we directly compute the minimal set. Among other things, our
approach allows us to minimize the number of calls to node matching.
      </p>
      <p>The rest of the paper is organized as follows. Section 2 provides a motivating
example. Section 3 provides the definition for redundant and minimal mappings, and it
shows that the minimal set always exists and it is unique. Section 4 describes the
algorithm while Section 5 evaluates it. Finally, Section 6 draws some conclusions and
outlines the future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A motivating example</title>
      <p>Classifications are perhaps the most natural tool humans use to organize
information content. Information items are hierarchically arranged under topic nodes moving
from general ones to more specific ones as long as we go deeper in the hierarchy.</p>
      <p>Classification (1)
Classification (2)
journals
development and
programming
languages
java</p>
      <p>A
B
C</p>
      <p>D
E
F
G
programming and
development
languages
java
magazines</p>
      <p>Fig. 1. Two classifications</p>
      <p>
        This attitude is well known in Knowledge Organization as the principle of
organizing from the general to the specific [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], called synthetically the get-specific principle
in [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ]. Consider the two fragments of classifications depicted in Fig. 1. They are
designed to arrange more or less the same content, but from different perspectives.
The second is a fragment taken from the Yahoo web directory1 (category Computers
and Internet).
      </p>
      <p>
        Following the approach described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and exploiting dedicated NLP techniques
tuned to short phrases (for instance, as described in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]), classifications can be
converted, exactly or with a certain degree of approximation, into their formal alter-ego,
namely into lightweight ontologies. Lightweight ontologies [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] are acyclic graph
structures where each natural language node label is translated into a propositional
Description Logic (DL) formula codifying the meaning of the node. Notice that the
formula associated to each node contains the formula of the node above to capture the
fact that the meaning of each node is contextualized by the meaning of its ancestor
nodes. As a consequence, the backbone structure of the resulting lightweight
ontologies is represented by subsumption relations between nodes. The resulting formulas
are reported in Fig. 2.
      </p>
      <p>Source
(develo⊓pmleanntg#u1ag⊔esp#r3og⊓rajomumrninagls##21) B</p>
      <p>Java#3 ⊓
(development#1 ⊔ programming#2)
⊓ languages#3 ⊓ journals#1</p>
      <p>A</p>
      <p>C
journals#1</p>
      <p>Target
⊑
⊑
⊑
⊑
⊑
⊒ ⊒
≡</p>
      <p>
        D
E
G
programming#2 ⊔ development#1
languages#3 ⊓
(programming#2 ⊔ development#1)
F java#3 ⊓ languages#3 ⊓
(programming#2 ⊔ development#1)
magazines#1 ⊓ java#3 ⊓
languages#3 ⊓
(programming#2 ⊔ development#1)
M’ = {&lt;A, G, ⊒&gt;, &lt;B, D, ⊑&gt;, &lt;B, E, ⊑&gt;, &lt;B, G, ⊒&gt;, &lt;C, D, ⊑&gt;, &lt;C, E, ⊑&gt;, &lt;C, F, ⊑&gt;, &lt;C, G, ≡&gt;}
M = { &lt;B, E, ⊑&gt;, &lt;C, G, ≡&gt;}
Here each string denotes a concept (e.g., journals#1) and the number at the end of
the strings denote a specific concept constructed from a WordNet sense. Fig. 2 also
reports the resulting mapping elements. We assume that each mapping element is
associated with one of the following semantic relations: disjointness (⊥), equivalence
(≡), more specific (⊑) and less specific (⊒), as computed for instance by semantic
matching [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Notice however that not all the mapping elements have the same
semantic valence. For instance, B⊑D is a trivial logical consequence of B⊑E and E⊑D,
and similarly for C⊑F and C≡G. We represent the elements in the minimal mapping
using solid lines and redundant elements using dashed lines. M’ is the set of
maximum size (including the maximum number of redundant elements) while M is the
minimal set. The problem is how to compute the minimal set in the most efficent way.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Redundant and minimal mappings</title>
      <p>
        Adapting the definition in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] we define a lightweight ontology as follows:
Definition 1 (Lightweight ontology). A lightweight ontology O is a rooted tree
&lt;N,E,LF&gt; where:
a) N is a finite set of nodes;
b) E is a set of edges on N;
c) LF is a finite set of labels expressed in a Propositional DL language such that for
any node ni ∈ N, there is one and only one label liF∈LF;
d) li+1F ⊑ liF with ni being the parent of ni+1.
      </p>
      <p>The superscript F is used to emphasize that labels are in a formal language. Fig. 2
above provides an example of (a fragment of) two lightweight ontologies.</p>
      <p>We then define mapping elements as follows:
Definition 2 (Mapping element). Given two lightweight ontologies O1 and O2, a
mapping element m between them is a triple &lt;n1, n2, R&gt;, where:
a) n1∈N1 is a node in O1, called the source node;
b) n2∈N2 is a node in O2, called the target node;
c) R ∈ {≡, ⊑, ⊒, ⊥} is the strongest semantic relation holding between n1 and n2.</p>
      <p>The partial order is such that disjointness is stronger than equivalence which, in
turn, is stronger than subsumption (in both directions), and such that the two
subsumption symbols are unordered. This is in order to return subsumption only when
equivalence does not hold or one of the two nodes being inconsistent (this latter case
generating at the same time both a disjointness and a subsumption relation), and
similarly for the order between disjointness and equivalence. Notice that, under this
ordering, there can be at most one mapping element between two nodes.</p>
      <p>The next step is to define the notion of redundancy. The key idea is that, given a
mapping element &lt;n1, n2, R&gt;, a new mapping element &lt;n1’, n2’, R’&gt; is redundant with
respect to the first if the existence of the second can be asserted simply by looking at
the relative positions of n1 with n1’, and n2 with n2’. In algorithmic terms, this means
that the second can be computed without running the time expensive node matching
functions. We have identified four basic redundancy patterns as follows:
(1)
A
C
⊑
⊑</p>
      <p>D
B
(2)
C
A
⊒
⊒</p>
      <p>B
D
(3)
A
C
⊥
⊥</p>
      <p>B
D
(4)
A
C
E
≡
≡
≡</p>
      <p>F
D
B</p>
      <p>Fig. 3. Redundancy detection patterns</p>
      <p>In Fig. 3, the blue dashed mappings are redundant w.r.t. the solid blue ones. The
bold red solid lines show how a semantic relation propagates. Let us discuss the
rationale for each of the patterns:
• Pattern (1): each mapping element &lt;C, D, ⊑&gt; is redundant w.r.t. &lt;A, B, ⊑&gt;. In
fact, C is more specific than A which is more specific than B which is more
specific than D. As a consequence, by transitivity C is more specific than D.
• Pattern (2): dual argument as in pattern (1).
• Pattern (3): each mapping element &lt;C, D, ⊥&gt; is redundant w.r.t. &lt;A, B, ⊥&gt;. In
fact, we know that A and B are disjoint, that C is more specific than A and that
D is more specific than B. This implies that C and D are also disjoint.
• Pattern (4): Pattern 4 is the combinations of patterns (1) and (2).</p>
      <p>In other words, the patterns are the way to capture logical inference from structural
information, namely just by looking at the position of the nodes in the two trees. As
we will show, this on turn allows computing the redundant elements in linear time
(w.r.t. the size of the two ontologies) from the ones in the minimal set. Notice that
patterns (1) and (2) are still valid in case we substitute subsumption with equivalence.
However, in this case we cannot exclude the possibility that a stronger relation holds
between C and D. A trivial example of where this is not the case is provided in Fig. 4
(a).</p>
      <p>(a)</p>
      <p>Car A
⊑
Automobile</p>
      <p>C
≡
≡</p>
      <sec id="sec-3-1">
        <title>D Auto</title>
      </sec>
      <sec id="sec-3-2">
        <title>Mammal A</title>
        <p>⊑</p>
      </sec>
      <sec id="sec-3-3">
        <title>Canine C</title>
        <p>⊑
⊑</p>
      </sec>
      <sec id="sec-3-4">
        <title>B Animal</title>
        <p>⊑
D Dog
(b)
(3) If R’ is ⊥, ∃m∈M with m = &lt;A, B, ⊥&gt; and m ≠ m’ such that A ∈ path(C)
and B ∈ path(D);
(4) If R’ is ≡, conditions (1) and (2) must be satisfied.</p>
        <p>See how Definition 3 maps to the four patterns in Fig. 3. Fig. 2 in Section 2
provides examples of redundant elements. Definition 3 can be proved to capture all and
only the cases of redundancy.</p>
        <p>Theorem 1 (Redundancy, soundness and completeness). Given a mapping M between
two lightweight ontologies O1 and O2, a mapping element m’ ∈ M is redundant if and
only if it satisfies one of the conditions of Definition 3.</p>
        <p>
          The soundness argument is the rationale described for the patterns above.
Completeness can be shown by constructing the counterargument that we cannot have
redundancy in the remaining cases. We can proceed by enumeration, negating each of
the patterns, encoded one by one in the conditions appearing in the Definition 3. The
complete proof is given in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Fig. 4 (b) provides an example of non redundancy
which is based on pattern (1). It tells us that the existence of a link between two nodes
does not necessarily propagate to the two nodes below. For example we cannot derive
that Canine ⊑ Dog from the set of axioms {Canine ⊑ Mammal, Mammal ⊑ Animal,
Dog ⊑ Animal}, and it would be wrong to do so.
        </p>
        <p>The notion of redundancy allows us to formalize the notion of minimal mapping as
follows:
Definition 4 (Minimal mapping). Given two lightweight ontologies O1 and O2, we
say that a mapping M between them is minimal iff:
a) ∄m∈M such that m is redundant (minimality condition);
b) ∄M’⊃M satisfying condition a) above (maximality condition).</p>
        <p>A mapping element is minimal if it belongs to the minimal mapping.</p>
        <p>Note that conditions (a) and (b) ensure that the minimal set is the set of maximum
size with no redundant elements. As an example, the set M in Fig. 2 is minimal.
Comparing this mapping with M’ we can observe that all elements in the set M’ - M
are redundant and that, therefore, there are no other supersets of M with the same
properties. In effect, &lt;A, G, ⊒&gt; and &lt;B, G, ⊒&gt; are redundant w.r.t. &lt;C, G, ≡&gt; for
pattern (2); &lt;C, D, ⊑&gt;, &lt;C, E, ⊑&gt; and &lt;C, F, ⊑&gt; are redundant w.r.t. &lt;C, G, ≡&gt; for
pattern (1); &lt;B, D, ⊑&gt; is redundant w.r.t. &lt;B, E, ⊑&gt; for pattern (1). Note that M contains
far less mapping elements w.r.t. M’.</p>
        <p>As last observation, for any two given lightweight ontologies, the minimal
mapping always exists and it is unique.</p>
        <p>Theorem 2 (Minimal mapping, existence and uniqueness). Given two lightweight
ontologies O1 and O2, there is always one and only one minimal mapping between
them.</p>
        <p>
          A proof is given in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>Computing minimal and redundant mappings</p>
        <p>The patterns described in the previous section suggest how to significantly reduce
the amount of calls to the node matchers. By looking for instance at pattern (2) in Fig.
3, given a mapping element m = &lt;A, B, ⊒&gt; we know that it is not necessary to
compute the semantic relation holding between A and any descendant C in the sub-tree of
B since we know in advance that it is ⊒. At the top level the algorithm is organized as
follows:
• Step 1, computing the minimal mapping modulo equivalence: compute the
set of disjointness and subsumption mapping elements which are minimal
modulo equivalence. By this we mean that they are minimal modulo
collapsing, whenever possible, two subsumption relations of opposite direction into a
single equivalence mapping element;
• Step 2, computing the minimal mapping: eliminate the redundant
subsumption mapping elements. In particular, collapse all the pairs of subsumption
elements (of opposite direction) between the same two nodes into a single
equivalence element. This will result into the minimal mapping;
• Step 3, computing the mapping of maximum size: Compute the mapping of
maximum size (including minimal and redundant mapping elements). During
this step the existence of a (redundant) element is computed as the result of the
propagation of the elements in the minimal mapping.</p>
        <p>
          The first two steps are performed at matching time, while the third is activated
whenever the user wants to exploit the pre-computed mapping elements, for instance
for their visualization. For lack of space in the following we give only the
pseudocode for the first step. The interested reader can look at [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] for the pseudo-code of
the other two steps.
        </p>
        <p>The minimal mapping is computed by a function TreeMatch whose pseudo-code
is given in Fig. 5. M is the minimal set while T1 and T2 are the input lightweight
ontologies.
10 node: struct of {cnode: wff; children: node[];}
20 T1,T2: tree of (node);
30 relation in {⊑, ⊒, ≡, ⊥};
40 element: struct of {source: node; target: node; rel: relation;};
50 M: list of (element);
60 boolean direction;
70 function TreeMatch(tree T1, tree T2)
80 {TreeDisjoint(root(T1),root(T2));
90 direction := true;
100 TreeSubsumedBy(root(T1),root(T2));
110 direction := false;
120 TreeSubsumedBy(root(T2),root(T1));
130 TreeEquiv();
140 };</p>
        <p>
          TreeMatch is crucially dependent on the node matching functions NodeDisjoint
(given in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]) and NodeSubsumedBy (Fig. 6) which take two nodes n1 and n2 and
return a positive answer respectively in case of disjointness or subsumption, or a
negative answer if it is not the case or they are not able to establish it. Notice that
these two functions hide the heaviest computational costs; in particular their
computation time is exponential when the relation holds, but possibly much faster, when the
relation does not hold. The main motivation for this is that the node matching
problem, in the general case, should be translated into disjointness or subsumption
problem in propositional DL (see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for a detailed description). The goal, therefore, is to
compute the minimal mapping by minimizing the calls to the node matching functions
and, in particular minimizing the calls where the relation will turn out to hold. We
achieve this purpose by processing both trees top down. To maximize the
performance of the system, TreeMatch has therefore been built as the sequence of three
function calls: the first call to TreeDisjoint (line 80) computes the minimal set of
disjointness mapping elements, while the second and the third call to TreeSubsumedBy
compute the minimal set of subsumption mapping elements in the two directions
modulo equivalence (lines 90-120). Notice that in the second call, TreeSubsumedBy
is called with the input ontologies with swapped roles. These three calls correspond to
Step 1 above. Line 130 in the pseudo code of the TreeMatch implements the Step 2.
        </p>
        <p>
          Given two sub-trees in input, rooted in n1 and n2, the TreeDisjoint function
searches for the first disjointness elements along any pair of paths in them. Look at
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] for corresponding pseudo-code and the complete description.
        </p>
        <p>TreeSubsumedBy (Fig. 6) recursively finds all minimal mapping elements where
the strongest relation between the nodes is ⊑ (or dually, ⊒ in the second call in the
TreeMatch, line 120. In the following we will concentrate only on the first call).
10 function boolean TreeSubsumedBy(node n1, node n2)
20 {c1,c2: node; LastNodeFound: boolean;
30 if (&lt;n1,n2,⊥&gt; ∈ M) then return false;
40 if (!NodeSubsumedBy(n1, n2)) then
50 foreach c1 in GetChildren(n1) do TreeSubsumedBy(c1,n2);
60 else
70 {LastNodeFound := false;
80 foreach c2 in GetChildren(n2) do
90 if (TreeSubsumedBy(n1,c2)) then LastNodeFound := true;
100 if (!LastNodeFound) then AddSubsumptionMappingElement(n1,n2);
120 return true;
140 };
150 return false;
160 };
170 function boolean NodeSubsumedBy(node n1, node n2)
180 {if (Unsatisfiable(mkConjunction(n1.cnode,negate(n2.cnode)))) then
return true;
190 else return false; };
200 function AddSubsumptionMappingElement(node n1, node n2)
210 {if (direction) then AddMappingElement(&lt;n1,n2,⊑&gt;);
220 else AddMappingElement(&lt;n2,n1,⊒&gt;); };</p>
        <p>Notice that TreeSubsumedBy assumes that the minimal disjointness elements are
already computed. As a consequence, at line 30 it checks whether the mapping
element between the nodes n1 and n2 is already in the minimal set. If this is the case it
stops the recursion. This allows computing the stronger disjointness relation rather
than subsumption when both hold (namely in presence of an inconsistent node).
Given n2, lines 40-50 implement a depth first recursion in the first tree till a
subsumption is found. The test for subsumption is performed by the NodeSubsumedBy
function that checks whether the formula obtained by the conjunction of the formulas
associated to the node n1 and the negation of the formula for n2 is unsatisfiable (lines
170-190). Lines 60-140 implement what happens after the first subsumption is found.
The key idea is that, after finding the first subsumption, TreeSubsumedBy keeps
recursing down the second tree till it finds the last subsumption. When this happens, the
resulting mapping element is added to the minimal set (line 100). Notice that both
NodeDisjoint and NodeSubsumedBy call the function Unsatisfiable which embeds
a call to a SAT solver.</p>
        <p>To fully understand TreeSubsumedBy, the reader should check what happens in
the four situations in Fig. 7. In case (a) the first iteration of the TreeSubsumedBy
finds a subsumption between A and C. Since C has no children, it skips lines 80-90
and directly adds the mapping element &lt;A, C, ⊑&gt; to the minimal set (line 100). In
case (b), since there is a child D of C the algorithm iterates on the pair A-D (lines
8090) finding a subsumption between them. Since there are no other nodes under D, it
adds the mapping element &lt;A, D, ⊑&gt; to the minimal set and returns true. Therefore
LastNodeFound is set to true (line 90) and the mapping element between the pair A-C
is recognized as redundant. Case (c) is similar. The difference is that
TreeSubsumedBy will return false when checking the pair A-D (line 30), thanks to previous
computation of minimal disjointness mapping elements, and therefore the mapping
element &lt;A, C, ⊑&gt; is recognized as minimal. In case (d) the algorithm iterates after the
second subsumption mapping element is identified. It first checks the pair A-C and
iterates on A-D concluding that subsumption does not hold between them (line 40).
Therefore, it recursively calls TreeSubsumedBy between B and D. In fact, since &lt;A,
C, ⊑&gt; will be recognized as minimal, it is not worth checking &lt;B, C, ⊑&gt; for pattern
(1). As a consequence &lt;B, D, ⊑&gt; is recognized as minimal together with &lt;A, C, ⊑&gt;.
(a)</p>
        <p>A
⊑</p>
        <p>C
(b)</p>
        <p>A
⊑
⊑</p>
        <p>C
D
(c)</p>
        <p>A
B
⊑
⊥</p>
        <p>C
D
(d)</p>
        <p>A
B
⊑
⊑
⊑</p>
        <p>C
D</p>
        <p>Five observations. The first is that, even if, overall, TreeMatch implements three
loops instead of one, the wasted (linear) time is largely counterbalanced by the
exponential time saved by avoiding a lot of useless calls to the SAT solver. The second is
that, when the input trees T1 and T2 are two nodes, TreeMatch behaves as a node
matching function which returns the semantic relation holding between the input
nodes. The third is that the call to TreeDisjoint before the two calls to
TreeSubsumedBy allows us to implement the partial order on relations defined in the previous
section. In particular it allows returning only a disjointness mapping element when
both disjointness and subsumption hold. The fourth is the fact that skipping (in the
body of the TreeDisjoint) the two sub-trees where disjointness holds is what allows
not only implementing the partial order (see the previous observation) but also saving
a lot of useless calls to the node matching functions. The fifth and last observation is
that the implementation of TreeMatch crucially depends on the fact that the minimal
elements of the two directions of subsumption and disjointness can be computed
independently (modulo inconsistencies).
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        The algorithm presented in the previous section, let us call it MinSMatch, has been
implemented by taking the node matching routines of the state of the art matcher
SMatch [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and by changing the way the tree structure is matched. The evaluation has
been performed by directly comparing the results of MinSMatch and SMatch on
several real-world datasets. All tests have been performed on a Pentium D 3.40GHz with
2GB of RAM running Windows XP SP3 operating system with no additional
applications running except the matching system. Both systems were limited to allocating no
more than 1GB of RAM. The tuning parameters were set to the default values. The
selected datasets had been already used in previous evaluations, see [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Some of
these datasets can be found at OAEI web site2. The first two datasets describe courses
and will be called Cornell and Washington, respectively. The second two come from
the arts domain and will be referred to as Topia and Icon, respectively. The third two
datasets have been extracted from the Looksmart, Google and Yahoo! directories and
will be referred to as Source and Target. The fourth two datasets contain portions of
the two business directories eCl@ss3 and UNSPSC4 and will be referred to as Eclass
and Unspsc. Table 1 describes some indicators of the complexity of these datasets.
#
      </p>
      <p>Consider Table 2. The reduction in the last column is calculated as (1-m/t), where
m is the number of elements in the minimal set and t is the total number of elements
in the mapping of maximum size, as computed by MinSMatch. As it can be easily
noticed, we have a significant reduction, in the range 68-96%.</p>
      <p>
        The second interesting observation is that in Table 2, in the last two experiments,
the number of total mapping elements computed by MinSMatch is slightly higher
(compare the second and the third column). This is due to the fact that in the presence
of one of the patterns, MinSMatch directly infers the existence of a mapping element
without testing it. This allows MinSMacth, differently from SMatch, to avoid missing
2 http://oaei.ontologymatching.org/2006/directory/
3 http://www.eclass-online.com/
4 http://www.unspsc.org/
elements because of failures of the node matching functions (because of lack of
background knowledge [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). One such example from our experiments is reported below
(directories Source and Target):
\Top\Computers\Internet\Broadcasting\Video Shows
\Top\Computing\Internet\Fun &amp; Games\Audio &amp; Video\Movies
      </p>
      <p>We have a minimal mapping element which states that Video Shows ⊒ Movies.
The element generated by this minimal one, which is captured by MinSMatch and
missed by SMatch (because of the lack of background knowledge about the relation
between ‘Broadcasting’ and ‘Movies’) states that Broadcasting ⊒ Movies.</p>
      <p>S-Match
Total mapping
elements (t)
223
5491
282638
39590</p>
      <p>MinSMatch
Total mapping Minimal mapping
elements (t) elements (m)
223 36
5491 243
282648 30956
39818 12754
Table 2. Mapping sizes.</p>
      <p>Reduction, %</p>
      <p>
        To conclude our analysis, Table 3 shows the reduction in computation time and
calls to SAT. As it can be noticed, the time reductions are substantial, in the range
16% - 59%, but where the smallest savings are for very small ontologies. In principle,
the deeper the ontologies the more we should save. The interested reader can refer to
[
        <xref ref-type="bibr" rid="ref14 ref5">5, 14</xref>
        ] for a detailed qualitative and performance evaluation of SMatch w.r.t. other
state of the art matching algorithms.
      </p>
      <p>S-Match</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marchese</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Zaihrayeu</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Encoding Classifications into Lightweight Ontologies</article-title>
          .
          <source>Journal of Data Semantics 8</source>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <year>2007</year>
          . Ontology Matching. Springer-Verlag New York, Inc. Secaucus, NJ, USA.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Ten Challenges for Ontology Matching</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE</source>
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>Representing and Reasoning about Mappings between Domain Models</article-title>
          .
          <source>At the 18th National Conference on Artificial Intelligence (AAAI</source>
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>Semantic Matching: algorithms and implementation</article-title>
          .
          <source>Journal on Data Semantics</source>
          , IX,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>C.</given-names>
            <surname>Caracciolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ichise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Malaisé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meilicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>First results of the Ontology Alignment Evaluation Initiative 2008</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Zaihrayeu</surname>
          </string-name>
          ,
          <year>2007</year>
          .
          <string-name>
            <given-names>Lightweight</given-names>
            <surname>Ontologies</surname>
          </string-name>
          .
          <source>In The Encyclopedia of Database Systems</source>
          , to appear. Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskevich</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Discovering missing background knowledge in ontology matching</article-title>
          .
          <source>Proceedings of the 17th European Conference on Artificial Intelligence (ECAI</source>
          <year>2006</year>
          ), pp.
          <fpage>382</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wache</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Reasoning about Ontology Mappings</article-title>
          .
          <source>Proceedings of the ECAI-06 Workshop on Contextual Representation and Reasoning.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>C. Meilicke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Tamilin</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Improving automatically created mappings using logical reasoning</article-title>
          .
          <source>In the proceedings of the 1st International Workshop on Ontology Matching OM-2006, CEUR Workshop Proceedings</source>
          Vol.
          <volume>225</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Meilicke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Tamilin</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Reasoning support for mapping revision</article-title>
          .
          <source>Journal of Logic and Computation</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Borgida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          . Distributed Description Logics:
          <article-title>Assimilating Information from Peer Sources</article-title>
          .
          <source>Journal on Data</source>
          Semantics pp.
          <fpage>153</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. I.
          <string-name>
            <surname>Zaihrayeu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Ju</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chi</surname>
            , and
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>From web directories to ontologies: Natural language processing challenges</article-title>
          .
          <source>In 6th International Semantic Web Conference (ISWC</source>
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P.</given-names>
            <surname>Avesani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskevich</surname>
          </string-name>
          ,
          <year>2005</year>
          .
          <string-name>
            <given-names>A</given-names>
            <surname>Large Scale</surname>
          </string-name>
          <article-title>Taxonomy Mapping Evaluation</article-title>
          .
          <source>In Proceedings of International Semantic Web Conference (ISWC</source>
          <year>2005</year>
          ), pp.
          <fpage>67</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. L. Zeng</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
          </string-name>
          ,
          <year>2004</year>
          .
          <article-title>Trends and Issues in Establishing Interoperability Among Knowledge Organization Systems</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>55</volume>
          (
          <issue>5</issue>
          ) pp.
          <fpage>377</fpage>
          -
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovács</surname>
          </string-name>
          . A.
          <string-name>
            <surname>Micsik</surname>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>Extending Semantic Matching Towards Digital Library Contexts</article-title>
          .
          <source>Proc.eedings of the 11th European Conference on Digital Libraries (ECDL</source>
          <year>2007</year>
          ), pp.
          <fpage>285</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>B.</given-names>
            <surname>Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Madhusudan</surname>
          </string-name>
          ,
          <year>2004</year>
          .
          <article-title>Element matching in concept maps</article-title>
          .
          <source>Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL</source>
          <year>2004</year>
          ), pp.
          <fpage>186</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>B.</given-names>
            <surname>Hjørland</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>What is Knowledge Organization (KO)?</article-title>
          . Knowledge Organization.
          <source>International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation</source>
          <volume>35</volume>
          (
          <issue>2</issue>
          /3) pp.
          <fpage>86</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>D.</given-names>
            <surname>Soergel</surname>
          </string-name>
          ,
          <year>1972</year>
          .
          <string-name>
            <given-names>A</given-names>
            <surname>Universal Source</surname>
          </string-name>
          <article-title>Thesaurus as a Classification Generator</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>23</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>299</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>D.</surname>
            Vizine-Goetz,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hickey</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Houghton</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Vocabulary Mapping for Terminology Services</article-title>
          .
          <source>Journal of Digital Information</source>
          , Volume
          <volume>4</volume>
          , Issue 4.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>M. Doerr</surname>
          </string-name>
          ,
          <year>2001</year>
          .
          <article-title>Semantic Problems of Thesaurus Mapping</article-title>
          .
          <source>Journal of Digital Information</source>
          , Volume
          <volume>1</volume>
          , Issue 8.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Maltese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Autayeu</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Computing minimal mappings</article-title>
          . University of Trento,
          <source>DISI Technical Report</source>
          : http://eprints.biblio.unitn.it/archive/00001525/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>