<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relaxed Precision and Recall for Ontology Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marc Ehrig</string-name>
          <email>ehrig@aifb.uni-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je´ roˆ me Euzenat</string-name>
          <email>Jerome.Euzenat@inrialpes.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INRIA Rhoˆ ne-Alpes</institution>
          ,
          <addr-line>655 avenue de l'Europe., Monbonnot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute AIFB, University of Karlsruhe</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>25</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>In order to evaluate the performance of ontology matching algorithms it is necessary to confront them with test ontologies and to compare the results. The most prominent criteria are precision and recall originating from information retrieval. However, it can happen that an alignment be very close to the expected result and another quite remote from it, and they both share the same precision and recall. This is due to the inability of precision and recall to measure the closeness of the results. To overcome this problem, we present a framework for generalizing precision and recall. This framework is instantiated by three different measures and we show in a motivating example that the proposed measures are prone to solve the problem of rigidity of classical precision and recall.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology alignment</kwd>
        <kwd>evaluation measures</kwd>
        <kwd>precision</kwd>
        <kwd>recall</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Ontology matching is an important problem for which many
algorithms (e.g., PROMPT[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], GLUE[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Ontrapro[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
OLA[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], FOAM[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) have been provided. In order to
evaluate the performance of these algorithms it is necessary to
confront them with test ontologies and to compare the
results. The most prominent criteria are precision and
recall originating from information retrieval and adapted to
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
      </p>
      <p>K-CAP’05, Workshop Integrating Ontologies, October 2, 2005, Banff,
Alberta, Canada.
the ontology matching task. Precision and recall are based
on the comparison of the resulting alignment with another
standard alignment, effectively comparing which
correspondences are found and which are not. These criteria are well
understood and widely accepted.</p>
      <p>
        However, as we have experienced in last year’s Ontology
Alignment Contest [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], they have the drawback to be of the
all-or-nothing kind. An alignment may be very close to the
expected result and another quite remote from it and both
return the same precision and recall. The reason for this is that
the criteria only compare two sets of correspondences
without considering if these are close or remote to each other:
if they are not the same exact correspondences, they score
zero. They both score identically low, despite their
different quality. It may be helpful for users to know whether the
found alignments are close to the expected one and easily
repairable or not. It is thus necessary to measure the proximity
between alignments instead of their strict equality.
In this paper we investigate some measures that generalize
precision and recall in order to overcome the problems
presented above. We first provide the basic definitions of
alignments, precision and recall as well as a motivating example
(§2). We then present a framework for generalizing
precision and recall (§3). This framework is instantiated by four
different measures (including classical precision and recall)
(§4) and we show on the motivating example that the
proposed measures do not exhibit the rigidity of classical
precision and recall (§5).
2.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>FOUNDATIONS</title>
    </sec>
    <sec id="sec-3">
      <title>Alignment</title>
      <p>DEFINITION 1 (ALIGNMENT, CORRESPONDENCE).
Given two ontologies O and O0, an alignment between
O and O0 is a set of correspondences (i.e., 4-uples):
he, e0, r, ni with e ∈ O and e0 ∈ O0 being the two matched
entities, r being a relationship holding between e and
e0, and n expressing the level of confidence [0..1] in this
correspondence.</p>
      <p>A matching algorithm returns an alignment A which is
compared with a reference alignment R.
Let us illustrate this through a simple example. Figure 1
presents two ontologies together with two alignments A1
and R. In this example, for the sake of simplification, the
relation is always ‘=’ and the confidence is always 1.0.</p>
      <sec id="sec-3-1">
        <title>The alignment A1 is defined as follows:</title>
        <p>&lt;o1:Vehicle,o2:Thing,=,1.0&gt;
&lt;o1:Car,o2:Porsche,=,1.0&gt;
&lt;o1:hasSpeed,o2:hasProperty,=,1.0&gt;
&lt;o1:MotorKA1,o2:MarcsPorsche,=,1.0&gt;
&lt;o1:250kmh,o2:fast,=,1.0&gt;
We present another reasonable alignment A2:
&lt;o1:Car,o2:Thing,=,1.0&gt;
&lt;o1:hasSpeed,o2:hasProperty,=,1.0&gt;
&lt;o1:MotorKA1,o2:MarcsPorsche,=,1.0&gt;
&lt;o1:250kmh,o2:fast,=,1.0&gt;
and an obviously wrong alignment A3:
&lt;o1:Object,o2:Thing,=,1.0&gt;
&lt;o1:Owner,o2:Volkswagen,=,1.0&gt;
&lt;o1:Boat,o2:Porsche,=,1.0&gt;
&lt;o1:hasOwner,o2:hasMotor,=,1.0&gt;
&lt;o1:Marc,o2:fast,=,1.0&gt;
Further, we have the following reference alignment (R):
&lt;o1:Object,o2:Thing,=,1.0&gt;
&lt;o1:Car,o2:Automobile,=,1.0&gt;
&lt;o1:Speed,o2:Characteristic,=,1.0&gt;
&lt;o1:250kmh,o2:fast,=,1.0&gt;
&lt;o1:PorscheKA123,o2:MarcsPorsche,=,1.0&gt;</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.2 Precision and Recall</title>
      <p>
        The usual approach for evaluating the returned alignments is
to consider them as sets of correspondences and check for
the overlap of the two sets. This is naturally obtained by
applying the classical measure of precision and recall [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
which are the ratio of the number of true positive (|R ∩ A )
|
and retrieved correspondences (|A|) or those to be retrieved
(|R|), respectively.
      </p>
      <p>DEFINITION 2 (PRECISION, RECALL). Given a
reference alignment R, the precision of some alignment A is
given by
and recall is given by</p>
      <p>P (A, R) = |R ∩ A|</p>
      <p>A
| |
R(A, R) = |R ∩ A| .</p>
      <p>|R|</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Problems with Current Measures</title>
      <p>However, even if the above measurements are easily
understandable and widespread, they are often criticized for
two reasons: Neither do they discriminate between a totally
wrong and an almost correct alignment, nor do they measure
user effort to adapt the alignment.</p>
      <p>Indeed, it often makes sense to not only have a decision
whether a particular correspondence has been found or not,
but measure the proximity of the found alignments. This
implies that also “near misses” are taken into consideration
instead of only the exact matches.</p>
      <p>As a matter of example, it will be clear to anybody that
among the alignments presented above, A3 is not a very
good alignment and A1 and A2 are better alignments.
However, they score almost exactly the same in terms of precision
(.2) and recall (.2).</p>
      <p>Moreover, the alignments will have to go through user
scrutiny and correction before being used. It is worth
measuring the effort required by the user for correcting the
provided alignment instead of only if some correction is
needing. This also calls for a relaxation of precision and recall.</p>
    </sec>
    <sec id="sec-6">
      <title>3. GENERALIZING PRECISION AND RE</title>
    </sec>
    <sec id="sec-7">
      <title>CALL</title>
      <p>Because precision and recall are well-known and easily
explained measures, it is good to adhere to them and extend
them. It also brings the benefit that measures derived from
precision and recall, such as f-measure, can still be
computed. For these reasons, we propose to generalize these
measures.</p>
      <p>If we want to generalize precision and recall, we should be
able to measure the proximity of correspondence sets rather
than their strict overlap. Instead of the taking the cardinal of
the intersection of the two sets (|R ∩ A|), we measure their
proximity (ω).</p>
      <p>DEFINITION 3 (GENERALIZED PRECISION AND RECALL).
Given a reference alignment R and an overlap function
ω between alignments, the precision of an alignment A is
given by</p>
      <p>Pω(A, R) =
Rω(A, R) =
ω(A, R)</p>
      <p>A
| |
ω(A, R)
|R|
.
and recall is given by</p>
    </sec>
    <sec id="sec-8">
      <title>3.1 Basic properties</title>
      <p>In order, for these new measures to be true generalizations,
we would like ω to share some properties with |R ∩ A|. In
particular, the measure should be positive:
∀A, B, ω(A, B) ≥ 0
(positiveness)
and not exceeding the minimal size of both sets:
∀A, B, ω(A, B) ≤ min(|A|, |B|)
(maximality)
Ontology 1
Ontology 2</p>
      <p>Concept
Relation
Instance
Correct Alignment R
Found Alignment A1
If we want to preserve precision and recall results, ω should
only add more flexibility to the usual precision and recall.
So their values cannot be worse than the initial evaluation:
∀A, B, ω(A, B) ≥ |A ∩ B|
(boundedness)
Hence, the main constraint faced by the proximity is the
following:</p>
      <p>|A ∩ R| ≤ ω(A, R) ≤ min(|A|, |R|)
This is indeed a true generalization because, |A ∩ R| satisfies
all these properties. One more property satisfied by precision
and recall that we will not enforce here is symmetry. This
guarantees that the precision and recall measures are true
normalized similarities.</p>
      <p>∀A, B, ω(A, B) = ω(B, A)
(symmetry)
We will not require symmetry, especially since A and R are
not in symmetrical positions.</p>
    </sec>
    <sec id="sec-9">
      <title>3.2 Designing Overlap Proximity</title>
      <p>There are many different ways to design such a proximity
given two sets. We retain here the most obvious one which
consists of finding correspondences matching each other and
computing the sum of their proximity. This can be defined
as an overlap proximity:</p>
      <p>DEFINITION 4 (OVERLAP PROXIMITY). A
that would generalize precision and recall is:
measure
ω(A, R) =</p>
      <p>X
ha,ri∈M(A,R)
σ(a, r)
in which M (A, R) is a matching between the
correspondences of A and R and σ(a, r) a proximity function between
two correspondences.</p>
      <p>Again, the standard overlap |A ∩ R| used in precision and
recall is such an overlap proximity.</p>
      <p>There are two tasks to fulfill when designing such an overlap
proximity function:
– the first one consists of finding the correspondences to
be compared M .
– the second one is to define a proximity measure on
correspondences σ;</p>
      <sec id="sec-9-1">
        <title>We consider these two issues below.</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>3.3 Matching Correspondences</title>
      <p>A matching between alignments is a set of correspondence
pairs, i.e., M (A, R) ⊆ A × R. However, if we want to keep
the analogy with precision and recall, it will be necessary to
restrict ourselves to the matchings in which an entity from
the ontology does not appear twice. This is compatible with
precision and recall for two reasons: (i) in these measures,
any correspondence is identified only with itself, and (ii)
appearing more than once in the matching would not guarantee
an overlap proximity below min(|A|, |R|) .</p>
      <p>|A|!
There are (|A|−|R|)! candidate matches (if |A| ≥ |R|). The
natural choice is to select the best match because this
guarantees that the function generalizes precision and recall.</p>
      <p>DEFINITION 5 (BEST MATCH). The best match
M (A, R) between two sets of correspondences A and R, is
the subset of A × R which maximizes the overall proximity
and in which each element of A (resp. R) belongs to only
one pair:</p>
      <p>M (A, R) ∈ M axω(A,R){M ⊆ A × R}
As defined here, this best match may not be unique. This
is not a problem, because we only want to find the highest
value for ω and any of the best matches will yield the same
value.</p>
      <p>Of course, the definitions M and ω are dependent of each
other, but this does not prevent us from computing them.
They are usually computed together but it is better to present
them separately.</p>
    </sec>
    <sec id="sec-11">
      <title>3.4 Correspondence Proximity</title>
      <p>In order to compute ω(A, R), we need to measure the
proximity between two matched correspondences (i.e., ha, ri ∈
M (A, R)) on the basis of how close the result is from the
ideal one. Each element in the tuple a = hea, e0a, ra, nai
will be compared with its counterpart in r = her, e0r, rr, nri.
For any two correspondences (the found a and the reference
r) we compute three similarities σpair, σrel, and σconf . If
elements are identical, proximity has to be one
(maximality). If they differ, proximity is lower, always according to
the chosen strategy. In contrast to the standard definition of
similarity, the mentioned proximity measures do not
necessarily have to be symmetric. We will only consider
normalized proximities, i.e., measures whose values are within the
unit interval [0..1], because this guarantees that</p>
      <p>
        ω(A, R) ≤ min(|A|, |R|)
The component proximity measure is defined in the
following way:
σpair(hea, eri, he0a, e0ri): How is one entity pair similar to
another entity pair? In ontologies we can in principal
follow any relation which exists (e.g., subsumption,
instantiation), or which can be derived in a meaningful
way. The most important parameters are the relations
to follow and their effect on the proximity.
σrel(ra, rr): Often the alignment relations are more
complex, e.g., represent subsumption, instantiation, or
compositions. Again, one has to assess the similarity
between these relations. The two relations of the
alignment cell can be compared based on their distance in a
conceptual neighborhood structure [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ].
σconf (na, nr): Finally, one has to decide, what to do with
different levels of confidence. The similarity could
simply be the difference. Unfortunately, none of the
current alignment approaches have an explicit meaning
attached to confidence values, which makes it rather
difficult in defining an adequate proximity.
      </p>
      <p>Once these proximities are established, they have to be
aggregated. The constraints on the aggregation function
(Aggr) are:
– normalization preservation (if ∀i, 0 ≤ ci ≤ 1 then 0 ≤</p>
      <p>Aggrici ≤ 1);
– maximality (if ∀i, ci = 1 then Aggrici = 1);
– local monotonicity (if ∀i 6= j, ci = c0i = c0j0 and cj ≤
c0j ≤ c0j0 then Aggrici ≤ Aggric0i ≤ Aggric0i0).</p>
      <p>Here, we consider aggregating them through
multiplication without further justification. Other aggregations (e.g.,
weighted sum) are also possible.</p>
      <p>DEFINITION 6 (CORRESPONDENCE PROXIMITY).
Given two correspondences hea, e0a, ra, nai
her, e0r, rr, nri, their proximity is:
and
σ(hea, e0a, ra, nai, her, e0r, rr, nri) =
σpair(hea, eri, he0a, e0ri) × σrel(ra, rr) × σconf (na, nr)
We have provided constraints and definitions for M , ω, and
σ. We now turn to concrete measures.</p>
    </sec>
    <sec id="sec-12">
      <title>4. CONCRETE MEASURES</title>
      <p>We consider four cases of relaxed precision and recall
measures based on the above definitions. We first give the
definition of usual precision and recall within this framework.</p>
    </sec>
    <sec id="sec-13">
      <title>4.1 Standard Precision and Recall</title>
      <p>For standard precision and recall, the value of ω is |A ∩ R|.
This is indeed an instance of this framework, if the
proximity used is based on the strict equality of the components of
correspondences.</p>
      <p>DEFINITION 7 (EQUALITY PROXIMITY). The equality
proximity is charaterized by:
σpair(hea, e0ai, her, e0ri) =</p>
      <p>σrel(ra, rr) =
σconf (na, nr) =</p>
    </sec>
    <sec id="sec-14">
      <title>4.2 Symmetric Proximity</title>
      <p>The easiest way to relax precision and recall is to have some
distance δ on the elements in ontologies and to weight the
proximity with the help of this distance: the higher the
distance between two entities in the matched correspondences,
the lower their proximity. This can be defined as:
and
δ(ea, er) ≤ δ(eb, er)
δ(e0a, e0r) ≤ δ(e0b, e0r)
=⇒</p>
      <p>σ(hea, e0ai, her, e0ri) ≥ σ(heb, e0bi, her, e0ri)
As a simple example of such a symmetric similarity, we use
a distance in which a class is at distance 0 of itself, at
distance 0.5 of its direct sub- and superclasses, and at a distance
1 of any other class. This could be further refined by having
a similarity inversely proportional to the distance in the
subsumption tree. Likewise, this similarity may also be applied
to properties and instances (through part-of relationships in
the latter case). The similarity between pairs is the
complement of these similarities The result is displayed in Table 1.
We always mention the assumed alignment and the actual
correct alignment.</p>
      <p>For the confidence distance we simply take the complement
of the difference. The final precision is calculated according
to the formula presented in the previous section:</p>
    </sec>
    <sec id="sec-15">
      <title>4.3 Measuring Correction Effort</title>
      <p>
        If users have to check and correct alignments, the quality of
alignment algorithms can be measured through the effort
required for transforming the obtained alignment into the
(correct) reference one [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        This measure can be implemented as an edit distance [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
an edit distance defines a number of operations by which an
object can be corrected (here the the operations on
correspondences authorized) and assigns a cost to each of these
operations (here the effort required to identify and repair
some mistake). The cost of a sequence of operations is the
sum of their cost and the distance between two objects is
the cost of the less costly sequence of operations that
transform one object into the other one. The result can always
be normalized in function of the size of the largest object.
Such a distance can be turned into a proximity by taking its
complement with regard to 1.
      </p>
      <p>Table 3 provides such plausible weights. Usually classes
are organized in a taxonomy in which they have less direct
super- than subclasses. It is thus easier to correct a class to
(one of) its superclass than to one of its subclasses. As a
consequence, the proximity is dissymmetric. Such a measure
should also add some effort when classes are not directly
related, but this has not been considered here.</p>
      <p>The edit distance between relations is relatively easy to
design since, generally, changing from one relation to another
can be done with just one click. Thus, the relational
similarity equals 1 if the relations are the same and 0.5 otherwise.
In this correction effort measure, the confidence factor does
not play an important role: ordering the correspondences can
only help the user to know that after some point she will have
to discard many correspondences. We thus decided to not
take confidence into account and thus, their proximity will
always be 1.
closest correct
e,e0
e,e0
c,sup(c0)
sup(c),c0
c,sub(c0)
sub(c),c0
r,sup(r0)
sup(r),r0
r,sub(r0)
sub(r),r0
i,super(i0)
super(i),i0
i,sub(i0)
sub(i),i0
similarity</p>
      <p>To be accurate, such an effort proximity would have been
better aggregated with an additive and normalized
aggregation function rather than multiplication.
4.4</p>
    </sec>
    <sec id="sec-16">
      <title>Precision- and Recall-oriented Measures</title>
      <p>One can also decide to use two different similarities
depending on their application for evaluating either precision or
recall. We here provide two such measures and justify the
given weights. Precision is normally a measure of accuracy
i.e., the returned results need to be correct. Every wrong
result will therefore entail a penalty. We assume the user poses
a query to the system as follows: “return me all instances of
e”. The system then returns any instance corresponding to
the alignment i.e. e0. Vice versa, for the relaxed recall we
want to avoid missing any correct result. This affects the
similarity relations and weights.
4.4.1</p>
      <p>Relaxed Precision
In Table 4 and 5 we present the precision similarity for pairs
and relations. The comments in each line explain the
decision for the weights.</p>
      <p>For the distance within the confidence we again use the
complement of the difference.</p>
      <sec id="sec-16-1">
        <title>DEFINITION 10 (PRECISION-ORIENTED PROXIMITY).</title>
        <p>The precision-recall oriented proximity is characterized by:
σpair (hea, e0ai, her , e0r i) as defined in Table 4</p>
        <p>σrel(ra, rr ) as defined in Table 5
σconf (na, nr ) = 1 − |na − nr |.
found
e,e0
e,e0
c,c0
c,c0
c,c0
c,c0
r,r0
r,r0
r,r0
r,r0
i,i0
i,i0
i,i0
i,i0</p>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>5. EXAMPLE</title>
      <p>In the introduction of this paper we have presented a pair of
ontologies, the reference alignment, and three different
identified alignments. We will now apply the different proposed
precision and recall measures to these example alignments.
Please note that they mainly illustrate entity pair similarities,
as relations and confidences are always identical. Table 8
provides the results. For the oriented measure we assume
that the query is given in ontology 1 and the answer has to
be retrieved in ontology 2. As the oriented measure is
dissymmetric, one has to define this direction beforehand.
ω
standard
symmetric
edit
oriented
(R, R)
P R
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
The measures which have been introduced address the
problems raised in the introduction and fulfill the requirements:
– They keep precision and recall untouched for the best
alignment (R);
– They help discriminating between irrelevant
alignments (A3) and not far from target ones (A1 and A2);
– Specialized measures are able to emphasize some
characteristics of alignments: ease of modification,
correctness or completeness. For instance, let’s consider the
oriented measures. In our example A1 has two very
near misses, which leads to a relatively high
precision. In A2 however the miss is bigger, but by aligning
one concept to its superconcept recall rises relatively
to precision.</p>
      <p>These results are based on only one example. They have to
be systematized in order to be extensively validated. Our
goal is to implement these measures within the Alignment
API and to use them on the forthcoming results of the
Ontology Alignment Evaluation 20051 in order to have real
data on which the relevance of the proposed measures can
be more openly debated.</p>
    </sec>
    <sec id="sec-18">
      <title>6. RELATED WORK</title>
      <p>
        The naturally relevant work is [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which has considered
precisely the evaluation of schema matching. However, the
authors only note the other mentioned problem (having two
measures instead of one) and use classical aggregation
(overall and F-measure) of precision and recall.
      </p>
      <p>
        In computational linguistics, and more precisely
multilingual text alignment, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has considered extending precision
and recall. Their goal is the same as ours: increasing the
discriminating power of the measures. In this work, the
mathematical formulation is not changed but the granularity
of compared sets changes: instead of comparing sentences
in a text, they compare words in sentences in a text. This
helps having some contribution to the measures when most
of the words are correctly aligned while the sentences are
not strictly aligned.
      </p>
      <p>
        In the Alignment API [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], there is another evaluation
measure which directly computes a distance based on a weighted
symmetric difference (weights are the confidences of each
correspondence in the alignment). This measure could be
used in the generalization proposed here (the distance would
then be based on confidence difference and would generally
satisfy P 0(A, R) ≤ P (A, R) and R0(A, R) ≤ R(A, R).
The deeper proposal for extending precision and recall
comes from hierarchical text categorization in which texts
are attached to some category in a taxonomy [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Usually,
texts are attached to the leaves, but when algorithms attach
them to the intermediate categories, it is useful to
discriminate between a category which is irrelevant and a category
which is an immediate super category of the expected one.
For that purpose, they introduce an extension of precision
(recall is redefined similarly) such that:
      </p>
      <p>PCS =
max(0, |A ∩ R| + F pCon + F nCon)</p>
      <p>|A| + F nCon
in which F pCon (resp. F nCon) is the contribution to false
positive (resp. false negative), i.e., the way incorrectly
classified documents could contribute to its incorrect category
anyway. The maximization is necessary to prevent the result
from being negative (because the contribution is defined with
respect to the average such contribution). The contribution
is measured in two ways. The first one is a category
similarity that is computed on the features of categories (categories
and documents are represented by a vector of features and
the membership to some category is based on a distance
be</p>
      <sec id="sec-18-1">
        <title>1http://oaei.inrialpes.fr/2005/</title>
        <p>tween these vectors). The second one is based on the
distance between categories in the taxonomy.</p>
        <p>This measure does not seem to be a generalization of
standard precision and recall as the one presented here. In
particular, because the contributions can be negative, this measure
can be lower than standard precision and recall. The idea
of retracting the contribution from wrongly classified
documents is not far from the idea developed here. However, the
computation of this contribution with regard to some
average and the addition of some contribution to the divisor do
not seem justified.</p>
      </sec>
    </sec>
    <sec id="sec-19">
      <title>7. DISCUSSION</title>
      <p>Evaluation of matching results is often made on the basis
of the well-known and well-understood precision and recall
measures. However, these measures do not discriminate
accurately between methods which do not provide the exact
results. In the context where the result of alignments have to
be screened by humans, this is an important need.
We have proposed a framework for generalizing
precision and recall when comparing ontology alignments. It
keeps the advantages of usual precision and recall but helps
discriminating between alignments by identifying for near
misses instead of completely wrong correspondences.
The framework has been instantiated in three different
measures, each one aiming at favoring some particular aspects
of alignment utility. We show that these measures indeed
avoid the shortcomings of standard evaluation criteria. They
should however, be further investigated in order to find
better formulations: more discrepancy needs to be considered,
more progressive distance (e.g., not direct subclasses) and
rationalized design of weights.</p>
      <p>This generalization framework is not the only possible one
since we have made a number of choices:
– on the form of the alignment similarity (Definition 4);
– on the kind of alignment matching (Definition 5);
– on the form of the correspondence similarity
(Definition 6).</p>
      <p>More work has to be done in order to assess the potential of
other choices in these functions.</p>
      <p>
        The most important work is to consider these proposed
measures in real evaluation of alignment systems and to identify
good measures for further evaluations. We plan to
implement these measures within the Alignment API [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
process the results of the Ontology Alignment Evaluation 2005.
      </p>
    </sec>
    <sec id="sec-20">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the Knowledge
Web European network of excellence (IST-2004-507482).
The authors would like to thank Diana Maynard who pointed
out the problem addressed here.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ashpole</surname>
          </string-name>
          .
          <article-title>Ontology translation protocol (ontrapro)</article-title>
          . In E. Messina and
          <string-name>
            <surname>A</surname>
          </string-name>
          . Meystel, editors,
          <source>Proceedings of Performance Metrics for Intelligent Systems (PerMIS '04)</source>
          , Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <year>August 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.-H.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melnik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Comparison of schema matching evaluations</article-title>
          .
          <source>In Proc. GI-Workshop ”Web and Databases”</source>
          , Erfurt (DE),
          <year>2002</year>
          . http://dol.uni-leipzig.de/pub/2002-28.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.-H.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dhamankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <article-title>Learning to map ontologies on the semantic web</article-title>
          .
          <source>VLDB journal</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrig</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          .
          <article-title>QOM - quick ontology mapping</article-title>
          .
          <source>In Proc. 3rd ISWC</source>
          ,
          <string-name>
            <surname>Hiroshima</surname>
          </string-name>
          (JP),
          <year>November 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>An API for ontology alignment</article-title>
          .
          <source>In Proc. 3rd international semantic web conference</source>
          ,
          <source>Hiroshima (JP)</source>
          , pages
          <fpage>698</fpage>
          -
          <lpage>712</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Laya¨ıda, and</article-title>
          <string-name>
            <given-names>V.</given-names>
            <surname>Dias</surname>
          </string-name>
          .
          <article-title>A semantic framework for multimedia document adaptation</article-title>
          .
          <source>In Proc. 18th International Joint Conference on Artificial Intelligence (IJCAI)</source>
          ,
          <source>Acapulco (MX)</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Valtchev</surname>
          </string-name>
          .
          <article-title>Similarity-based ontology alignment in OWL-lite</article-title>
          .
          <source>In Proc. 15th ECAI</source>
          , pages
          <fpage>333</fpage>
          -
          <lpage>337</lpage>
          , Valencia (ES),
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Freksa</surname>
          </string-name>
          .
          <article-title>Temporal reasoning based on semi-intervals</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>54</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>199</fpage>
          -
          <lpage>227</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Langlais</surname>
          </string-name>
          , J. Ve´ronis, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Simard</surname>
          </string-name>
          .
          <article-title>Methods and practical issues in evaluating alignment techniques</article-title>
          .
          <source>In Proc. 17th international conference on Computational linguistics</source>
          , Montre´al (CA), pages
          <fpage>711</fpage>
          -
          <lpage>717</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Levenshtein</surname>
          </string-name>
          .
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>Cybernetics and Control Theory</source>
          ,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Musen</surname>
          </string-name>
          . Smart:
          <article-title>Automated support for ontology merging</article-title>
          and alignment,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.-P.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Hierarchical text classification and evaluation</article-title>
          .
          <source>In Proc. IEEE international conference on data mining</source>
          , pages
          <fpage>521</fpage>
          -
          <lpage>528</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          , and T. Hughes, editors.
          <source>Proceedings of the 3rd Evaluation of Ontology-based tools (EON)</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>C. J. van Rijsbergen</surname>
          </string-name>
          .
          <source>Information retrieval. Butterworths, London (UK)</source>
          ,
          <year>1975</year>
          . http://www.dcs.gla.ac.uk/Keith/Preface.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>