=Paper=
{{Paper
|id=Vol-156/paper-5
|storemode=property
|title=Relaxed Precision and Recall for Ontology Matching
|pdfUrl=https://ceur-ws.org/Vol-156/paper5.pdf
|volume=Vol-156
|dblpUrl=https://dblp.org/rec/conf/kcap/EhrigE05
}}
==Relaxed Precision and Recall for Ontology Matching==
<pdf width="1500px">https://ceur-ws.org/Vol-156/paper5.pdf</pdf>
<pre>
         Relaxed Precision and Recall for Ontology Matching

                                    Marc Ehrig                                                      Jérôme Euzenat
                                  Institute AIFB                                                    INRIA Rhône-Alpes
                              University of Karlsruhe                                             655 avenue de l’Europe.
                               Karlsruhe, Germany                                                   Monbonnot, France
                      ehrig@aifb.uni-karlsruhe.de                                           Jerome.Euzenat@inrialpes.fr


ABSTRACT                                                                              the ontology matching task. Precision and recall are based
In order to evaluate the performance of ontology matching                             on the comparison of the resulting alignment with another
algorithms it is necessary to confront them with test ontolo-                         standard alignment, effectively comparing which correspon-
gies and to compare the results. The most prominent cri-                              dences are found and which are not. These criteria are well
teria are precision and recall originating from information                           understood and widely accepted.
retrieval. However, it can happen that an alignment be very
close to the expected result and another quite remote from                            However, as we have experienced in last year’s Ontology
it, and they both share the same precision and recall. This                           Alignment Contest [13], they have the drawback to be of the
is due to the inability of precision and recall to measure                            all-or-nothing kind. An alignment may be very close to the
the closeness of the results. To overcome this problem, we                            expected result and another quite remote from it and both re-
present a framework for generalizing precision and recall.                            turn the same precision and recall. The reason for this is that
This framework is instantiated by three different measures                            the criteria only compare two sets of correspondences with-
and we show in a motivating example that the proposed mea-                            out considering if these are close or remote to each other:
sures are prone to solve the problem of rigidity of classical                         if they are not the same exact correspondences, they score
precision and recall.                                                                 zero. They both score identically low, despite their differ-
                                                                                      ent quality. It may be helpful for users to know whether the
Categories and Subject Descriptors                                                    found alignments are close to the expected one and easily re-
D.2.12 [Software]: Interoperability; I.2.4 [Artificial Intel-                         pairable or not. It is thus necessary to measure the proximity
ligence]: Knowledge Representation Formalisms and Meth-                               between alignments instead of their strict equality.
ods; D.2.8 [Software Engineering]: Metrics
                                                                                      In this paper we investigate some measures that generalize
                                                                                      precision and recall in order to overcome the problems pre-
General Terms                                                                         sented above. We first provide the basic definitions of align-
Measurement, Performance, Experimentation                                             ments, precision and recall as well as a motivating example
                                                                                      (§2). We then present a framework for generalizing preci-
Keywords                                                                              sion and recall (§3). This framework is instantiated by four
Ontology alignment, evaluation measures, precision, recall                            different measures (including classical precision and recall)
                                                                                      (§4) and we show on the motivating example that the pro-
1.     INTRODUCTION                                                                   posed measures do not exhibit the rigidity of classical preci-
Ontology matching is an important problem for which many                              sion and recall (§5).
algorithms (e.g., PROMPT[11], GLUE[3], Ontrapro[1],
OLA[7], FOAM[4]) have been provided. In order to eval-                                2. FOUNDATIONS
uate the performance of these algorithms it is necessary to
confront them with test ontologies and to compare the re-                             2.1 Alignment
sults. The most prominent criteria are precision and re-                                  D EFINITION 1 (A LIGNMENT, CORRESPONDENCE ).
call originating from information retrieval and adapted to                            Given two ontologies O and O0 , an alignment between
                                                                                      O and O0 is a set of correspondences (i.e., 4-uples):
                                                                                      he, e0 , r, ni with e ∈ O and e0 ∈ O0 being the two matched
Permission to make digital or hard copies of all or part of this work for             entities, r being a relationship holding between e and
personal or classroom use is granted without fee provided that copies are             e0 , and n expressing the level of confidence [0..1] in this
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to       correspondence.
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
K-CAP’05, Workshop Integrating Ontologies, October 2, 2005, Banff, Al-
berta, Canada.
                                                                                      A matching algorithm returns an alignment A which is com-
                                                                                      pared with a reference alignment R.


                                                                                 25
Let us illustrate this through a simple example. Figure 1       two reasons: Neither do they discriminate between a totally
presents two ontologies together with two alignments A1         wrong and an almost correct alignment, nor do they measure
and R. In this example, for the sake of simplification, the     user effort to adapt the alignment.
relation is always ‘=’ and the confidence is always 1.0.
                                                                Indeed, it often makes sense to not only have a decision
The alignment A1 is defined as follows:                         whether a particular correspondence has been found or not,
                                                                but measure the proximity of the found alignments. This
<o1:Vehicle,o2:Thing,=,1.0>                                     implies that also “near misses” are taken into consideration
<o1:Car,o2:Porsche,=,1.0>                                       instead of only the exact matches.
<o1:hasSpeed,o2:hasProperty,=,1.0>
<o1:MotorKA1,o2:MarcsPorsche,=,1.0>
<o1:250kmh,o2:fast,=,1.0>                                       As a matter of example, it will be clear to anybody that
                                                                among the alignments presented above, A3 is not a very
                                                                good alignment and A1 and A2 are better alignments. How-
We present another reasonable alignment A2 :                    ever, they score almost exactly the same in terms of precision
                                                                (.2) and recall (.2).
<o1:Car,o2:Thing,=,1.0>
<o1:hasSpeed,o2:hasProperty,=,1.0>                              Moreover, the alignments will have to go through user
<o1:MotorKA1,o2:MarcsPorsche,=,1.0>
<o1:250kmh,o2:fast,=,1.0>
                                                                scrutiny and correction before being used. It is worth mea-
                                                                suring the effort required by the user for correcting the pro-
                                                                vided alignment instead of only if some correction is need-
and an obviously wrong alignment A3 :                           ing. This also calls for a relaxation of precision and recall.

<o1:Object,o2:Thing,=,1.0>                                      3.    GENERALIZING PRECISION AND RE-
<o1:Owner,o2:Volkswagen,=,1.0>
<o1:Boat,o2:Porsche,=,1.0>                                            CALL
<o1:hasOwner,o2:hasMotor,=,1.0>                                 Because precision and recall are well-known and easily ex-
<o1:Marc,o2:fast,=,1.0>                                         plained measures, it is good to adhere to them and extend
                                                                them. It also brings the benefit that measures derived from
Further, we have the following reference alignment (R):         precision and recall, such as f-measure, can still be com-
                                                                puted. For these reasons, we propose to generalize these
<o1:Object,o2:Thing,=,1.0>
                                                                measures.
<o1:Car,o2:Automobile,=,1.0>
<o1:Speed,o2:Characteristic,=,1.0>                              If we want to generalize precision and recall, we should be
<o1:250kmh,o2:fast,=,1.0>                                       able to measure the proximity of correspondence sets rather
<o1:PorscheKA123,o2:MarcsPorsche,=,1.0>
                                                                than their strict overlap. Instead of the taking the cardinal of
                                                                the intersection of the two sets (|R ∩ A|), we measure their
2.2   Precision and Recall                                      proximity (ω).
The usual approach for evaluating the returned alignments is
to consider them as sets of correspondences and check for
the overlap of the two sets. This is naturally obtained by         D EFINITION 3 (G ENERALIZED PRECISION AND RECALL ).
applying the classical measure of precision and recall [14],    Given a reference alignment R and an overlap function
which are the ratio of the number of true positive (|R ∩ A|)    ω between alignments, the precision of an alignment A is
and retrieved correspondences (|A|) or those to be retrieved    given by
(|R|), respectively.                                                                              ω(A, R)
                                                                                    Pω (A, R) =
                                                                                                    |A|
   D EFINITION 2 (P RECISION , R ECALL ). Given a refer-        and recall is given by
ence alignment R, the precision of some alignment A is
given by                                                                                          ω(A, R)
                                                                                   Rω (A, R) =            .
                                                                                                    |R|
                                 |R ∩ A|
                    P (A, R) =
                                   |A|                          3.1     Basic properties
and recall is given by                                          In order, for these new measures to be true generalizations,
                                                                we would like ω to share some properties with |R ∩ A|. In
                                 |R ∩ A|                        particular, the measure should be positive:
                   R(A, R) =             .
                                   |R|
                                                                         ∀A, B, ω(A, B) ≥ 0             (positiveness)
2.3   Problems with Current Measures                            and not exceeding the minimal size of both sets:
However, even if the above measurements are easily un-
derstandable and widespread, they are often criticized for            ∀A, B, ω(A, B) ≤ min(|A|, |B|)          (maximality)


                                                           26
                                 Ontology 1                                Ontology 2


                                                                                        Concept
                                                                                        Relation
                                                                                        Instance
                                                                                        Correct Alignment R
                                                                                        Found Alignment A1


                                              Figure 1: Two Aligned Ontologies


If we want to preserve precision and recall results, ω should       D EFINITION 4 (OVERLAP PROXIMITY ). A              measure
only add more flexibility to the usual precision and recall.      that would generalize precision and recall is:
So their values cannot be worse than the initial evaluation:                                    X
                                                                               ω(A, R) =                 σ(a, r)
      ∀A, B, ω(A, B) ≥ |A ∩ B|          (boundedness)                                        ha,ri∈M (A,R)

                                                                  in which M (A, R) is a matching between the correspon-
Hence, the main constraint faced by the proximity is the fol-     dences of A and R and σ(a, r) a proximity function between
lowing:                                                           two correspondences.
           |A ∩ R| ≤ ω(A, R) ≤ min(|A|, |R|)
                                                                  Again, the standard overlap |A ∩ R| used in precision and
This is indeed a true generalization because, |A∩R| satisfies     recall is such an overlap proximity.
all these properties. One more property satisfied by precision
                                                                  There are two tasks to fulfill when designing such an overlap
and recall that we will not enforce here is symmetry. This
                                                                  proximity function:
guarantees that the precision and recall measures are true
normalized similarities.
                                                                     – the first one consists of finding the correspondences to
      ∀A, B, ω(A, B) = ω(B, A)            (symmetry)                   be compared M .
We will not require symmetry, especially since A and R are           – the second one is to define a proximity measure on cor-
not in symmetrical positions.                                          respondences σ;

3.2   Designing Overlap Proximity                                 We consider these two issues below.
There are many different ways to design such a proximity
given two sets. We retain here the most obvious one which         3.3   Matching Correspondences
consists of finding correspondences matching each other and       A matching between alignments is a set of correspondence
computing the sum of their proximity. This can be defined         pairs, i.e., M (A, R) ⊆ A × R. However, if we want to keep
as an overlap proximity:                                          the analogy with precision and recall, it will be necessary to
                                                                  restrict ourselves to the matchings in which an entity from


                                                             27
the ontology does not appear twice. This is compatible with            σrel (ra , rr ): Often the alignment relations are more com-
precision and recall for two reasons: (i) in these measures,                 plex, e.g., represent subsumption, instantiation, or
any correspondence is identified only with itself, and (ii) ap-              compositions. Again, one has to assess the similarity
pearing more than once in the matching would not guarantee                   between these relations. The two relations of the align-
an overlap proximity below min(|A|, |R|) .                                   ment cell can be compared based on their distance in a
                                                                             conceptual neighborhood structure [6, 8].
              |A|!
There are (|A|−|R|)!  candidate matches (if |A| ≥ |R|). The
natural choice is to select the best match because this guar-          σconf (na , nr ): Finally, one has to decide, what to do with
antees that the function generalizes precision and recall.                  different levels of confidence. The similarity could
                                                                            simply be the difference. Unfortunately, none of the
                                                                            current alignment approaches have an explicit meaning
   D EFINITION 5 (B EST MATCH ). The      best     match                    attached to confidence values, which makes it rather
M (A, R) between two sets of correspondences A and R, is                    difficult in defining an adequate proximity.
the subset of A × R which maximizes the overall proximity
and in which each element of A (resp. R) belongs to only
one pair:                                                              Once these proximities are established, they have to be
                                                                       aggregated. The constraints on the aggregation function
                                                                       (Aggr) are:
           M (A, R) ∈ M axω(A,R) {M ⊆ A × R}
                                                                            – normalization preservation (if ∀i, 0 ≤ ci ≤ 1 then 0 ≤
As defined here, this best match may not be unique. This                      Aggri ci ≤ 1);
is not a problem, because we only want to find the highest                  – maximality (if ∀i, ci = 1 then Aggri ci = 1);
value for ω and any of the best matches will yield the same                 – local monotonicity (if ∀i 6= j, ci = c0i = c00j and cj ≤
value.                                                                        c0j ≤ c00j then Aggri ci ≤ Aggri c0i ≤ Aggri c00i ).

Of course, the definitions M and ω are dependent of each
other, but this does not prevent us from computing them.               Here, we consider aggregating them through multiplica-
They are usually computed together but it is better to present         tion without further justification. Other aggregations (e.g.,
them separately.                                                       weighted sum) are also possible.

3.4    Correspondence Proximity
In order to compute ω(A, R), we need to measure the prox-                D EFINITION 6 (C ORRESPONDENCE PROXIMITY ).
imity between two matched correspondences (i.e., ha, ri ∈              Given two correspondences hea , e0a , ra , na i and
M (A, R)) on the basis of how close the result is from the             her , e0r , rr , nr i, their proximity is:
ideal one. Each element in the tuple a = hea , e0a , ra , na i
will be compared with its counterpart in r = her , e0r , rr , nr i.                 σ(hea , e0a , ra , na i, her , e0r , rr , nr i) =
For any two correspondences (the found a and the reference
r) we compute three similarities σpair , σrel , and σconf . If           σpair (hea , er i, he0a , e0r i) × σrel (ra , rr ) × σconf (na , nr )
elements are identical, proximity has to be one (maximal-
ity). If they differ, proximity is lower, always according to
the chosen strategy. In contrast to the standard definition of         We have provided constraints and definitions for M , ω, and
similarity, the mentioned proximity measures do not neces-             σ. We now turn to concrete measures.
sarily have to be symmetric. We will only consider normal-
ized proximities, i.e., measures whose values are within the
unit interval [0..1], because this guarantees that                     4.     CONCRETE MEASURES
                                                                       We consider four cases of relaxed precision and recall mea-
                  ω(A, R) ≤ min(|A|, |R|)                              sures based on the above definitions. We first give the defi-
                                                                       nition of usual precision and recall within this framework.
The component proximity measure is defined in the follow-
ing way:                                                               4.1     Standard Precision and Recall
                                                                       For standard precision and recall, the value of ω is |A ∩ R|.
σpair (hea , er i, he0a , e0r i): How is one entity pair similar to    This is indeed an instance of this framework, if the proxim-
      another entity pair? In ontologies we can in principal           ity used is based on the strict equality of the components of
      follow any relation which exists (e.g., subsumption, in-         correspondences.
      stantiation), or which can be derived in a meaningful
      way. The most important parameters are the relations
      to follow and their effect on the proximity.                          D EFINITION 7      (E QUALITY PROXIMITY ). The equality


                                                                  28
proximity is charaterized by:                                                             found           correct        similarity   comment
                                                                                         relation         relation         σrel
                                                                                          e = e0           e = e0            1        correct relation
                                                   if hea , e0a i = her , e0r i
                                       
                                            1
 σpair (hea , e0a i, her , e0r i) =                                                       c = c0           c ⊂ c0           0.5       returns more instances than correct
                                            0      otherwise                              c = c0           c ⊃ c0           0.5       returns less instances than possible,
                                                                                                                                     but these are correct
                                            1      if ra = rr                             r = r0           r ⊂ r0           0.5
                   σrel (ra , rr ) =
                                            0      otherwise                              r = r0           r ⊃ r0           0.5
                                                                                         i = i0         i partOf i0        0.5
                                            1      if na = nr                             i = i0       i consistsOf i0      0.5
                σconf (na , nr ) =
                                            0      otherwise
                                                                                                     Table 2: Similarities based on Relations
4.2       Symmetric Proximity
The easiest way to relax precision and recall is to have some
distance δ on the elements in ontologies and to weight the                                D EFINITION 8 (S YMMETRIC PROXIMITY ). The sym-
proximity with the help of this distance: the higher the dis-                           metric proximity is characterized by:
tance between two entities in the matched correspondences,
the lower their proximity. This can be defined as:                                                  σpair (hea , e0a i, her , e0r i) as defined in Table 1
                                                                                                                  σrel (ra , rr ) as defined in Table 2
                       δ(ea , er ) ≤ δ(eb , er )
                 and δ(e0a , e0r ) ≤ δ(e0b , e0r )                                                            σconf (na , nr ) = 1 − |na − nr |.


          =⇒ σ(hea , e0a i, her , e0r i) ≥ σ(heb , e0b i, her , e0r i)                  4.3        Measuring Correction Effort
                                                                                        If users have to check and correct alignments, the quality of
As a simple example of such a symmetric similarity, we use                              alignment algorithms can be measured through the effort re-
a distance in which a class is at distance 0 of itself, at dis-                         quired for transforming the obtained alignment into the (cor-
tance 0.5 of its direct sub- and superclasses, and at a distance                        rect) reference one [2].
1 of any other class. This could be further refined by having
a similarity inversely proportional to the distance in the sub-                         This measure can be implemented as an edit distance [10]:
sumption tree. Likewise, this similarity may also be applied                            an edit distance defines a number of operations by which an
to properties and instances (through part-of relationships in                           object can be corrected (here the the operations on corre-
the latter case). The similarity between pairs is the comple-                           spondences authorized) and assigns a cost to each of these
ment of these similarities The result is displayed in Table 1.                          operations (here the effort required to identify and repair
We always mention the assumed alignment and the actual                                  some mistake). The cost of a sequence of operations is the
correct alignment.                                                                      sum of their cost and the distance between two objects is
                                                                                        the cost of the less costly sequence of operations that trans-
 found       closest correct   similarity       comment                                 form one object into the other one. The result can always
  e,e0             e,e0          σpair                                                  be normalized in function of the size of the largest object.
  e,e0             e,e0            1            correct correspondence                  Such a distance can be turned into a proximity by taking its
  c,c0         c,sup(c0 )         0.5           returns more specialized instances      complement with regard to 1.
  c,c0         sup(c),c0          0.5           returns more general instances
  c,c0         c,sub(c0 )         0.5           returns more general instances
  c,c0         sub(c),c0          0.5           returns more specialized instances      Table 3 provides such plausible weights. Usually classes
  r,r 0        r,sup(r0 )         0.5           returns more spec. relation instances   are organized in a taxonomy in which they have less direct
  r,r 0        sup(r),r0          0.5           returns more gen. relation instances    super- than subclasses. It is thus easier to correct a class to
  r,r 0        r,sub(r0 )         0.5           returns more gen. relation instances
  r,r 0        sub(r),r0          0.5           returns more spec. relation instances   (one of) its superclass than to one of its subclasses. As a con-
  i,i0        i,super(i0 )        0.5           returns a more restricted instance      sequence, the proximity is dissymmetric. Such a measure
  i,i0        super(i),i0         0.5           returns a too broad instance            should also add some effort when classes are not directly
  i,i0          i,sub(i0 )        0.5           returns a too broad instance            related, but this has not been considered here.
  i,i0          sub(i),i0         0.5           returns a more restricted instance

                                                                                        The edit distance between relations is relatively easy to de-
           Table 1: Similarities based on Entity Pairs                                  sign since, generally, changing from one relation to another
                                                                                        can be done with just one click. Thus, the relational similar-
                                                                                        ity equals 1 if the relations are the same and 0.5 otherwise.
Table 2 consider the proximity between relations. It only
presents the similarity between equality (=) and other rela-                            In this correction effort measure, the confidence factor does
tions.                                                                                  not play an important role: ordering the correspondences can
                                                                                        only help the user to know that after some point she will have
For the confidence distance we simply take the complement                               to discard many correspondences. We thus decided to not
of the difference. The final precision is calculated according                          take confidence into account and thus, their proximity will
to the formula presented in the previous section:                                       always be 1.


                                                                                  29
 found     closest correct   effort   similarity   comment                            found      closest correct    similarity    comment
  e,e0           e,e0                   σpair                                          e,e0           e,e0            σpair
  e,e0           e,e0          0          1        correct alignment                   e,e0           e,e0              1         correct correspondence
  c,c0       c,sup(c0 )       0.4        0.6       returns more spec. instances        c,c0        c,sup(c0 )           1         returns more specialized instances,
  c,c0       sup(c),c0        0.4        0.6       returns more gen. instances                                                    these are correct
  c,c0       c,sub(c0 )       0.6        0.4       returns more gen. instances         c,c0        sup(c),c0           0.5        returns more general instances,
  c,c0       sub(c),c0        0.6        0.4       returns more spec. instances                                                   includes some correct results
  r,r 0      r,sup(r0 )       0.4        0.6                                           c,c0        c,sub(c0 )          0.5        returns more general instances,
  r,r 0      sup(r),r0        0.4        0.6                                                                                      includes some correct results
  r,r 0      r,sub(r0 )       0.6        0.4                                           c,c0        sub(c),c0            1         returns more specialized instances,
  r,r 0      sub(r),r0        0.6        0.4                                                                                      these are correct
  i,i0      i,super(i0 )      0.4        0.6       returns a more restricted inst.     r,r 0       r,sup(r 0 )          1
  i,i0      super(i),i0       0.4        0.6       returns a too broad inst.           r,r 0       sup(r),r0           0.5
  i,i0        i,sub(i0 )      0.6        0.4       returns a too broad inst.           r,r 0       r,sub(r 0 )         0.5
  i,i0        sub(i),i0       0.6        0.4       returns a more restricted inst.     r,r 0       sub(r),r0            1
                                                                                       i,i0       i,super(i0 )         0.5        returns a more restricted instance
                                                                                       i,i0       super(i),i0           0         returns a too broad instance
 Table 3: Effort-based proximity between Entity Pairs                                  i,i0         i,sub(i0 )          0         returns a too broad instance
                                                                                       i,i0         sub(i),i0          0.5        returns a more restricted instance

   D EFINITION 9 (E FFORT- BASED PROXIMITY ). The
                                                                                     Table 4: Similarities for Relaxed Precision based on En-
effort-based proximity is charaterized by:
                                                                                     tity Pairs
 σpair (hea , e0a i, her , e0r i) as defined in Table 3
                                                                                      found          correct        similarity    comment
                                          1 if ra = rr                                relation        relation         σrel
                σrel (ra , rr ) =
                                        0.5 otherwise                                  e = e0          e = e0            1         correct relation
                                                                                      c = c0          c ⊂ c0           0.5        returns more instances than correct
                                        1 if na 6= 0 and nr 6= 0                       c = c0          c ⊃ c0            1         returns less instances than possible,
           σconf (na , nr ) =
                                        0 otherwise                                                                                but these are correct
                                                                                      r = r0          r ⊂ r0             0.5
                                                                                      r = r0          r ⊃ r0              1
                                                                                      i = i0        i partOf i0          0.5
To be accurate, such an effort proximity would have been                              i = i0      i consistsOf i0         1
better aggregated with an additive and normalized aggrega-
tion function rather than multiplication.
                                                                                     Table 5: Similarities for Relaxed Precision based on Re-
4.4       Precision- and Recall-oriented Measures                                    lations
One can also decide to use two different similarities depend-
ing on their application for evaluating either precision or                          4.4.2       Relaxed Recall
recall. We here provide two such measures and justify the                            In Table 6 and 7 we present the recall similarity for pairs
given weights. Precision is normally a measure of accuracy                           and relations. Basically many distances are just mirrored
i.e., the returned results need to be correct. Every wrong re-                       compared to the precision case.
sult will therefore entail a penalty. We assume the user poses
a query to the system as follows: “return me all instances of                         found      closest correct    similarity    comment
e”. The system then returns any instance corresponding to                              e,e0           e,e0            σpair
                                                                                       e,e0           e,e0              1         correct correspondence
the alignment i.e. e0 . Vice versa, for the relaxed recall we                          c,c0        c,sup(c0 )          0.5        returns more specialized instances,
want to avoid missing any correct result. This affects the                                                                        misses some
similarity relations and weights.                                                      c,c0        sup(c),c0            1         returns more general instances,
                                                                                                                                  includes the correct results
                                                                                       c,c0        c,sub(c0 )           1         returns more general instances,
4.4.1      Relaxed Precision                                                                                                      includes the correct results
In Table 4 and 5 we present the precision similarity for pairs                         c,c0        sub(c),c0           0.5        returns more specialized instances,
and relations. The comments in each line explain the deci-                                                                        misses some
                                                                                       r,r 0       r,sup(r 0 )         0.5
sion for the weights.                                                                  r,r 0       sup(r),r0            1
                                                                                       r,r 0       r,sub(r 0 )          1
For the distance within the confidence we again use the com-                           r,r 0       sub(r),r0           0.5
plement of the difference.                                                             i,i0       i,super(i0 )          0         returns a more restricted instance,
                                                                                                                                  misses correct
                                                                                       i,i0       super(i),i0          0.5        returns a broader instance
                                                                                       i,i0        i,sub(i0 )          0.5        returns a broader instance
  D EFINITION 10 (P RECISION - ORIENTED PROXIMITY ).                                   i,i0        sub(i),i0            0         returns a more restricted instance,
The precision-recall oriented proximity is characterized by:                                                                      misses correct

          σpair (hea , e0a i, her , e0r i) as defined in Table 4
                                                                                     Table 6: Similarities for Relaxed Recall based on Entity
                         σrel (ra , rr ) as defined in Table 5
                                                                                     Pairs
                    σconf (na , nr ) = 1 − |na − nr |.


                                                                            30
  found           correct        similarity   comment                                 These results are based on only one example. They have to
 relation         relation         σrel
  e = e0           e = e0            0        correct relation                        be systematized in order to be extensively validated. Our
  c = c0           c ⊂ c0            0        returns more instances than correct     goal is to implement these measures within the Alignment
  c = c0           c ⊃ c0           0.5       returns less instances than possible,   API and to use them on the forthcoming results of the On-
                                              misses some                             tology Alignment Evaluation 20051 in order to have real
  r = r0           r ⊂ r0            0
  r = r0           r ⊃ r0           0.5                                               data on which the relevance of the proposed measures can
  i = i0         i partOf i0         0                                                be more openly debated.
  i = i0       i consistsOf i0      0.5

                                                                                      6. RELATED WORK
Table 7: Similarities for Relaxed Recall based on Rela-                               The naturally relevant work is [2] which has considered pre-
tions                                                                                 cisely the evaluation of schema matching. However, the au-
                                                                                      thors only note the other mentioned problem (having two
The final recall is computed as usual:                                                measures instead of one) and use classical aggregation (over-
                                                                                      all and F-measure) of precision and recall.

  D EFINITION 11 (R ECALL - ORIENTED PROXIMITY ).                                     In computational linguistics, and more precisely multilin-
The recall-oriented proximity is characterized by:                                    gual text alignment, [9] has considered extending precision
            σpair (hea , e0a i, her , e0r i) as defined in Table 6                    and recall. Their goal is the same as ours: increasing the
                                                                                      discriminating power of the measures. In this work, the
                           σrel (ra , rr ) as defined in Table 7                      mathematical formulation is not changed but the granularity
                      σconf (na , nr ) = 1 − |na − nr |.                              of compared sets changes: instead of comparing sentences
                                                                                      in a text, they compare words in sentences in a text. This
5. EXAMPLE                                                                            helps having some contribution to the measures when most
In the introduction of this paper we have presented a pair of                         of the words are correctly aligned while the sentences are
ontologies, the reference alignment, and three different iden-                        not strictly aligned.
tified alignments. We will now apply the different proposed
precision and recall measures to these example alignments.                            In the Alignment API [5], there is another evaluation mea-
Please note that they mainly illustrate entity pair similarities,                     sure which directly computes a distance based on a weighted
as relations and confidences are always identical. Table 8                            symmetric difference (weights are the confidences of each
provides the results. For the oriented measure we assume                              correspondence in the alignment). This measure could be
that the query is given in ontology 1 and the answer has to                           used in the generalization proposed here (the distance would
be retrieved in ontology 2. As the oriented measure is dis-                           then be based on confidence difference and would generally
symmetric, one has to define this direction beforehand.                               satisfy P 0 (A, R) ≤ P (A, R) and R0 (A, R) ≤ R(A, R).

  ω                 (R, R)         (R, A1 )       (R, A2 )       (R, A3 )             The deeper proposal for extending precision and recall
                    P    R        P      R       P       R        P   R
                                                                                      comes from hierarchical text categorization in which texts
  standard         1.0 1.0       0.2    0.2     0.25    0.2      0.2 0.2
  symmetric        1.0 1.0       0.4    0.4    0.375    0.3      0.2 0.2              are attached to some category in a taxonomy [12]. Usually,
  edit             1.0 1.0       0.44 0.44      0.35   0.28      0.2 0.2              texts are attached to the leaves, but when algorithms attach
  oriented         1.0 1.0       0.5    0.5    0.375    0.4      0.2 0.2              them to the intermediate categories, it is useful to discrimi-
                                                                                      nate between a category which is irrelevant and a category
Table 8: Precision recall result on the alignments of Fig-                            which is an immediate super category of the expected one.
ure 1                                                                                 For that purpose, they introduce an extension of precision
                                                                                      (recall is redefined similarly) such that:
The measures which have been introduced address the prob-
lems raised in the introduction and fulfill the requirements:                                            max(0, |A ∩ R| + F pCon + F nCon)
                                                                                               PCS =
                                                                                                                    |A| + F nCon
   – They keep precision and recall untouched for the best                            in which F pCon (resp. F nCon) is the contribution to false
     alignment (R);                                                                   positive (resp. false negative), i.e., the way incorrectly clas-
   – They help discriminating between irrelevant align-                               sified documents could contribute to its incorrect category
     ments (A3 ) and not far from target ones (A1 and A2 );                           anyway. The maximization is necessary to prevent the result
   – Specialized measures are able to emphasize some char-                            from being negative (because the contribution is defined with
     acteristics of alignments: ease of modification, correct-                        respect to the average such contribution). The contribution
     ness or completeness. For instance, let’s consider the                           is measured in two ways. The first one is a category similar-
     oriented measures. In our example A1 has two very                                ity that is computed on the features of categories (categories
     near misses, which leads to a relatively high preci-                             and documents are represented by a vector of features and
     sion. In A2 however the miss is bigger, but by aligning                          the membership to some category is based on a distance be-
     one concept to its superconcept recall rises relatively
                                                                                      1
     to precision.                                                                        http://oaei.inrialpes.fr/2005/


                                                                              31
tween these vectors). The second one is based on the dis-          8.   REFERENCES
tance between categories in the taxonomy.                           [1] B. Ashpole. Ontology translation protocol (ontrapro).
                                                                        In E. Messina and A. Meystel, editors, Proceedings of
This measure does not seem to be a generalization of stan-              Performance Metrics for Intelligent Systems (PerMIS
dard precision and recall as the one presented here. In partic-         ’04), Gaithersburg, MD, USA, August 2004.
ular, because the contributions can be negative, this measure
can be lower than standard precision and recall. The idea           [2] H.-H. Do, S. Melnik, and E. Rahm. Comparison of
of retracting the contribution from wrongly classified docu-            schema matching evaluations. In Proc. GI-Workshop
ments is not far from the idea developed here. However, the             ”Web and Databases”, Erfurt (DE), 2002.
computation of this contribution with regard to some aver-              http://dol.uni-leipzig.de/pub/2002-28.
age and the addition of some contribution to the divisor do         [3] A.-H. Doan, J. Madhavan, R. Dhamankar,
not seem justified.                                                     P. Domingos, and A. Halevy. Learning to map
                                                                        ontologies on the semantic web. VLDB journal, 2003.
7.     DISCUSSION
Evaluation of matching results is often made on the basis           [4] M. Ehrig and S. Staab. QOM - quick ontology
of the well-known and well-understood precision and recall              mapping. In Proc. 3rd ISWC, Hiroshima (JP),
measures. However, these measures do not discriminate ac-               November 2004.
curately between methods which do not provide the exact
results. In the context where the result of alignments have to      [5] J. Euzenat. An API for ontology alignment. In Proc.
be screened by humans, this is an important need.                       3rd international semantic web conference, Hiroshima
                                                                        (JP), pages 698–712, 2004.
We have proposed a framework for generalizing preci-                [6] J. Euzenat, N. Layaı̈da, and V. Dias. A semantic
sion and recall when comparing ontology alignments. It                  framework for multimedia document adaptation. In
keeps the advantages of usual precision and recall but helps            Proc. 18th International Joint Conference on Artificial
discriminating between alignments by identifying for near               Intelligence (IJCAI), Acapulco (MX), pages 31–36,
misses instead of completely wrong correspondences.                     2003.
The framework has been instantiated in three different mea-         [7] J. Euzenat and P. Valtchev. Similarity-based ontology
sures, each one aiming at favoring some particular aspects              alignment in OWL-lite. In Proc. 15th ECAI, pages
of alignment utility. We show that these measures indeed                333–337, Valencia (ES), 2004.
avoid the shortcomings of standard evaluation criteria. They
should however, be further investigated in order to find bet-       [8] C. Freksa. Temporal reasoning based on
ter formulations: more discrepancy needs to be considered,              semi-intervals. Artificial Intelligence,
more progressive distance (e.g., not direct subclasses) and             54(1–2):199–227, 1992.
rationalized design of weights.                                     [9] P. Langlais, J. Véronis, and M. Simard. Methods and
                                                                        practical issues in evaluating alignment techniques. In
This generalization framework is not the only possible one              Proc. 17th international conference on Computational
since we have made a number of choices:                                 linguistics, Montréal (CA), pages 711–717, 1998.

     – on the form of the alignment similarity (Definition 4);     [10] I. V. Levenshtein. Binary codes capable of correcting
     – on the kind of alignment matching (Definition 5);                deletions, insertions, and reversals. Cybernetics and
     – on the form of the correspondence similarity (Defini-            Control Theory, 1966.
       tion 6).                                                    [11] N. Noy and M. Musen. Smart: Automated support for
                                                                        ontology merging and alignment, 1999.
More work has to be done in order to assess the potential of
other choices in these functions.                                  [12] A. Sun and E.-P. Lin. Hierarchical text classification
                                                                        and evaluation. In Proc. IEEE international
The most important work is to consider these proposed mea-              conference on data mining, pages 521–528, 2001.
sures in real evaluation of alignment systems and to identify
                                                                   [13] Y. Sure, O. Corcho, J. Euzenat, and T. Hughes, editors.
good measures for further evaluations. We plan to imple-
                                                                        Proceedings of the 3rd Evaluation of Ontology-based
ment these measures within the Alignment API [5] and pro-
                                                                        tools (EON), 2004.
cess the results of the Ontology Alignment Evaluation 2005.
                                                                   [14] C. J. van Rijsbergen. Information retrieval.
Acknowledgements                                                        Butterworths, London (UK), 1975.
This work has been partially supported by the Knowledge                 http://www.dcs.gla.ac.uk/Keith/Preface.html.
Web European network of excellence (IST-2004-507482).
The authors would like to thank Diana Maynard who pointed
out the problem addressed here.


                                                              32

</pre>