Ontology Alignment Through Instance Negotiation:
         a Machine Learning Approach.
                     Ignazio Palmisano† , Luigi Iannone† , Domenico Redavid∗ , Giovanni Semeraro∗
                 ∗ Dipartimento di Informatica, Università degli Studi di Bari, Via Orabona 4, 70125 Bari, Italy

                                               Email: {redavid, semeraro}@di.uniba.it
       † Computer Science Department, Liverpool University, Ashton Building, Ashton Street, L69 BX Liverpool, UK

                                                 Email: {ignazio, luigi}@csc.liv.ac.uk


   Abstract— The Semantic Web needs methodologies to accom-            other factors, e.g. warranties, comfort, support, which may
plish actual commitment on shared ontologies among different           turn out to be only marginal for the suppliers, unexpressed
actors in play. In this paper, we propose a machine learning           or simply difficult to acquire.
approach to solve this issue relying on classified instance exchange
and inductive reasoning. This approach is based on the idea that,         In such cases, partially overlapping visions of the same
whenever two (or more) software entities need to align their           world should be integrated in value chains in order to provide
ontologies (which amounts, from the point of view of each entity,      goods. However, The wide range of possible B2B scenar-
to add one or more new concept definitions to its own ontology),       ios hinders a centralized approach to ontology sharing for
it is possible to learn the new concept definitions starting from      semantic convergence, which contrasts with the architectural
shared individuals (i.e. individuals already described in terms
of both ontologies, for which the entities have statements about       paradigm of the SW. Indeed, such a scenario requires flexi-
classes and related properties); these individuals, arranged in two    bility, since it basically builds up open situations where the
sets of positive and negative examples for the target definition,      interacting actors cannot be precisely individuated in advance.
are used to solve a learning problem which as solution gives           Hence, a central ontology should foresee a high number of
the definition of the target concept in terms of the ontology          particular situations, and therefore it would be detailed enough,
used for the learning process. The method has been applied
in a preliminary prototype for a small multi-agent scenario            and therefore useful, only for restricted cases and toy domains.
(where the two entities cited before are instantiated as two           Moreover, single application ontologies are supposed to vary
software agents). Following the prototype presentation, we report      in time, an issue that is not easily manageable in centralized
on the experimental results we obtained and then draw some             approaches.
conclusions.                                                              We agree on the fact that semantic convergence has to
                                                                       be addressed at the ontology level, but we argue that the
                        I. M OTIVATION
                                                                       approaches for solving such issues should fulfil the following
   The Semantic Web (SW), being an evolution of the World              properties to be effectively applied in the mentioned scenarios:
Wide Web, will inherit Web decentralized architecture. Decen-             • they have to be automatizable, so that applications could
tralization, in its turn, was one of the success factors of the              integrate among each other without human intervention
Internet, granting its structure with scalability, no single point           (or with as little human intervention as possible);
of failures, and so on. However, in order to implement the SW             • they have to be flexible. It should be possible to apply
scenario envisioned in [1], semantic convergence is crucial.                 the same convergence schema to any domain on which
This point has been always identified in the employment of                   the actors have to communicate their own (partially
ontologies that, even before the rise of SW, were considered                 overlapping) conceptualizations;
as ”shared formalizations of conceptualizations” [2].                     • they have to be on-demand and limited in scope. The aim
   In the SW vision, different actors that want to take ad-                  is not to fully map the actors conceptualization, which is
vantage from interoperating should be able to converge onto                  often impossible or even useless, but to reconciliate only
shared ontologies in order to communicate. Such a problem                    those parts that are necessary for the dialogue.
turned out to be crucial and very difficult to work out. Indeed,          Developing platforms enabling SW systems to communicate
each party involved has its peculiar view of the domain it             with each other, without committing to a superimposed ontol-
shares with other parties. Very often, different applications are      ogy, ensures a very loose coupling among them which allows
interested in different aspects of the same world state, e.g. in       for independent evolution and maintaining of the applications.
a typical B2B scenario in which suppliers and final customers          In our opinion, other SW technologies spanning from Web
usually are interested into different aspects of the goods that        Service discovery, orchestration, and choreography, up to
are dealt with. Both quantitative and, more likely, qualitative        software agents immersed in the SW, could profit from the
concepts often turn out to be fuzzy depending on those actors          solutions that research will find to these issues.
interested in assessing them to interoperate. Suppliers may               In this paper, we suggest a Machine Learning approach
likely consider features like materials, provenance, standard          to accomplish semantic integration, taking into account the
processes, whereas customer satisfaction may depend also on            requirements listed above. The remainder of the paper is or-
ganized as follows: the next section will briefly summarize the      merging. It starts from initial mappings among two entities
state of the art (to our knowledge) on these problems. Sect. III     (provided either by users or by a automatic process based
illustrates our approach to Ontology Alignment together with         on linguistic similarity) and generates suggestions for further
a proposed algorithm. Sect. IV explains in detail the Machine        mappings between other entities. After the user triggers one
Learning techniques used to implement the algorithm, while           among the suggestions proposed (or performs some changes
Sect. V presents the implemented prototype employed to carry         on the resulting ontology), iP ROMPT applies the required
out a preliminary empirical evaluation reported in this section.     changes, computes the possible cases of inconsistency trying
Finally, in Sect. VI, some conclusions are drawn outlining           to suggest fixes for those, and produces new suggestions for
future work directions.                                              further mappings.
                      II. R ELATED W ORK                             A NCHOR P ROMPT, on the contrary, is an ontology mapping
                                                                     tool. It can be used for supporting iP ROMPT in order to
   In this section, we present a short discussion of some recent     individuate new related concepts during the iterations. Unlike
relevant research describing different approaches to ontology        iP ROMPT, it exploits also non-local contexts for the concept
alignment. Ontology Alignment is a broad umbrella for a              pairs whose similarity has to be assessed. Indeed, it repre-
variety of subtask that try to solve some of the issues pointed      sents ontologies as graphs and compares the paths in which
out in the previous section. In order to keep the subject simple     concepts are involved, both within the taxonomy and in slot
and general, we might define alignment as: any process that          dependencies and domain/range constraints.
enables communication among two or more systems, adopting
two or more different ontologies. Following Noy [3], there           C. APFEL.
are (at least) two types of alignment processes, classified             APFEL is really a whole framework rather than a single
according to their output:                                           alignment technique. In [7], Ehrig et al. propose a very
   • Ontology Merging: in which the outcome of the align-            interesting classification of alignment strategies dividing them
      ment of n ontologies is a single new one, embedding the        into manually predefined automatic methods and learning
      knowledge coming from the n sources                            from instance representations methods. The former are very
   • Ontology Mapping: in which the results of the alignment         complex to design, since all the possible peculiarities of the
      among n different ontologies consists of the starting ones     domains they will be applied must be considered; the latter,
      plus the mapping between similar entities of the input         on the contrary, can be used only in presence of instances (and
      sources.                                                       sometimes this is not the case) though they show more flexibil-
   In the following, we briefly discuss a non-exhaustive list of     ity as the underlying ontology domain changes. The proposed
some remarkable research approaches pursuing such alignment          framework (PAM - Parameterizable Alignment Methodology)
strategies.                                                          is fully parameterizable to the extent of being able to reproduce
A. GLUE.                                                             the strategy used in other methodologies by just adjusting
                                                                     some parameters. APFEL tries to learn from pre-validated
   GLUE [4], [5] exploits Machine Learning techniques to
                                                                     example mappings the right parameters that would instantiate
find semantic mappings between concepts stored in distinct
                                                                     the fittest PAM for the particular learning problem. Such
and autonomous ontologies. Basically, its strategy falls into
                                                                     parameters are:
the Ontology Merging category as its output is a mediator
                                                                        • Features, i.e. the smallest set of features on which to
ontology (or database schema in its earlier version). The
process relies on a initial manual bootstrapping phase, in                 compute similarity among entities
                                                                        • Selection criteria for the next pair of entities to compare
which user provides some sample mappings between some
                                                                        • Similarity measures, aggregation, and interpretation
entities within the ontologies to be merged. GLUE then,
tries to infer a set of rules (or classifiers) that can synthesize         strategies
                                                                        • Iteration, that is the extent to which neighbor entity pairs
the mapping strategy followed by the users for the initial
mappings. GLUE then, applies what it learnt to find other                  are influenced by the current pair (whose similarity has
semantic mappings. One of its strong points is the possibility             been evaluated).
of using any kind of probabilistic similarity measure (measure                III. O NTOLOGY A LIGNMENT A LGORITHM
neutrality) and, furthermore, also the weighting of the single
similarity assessments is learnt by the system. In [4] the              According to Noy’s classification cited in the previous
authors underline that there are many kinds of mappings that         section, the approach described in this work falls into the
can be found, comparing entities from different ontologies,          ontology mapping methodology. Indeed, without loss of gen-
however GLUE focuses on finding 1-1 mappings between                 erality, we assume to work on two ontologies dealing with
concepts belonging to two taxonomies.                                the same domain. Besides, we also assume that some of the
                                                                     concepts within the input ontologies may partially overlap
B. PROMPT Suite.                                                     in their semantics. We intentionally will not define formally
   The P ROMPT Suite by Stanford Medical Informatics [6],            what this semantic overlap means, due to the large variety of
[3] provides two tools for alignment: iP ROMPT and A NCHOR -         practical situations in which this may happen in real-world
P ROMPT. iP ROMPT is an interactive tool performing ontology         applications. We might have concepts defined with different
names, structures, etc. that share exactly the same meaning.        that does not appear in OB . Then, in our scenario B could
On the other hand, we can have concepts that, though sharing        ask A to provide some instances (examples) of a : C and,
their name, might be different in their actual meaning, and it      possibly, also some counterexamples (i.e.: instances of the
could be necessary that an application understands what its         complement class of a : C). If such examples occur as
peer exactly means for such a concept, possibly in terms of         instances within OB , then B, by means of a Concept Learning
its own ontological primitives.                                     algorithm, can infer a definition of a : C in terms of the
   A couple of examples can better explain these situations.        descriptions of those instances contained in OB . In this way
The former case can be exemplified as follows. Suppose that         B obtains a definition of a : C using its own terminology,
there are two applications dealing with two ontologies about        that is, in other words, B understands a : C according
cars, differing just for the language: one uses Italian names       to what it knows (its ontology). In the following section
and the other English names. The concept of, say, Wheel is          (Sect. IV) the particular Concept Learning algorithm employed
equivalent to the concept Ruota in the Italian version of the car   is described; however, for the purposes of this section the
ontology. The latter case, instead, may happen when you have        particular algorithm is irrelevant since the only requirement is
two ontologies describing a domain from two different per-          that it solves a generic Concept Learning problem, described
spectives. Suppose that the ontology FoodRestaurant deals           formally in the following definition:
with food from the typical point of view of a restaurant and           Concept Learning problem
that the ontology FoodNutritionist deals with the same subject      Given a knowledge base K and a set of instances divided into
from the side of a dietician. The concept of HighQualityFood        examples and counterexamples E = E + ∪ E − and described
depends on different aspects of food according to the adopted       in K
viewpoint. Indeed, when defining this concept, a dietician,         Learn the definition of a concept C such that all elements of
would care of nutritional values of the ingredients, whereas a      E + none of E − are instances of C.
chef would take into account the provenance of the ingredients,        The result of the learning problem can be further revised in
their rarity and so on. In such case, we have two legal             the following way. B identifies a suitable subset of instances
HighQualityFood concepts; an agent for dinner arrangements          (different from those provided by A as a training set), classifies
dealing with restaurants and people could be more effective if      them as belonging or not to the learnt concept and passes them
it knows both senses.                                               to A asking the peer to do the same. If A and B disagree
   The intuition behind our approach is that a useful mapping       on the classification of some individuals, it will be the case
is the one in which at least one of the two parties can             that B refines properly the learnt definition in order to fix the
understand what the other means in terms of its own ontology.       discrepancy with the classification provided by A.
Indeed, if an application is to use information coming from            After learning a definition of a : C, let us call it b : C
another system, it should be able to frame this information         supposing there are no clashes with pre-existing names, B
in its own knowledge model, hence, in our opinion, it has to        has to compare it with the concepts in its OB , evaluating
grasp some meaning using primitive concepts and properties          the similarity of b : C to pre-existing concepts. This can
from its own ontology. Though this may seem straightforward,        accomplished using suitable similarity measures such as the
many approaches limit themselves to find a correspondence           one proposed in [9] that score the semantic affinity/diversity of
between equivalent (or very similar) entities, provided that        a pair of concepts within an ontology. If the similarity between
such an equivalence exists. Yet such approaches disregard the       b : C and another concept in OB exceeds a given threshold
importance of maintaining different definitions for the same        or, better, if the equivalence can be proved by a reasoner, then
entity (the viewpoints discussed in the previous example) and       an equivalence axiom can be added, otherwise the definition
of having one’s own definitions, in order to communicate with       of b : C is left unchanged within OB .
different parties.                                                     Careful readers should have already found out that there
   Therefore, a way is needed for allowing a system to build        is an assumption on which the whole process relies on. The
a representation of a concept belonging to another ontology         assumption is that there exists a subset of individual URIs in
using its own terminology. We argue that the inductive rea-         common between OA and OB . This corresponds to the initial
soning techniques of Machine Learning are a suitable way            knowledge that is provided in other systems during the boot-
to accomplish this objective. In particular, we employ the          strapping phase. To cite only the related work, GLUE needs
Concept Learning paradigm [8] for treating this problem.            a complete example of mapping to start the learning phase,
Indeed, with Concept Learning aims at inducing an intensional       whereas iP ROMPT relies on initial suggestions on linguistic
descriptions of concepts starting from its examples and (pos-       similarity, assuming that concept names are expressed in their
sibly) counterexamples.                                             natural language form (or very close to it). A NCHOR P ROMPT,
   In particular, we intend to implement the following scenario.    on the contrary, exploits the graph structure assuming that,
Suppose there are two applications A and B with their               for ontologies partially overlapping on the same domain, their
respective ontologies OA and OB and suppose that application        graph structure should be similar. Our assumption seems at
A wants to communicate about some concept, say a : C,               least as reasonable as these, also because it can be relaxed
with B (we suppose to indicate concepts with the usual URIs,        thinking that, instead of having some identical individual URIs
i.e. namespace:conceptLocalName) regarding a concept                across ontologies, we have mechanisms to map some URIs
into some others (either manually or automatically).                            We define the function ρ : S → 2S such that ρ(C) ⊆ {E ∈
   Consider, for instance, the ontologies about paper indexing                  2S | E v C}.
systems, e.g. DBLP and the ACM Digital Library, and suppose                        upward refinement operator - δ Let (S, ) be a search space
there exist their OWL (or RDF) 1 versions. Each system has                      of concept descriptions C ordered by subsumption relation v
automatic strategies for generating identifiers and there is a                  we define the function δ : S → 2S such that δ(C) ⊆ {E |
large subset of research works that are indexed by both of                      E ∈ 2S ∧ C v E}.
them. It could be easy then to provide URIs translator for                         Notice that it does not hold in general that:
the two systems. Actually, ACM and DBLP bibliographic
information are used in one of the test cases we devised for                             E v C → E ∈ ρ(C) or C v E → E ∈ δ(C)
our system; more details in Sect. V-C.
                                                                                   Refinement has been thoroughly studied in literature (see
     IV. M ACHINE L EARNING IN D ESCRIPTION L OGIC                              [11], [12]) also specifically for Description Logic knowledge
                                                                                representations [13]. For the sake of brevity we will not
   In this section, we describe in more details the learning                    expose here all the theoretical framework for refinement. It
algorithm we developed for solving a Concept Learning Prob-                     suffices to saying that, unlike many other approaches, we try
lem in Description Logic. In particular, we focus on concept                    to use examples and counterexamples of the concept to be
descriptions in the ALC[10] Description Logic as it is the                      learnt to bias the search and, hence, the refinement strategies,
least expressive subset of OWL-DL allowing for disjunction                      rather than using them just for testing intermediate candidate
and ensuring a reasonable expressiveness without features (like                 refinements. Indeed, during learning, as the algorithm searches
role hierarchies) that would make learning intractable.                         for a solution of the problem, it may find various intermediate
   First we cast the generic Concept Learning Problem re-                       hypotheses that satisfy the required constraints3 but not all
ported in Def. III to the Description Logic setting.                            of them. Hence refinement operators must be applied to fulfil
   learning/tuning problem In a search space of concept defi-                   such constraints. The problem is that at each refinement step
nitions (S, v) ordered by subsumption                                           there are many alternative directions that can be chosen.
Given a knowledge base K = hT , Ai and a set of positive and                    Examples come into play in this choice, guiding the refinement
                                   −
negative assertions AC = A+  C ∪AC regarding the membership                     process towards selecting the most promising candidates, thus
(or non-membership) of some individuals to a concept C such                     pruning the search space and avoiding the learner to backtrack
that: A 6|=T AC                                                                 for the sake of efficiency.
Find a new T-box2 T 0 = (T \ {C ≡ D}) ∪ {C ≡ D0 } such                             Our solution to the learning problem is the algorithm
that: A |=T 0 AC                                                                implemented in the learning system named Y IN YANG that is
   A Concept Tuning (or Revision) Problem is similar but for                    an evolution of the system described in [14]. In Y IN YANG ex-
the fact that learner already knows an incorrect definition for                 amples are not processed at the individual description level but,
C which has to be revised. This means that there are some                       for each of them, a concept level representative is computed.
assertions within AC that are not logical consequences of the                   This is very frequent in Machine Learning when the example
knowledge base and the current definition of C. Such incorrect                  and hypothesis languages do not coincide, and is called Single
definition can be incomplete, i.e.: there are some individuals                  Representation Trick. Concept level representatives should be
that are said to belong to C in AC but this cannot be entailed,                 as much adherent as possible to the examples they stand for.
and/or it can be incorrect, meaning that there are individuals                  In Description Logic there exist a non standard inference
that are deduced to belong to C from the definition of C,                       called Most Specific Concept (denoted msc) that fulfills
while they actually don’t belong to it.                                         such a requirement but, unfortunately, it needs not exist for
   In general, Concept Learning relies on the intuition that                    many Description Logics (such as ALC). Hence it will be
the solution of any problem can be found traversing a proper                    approximated provided that it has to be possible to distinguish
search space.                                                                   among concept level representative of positive and negative
   A learning algorithm, then, is cast as a search that moves                   examples. In other words it cannot exist a unique concept
across such a space with two kinds of possible movements,                       level representative for a positive and a negative example.
called refinements:                                                                The algorithm starts choosing a seed that is a concept level
   • Specific towards general (generalization)                                  representative of an example. Then, it keeps generalizing it
   • General towards specific (specialization)                                  until there are residual positive to be covered or until the gen-
   We provide here the formal (though generic) definitions for                  eralization covers some negative example (overgeneralization).
both kinds of refinement.                                                       In the former case, the algorithm exits returning a complete
   downward refinement operator - ρ Let (S, ) be a search                      and correct generalization. In the latter, specialization is neces-
space of concept descriptions C ordered by subsumption v.                       sary in order to correct the overly general hypothesis (concept
                                                                                definition).
  1 http://www.w3.org/2004/OWL/, http://www.w3.org/RDF/
  2 T-box (terminological box), is the portion of the knowledge base contain-     3 It should covers a subset of examples and does not cover a part of
ing the theory (T ), while A-box (assertional box) is the portion containing    counterexamples, but there would remain some uncovered examples and/or
the ground statements                                                           some covered counterexamples.
   Specialization can take place in two different ways. The first
one is based on the notion of counterfactuals [15]. The idea
behind is that if we have a concept that covers some negative
examples one can single out the parts of such definitions that
are responsible for the coverage and rule them out by means
of negation. There is a well known non-standard inference
in DLs called concept difference or residual [16] used for
the aforementioned identifications of part of definitions which
are responsible for the incorrect coverage (also known as
blame assignment). For eliminating in one single negation
all the different parts that are responsible for the coverage
of each negative, a generalization of the blamed parts is
needed in order to obtain a single description to be negated
(i.e. a counterfactual). Thus, again generalization is needed,
for the negative residuals with respect to the overly general
hypothesis. Such negative residuals, will now become positive
examples in a new nested learning problem and the solution
of such problem will be our counterfactual. Since its negation
has to be conjoined to the starting incorrect hypothesis in
order to specialize, one do not want to incur in the dual error,
i.e. overspecialization. Indeed, the refined resulting concept
definition must keep covering the positive examples it covered
before specialization. Hence the counterfactual should not
cover the residual of covered positive example representatives
w.r.t. the covering hypothesis to specialize. This means that
                                                                                     Fig. 1.   SELA Alignment Protocol
they represent negative examples for the new nested learning
problem solved by the recursive call.
   Specialization by means of counterfactuals can be proved
not to be complete in the sense that it sometimes fails             agents that have their own knowledge bases and wish to
in finding a correct counterfactual for properly specializing       dialogue with each other. In such environments, it is very likely
overly general hypotheses. That is why another specialization       that there is no common ontology to commit to; this can offer
strategy was introduced, named Add Conjunct, which is               plenty of possible scenarios in which ontology alignment can
applied whenever the Counterfactual routine fails. Such a           be evaluated.
strategy aims at finding a conjunct per positive example that          Here we present the prototype in terms of its architecture
does not appear yet in the hypothesis to be specialized nor in      and we propose an evaluation on some simple alignment
none of the negatives that are currently covered. In order to       problems that are characteristic (in our opinion) of typical
minimize the number of conjuncts, each positive provides such       situations that can happen in such settings. We conducted such
a conjunct only if there are no conjuncts (already provided by      simple tests in order to single out the direction towards which
some other positives) that cover it. After having computed          a mature system should move as well as the practical issues
such conjuncts, they are generalized into their least common        that arise. Therefore, these tests are not to be considered a
subsumer (lcs) [10] and such lcs is conjoined to the overly         thorough empirical evaluation, which will follow once robust
general hypothesis. Since the requirements for a conjunct to        solutions for the issues described in the following have been
be chosen for each positive are very tight, it may some-            devised.
times happen that such specialization strategy fails as well           Our prototype is called SELA (Self Explaining and Learn-
as counterfactual before it. In such cases, the overly general      ing Agents) and consists of a software agent (namely SE-
hypothesis is simply replaced by the lcs of its covered positive    LAAgent) that has two main tasks: it can either explain
example representatives, such lcs is added as a disjunct to the     concepts, providing examples and validating classifications,
main generalization and the algorithm chooses another seed          or it can learn concepts from examples acquired by some
starting again the inner loop.                                      other SELAAgent and ask for validations of what it learned.
                                                                    The agent has been developed within the framework provided
     V. P ROTOTYPE D ESCRIPTION AND P RELIMINARY                    by the Java Agent Development Environment (JADE http:
                     E VALUATION                                    //jade.tilab.com) and the alignment methodology has
  We developed a preliminary prototype that implements the          been designed as a protocol as sketched in Fig. 1.
methodology in Sect. III. We thought that one of the most              We can describe it briefly as follows. A SELAAgent A (the
suitable settings could be a Multi-Agent System. Indeed, it         teacher) initiates a conversation with another SELAAgent B
seems natural to think of communicating actors as autonomous        (the learner) asking whether B knows the concept a : C which
is what A wants to speak about. B can confirm it knows a : C        both the concept called HighQuality defined, however, in
or disconfirm, stating it does not. In case of disconfirmation      two different namespaces (in order to represent two different
A provides some individual names that are instances of a : C        points of views for it). One deals with HighQuality from the
and some names of counterexamples of a : C (individuals in          perspective of a particular restaurants defining it as:
the extension of the complement of a : C). After receiving the      HighQuality      ≡       Meal u ((∃hasCourse.Starter u
names of the instances (both examples and counterexamples)          ∃hasCourse.MainCourse u ∃hasCourse.Dessert) t
B takes into account only the subset of individuals it owns         ∃hasCourse.(∃hasDrink.Alcoholic))
in its ontology. Then B starts learning as described in the           The other one defines its notion of HighQuality from the
previous Section. Notice that instance descriptions are not to      hypothetical perspective of a dietician as:
be provided by A, but they are built on the assertions that         HighQuality                   ≡                   Meal     u
B has about such individuals. When learning terminates, B           (∀hasCourse.(∀hasDrink.(¬Alcoholic)                        u
has a definition of a : C in terms of its ontology vocabulary.      ∃hasFood(LowCaloric           t       NegligibleCaloric    t
As we shall see in the following, this needs not to be exactly      ModerateCaloric)))
corresponding to the definition of a : C that is actually stored
into the ontology of A. B then, has somehow to be sure that            It is obvious that there are two different alignment purposes
its idea of a : C corresponds to the one of A. This can be          and situation. Indeed, in the former, the process should detect
accomplished to some extent by means of the cyclic remainder        that there is a complete correspondence among the entities of
of the protocol. B chooses some individuals and classifies          the two ontologies. In the latter, there is not a 1-1 mapping be-
them employing the definition of a : C it has learned. Then,        tween the two conceptualizations of HighQuality but it could
it sends the classification results to A that validates them. A     be interesting if each party (restaurant owner and dietician)
answers with validation results and then B can fine-tune (see       could build up, in their respective knowledge base, the point
Def. IV) its definition a : C. The loop stops when the number       of view of their counterpart during the dialogue.
of discordances between B classifications and A validations            As concerns the mapping between the two academic on-
is below a given threshold.                                         tologies, we report briefly that for each concept the learning
                                                                    SELAAgent was able to build up a definition that was
A. Artificial Ontologies                                            equivalent to the corresponding translated concept in its own
   We prepared two sample pairs of ontologies. The first pair       ontology, even when a few instances where exchanged. Such
is made of two ontologies that are structurally identical but for   a good result depends obviously on the structural similarity
concept and property names. In fact, we prepared an ontology        of the ontologies: the individuals in common between the
in the academic domain with an English nomenclature and             two ontologies have the same structure but for the names of
then its translation in Italian (see Fig. 2 for its layout in       the relations; moreover, the A-box contains all the relevant
Protégé).                                                         information for the individuals, which in general may not be
                                                                    the case (see Sect. V-C for further details).
                                                                       In the restaurant domain the results were not as satisfactory
                                                                    as the previous case. Consider the more likely scenario in
                                                                    which the agent with the restaurant owner version of High-
                                                                    Quality tries to learn the dietician’s concept of HighQuality.
                                                                    It learns a definition that is more specific than the desired one
                                                                    though it never required tuning in all the iterations we ran
                                                                    through. This is likely due to the limitedness of the number
                                                                    and especially of the variety of examples. It is indeed well
                                                                    known in inductive learning that too few examples not so
                                                                    different among each other may yield a phenomenon called
                                                                    overfitting. Overfitting occurs when the learned definition is
                                                                    too adherent to the training set used for building it up and
                                                                    hence suffers for poor predictive accuracy on future unseen
           Fig. 2.   English and Italian Academy ontologies         examples classification. Future experiments will aim to devise
                                                                    also suitable strategies for choosing examples and, above all,
                                                                    counterexamples. As a matter of fact, a careful selection of
   The second pair presents a different situation. We started
                                                                    counterexamples could improve the effectiveness of learning.
from a common ontology in a fancy restaurants domain
                                                                    Winston [17] claimed the usefulness of near-misses counterex-
that deals with food, meals, and courses. Actually, it is a
                                                                    amples that are instances that very close to the concept to be
simplified version of the famous food ontology sample on the
                                                                    learnt but not belonging to its extension. In our case, we could
W3C website.4 Then we derived two ontologies containing
                                                                    exploit the taxonomy for individuating the most likely near-
  4 http://www.w3.org/TR/2002/WD-owl-guide-20021104/                misses: e.g. the instances of the superconcepts of the concept
food.owl                                                            to be learned (or those of its sibling concepts) that do not
belong to it.                                                                       C. ACM and DBLP Test Case
                                                                                        Our last test case has been built from ACM SIGMOD
B. Ontology Alignment Evaluation Initiative Ontologies
                                                                                    dataset7 and from DBLP RDF dump8 ; the ACM dataset is
   Our second test is based on two ontologies taken from the                        an XML file with associated DTD; in order to translate it into
Ontology Alignment Evaluation Initiative test set for 2005,                         OWL, we designed a very small ontology reflecting the DTD,
namely onto101 and onto2065 ; onto101 is the reference                              and then produced the OWL ontology we needed.
ontology for the contest, which consists on a set of ontologies                         In order to find common individuals, we analyzed the
that are modifications of onto101 (e.g., hierarchy flattening,                      available data and found that a good way to identify matching
deletion of concepts, translation of names, URIs, comments,                         individuals for this setting is to look at the paper titles;
or deletion of comments). In particular, onto206 is the                             this enabled us to choose a subset of the DBLP individuals
French translation of onto101, where all local names of                             (amounting to roughly 700 individuals) that were described
the defined classes and properties have been translated from                        both in terms of the DBLP ontology and of our homemade
English to French. These ontologies consist of 39 named                             ACM SIGMOD ontology (both sketched in Fig. 49 ; the
classes, 24 properties and over 70 individuals, of which about                      learning problem here consisted in finding the definition for the
50 are identified by an URI; these individuals have the same                        concept Article in the ACM SIGMOD ontology; the definition
URI in both onto101 and onto206, and this enables us                                we obtain for Article in the original ACM SIGMOD ontology
to apply our algorithm to them; in particular we assigned                           is:
onto101 to one of our agents (the teacher) and onto206 to
the other agent, then we made the teacher agent try to teach                                       sigmod:Article u ∃sigmod:author.>
the definition of the http://oaei.inrialpes.fr/                                     while the definition w.r.t. the DBLP ontology is:
2005/benchmarks/101/onto.rdf\#Institution6
concept; however, the resulting definition:                                                     dblp:Article u ∃dblp:url.> u ∃dblp:ee.>

                    ∃adresse.Adresse u Éditeur                                        The definitions appear to be very short; in fact, this depends
                                                                                    on the two chosen ontologies being very small, and having
is poor w.r.t. the original definition (Fig. 3) and moreover it is                  mainly datatype properties, which are not useful in the learning
overfitting (being Éditeur is not necessary to be Institution);                    algorithm.
while the second issue only depends on the available individ-
uals (only two individuals of class Éditeur were available in
the ontology), the first issue depends on the targeted DL, since
the expressivity of the ontologies we use here is greater than
that of the language our learner uses (specifically, cardinality
restrictions are not handled). This example clearly shows
that at least cardinality restrictions should be added to the
representation language of the algorithm in order for it to be
useful in real world scenarios.


                                                                                                  Fig. 4.   ACM SIGMOD and DBLP Ontologies


                                                                                       At the time of writing, no quantitative tests have been run
                                                                                    on this dataset; the reason is that, being all the individuals in
                                                                                    DBLP and ACM SIGMOD almost equal from the structural
                                                                                    point of view, a very small amount of examples is enough to
                                                                                    converge on the presented definitions. These definitions score
                                                                                    100% precision on the presented datasets, but the reason for
                                                                                    this high performance lies in the regularity of the dataset (the
                                                                                    actual number of examples used in the tests is 10 positive
                    Fig. 3.   inria206:Institution Definition
                                                                                    examples and 5 negative examples out of 700 available individ-
                                                                                    uals). Also, the fact that so many properties in real ontologies

   5 Full test set at http://oaei.ontologymatching.org/2005/; as                       7 available    at http://www.cs.washington.edu/research/
a side note, it is necessary to operate some corrections on the ontologies, since   xmldatasets/www/repository.html
there are some small errors: the literals for year, month and day should be            8 available at http://www.semanticweb.org/library/
typed with the corresponding XSD types, but they are plain literals in both            9 it is worth noting that the DBLP RDF dump needed a little data cleaning,
ontologies, and the DIG reasoner we use (Pellet, http://www.mindswap.               since it was not completely conformant RDF; in particular, many URIs in
org/2003/pellet/) finds the ontology inconsistent                                   the data contained spaces, that needed handling before being submitted to the
   6 Namespaces will be shortened in the following                                  reasoner
are defined as datatype, even when they in fact represent                          [2] T. R. Gruber, “A translation approach to portable ontology specifica-
entities (e.g. dblp:author is a datatype property, while usually                       tions,” 1993.
                                                                                   [3] N. F. Noy, “Tools for mapping and merging ontologies.” in Handbook
an author of a paper is a person, and so is typically modeled                          on Ontologies, ser. International Handbooks on Information Systems,
as an individual) raises the idea that building more structured                        S. Staab and R. Studer, Eds. Springer, 2004, pp. 365–384.
datasets is necessary before attempting a complete empirical                       [4] A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy, “Learning to
                                                                                       map between ontologies on the semantic web.” in WWW, 2002, pp.
evaluation of the system.                                                              662–673.
   From the computational complexity point of view, it is well                     [5] A. Doan, P. Domingos, and A. Y. Halevy, “Learning to match the
known that OWL DL reasoning algorithms have exponential                                schemas of data sources: A multistrategy approach.” Machine Learning,
                                                                                       vol. 50, no. 3, pp. 279–301, 2003.
worst-case complexity10 ; however, most real world ontologies                      [6] M. A. M. Natalya F. Noy, “The prompt suite: interactive tools for
do not exploit all DL constructors, and therefore reasoning                            ontology merging and mapping,” International Journal of Human-
with these ontologies can be done in reasonable time. We con-                          Computer Studies, vol. 59, pp. 983–1024, 2003.
                                                                                   [7] M. Ehrig, S. Staab, and Y. Sure, “Bootstrapping ontology alignment
ducted a very preliminary test on the DBLP dataset presented,                          methods with apfel.” in International Semantic Web Conference, ser.
where we sliced the available examples in 10 disjoint sets and                         Lecture Notes in Computer Science, Y. Gil, E. Motta, V. R. Benjamins,
ran the learning algorithm separately; each learning problem                           and M. A. Musen, Eds., vol. 3729. Springer, 2005, pp. 186–200.
                                                                                   [8] T. M. Mitchell, Machine Learning. New York: McGraw-Hill, 1997,
was made up of 50 positive examples (randomly chosen) and                              ch. Concept Learning and General to Specific Ordering, pp. 20–51.
5 negative examples, and the elapsed time for building the                         [9] C. dAmato, N. Fanizzi, and F. Esposito, “A semantic similarity measure
definition is around one minute for each one of the ten trials.                        for expressive description logics,” in Proceedings of CILC 2005, 2005.
                                                                                  [10] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-
The last trial was made using all individuals at once, so that                         Schneider, Eds., The Description Logic Handbook. Cambridge Uni-
there were 500 positive examples and 50 negative examples;                             versity Press, 2003.
in this case, the required time is 80 seconds. So far, then, the                  [11] P. R. J. van der Laag and S.-H. Nienhuys-Cheng, “Existence and
                                                                                       nonexistence of complete refinement operators.” in ECML, ser. Lecture
number of examples seems not to hamper the practical use of                            Notes in Computer Science, F. Bergadano and L. D. Raedt, Eds., vol.
the algorithm.11                                                                       784. Springer, 1994, pp. 307–322.
                                                                                  [12] S. Nienhuys-Cheng and R. de Wolf, Foundations of Inductive Logic
            VI. C ONCLUSIONS AND F UTURE W ORK                                         Programming, ser. LNAI. Springer, 1997, vol. 1228.
                                                                                  [13] L. Badea and S.-H. Nienhuys-Cheng, “A refinement operator for de-
   Giving solutions to the ontology alignment problem is one                           scription logics,” in Proceedings of the 10th International Conference
of the most compelling issues in the roadmap to realize the                            on Inductive Logic Programming, ser. LNAI, J. Cussens and A. Frisch,
                                                                                       Eds., vol. 1866. Springer, 2000, pp. 40–59.
Semantic Web as a real-world infrastructure. We recalled the                      [14] F. Esposito, N. Fanizzi, L. Iannone, I. Palmisano, and G. Semer-
motivations for the investigation on this subject and underlined                       aro, “Knowledge-intensive induction of terminologies from metadata,”
our own viewpoint on alignment. In such a vision, we prefer                            in Proceedings of the 3rd International Semantic Web Conference,
                                                                                       ISWC2004, ser. LNCS, S. A. McIlraith, D. Plexousakis, and F. van
to slightly stray from the most widespread idea of a rigid                             Harmelen, Eds., vol. 3298. Springer, 2004, pp. 411–426.
centralized (top level) ontology and to adopt on-demand partial                   [15] S. Vere, “Multilevel counterfactuals for generalizations of relational
mappings among ontologies owned by the parties in play. We                             concepts and productions,” Artificial Intelligence, vol. 14, pp. 139–164,
                                                                                       1980.
have proposed a concept learning algorithm to accomplish this                     [16] G. Teege, “A subtraction operation for description logics,” in Proceed-
that works under assumptions justified throughout the paper.                           ings of the 4th International Conference on Principles of Knowledge
We have illustrated the prototype implemented these ideas                              Representation and Reasoning, P. Torasso, J. Doyle, and E. Sandewall,
                                                                                       Eds. Morgan Kaufmann, 1994, pp. 540–550.
together with some preliminary results using limited datasets.                    [17] P. Winston, Learning Structural Descriptions from Examples. MIT,
   We plan to improve our work along the following directions.                         1970, ph.D. dissertation.
First, we should evaluate suitable concept similarity measures
for assessing the closeness of learned concept definitions to
the pre-existing conceptualizations in the ontologies to be
aligned. We will also investigate on methodologies for aligning
properties and not only concepts, using techniques that are
very similar to those employed in tools like GLUE (see
Sect. II). Last, but not least, we will evaluate the impact of
different strategies for example selection in terms of the quality
(effectiveness and/or efficiency) of the induced definitions and,
therefore, of the computed alignment themselves.
                              R EFERENCES
 [1] T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,”
     Scientific American, May 2001.

  10 http://www.cs.man.ac.uk/ ezolin/logic/complexity.html
   11 Note that this last test was designed as a ten-fold cross validation
experiment, however the results of the test seem too influenced by the specific
dataset to be taken into account to measure the performance of the algorithm
in terms of precision and recall, and that’s the reason they’re not presented
here in detail.