Discovering and Merging Keyword Senses using
              Ontology Matching?

       Mauricio Espinoza?? , Raquel Trillo, Jorge Gracia, and Eduardo Mena

      IIS Department, Univ. of Zaragoza, Marı́a de Luna 1, 50018 Zaragoza, Spain
                 {mespinoz,raqueltl,jogracia,emena}@unizar.es
                   WWW home page: http://sid.cps.unizar.es


         Abstract. During the last years we are witnessing how the use of key-
         words has become the standard input when searching the Web. As op-
         posite to the syntactic searches performed by traditional web search en-
         gines, the current research challenge is a semantics-guided information
         retrieval. The increasing pools of ontologies available on the Web can
         help to discover the semantics of user keywords and this information is
         priceless for many tasks, including new semantic search engines.
         In this paper we propose a system that takes as input a list of keywords
         provided by the user and discovers their possible meanings by consult-
         ing the knowledge represented by many (heterogeneous and distributed)
         ontologies. These keyword senses are semantically enriched with the syn-
         onym terms found during the ontology matching process: A synonymy
         measure based on statistics techniques and ontological similarity is used
         to integrate senses that are similar enough.


      Keywords: Ontology matching for information integration


1      Introduction
Although keyword-based search is a widely used technique for information re-
trieval, traditional techniques do not consider the specific semantics assigned by
the user: the same keywords can be used by different users with the purpose
of accessing to different information. Furthermore, the syntactic-based search
engines are very influenced by the enormous amount of information about pop-
ular issues on the Web, i.e., the keyword “java”: Java as programming language
eclipses the rest of possible senses (the Indonesian island, a coffee plant, different
US cities, etc). However, ontologies (which offer a formal, explicit specification
of a shared conceptualization [5]) can be used to make the semantics of user
keywords explicit without ambiguity. The more ontologies consulted, the more
chances to find the semantics assigned to keywords by the user.
    In this paper, we propose a system that takes as input a list of plain key-
words provided by the user, discovers their semantics in run-time and obtains a
?
     This work is supported by the CICYT project TIN2004-07999-C02-02.
??
     Work supported by a grant of Santander Central Hispano & University of Zaragoza.
list of senses extracted from different ontology pools; it deals with the possible
overlapping among senses. The main steps of our approach are summarized in
the following:
 1. Extraction of Keyword Senses. First, the user keywords are normalized by
    a preprocessing step (e.g., rewriting them in lowercase, removing hyphens,
    etc.), and in order to discover the semantics of the user keywords, the system
    accesses to the shared knowledge stored in different ontology pools available
    on the Web. The extracted senses are semantically enriched with the on-
    tological senses of their synonyms (which are obtained from the ontology
    pool), whenever the system evaluates that the synonym senses matches to
    the semantics of the corresponding keyword sense.
 2. Alignment of Senses. This process uses an incremental algorithm for the
    alignment of the different keyword senses in order to remove the possible
    semantics redundancy among them. Senses are merged when the estimated
    synonymy probability between them is above a certain threshold. The syn-
    onymy measure combines a standard string distance metric with a structural
    similarity measure that is based on vector space techniques. Thus the result
    is a set of different possible senses for each user keyword.
    For efficiency purposes, the system uses sampling and other statistic tech-
niques, as well as parallel processing, whenever possible. The output of our sys-
tem can be the input for a disambiguation process across keywords [4] or used
to retrieve data once the keyword semantics is known.
    The rest of this paper is as follows. In Section 2 we show how the possible
senses of each keyword are obtained and semantically enriched with their syn-
onym senses. In Section 3 we describe the algorithm that computes the synonymy
probability in order to integrate senses when a certain threshold is achieved. Fi-
nally, conclusions and future work appear in Section 4.


2     Extraction of Keyword Senses
In this section we provide the details that show the contribution of this paper in
the task of automatically retrieving the possible senses for a set of user keywords.
In order to find the ontological terms that match those keywords, the system
accesses to Swoogle [2], other remote lexical resources as WordNet [8] and other
ontologies not indexed by Swoogle are used as well. We advocate using a pool of
ontologies instead of just a single one, like WordNet (as many works do [6, 7]),
because many technical or subject-specific senses cannot be found in WordNet.
    The system builds a sense for each URI obtained with the information re-
trieved from matching terms in the ontology pool [1]. In our approach, a sense
of a keyword k, denoted by sk , is a tuple sk = <s, grph, descr, pop, syndgr>,
where s is the list of synonym names1 of keyword k, grph describes the sense
1
    To extract from an ontology the synonyms of a class, property or individual, the
    primitives equivalentClass, equivalentProperty and sameIndividualAs are used, re-
    spectively.
sk by means of the hierarchical graph of hypernyms and hyponyms of synonym
terms found in one or more ontologies, descr is a description in natural language
of such a sense, and pop and syndgr measure the degree of popularity of this
sense (pop is the number of times it appears in the ontology pool and syndgr is
the integrated percentage of synonymy degree). Thus, senses are built with the
information retrieved from matching terms in the ontology pool [1].
    As matching terms could be ontology classes, properties or individuals, three
lists of possible senses are associated with each keyword k: Skclass , Skprop and
Skindv . In Figure 1 we show an example of some senses found in the ontology
pool for the user keyword “star”. The system finds in WordNet two matchings
of keyword “star” as concept/class (s1 and s2), and one matching in the Travel
Ontology2 as property of class “hotel” (s3). Notice that each sense is initialized
with a popularity=1 and a synonymy degree=1.


                       celestial body
     class
    s1star = < {WN#star},   star "(astronomy) a celestial body of hot gases that...", 1, 1 >

                 binary star ... supernova
                                                      actor
       class
    s2star = < {WN#star, WN#lead, WN#principal},     star , "an actor who plays a principal role", 1, 1 >
                                               co−star ... film star

                                    domain (hotel)
      prop
    s3star = < {TravelOntology#star}, star , "", 1, 1 >


             Fig. 1. Some senses of keyword “star” extracted from the ontology pool


     Each keyword sense is enhanced incrementally with the synonyms terms ex-
tracted from the ontology pool. Therefore our system takes advantage of the
shared ontologies available on the Web and semantically enriches the keyword
senses with senses extracted from their synonyms. The synonym names are stored
in the sense structure shown before, which gets upgraded everytime the sense is
integrated with a (very similar) sense coming from other ontology. In order to
evaluate the semantic similarity between the sense of a keyword and their syn-
onyms, the system performs a sense alignment process (detailed in Section 3)
which determines whether the semantics of the keyword sense and each synonym
sense found represent the same semantic or not. After discarding the synonym
senses that do not enrich the corresponding keyword sense, the result is a list of
different possible senses for each keyword.
     This process can be limited in time; obtaining the senses is executed in paral-
lel for each keyword; within that task, the semantic enrichment of each keyword
sense with its synonym senses is performed in parallel too.
2
     http://learn.tsinghua.edu.cn:8080/2003214945/travelontology.owl
3     Alignment of Senses

We explain in this section the sense alignment process, which is used in two
situations by our system: 1) to check which synonym senses represent the same
semantics as their keyword senses, and 2) to avoid redundancy in the list of
possible senses of each user keyword. However both tasks share a common goal:
to find when two given senses represent very similar semantics; in that case they
will be considered synonyms and both senses will be integrated3 .
    In order to decide if two senses must be integrated (as a single sense) or
not, the system computes their synonymy probability. Thus the system avoid
redundancy among the possible senses of a keyword. At present, several solutions
to determine the matching among ontological terms of distinct ontologies have
been proposed, see [9] for recent surveys. Our approach computes coefficients
of synonymy degree in the [0,1] range, however other approaches as semantic
matching [3] can be used as well.
    The synonymy measure used relies on both linguistic and structural charac-
teristics of ontologies. The following steps are performed: 1) an initial compu-
tation using linguistic similarity, which consider labels as strings, 2) a recursive
computation using structural similarity, which exploits the semantics of terms
(ontological context) until a certain depth, and 3) the above values are combined
to obtain the resultant synonymy measure.
    Our proposal for sense alignment is not just a comparison between two senses
but an iterative process, which improves the quality and efficiency of ontology
matching and enables the reuse of new discovered senses. In other words, each
new integrated sense must be considered as candidate to integrate with the
rest. For the same reason, new senses that do not integrate are stored because
they could become the missing semantic gap between two senses. Although this
method is costly (we limit its execution time), it performs a much better ontology
alignment among senses. Due to space limitations, we do not detail this process.
    In a variety of approaches, the similarity measure is only calculated among
ontological terms that plays as classes. However, unlike another works, we pro-
pose a way to obtain the synonymy probability according to the type of senses
that we compare. Details about this process is available in [1] as it not the main
goal of this paper.


4     Conclusions

In this paper we have presented a semantics-guided approach to discover the pos-
sible senses for a set of keywords, by searching and extracting relevant knowledge
from different ontology pools; ontology matching and synonymy estimation tech-
niques are used to merge senses considered similar enough. The main features
of our proposal are the following:
3
    The integration process that we propose can be found in [1].
 1. It uses an iterative approach to retrieve from different knowledge repositories
    the possible senses of each user keyword, in a parallel manner. A sense is
    represented basically as the (multi)ontological context of a term, and the
    system is able to deal with senses corresponding to different kind of ontology
    terms (classes, properties, and individuals).
 2. It considers not only the senses corresponding to ontology terms syntactically
    matching the user keywords but also the senses of ontology terms matching
    the synonyms of the user keywords, recursively, in order to semantically
    enrich the keyword senses retrieved within a certain synonymy threshold.
 3. It measures the synonymy degree between two senses by considering their
    linguistic and structural similarity. Statistical techniques like sampling and
    parallel processing are used to improve the performance of this process.

   We believe that this technique to find out the semantic different between
senses (subsets of ontologies) can be applied to many fields. As example, we are
currently working on using the retrieved senses to generate queries expressed in
a knowledge representation language to retrieve data corresponding to the user
keywords.


References
1. M. Espinoza, J. Gracia, R. Trillo, and E. Mena. Discovering the semantics of key-
   words: An ontology-based approach. In The 2006 International Conference on Se-
   mantic Web and Web Services (SWWS’06), Las Vegas, Nevada (USA). CSREA
   Press, June 2006.
2. T. Finin, L. Ding, R. Pan, A. Joshi, P. Kolari, A. Java, and Y. Peng. Swoogle:
   Searching for knowledge on the Semantic Web. In AAAI 05 (intelligent systems
   demo), July 2005.
3. F. Giunchiglia, P. Shvaiko, and M. Yatskevich. Discovering missing background
   knowledge in ontology matching. In Technical Report DIT-06-005, Informatica e
   Telecomunicazioni, University of Trento, February 2006.
4. J. Gracia, R. Trillo, M. Espinoza, and E. Mena. Querying the web: A multiontology
   disambiguation method. In Sixth International Conference on Web Engineering
   (ICWE’06), Palo Alto (California, USA). ACM, July 2006.
5. T. R. Gruber. Towards principles for the design of ontologies used for knowledge
   sharing. In N. Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis
   and Knowledge Representation. Kluwer Academic Publishers, 1993.
6. B. Le, R. Dieng-Kuntz, and F. Gandon. On ontology matching problems - for
   building a corporate semantic web in a multi-communities organization. In ICEIS
   (4), 2004.
7. V. Lopez, E. Motta, and V. Uren. Poweraqua: Fishing the semantic web. In 3rd
   European Sematic Web Conference, Budva, Montenegro, June 2006.
8. G. Miller. WordNet: A Lexical Database for English. Communications of the ACM,
   38(11), nov 1995.
9. P.Shvaiko and J. Euzenat. A survey of schema-based matching approaches. In
   Journal on Data Semantics, 2005.