Using semantic relatedness and word sense
              disambiguation for (CL)IR

                     Eneko Agirre1 , Arantxa Otegi1 , Hugo Zaragoza2
       1
         IXA NLP Group, University of the Basque Country. Donostia, Basque Country
                          {e.agirre,arantza.otegi}@ehu.es
                         2
                           Yahoo! Researech, Barcelona, Spain
                                  hugoz@yahoo-inc.com


                                             Abstract
     In this paper we report the experiments for the CLEF 2009 Robust-WSD task, both
     for the monolingual (English) and the bilingual (Spanish to English) subtasks. Our
     main experimentation strategy consisted on expanding and translating the documents,
     based on the related concepts of the documents. For that purpose we applied a state-
     of-the art semantic relatedness method based on WordNet. The relatedness measure
     was used with and without WSD information. Even if we obtained positive results in
     our training and development datasets, we did not manage to improve over the baseline
     in the monolingual case. The improvement over the baseline in the bilingual case is
     marginal. We plan to further work on this technique, which has attained positive
     results in the passage retrieval for question answering task at CLEF (ResPubliQA).
.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; I.2 [Artificial Intelligence]: I.2.7 Natural Language Processing

Keywords
Robust Retrieval, CLIR, Word Sense Disambiguation, Lexical Relatedness, Document Expansion


1    Introduction
Our goal is to test whether Word Sense Disambiguation (WSD) information can be beneficial for
Cross Lingual Information Retrieval (CLIR) or monolingual Information Retrieval (IR). WordNet
has been previously used to expand the terms in the query with some success [3, 4, 5, 7]. WordNet-
based approaches need to deal with ambiguity, which proves difficult given the little context
available to disambiguate the word in the query effectively. In our experience document expansion
works better than topic expansion (see our results of the last edition in [6]). Bearing this in mind,
this edition we have mainly focused on documents, using a more elaborate expansion strategy.
We have applied a state-of-the-art semantic relatedness method based on WordNet [1] in order to
select the best terms to expand the documents. The relatedness method can optionally use the
WSD information provided by the organizers.
    The remainder of this paper is organized as follows. Section 2 describes the experiments carried
out. Section 3 presents the results obtained. Finally, Section 4 draws the conclusions and mentions
future work.
2      Experiments
Our main experimentation strategy consisted on expanding the documents, based on the related
concepts of the documents. The steps of our retrieval system are the following. We first expand
translate the topics. In a second step we extract the related concepts of the documents, and expand
the documents with the words linked to these concepts in WordNet. Then we index these new
expanded documents, and finally, we search for the queries in the indexes in various combinations.
All steps are described sequentially.

2.1      Expansion and translation strategies of the topics
WSD data provided to the participants was based on WordNet version 1.6. Each word sense has
a WordNet synset assigned with a score. Using those synset codes and the English and Spanish
wordnets, we expanded the topics. In this way, we generated different topic collections using
different approaches of expansion and translation, as follows:

     • Full expansion of English topics: expansion to all synonyms of all senses.
     • Best expansion of English topics: expansion to the synonyms of the sense with highest
       WSD score for each word, using either UBC or NUS disambiguation data (as provided by
       organizers).
     • Translation of Spanish topics: translation from Spanish to English of the first sense for each
       word, taking the English variants from WordNet.

     In both cases we used the Spanish and English wordnet versions provided by the organizers.

2.2      Query construction
We constructed queries using the title and description topic fields. Based on the training topics, we
excluded some words and phrases from the queries, such as find, describing, discussing, document,
report for English and encontrar, describir, documentos, noticias, ejemplos for Spanish.
    After excluding those words and taking only nouns, adjectives, verbs and numbers, we con-
structed several queries for each topic using the different expansions of the topics (see Section 2.1)
as follows:
     • Original words.
     • Both original words and expansions for the best sense of each word.
     • Both original words and all expansions for each word.
     • Translated words, using translations for the best sense of each word. If a word had no
       translation, the original word was included in the query.
   The first three cases are for the monolingual runs, and the last one for the bilingual run which
translated the query.

2.3      Expansion and translation strategies of the documents
Our document expansion strategy was based on semantic relatedness. For that purpose we used
UKB1 , a collection of programs for performing graph-based Word Sense Disambiguation and lexical
similarity/relatedness using a pre-existing knowledge base, in this case WordNet 1.6.
    Given a document, UKB returns a vector of scores for each concept in WordNet. The higher
the score, the more related is the concept to the given document. In our experiments we used
different approaches to represent each document:
    1 The algorithm is publicly available at http://ixa2.si.ehu.es/ukb/
    • using all the synsets of each word of the document.
    • using only the synset with highest WSD score for each word, as given by the UBC disam-
      biguation data (provided by the organizers).
   In both cases, UKB was initialized using the WSD weights: each synset was weighted with the
score returned by the disambiguation system, that is, each concept was weighted according to the
WSD weight of the corresponding sense of the target word.
   Once UKB outputs the list of related concepts, we took the highest-scoring 100 or 500 concepts
and expanded them to all variants (words in the concept) as given by WordNet. For the bilingual
run, we took the Spanish variants. In both cases we used the Spanish and English wordnet versions
provided by the organizers.
   The variants for those expanded concepts were included in two new fields of the document
representation; 100 concepts in the first field and 400 concepts in the second field. This way, we
were able to use the original words only, or also the most related 100 concepts, or the original
words and the most related 500 concepts. We will get back to this in Section 2.4 and Section 2.5.

2.4    Indexing
We indexed the new expanded documents using the MG4J search-engine [2]. MG4J makes it
possible to combine several indices over the same document collection. We created one index for
each field: one for the original words, one for the expansion of the top 100 concepts, and another
one for the expansion of the following 400 concepts. Porter stemming was used as per usual.

2.5    Retrieval
We carried out several retrieval experiments combining different kind of queries with different kind
of indices. We used the training data to perform extensive experimentation, and choose the ones
with best MAP results in order to produce the test topic runs.
    The different kind of queries that we had prepared are those explained in Section 2.2. Our
experiments showed that original words were getting good results, so in the test runs we used only
the queries with original words.
    MG4J allows multi-index queries, where one can specify which of the indices one wants to search
in, and assign different weights to each index. We conducted different experiments, by using the
original words alone (the index made of original words) and also by using one or both indices
with the expansion of concepts, giving different weight to the original words and the expanded
concepts. The best weights were then used in the test set, as explained in the following Section.
    We used the BM25 ranking function with the following parameters: 1.0 for k1 and 0.6 for b.
We did not tune these parameters.
    The submitted runs are described in Section 3.


3     Results
Table 1 summarizes the results of our submitted runs. The IR process is the same for all the runs
and the main differences between them is the expansion strategy. The characteristics of each run
are as follows:
    • monolingual without WSD:
        – EnEnNowsd: original terms in topics; original terms in documents.
    • monolingual with WSD:
        – EnEnAllSenses100Docs: original terms in topics; both original and expanded terms
          of 100 concepts, using all senses for initializing the semantic graph. The weight of the
          index that included the expanded terms: 0.25.
        – EnEnBestSense100Docs: original terms in topics; both original and expanded terms
          of 100 concepts, using best sense for initializing the semantic graph. The weight of the
          index that included the expanded terms: 0.25.
        – EnEnBestSense500Docs: original terms in topics; both original and expanded terms
          of 500 concepts, using best sense for initializing the semantic graph. The weight of the
          index that included the expanded terms: 0.25.
   • bilingual without WSD:
        – EsEnNowsd: translated terms in topics (from Spanish to English); original terms in
          documents (in English).
   • bilingual with WSD:
        – EsEn1stTopsAllSenses100Docs: translated terms in topics (from Spanish to En-
          glish); both original and expanded terms of 100 concepts, using all senses for initializing
          the semantic graph. The weight of the index that included the expanded terms: 0.15.
        – EsEn1stTopsBestSense500Docs: translated terms in topics (from Spanish to En-
          glish); both original and expanded terms of 100 concepts, using best sense for initializing
          the semantic graph. The weight of the index that included the expanded terms: 0.15.
        – EsEnAllSenses100Docs: original terms in topics (in Spanish); both original terms (in
          English) and translated terms (in Spanish) in documents, using all senses for initializing
          the semantic graph. The weight of the index that included the expanded terms: 1.00.
        – EsEnBestSense500Docs: original terms in topics (in Spanish); both original terms
          (in English) and translated terms (in Spanish) in documents, using best sense for ini-
          tializing the semantic graph. The weight of the index that included the expanded terms:
          1.60.

    The weight of the index which was created using the original terms of the documents was 1.00
for all the runs.

                                Table 1: Results for submitted runs
                                                     runId                 map      gmap
         monolingual     no WSD        EnEnNowsd                          0.3826    0.1707
                         with WSD      EnEnAllSenses100Docs               0.3654    0.1573
                                       EnEnBestSense100Docs               0.3668    0.1589
                                       EnEnBestSense500Docs               0.3805    0.1657
         bilingual       no WSD        EsEnNowsd                          0.1805    0.0190
                         with WSD      EsEn1stTopsAllSenses100Docs        0.1827    0.0193
                                       EsEn1stTopsBestSense500Docs        0.1838    0.0198
                                       EsEnAllSenses100Docs               0.1402    0.0086
                                       EsEnBestSense500Docs               0.1772    0.0132


    Regarding monolingual results, we can see that using the best sense for representing the
document when initializing the semantic graph achieves slightly higher results with respect to
using all senses. Besides, we obtained better results when we expanded the documents using
500 concepts than using only 100 (compare the results of the runs EnEnBestSense100Docs and
EnEnBestSense500Docs). However, we did not achieve any improvement over the baseline with
neither WSD or semantic relatedness information. We have to mention that we did achieve im-
provement in the training data, but the difference was not significant2 .
  2 We used paired Randomization Tests over MAPs with α=0.05
    With respect to the bilingual results, EsEn1stTopsBestSense500Docs obtains the best result,
although the difference with respect to the baseline run is not statistically significant. This is dif-
ferent to the results obtained using the training data, where the improvements using the semantic
expansion were remarkable. It is not very clear whether translating the topics from Spanish to
English or translating the documents from English to Spain is better, since we got better results
in the first case in the testing phase (see runs called ...1stTops... in the Table 1), but not in
the training phase.
    In our experiments we did not make any effort to deal with hard topics, and we only paid
attention to improvements in Mean Average Precision (MAP) metric. In fact, we applied the
settings which proved best in training data according to MAP. Another option could have been to
optimize the parameters and settings according to Geometric Mean Average Precision (GMAP)
values.


4    Conclusions and future work
We have described our experiments and the results obtained in both monolingual and bilingual
tasks at Robust-WSD Track at CLEF 2009. Our main experimentation strategy consisted on
expanding the documents based on a semantic relatedness algorithm.
   The objective of carrying out different expansion strategies was to study if WSD information
and semantic relatedness could be used in an effective way in (CL)IR. After analyzing the results,
we have found that those expansion strategies were not very helpful, especially in the monolingual
task.
   For the future, we want to analyze why we have not achieved higher gains using the semantic
expansion, as the same strategy obtained remarkable improvements in the passage retrieval task
(ResPubliQA).


Acknowledgments
This work has been supported by KNOW (TIN2006-15049-C03-01) and KYOTO (ICT-2007-
211423). Arantxa Otegi’s work is funded by a PhD grant from the Basque Government.


References
[1] E. Agirre, A. Soroa, E. Alfonseca, K. Hall, J. Kravalova, and M. Pasca. A study on similarity
    and relatedness using distributional and WordNet-based approaches. In Proceedings of an-
    nual meeting of the North American Chapter of the Association of Computational Linguistics
    (NAACL), Boulder, USA, June 2009.
[2] P. Boldi and S. Vigna. MG4J at TREC 2005. In Ellen M. Voorhees and Lori P. Buckland,
    editors, The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings, number SP
    500-266 in Special Publications. NIST, 2005. http://mg4j.dsi.unimi.it/.
[3] S. Kim, H. Seo, and H. Rim. Information retrieval using word senses: Root sense tagging
    approach. In Proceedings of SIGIR, 2004.
[4] S. Liu, F. Liu, C. Yu, and W. Meng. An effective approach to document retrieval via utilizing
    wordnet and recognizing phrases. In Proceedings of SIGIR, 2004.
[5] S. Liu, C. Yu, and W. Meng. Word sense disambiguation in queries. In Proceedings of ACM
    Conference on Information and Knowledge Management (CIKM), 2005.
[6] A. Otegi, E. Agirre, and G. Rigau. IXA at CLEF 2008 Robust-WSD task: using Word Sense
    Disambiguation for (Cross Lingual) Information Retrieval. In Working Notes of the Cross-
    Lingual Evaluation Forum, Aarhus, Denmark, 2008. ISBN 2-912335-43-4, ISSN 1818-8044.
[7] J.R. Pérez-Agüera and H. Zaragoza. UCM-Y!R at CLEF2008 Robust and WSD tasks. In
    Working Notes of the Cross-Lingual Evaluation Forum, Aarhus, Denmark, 2008.