Semantic Vectors: an Information Retrieval scenario

                  Pierpaolo Basile                              Annalina Caputo                Giovanni Semeraro
             Dept. of Computer Science                     Dept. of Computer Science        Dept. of Computer Science
                  University of Bari                            University of Bari               University of Bari
                 Via E. Orabona, 4                             Via E. Orabona, 4                Via E. Orabona, 4
                70125 Bari (ITALY)                            70125 Bari (ITALY)               70125 Bari (ITALY)
               basilepp@di.uniba.it                          acaputo@di.uniba.it             semeraro@di.uniba.it


ABSTRACT                                                                 challenge for computational linguistics is ambiguity. Ambi-
In this paper we exploit Semantic Vectors to develop an IR               guity means that a word can be interpreted in more than
system. The idea is to use semantic spaces built on terms                one way, since it has more than one meaning. Ambiguity
and documents to overcome the problem of word ambiguity.                 usually is not a problem for humans therefore it is not per-
Word ambiguity is a key issue for those systems which have               ceived as such. Conversely, for a computer ambiguity is one
access to textual information. Semantic Vectors are able                 of the main problems encountered in the analysis and gener-
to dividing the usages of a word into different meanings,                ation of natural languages. Two main strategies have been
discriminating among word meanings based on information                  proposed to cope with ambiguity:
found in unannotated corpora. We provide an in vivo eval-
                                                                            1. Word Sense Disambiguation: the task of selecting
uation in an Information Retrieval scenario and we compare
                                                                               a sense for a word from a set of predefined possibilities;
the proposed method with another one which exploits Word
                                                                               usually the so called sense inventory 1 comes from a
Sense Disambiguation (WSD). Contrary to sense discrimi-
                                                                               dictionary or thesaurus.
nation, which is the task of discriminating among different
meanings (not necessarily known a priori), WSD is the task
of selecting a sense for a word from a set of predefined pos-               2. Word Sense Discrimination: the task of dividing
sibilities. The goal of the evaluation is to establish how                     the usages of a word into different meanings, ignoring
Semantic Vectors affect the retrieval performance.                             any particular existing sense inventory. The goal is to
                                                                               discriminate among word meanings based on informa-
                                                                               tion found in unannotated corpora.
Categories and Subject Descriptors
H.3.1 [Content Analysis and Indexing]: Indexing meth-                    The main difference between the two strategies is that dis-
ods, Linguistic processing; H.3.3 [Information Search and                ambiguation relies on a sense inventory, while discrimination
Retrieval]: Retrieval models, Search process                             exploits unannotated corpora.
                                                                            In the past years, several attempts were proposed to in-
                                                                         clude sense disambiguation and discrimination techniques
Keywords                                                                 in IR systems. This is possible because discrimination and
Semantic Vectors, Information Retrieval, Word Sense Dis-                 disambiguation are not an end in themselves, but rather “in-
crimination                                                              termediate tasks” which contribute to more complex tasks
                                                                         such as information retrieval. This opens the possibility of
1.   BACKGROUND AND MOTIVATIONS                                          an in vivo evaluation, where, rather then being evaluated
                                                                         in isolation, results are evaluated in terms of their contribu-
   Ranked keyword search has been quite successful in the                tion to the overall performance of a system designed for a
past, in spite of its obvious limits basically due to polysemy,          particular application (e.g. Information Retrieval).
the presence of multiple meanings for one word, and syn-                    The goal of this paper is to present an IR system which
onymy, multiple words having the same meaning. The result                exploits semantic spaces built on words and documents to
is that, due to synonymy, relevant documents can be missed               overcome the problem of word ambiguity. Then we com-
if they do not contain the exact query keywords, while, due              pare this system with another one which uses a Word Sense
to polysemy, wrong documents could be deemed as relevant.                Disambiguation strategy. We evaluated the proposed sys-
These problems call for alternative methods that work not                tem into the context of CLEF 2009 Ad-Hoc Robust WSD
only at the lexical level of the documents, but also at the              task [2].
meaning level.                                                              The paper is organized as follows: Sections 2 presents
   In the field of computational linguistics, a number of im-            the IR model involved into the evaluation, which embodies
portant research problems still remain unresolved. A specific            semantic vectors strategies. The evaluation and the results
                                                                         are reported in Section 3, while a brief discussion about
                                                                         the main works related to our research are in Section 4.
                                                                         Conclusions and future work close the paper.
Appears in the Proceedings of the 1st Italian Information Retrieval      1
                                                                           A sense inventory provides for each word a list of all pos-
Workshop (IIR’10), January 27–28, 2010, Padova, Italy.                   sible meanings.
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
2.     AN IR SYSTEM BASED ON
       SEMANTIC VECTORS
   Semantic Vectors are based on WordSpace model [15].
This model is based on a vector space in which points are
used to represent semantic concepts, such as words and doc-
uments. Using this strategy it is possible to build a vector
space on both words and documents. These vector spaces
can be exploited to develop an IR model as described in the
following.
   The main idea behind Semantic Vectors is that words are
represented by points in a mathematical space, and words
                                                                             Figure 1: Word vectors in word-space
or documents with similar or related meanings are repre-
sented close in that space. This provide us an approach to
perform sense discrimination. We adopt the Semantic Vec-           space, document semantically related will be represented
tors package [18] which relies on a technique called Random        closer in that space.
Indexing (RI) introduced by Kanerva in [13]. This allows              The Semantic Vectors package supplies tools for indexing
to build semantic vectors with no need for the factorization       a collection of documents and their retrieval adopting the
of document-term or term-term matrix , because vectors             Random Indexing strategy. This package relies on Apache
are inferred using an incremental strategy. This method al-        Lucene2 to create a basic term-document matrix, then it
lows to solve efficiently the problem of reducing dimensions,      uses the Lucene API to create both a word-space and a
which is one of the key features used to uncover the “latent       document-space from the term-document matrix, using Ran-
semantic dimensions” of a word distribution.                       dom Projection to perform dimensionality reduction without
   RI is based on the concept of Random Projection: the            matrix factorization. In order to evaluate Semantic Vectors
idea is that high dimensional vectors chosen randomly are          model we must modify the standard Semantic Vectors pack-
“nearly orthogonal”. This yields a result that is compara-         age by adding some ad-hoc features to support our evalua-
ble to orthogonalization methods, such as Singular Value           tion. In particular, documents are split in two fields, head-
Decomposition, but saving computational resources. Specif-         line and title, and are not tokenized using the standard text
ically, RI creates semantic vectors in three steps:                analyzer in Lucene.
     1. a context vector is assigned to each document. This           An important factor to take into account in semantic-
        vector is sparse, high-dimensional and ternary, which      space model is the number of contexts, that sets the dimen-
        means that its elements can take values in {-1, 0, 1}.     sions of the context vector. We evaluated Semantic Vectors
        The index vector contains a small number of randomly       using several values of reduced dimensions. Results of the
        distributed non-zero elements, and the structure of this   evaluation are reported in Section 3.
        vector follows the hypothesis behind the concept of
        Random Projection;                                         3.     EVALUATION
     2. context vectors are accumulated by analyzing terms            The goal of the evaluation was to establish how Semantic
        and documents in which terms occur. In particular the      Vectors influence the retrieval performance. The system is
        semantic vector of each term is the sum of the context     evaluated into the context of an Information Retrieval (IR)
        vectors of the documents which contain the term;           task. We adopted the dataset used for CLEF 2009 Ad-Hoc
                                                                   Robust WSD task [2]. Task organizers make available doc-
     3. in the same way a semantic vector for a document is        ument collections (from the news domain) and topics which
        the sum of the semantic vectors of the terms (created      have been automatically tagged with word senses (synsets)
        in step 2) which occur in the document.                    from WordNet using several state-of-the-art disambiguation
                                                                   systems. Considering our goal, we exploit only the monolin-
   The two spaces built on terms and documents have the            gual part of the task.
same dimension. We can use vectors built on word-space as             In particular, the Ad-Hoc WSD Robust task used existing
query vectors and vectors built on document-space as search        CLEF news collections, but with WSD added. The dataset
vectors. Then, we can compute the similarity between word-         comprises corpora from “Los Angeles Times” and “Glasgow
space vectors and document-space vectors by means of the           Herald”, amounting to 169,477 documents, 160 test topics
classical cosine similarity measure. In this way we imple-         and 150 training topics. The WSD data were automatically
ment an information retrieval model based on semantic vec-         added by systems from two leading research laboratories,
tors.                                                              UBC [1] and NUS [9]. Both systems returned word senses
   Figure 1 shows a word-space with two only dimensions. If        from the English WordNet, version 1.6. We used only the
those two dimensions refer respectively to LEGAL and SPORT         senses provided by NUS. Each term in the document is an-
contexts, we can note that the vector of the word soccer           notated by its senses with their respective scores, as assigned
is closer to the SPORT context than the LEGAL context, vice        by the automatic WSD system. This kind of dataset sup-
versa the word law is closer to the LEGAL context. The an-         plies WordNet synsets that are useful for the development
gle between soccer and law represents the similarity degree        of search engines that rely on disambiguation.
between the two words. It is important to emphasize that              In order to compare the IR system based on Semantic
contexts in WordSpace have no tag, thus we know that each          Vectors to other systems which cope with word ambiguity
dimension is a context, but we cannot know the kind of the
                                                                   2
context. If we consider document-space rather than word-               http://lucene.apache.org/
by means of methods based on Word Sense Disambiguation,                Topic fields                                 MAP
we provide a baseline based on SENSE. SENSE: SEmantic                  TITLE                                        0.0892
N-levels Search Engine is an IR system which relies on Word            TITLE+DESCRIPTION                            0.2141
Sense Disambiguation. SENSE is based on the N-Levels                   TITLE+DESCRIPTION+NARRATIVE                  0.2041
model [5]. This model tries to overcome the limitations of
the ranked keyword approach by introducing semantic lev-           Table 1: Semantic Vectors: Results of the performed
els, which integrate (and not simply replace) the lexical level    experiments
represented by keywords. Semantic levels provide informa-
tion about word meanings, as described in a reference dic-                      System          MAP       Imp.
tionary or other semantic resources. SENSE is able to man-                      KEYWORD         0.3962       -
age documents indexed at separate levels (keywords, word                        MEANING         0.2930   -26.04%
meanings, and so on) as well as to combine keyword search                       SENSE           0.4222   +6.56%
with semantic information provided by the other indexing                        SVbest          0.2141   -45.96%
levels. In particular, for each level:
     1. a local scoring function is used in order to weigh ele-      Table 2: Results of the performed experiments
        ments belonging to that level according to their infor-
        mative power;
                                                                   card terms that have a frequency below Tf . After a tuning
     2. a local similarity function is used in order to compute    step, we set the dimension to 2000 and Tf to 10. Tuning
        document relevance by exploiting the above-mentioned       is performed using training topics provided by the CLEF
        scores.                                                    organizers.
                                                                      Queries for the Semantic Vectors model are built using
Finally, a global ranking function is defined in order to com-
                                                                   several combinations of topic fields. Table 1 reports the re-
bine document relevance computed at each level. The SEN-
                                                                   sults of the experiments using Semantic Vectors and different
SE search engine is described in [4], while the setup of SEN-
                                                                   combinations of topic fields.
SE into the context of CLEF 2009 is thoroughly described
                                                                      To compare the systems we use a single measure of perfor-
in [7]
                                                                   mance: the Mean Average Precision (MAP), due to its good
   In CLEF, queries are represented by topics, which are
                                                                   stability and discrimination capabilities. Given the Average
structured statements representing information needs. Each
                                                                   Precision [8], that is the mean of the precision scores ob-
topic typically consists of three parts: a brief TITLE state-
                                                                   tained after retrieving each relevant document, the MAP
ment, a one-sentence DESCRIPTION, and a more complex
                                                                   is computed as the sample mean of the Average Precision
“narrative” specifying the criteria for assessing relevance. All
                                                                   scores over all topics. Zero precision is assigned to unre-
topics are available with and without WSD. Topics in En-
                                                                   trieved relevant documents.
glish are disambiguated by both UBC and NUS systems,
                                                                      Table 2 reports the results of each system involved into
yielding word senses from WordNet version 1.6.
                                                                   the experiment. The column Imp. shows the improvement
   We adopted as baseline the system which exploits only
                                                                   with respect to the baseline KEYWORD. The system SVbest
keywords during the indexing, identified by KEYWORD.
                                                                   refers to the best result obtained by Semantic Vectors re-
Regarding disambiguation we used the SENSE system adopt-
                                                                   ported in boldface in Table 1.
ing two strategies: the former, called MEANING, exploits
                                                                      The main result of the evaluation is that MEANING works
only word meanings, the latter, called SENSE, uses two lev-
                                                                   better than SVbest ; in other words disambiguation wins over
els of document representation: keywords and word mean-
                                                                   discrimination. Another important observation is that the
ings combined.
                                                                   combination of keywords and word meanings, the SENSE
   The query for the KEYWORD system is built using word
                                                                   system, obtains the best result. It is important to note that
stems in TITLE and DESCRIPTION fields of the topics.
                                                                   SVbest obtains a performance below the KEYWORD sys-
All query terms are joined adopting the OR boolean clause.
                                                                   tem, about the 46% under the baseline. It is important
Regarding the MEANING system each word in TITLE and
                                                                   to underline that the keyword level implemented in SENSE
DESCRIPTION fields is expanded using the synsets in Word-
                                                                   uses a modified version of Apache Lucene which implements
Net provided by the WSD algorithm. More details regarding
                                                                   Okapi BM25 model [14].
the evaluation of SENSE in CLEF 2009 are in [7].
                                                                      In the previous experiments we compared the performance
   The query for the SENSE system is built combining the
                                                                   of the Semantic Vectors-based IR system to SENSE. In the
strategies adopted for the KEYWORD and the MEANING
                                                                   following, we describe a new kind of experiment in which
systems. For all the runs we remove the stop words from
                                                                   we integrate the Semantic Vector as a new level in SENSE.
both the index and the topics. In particular, we build a
                                                                   The idea is to combine the results produced by Semantic
different stop words list for topics in order to remove non
                                                                   Vectors with the results which come out from both the key-
informative words such as find, reports, describe, that occur
                                                                   word level and the word meaning level. Table 3 shows that
with high frequency in topics and are poorly discriminating.
                                                                   the combination of the keyword level with Semantic Vectors
   In order to make results comparable we use the same index
                                                                   outperforms the keyword level alone.
built for the KEYWORD system to infer semantic vectors
                                                                      Moreover, the combination of Semantic Vectors with word
using the Semantic Vectors package, as described in Section
                                                                   meaning level achieves an interesting result: the combina-
2. We need to tune two parameters in Semantic Vectors:
                                                                   tion is able to outperform the word meaning level alone.
the number of dimensions (the number of contexts) and the
                                                                   Finally, the combination of Semantic Vectors with SENSE
frequency3 threshold (Tf ). The last value is used to dis-
                                                                   (keyword level+word meaning level) obtains the best MAP
3
    In this instance word frequency refers to word occurrences.    with an increase of about the 6% with respect to KEY-
            System                MAP        Imp.                   found in queries.
            SV +KEYWORD           0.4150    +4.74%                     In order to show that WordSpace model is an approach
            SV +MEANING           0.3238    -18.27%                 to ambiguity resolution that is beneficial in information re-
            SV +SENSE             0.4216    +6.41%                  trieval, we summarize the experiment presented in [16]. This
                                                                    experiment evaluates sense-based retrieval, a modification of
Table 3: Results of the experiments: combination of                 the standard vector-space model in information retrieval. In
Semantic Vectors with other levels                                  word-based retrieval, documents and queries are represented
                                                                    as vectors in a multidimensional space in which each dimen-
                                                                    sion corresponds to a word. In sense-based retrieval, docu-
WORD. However, SV does not contribute to improve the                ments and queries are also represented in a multidimensional
effectiveness of SENSE, in fact SENSE without SV (see Ta-           space, but its dimensions are senses, not words. The eval-
ble 2) outperforms SV +SENSE.                                       uation shows that sense-based retrieval improved average
   Analyzing results query by query, we discovered that for         precision by 7.4% when compared to word-based retrieval.
some queries the Semantic Vectors-based IR system achieves             Regarding the evaluation of word sense disambiguation
an high improvement wrt keyword search. This happen                 systems in the context of IR it is important to cite SemEval-
mainly when few relevant documents exist for a query. For           2007 task 1 [3]. This task is an application-driven one, where
example, query “10.2452/155-AH” has only three relevant             the application is a given cross-lingual information retrieval
documents. Both keyword and Semantic Vectors are able               system. Participants disambiguate text by assigning Word-
to retrieve all relevant documents for that query, but key-         Net synsets, then the system has to do the expansion to
word achieves 0,1484 MAP, while for Semantic Vectors MAP            other languages, the indexing of the expanded documents
grows to 0,7051. This means that Semantic Vectors are more          and the retrieval for all the languages in batch. The re-
accurate than keyword when few relevant documents exist             trieval results are taken as a measure for the effectiveness of
for a query.                                                        the disambiguation. CLEF 2009 Ad-hoc Robust WSD [2] is
                                                                    inspired to SemEval-2007 task 1.
4.   RELATED WORKS                                                     Finally, this work is strongly related to [6], in which a first
                                                                    attempt to integrate Semantic Vectors in an IR system was
   The main motivation for focusing our attention on the
                                                                    performed.
evaluation of disambiguation or discrimination systems is
the idea that ambiguity resolution can improve the perfor-
mance of IR systems.                                                5.   CONCLUSIONS AND FUTURE WORK
   Many strategies have been used to incorporate semantic              We have evaluated Semantic Vectors exploiting an infor-
information coming from electronic dictionaries into search         mation retrieval scenario. The IR system which we propose
paradigms.                                                          relies on semantic vectors to induce a WordSpace model ex-
   Query expansion with WordNet has shown to potentially            ploited during the retrieval process. Moreover we compare
improve recall, as it allows matching relevant documents            the proposed IR system with another one which exploits
even if they do not contain the exact keywords in the query         word sense disambiguation. The main outcome of this com-
[17]. On the other hand, semantic similarity measures have          parison is that disambiguation works better than discrimi-
the potential to redefine the similarity between a document         nation. This is a counterintuitive result: indeed it should
and a user query [10]. The semantic similarity between con-         be obvious that discrimination is better than disambigua-
cepts is useful to understand how similar are the meanings          tion. Since, the former is able to infer the usages of a word
of the concepts. However, computing the degree of relevance         directly from documents, while disambiguation works on a
of a document with respect to a query means computing the           fixed distinction of word meanings encoded into the sense
similarity among all the synsets of the document and all the        inventory such as WordNet.
synsets of the user query, thus the matching process could             It is important to note that the dataset used for the evalu-
have very high computational costs.                                 ation depends on the method adopted to compute document
   In [12] the authors performed a shift of representation          relevance, in this case the pooling techniques. This means
from a lexical space, where each dimension is represented           that the results submitted by the groups participating in the
by a term, towards a semantic space, where each dimen-              previous ad hoc tasks are used to form a pool of documents
sion is represented by a concept expressed using WordNet            for each topic by collecting the highly ranked documents.
synsets. Then, they applied the Vector Space Model to               What we want to underline here is that generally the sys-
WordNet synsets. The realization of the semantic tf-idf             tems taken into account rely on keywords. This can produce
model was rather simple, because it was sufficient to index         relevance judgements that do not take into account evidence
the documents or the user-query by using strings represent-         provided by other features, such as word meanings or con-
ing synsets. The retrieval phase is similar to the classic tf-idf   text vectors. Moreover, distributional semantics methods,
model, with the only difference that matching is carried out        such as Semantic Vectors, do not provide a formal descrip-
between synsets.                                                    tion of why two terms or documents are similar. The se-
   Concerning the discrimination methods, in [11] some ex-          mantic associations derived by Semantic Vectors are similar
periments in IR context adopting LSI technique are reported.        to how human estimates similarity between terms or docu-
In particular this method performs better than canonical            ments. It is not clear if current evaluation methods are able
vector space when queries and relevant documents do not             to detect these cognitive aspects typical of human thinking.
share many words. In this case LSI takes advantage of the           More investigation on the strategy adopted for the evalua-
implicit higher-order structure in the association of terms         tion is needed. As future work we intend to exploit several
with documents (“semantic structure”) in order to improve           discrimination methods, such as Latent Semantic Indexing
the detection of relevant documents on the basis of terms           and Hyperspace Analogue to Language.
6.   REFERENCES                                                   Equivalence and Entailment, pages 13–18, Ann Arbor,
                                                                  Michigan, June 2005. Association for Computational
 [1] E. Agirre and O. L. de Lacalle. BC-ALM: Combining            Linguistics.
     k-NN with SVD for WSD. In Proceedings of the 4th
                                                             [11] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K.
     International Workshop on Semantic Evaluations
                                                                  Landauer, and R. Harshman. Indexing by latent
     (SemEval 2007), Prague, Czech Republic, pages
                                                                  semantic analysis. Journal of the American Society for
     341–325, 2007.
                                                                  Information Science, 41:391–407, 1990.
 [2] E. Agirre, G. M. Di Nunzio, T. Mandl, and A. Otegi.
                                                             [12] J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran.
     CLEF 2009 Ad Hoc Track Overview: Robust - WSD
                                                                  Indexing with WordNet synsets can improve text
     Task. In Working notes for the CLEF 2009 Workshop,
                                                                  retrieval. In Proceedings of the COLING/ACL, pages
     2009.
                                                                  38–44, 1998.
     http://clef-campaign.org/2009/working notes/agirre-
     robustWSDtask-paperCLEF2009.pdf.                        [13] P. Kanerva. Sparse Distributed Memory. MIT Press,
                                                                  1988.
 [3] E. Agirre, B. Magnini, O. L. de Lacalle, A. Otegi,
     G. Rigau, and P. Vossen. SemEval-2007 Task 1:           [14] S. Robertson, H. Zaragoza, and M. Taylor. Simple
     Evaluating WSD on Cross-Language Information                 bm25 extension to multiple weighted fields. In CIKM
     Retrieval. In Proceedings of the 4th International           ’04: Proceedings of the thirteenth ACM international
     Workshop on Semantic Evaluations (SemEval 2007),             conference on Information and knowledge
     Prague, Czech Republic, pages 7–12. ACL, 2007.               management, pages 42–49, New York, NY, USA, 2004.
                                                                  ACM.
 [4] P. Basile, A. Caputo, M. de Gemmis, A. L. Gentile,
     P. Lops, and G. Semeraro. Improving Ranked              [15] M. Sahlgren. The Word-Space Model: Using
     Keyword Search with SENSE: SEmantic N-levels                 distributional analysis to represent syntagmatic and
     Search Engine. Communications of SIWN (formerly:             paradigmatic relations between words in
     System and Information Sciences Notes), special issue        high-dimensional vector spaces. PhD thesis,
     on DART 2008, 5:39–45, August 2008. SIWN: The                Stockholm: Stockholm University, Faculty of
     Systemics and Informatics World Network.                     Humanities, Department of Linguistics, 2006.
 [5] P. Basile, A. Caputo, A. L. Gentile, M. Degemmis,       [16] H. Schütze and J. O. Pedersen. Information retrieval
     P. Lops, and G. Semeraro. Enhancing Semantic Search          based on word senses. In In Proceedings of the 4th
     using N-Levels Document Representation. In                   Annual Symposium on Document Analysis and
     S. Bloehdorn, M. Grobelnik, P. Mika, and D. T. Tran,         Information Retrieval, pages 161–175, 1995.
     editors, Proceedings of the Workshop on Semantic        [17] E. M. Voorhees. WordNet: An Electronic Lexical
     Search (SemSearch 2008) at the 5th European                  Database, chapter Using WordNet for text retrieval,
     Semantic Web Conference (ESWC 2008), Tenerife,               pages 285–304. Cambridge (Mass.): The MIT Press,
     Spain, June 2nd, 2008, volume 334 of CEUR                    1998.
     Workshop Proceedings, pages 29–43. CEUR-WS.org,         [18] D. Widdows and K. Ferraro. Semantic Vectors: A
     2008.                                                        Scalable Open Source Package and Online Technology
 [6] P. Basile, A. Caputo, and G. Semeraro. Exploiting            Management Application. In Proceedings of the 6th
     Disambiguation and Discrimination in Information             International Conference on Language Resources and
     Retrieval Systems. In Proceedings of the 2008                Evaluation (LREC 2008), 2008.
     IEEE/WIC/ACM International Conference on Web
     Intelligence and International Conference on
     Intelligent Agent Technology - Workshops, Milan,
     Italy, 15-18 September 2009, pages 539–542. IEEE,
     2009.
 [7] P. Basile, A. Caputo, and G. Semeraro.
     UNIBA-SENSE @ CLEF 2009: Robust WSD task. In
     Working notes for the CLEF 2009 Workshop, 2009.
     http://clef-campaign.org/2009/working notes/basile-
     paperCLEF2009.pdf.
 [8] C. Buckley and E. M. Voorhees. Evaluating evaluation
     measure stability. In SIGIR ’00: Proceedings of the
     23rd annual international ACM SIGIR conference on
     Research and development in information retrieval,
     pages 33–40, New York, NY, USA, 2000. ACM.
 [9] Y. S. Chan, H. T. Ng, and Z. Zhong. NUS-PT:
     Exploiting Parallel Texts for Word Sense
     Disambiguation in the English All-Words Tasks. In
     Proceedings of the 4th International Workshop on
     Semantic Evaluations (SemEval 2007), Prague, Czech
     Republic, pages 253–256, 2007.
[10] C. Corley and R. Mihalcea. Measuring the semantic
     similarity of texts. In Proceedings of the ACL
     Workshop on Empirical Modeling of Semantic