UniGe at CLEF 2009 Robust WSD Task Jacques Guyot, Gilles Falquet, Saïd Radhouani Computer Science Center, University of Geneva - Route de Drize 7, 1227 Carouge, Switzerland {Jacques.Guyot, Gilles.Falquet, Said.Radhouani}@unige.ch For our second participation to the Robust Word Sense Disambiguation (WSD) Task, we focused on performing a deep analysis of the ambiguity issue in the field of Information Retrieval. During the 2008 edition, we noted that although the WSD corpus allowed lifting lexical ambiguities, our results based on the corpus' WSD were not clearly better than those based on words only. We showed that lexical ambiguity was an issue only when queries included only one or possibly two words, but whenever the query was "longer", its words created a context that implicitly decreased lexical ambiguities. We thought we had a domain ambiguity problem, i.e. the retrieved documents did contain some of the query’s words but they turned out to be irrelevant. Thus, we tried to expand the query’s vocabulary in the following way: 1) On the basis of the query’s titles, we queried the Web (using Google’s Search Engine) and selected the 50 top retrieved documents; 2) We downloaded those documents and kept only the text; 3) Then we trained a supervised classifier by associating the document classes to the query numbers; 4) Finally we extracted the 50 most classifying words for each query. Thus, for each query, we produced a list of related words; for example, in query148, which deals about the hole in the ozone layer, we got the following words: “ozone layer stratosphere cfcs ultraviolet depleting chlorofluorocarbons depletion atmosphere antarctic chlorine montreal chemicals cataracts stratospheric atmospheric rays gases hole deplete molecules damaging harmful compounds substances protocol phased cfc aerosols antarctica troposphere climate hcfc nitrogen molecule protective volcanic bromide halons temperatures arctic hydrochlorofluorocar warming bromine conditioners dioxide thinning refrigerants wmo volcanoes.” This word list was used (after withdrawing the words in the query’s title) to re-formulate a query (with an OR operator between the words) on the CLEF corpus. We did not apply any "feedback relevance" method on the results. The answers (called DOM) contained about 60% of correct documents but the average precision fell to a level slightly above 10%. Then we ranked again those retrieved documents by directly querying the CLEF query on the corpus. The documents included in the DOM answers were promoted to the top of the list while the other ones were pushed down. This new ranking was meant to eliminate the documents which are outside the domain, thus improving the precision. The results showed that this process had virtually no impact, as almost all the documents on top of the list belonged to the query domain. Therefore the hypothesis of a domain ambiguity must probably be rejected. In another experience (WEB), we used the list of words which define the domain to expand the CLEF query. This had a negative impact on the precision: the additional words seem to "dilute" the original question. For an easier analysis of the answers, we converted the results into hypertext, thus allowing for a quick access to the text of the referenced documents. A detailed analysis of the answers showed that the ambiguity was in fact of semantic nature. The "right" or "wrong" documents were not differentiated by the words they contained: both included words from the query and from the domain. Thus the WHAT aspect (the topic) was equivalent. However, the HOW aspect (how people talked about the topic) was different and required a semantic “understanding” of the text. For instance, in the query dealing with the fourth victory of Indurain in the Tour de France (we were looking for documents relating to the reactions to this victory), all the answers were linked to the victory but some of them related to its anticipation while others were referring to it after it occurred. A human being can easily tell the "right" answers because of their experience of reactions to a victory in a bike contest. Therefore, in order to significantly improve the performance, we believe the problem should be addressed with methods allowing to introduce semantic elements.