<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QA with a Disambiguated Document Collection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Buscaldi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso Natural Language Engineering Lab</string-name>
          <email>prossog@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Politecnica de Valencia</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This report describes our approach to the Question Answering - Word Sense Disambiguation task. In our approach, disambiguated documents are used to improve the retrieval phase: this has been implemented by adding a WordNet expanded index to the document collection. This index contains synonyms, hypernyms and holonyms of the document words. Question words are searched for in both the expanded WordNet index and the default index. The obtained results do not show any improvement over the system that do not use the disambiguated collection. However, an analysis of the results shows that the average number of passages that contains the answer for each question is too small to detect any signi cative di erence between the two systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Answering</kwd>
        <kwd>Word Sense Disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Our system is constituted by a modi ed version the QUASAR system described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], whose
search engine (JIRS) has been replaced by Lucene2. In the con guration that uses semantics, the
search index contains also terms that have been extracted from the hypernyms, holonyms, and
synonyms of the document words by means of WordNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In the following section, we describe the system. In section 3 we describe the characteristics of
our submissions and discuss the obtained results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Our QA-WSD System</title>
      <p>
        The system has some limitations due to the fact that it was developed for a past edition with
di erent guidelines: it does not include an anaphora resolution system and it cannot answer list
questions. We refer the reader to the description in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for a detailed description of the default
system. In the following we will describe the WSD-based system only.
      </p>
      <p>Previously to the indexing phase, all documents are split into sentences. These are used later
to form the passages. In the indexing phase, we create two indices: the rst one (text ) contains all
the terms of the sentence; the second one (expanded index, or wn index) contains all the synonyms
of the disambiguated words; in the case of nouns and verbs, it contains also their hypernyms. For
nouns, the holonyms (if available) are also added to the index. For instance, let us consider the
following sentence from document GH951115-000080-03:</p>
      <p>Splitting the left from the Labour Party would weaken the battle for progressive policies
inside the Labour Party.</p>
      <p>The underlined words are those that have been disambiguated in the collection. For these
words we can found their synonyms and related concepts in WordNet, as listed in Table 1.
by appending them the previous and next sentences in the collection. For instance, if the above
example were a retrieved sentence, the resulting passage would be composed by the following
sentences:</p>
      <p>GH951115-000080-2 : \The real question is how these policies are best defeated and how
the great mass of Labour voters can be won to see the need for a socialist alternative."
GH951115-000080-3 : \Splitting the left from the Labour Party would weaken the battle
for progressive policies inside the Labour Party."
GH951115-000080-4 : \It would also make it easier for Tony Blair to cut the crucial links
that remain with the trade-union movement."</p>
      <p>One noteworthy feature is that sentences retrieved with the expanded WordNet index are
shorter, therefore allowing for more precise results.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>We submitted the two mandatory runs, one with the basic system (id: nlel081enen) that does not
use semantic information, and one with the system described above (id: nlel082enen), using as
collection the NUS-disambiguated collection. Of the 200 questions in the test set, only 49 had an
answer in the disambiguated collection (the other questions had an answer in Wikipedia, which was
not featured for the QA-WSD track), according to the organizers. However, we manually checked
the data and found that it was possible to nd an answer to 25 of the Wikipedia questions, bringing
the number of questions with an answer in the collection to 74.</p>
      <p>In Table 2 we show the results obtained by the two mandatory runs and another run that used
the UBC-disambiguated collection (id: nlel083enen). The results of this last run are not o cial.</p>
      <p>From the results we can say that the base system performed generally poor, although better
than the system that included semantics. We calculated the number of question that could be
answered by our system, discarding the questions that do not have an answer in the collection and
the questions whose answer type cannot be handled by our system, resulting in a maximum of 40
questions. Even with this reduced set of questions, the system did not perform well, obtaining a
25% accuracy over this set.</p>
      <p>About the lower precision of the WSD based system, we carried out an analysis of the average
number of passages that contained the answer for each of the questions. Of the 49 questions,
only three answers were present in more than nine passages. The average number of passages
containing the answer for each question in the remaining 46 questions is 2:04. This number justify
both the bad overall performance of our system, which is based on redundancy, and the bad results
obtained by the WSD based system: it could not nd better passages simply because the relevant
passages that could be retrieved were very few.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We presented a simple approach to take advantage of the disambiguated collection provided by
the organizers of the QA-WSD track. It is based on an extended index that include synonyms,
hypernyms and holonyms extracted by means of WordNet. However, the test set provided was
not particularly t to the task, with more than 75% of the questions not containing an answer in
the collection. Moreover, the answers to questions that could be answered are contained in few
passages, with the result that it cannot be demonstrated whether the use of semantic information
proved useful or not.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We would like to thank the TIN2006-15265-C06-04 research project for partially supporting this
work.
[4] Yee Sen Chan, Hwee Tou Ng, and Zhi Zhong. US-PT: Exploiting Parallel Texts for Word
Sense Disambiguation in the English All-Words Tasks. In Proceedings of the 4th International
Workshop on Semantic Evaluations (SemEval 2007), pages 253{256, Prague, Czech republic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre and Oier Lopez de Lacalle.</surname>
          </string-name>
          UBC-ALM:
          <article-title>Combnining k-NN with SVD for WSD</article-title>
          .
          <source>In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval</source>
          <year>2007</year>
          ), pages
          <fpage>341</fpage>
          {
          <fpage>345</fpage>
          , Prague, Czech republic.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Davide</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , Jose Manuel Gomez, Paolo Rosso, and
          <string-name>
            <given-names>Emilio</given-names>
            <surname>Sanchis</surname>
          </string-name>
          .
          <article-title>N-gram vs. keywordbased passage retrieval for question answering</article-title>
          . In Carol Peters, Paul Clough, Fredric C. Gey, Jussi Karlgren, Bernardo Magnini, Douglas W. Oard, Maarten de Rijke, and Maximilian Stempfhuber, editors,
          <source>Evaluation of Multilingual and Multi-modal Information Retrieval</source>
          , volume
          <volume>4730</volume>
          of Lecture Notes in Computer Science, pages
          <volume>377</volume>
          {
          <fpage>384</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>George. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Wordnet: A lexical database for english</article-title>
          .
          <source>In Communications of the ACM</source>
          , volume
          <volume>38</volume>
          , pages
          <fpage>39</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>