<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MIRACLE Question Answering System for Spanish at CLEF 2007∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>C´esar de Pablo-S´anchez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jos´e Luis Mart´ınez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Garc´ıa-Ledesma</string-name>
          <email>ana@maria.lllf.uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Doaa Samy</string-name>
          <email>dsamy@inf.uc3m.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paloma Mart´ınez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Antonio Moreno-Sandoval, Harith Al-Jumaily</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DAEDALUS</institution>
          ,
          <addr-line>Data, Decisions and Systems S.A</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Aut ́onoma de Madrid</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the system developed by MIRACLE group to participate in the Spanish monolingual question answering task at QA@CLEF 2007. A basic subsystem, similar to our last year participation, was used separately for EFE and Wikipedia collection. Answers from the two subsystems are combined using temporal information from the questions and the collections. The system is also enhanced with a correference module that processes question series based on a few simple heuristics that constraint the structure of the dialogue. The analysis of the results show that the reuse of strategies for factoids is feasible but definitions would benefit from adaptation. Regarding questions series, our heuristics have good coverage but we should find alternatives to avoid error chaining from previous questions.</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 Content Analysis and Indexing</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 Information Search and Retrieval</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 Systems and Software</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>7 Digital Libraries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        QA@CLEF 2007 has introduced two innovations over last year evaluation. The first change consists
on topic-related questions, a series of ordered questions about a common topic that simulates
the dialogue between a user and the system to obtain information related to the topic. In this
dialogue the user could introduce anaphoric expressions to refer to mentioned entities or events
that appear in previous answers or questions. The second innovation is related to the inclusion of
the November 2006 dump of the Wikipedia [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as a source of answers in addition to the classic
newspaper collections.
      </p>
      <p>
        MIRACLE submitted a run for the Spanish monolingual task using a system that was enhanced
to answer questions from Wikipedia and EFE collections. Each subsystem results were combined
in a unified ranked list. The system also included a module that handles topic related questions.
It identifies the topic and use it to enhance the representation of the following questions. The
basic QA system was based on the architecture of our last year submission [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] although almost
all components have evolved since then. This system was based on the use of filters for semantic
information and therefore is tailored for factual questions. Last year we also tried to improve it
for temporally restricted questions. An additional requirement has been to develop a fast system
that could be competitive in real time with a classic information retrieval system.
      </p>
      <p>This paper is structured as follow, the next section decribes the system architecture with special
attention to the new modules. Section 3 introduces the results and a preliminary analysis of the
kind of errors that the system made. Conclusions and directions of future work to solve the main
problems follow in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Overview</title>
      <p>
        The architecture of the system used this year is presented in figure 1 and it is composed of two
streams for each of the sources similar to [
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ]. The first stream uses the EFE newswire collection
as a source of answers while the second uses Wikipedia. Each stream produces a ranked list of
answers that are merged and combined by the Answer Source Mixer component described below.
The two QA streams share a similar basic pipeline architecture and work as an independent QA
system, with different configuration parameters, collections, etc. The way we perform question
analysis is common to the two streams and therefore, when the two streams are composed, this
module is shared. Another new common module complements question analysis for managing
context and anaphora resolution in topic-related question series.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Basic system architecture</title>
        <p>
          The basic system follows the classic pipeline architecture used by many other QA systems [
          <xref ref-type="bibr" rid="ref11 ref8">8, 11</xref>
          ].
The different operations are split between those that are performed online and offline.
2.1.1
        </p>
        <sec id="sec-2-1-1">
          <title>Offline operations</title>
          <p>These operations are performed during the preparation of the collection to speed up online
document retrieval and answer extraction.</p>
          <p>
            • Collection indexing. Collections are indexed at the word level using Lucene [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. They
are processed to extract the text that will be indexed. In the case of Wikipedia we remove
format and links and therefore we do not use this information for relevance. Documents are
indexed after tokenization, stopword removal and stemming based on Snowball [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ].
• Collection processing. In order to speed the process of answer extraction we can enable
the use of preprocessed collections. In this case, collections are analyzed using the output
of language tools or services. For Spanish we use DAEDALUS STILUS[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] analyzers that
provide tokenization, sentence detection and token analysis. Token analysis include detailed
part of speech analysis, lemmatization and semantic information. STILUS has been improved
from last year to support part of speech tagging. Regarding the semantic information, we
have use STILUS Named Entities tagging, that is based on linguistic resources organized in
a classification inspired by Sekine’s typology[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. The processed collection is stored on disk.
Without compression the collection is about 10 times larger than the corresponding original.
Online operations has been described in detail in previous participations [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
description of the three main modules here and remark only the main changes.
We outline the
• Question analysis. Questions are analyzed using STILUS online and the module produce a
custom representation of the question that includes focus, theme, relevant and query terms,
question type and expected answer type (EAT).
• Passage retrieval. We have change this module to use Lucene as information retrieval
engine. The question reresentation is used to build a query using Lucene syntax and relevant
documents are retrieved. Lucene uses a vector model for document representation and cosine
similarity for ranking. Documents are analyzed and only sentences that contain a number
of relevant terms are selected for the next step.
• Answer selection. This module uses the question class and the expected answer type to
select an appropiate filter to locate candidate answers. Extracted candidates are grouped
if they are similar and ranked using a score that combines redundancy, sentence score and
document score.
2.2
2.2.1
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>New modules</title>
        <sec id="sec-2-2-1">
          <title>Group topic identification in question series</title>
          <p>
            With the inclusion of topic related questions, the system needed a method to solve referential
expressions that appear between questions and answers in the same series. The system processes
the first question and generates a set of candidates with the topic, the focus and the future answer.
We have implemented a few rules that we believed covered the most common cases to select the
best topic for the whole group. The rules use the information available after question analysis
and simplified assumptions about the syntactic structure of the questions. The rules to locate the
topic for the question series are only applied to the first question:
• Answers of subtypes NUMEX (numbers and quantities) are ignored as topics for questions
series. We believe it is improbable for a number, if not representing any other type of entity,
to be a natural topic. We use the subject, the topic of this question, as the topic of the
question series.
• Answers of subtypes TIMEX (dates, years, etc...) are also ignored as topics, but in this case
it is because we believed that they fall outside the guidelines for this year. We use the same
rule than for NUMEX so far. Using a temporal expresion as a topic is very natural. In fact,
most temporally restricted questions and specially those that have an event as restrictions
are naturally split into two questions. This strategy has already been used by [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] to answer
this kind of questions. We would need to recognize referential expressions like ese an˜o (that
year), durante ese per´ıodo (during that period), etc. and solve the reference correctly in a
second step.
• The question ask for a definition like Qui´en es George Bush? (Who is George Bush?) will
add the topic (Named Entity) and the answer (presidente de los Estados Unidos) to the
group topic. We have a similar case when we have questions like Qui´en es el presidente de
los Estados Unidos? ( Who is the president of the United States?).
• The question follows the pattern Qu´e NP * ? (What NP * ?) ” like Qu´e organizaci´on se
fund´o en 1995? (Which organization was created in 1995?). In this cases the noun group
that follow the interrogative article is the focus of the question. Both the answer and the
focus would be added to the group topic. We should remark that this case is different from
a question beginning with a preposition like En qu´e lugar... ? (In which place...?).
• For the rest of the classes we use the answer as the topic for the rest of the group.
          </p>
          <p>Once the topic for the group is identified, the rest of the questions use it as an additional
relevant term in order to locate documents and filter relevant sentences. This is obviously a
problem when the topic was the answer but the system was not able to find the right one.</p>
          <p>These rules are based on the structure of the information seeking dialogue and how we introduce
new topics in conversation. Our rules select the most promising candidate using the firt question
and ignoring the rest. Nevertheless, it is posible to shift the topic by introducing a longer referential
expresion like the examples mentioned for TIMEX. We plan to investigate how to modify our
procedure to work in two steps, generating a ranking list of cantidates and selecting the best
candidate depending on constraints expressed in the referential expression.
2.2.2</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Combining EFE and Wikipedia answers</title>
          <p>As already mentioned, this year there are two possible collections to find the right answer to a
question, one is Wikipedia and the other is the EFE newswire collection for years 1994 and 1995.
This means that there should be some automatic method to decide which of these sources should
be more relevant to extract candidate answers to a given question. This is the role of the Source
Mixer component, based on very simple heuristics:
• If the verb of the question appears in present tense preference is given to answers appearing
in the Wikipedia collection.
• If the verb is in past tense and the question makes reference to the period covered by the
EFE news collection, i.e., 1994 or 1995 years appearing in the question, then preference is
given to answers from the EFE news collection.</p>
          <p>The preference given to each answer is measured by two parameters, one referring to the verb
tense factor and the other referring to the time period factor. In this way, no answer is really
dropped from the candidates list but the list is reordered according to this clues. At the output
of the Source Mixer component there is a list of candidates ordered according to their source and
the information present in the question.
3
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Run description</title>
        <p>Using the system described above we have submitted one monolingual run for the Spanish subtask
of this edition. Evaluation results and combined measures are outlined in tables 1 and 2.</p>
        <p>Results are disappointing in general, as they are lower than previous years results for almost
all types, despite the inclusion of new sources like Wikipedia. Even though almost all modules
have been improved, the overall accuracy is not improving. We believe that whether the question
set is more difficult or the system was not tuned as much as needed. We do not show results for
the ten list questions because we did not implement an specific strategy. The case of definitions is
analyzed with more detail below. We believe that question series introduce additional complexity
in the task and this is reflected in the results. If we ignore question series we obtain an accuracy
around 18% for the rest of the questions, which supports this idea but reflect that even base results
are not good.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Analysis of errors</title>
        <p>We are analysing our results in order to estimate which parts of the system need further
improvement. For the moment, we are only able to present a preliminary analysis of the contents of our
submission that uses one answer per question with their supporting sentence and document. The
document returned is correct in 44% of the results, which is a lower bound to measure the
performance of the document retrieval system. It also includes errors caused by an incorrect selection
of a document collection. Given a correct document, 31% of the sentences selected with the first
answer do not really contain a correct answer. Even if the sentence is correct, the error selecting
a correct string is about 47%. These kind of error accumulates incorrect selection of a candidate,
incorrect extraction and incorrect identification of the expected answer type. Finally, at least 8
questions have found an unsupported answer, which signals that in those cases where the same
answer string has been extracted from several sentences, the score to select the most representative
answer is not working as well as expected. A more detailed analysis isolating the contribution of
the main modules will be presented in the final version of this article.</p>
        <p>Analyzing the behaviour of the system for different types we have detected that the accuracy of
definition questions has dropped dramatically. We believe that the heuristics that we have defined
to extract definitions in EFE do not work for Wikipedia. The system used to signal appositions,
nominal phrases before Named Entities and expansion of acronyms as valid definitions of persons
and organizations. In contrast, definitional sentences in Wikipedia usually are copulative, they
have a longer distance between the defined object and a valid definition and they usually are placed
in the beginning of the document. Unfortunately, we did not implement any special strategy for
these questions. If we consider the rest of the types, the behaviour is similar to previous years,
most of the factual questions with well defined Named Entities achieve a reasonable accuracy.
Questions with EAT OTHER are in contrast much harder.</p>
        <p>There were 20 topic related groups of questions that made a total of 50 questions. The system
correctly answered 5 questions of this kind, from three different groups. This behaviour was
expected as errors usually chain, specially in the case when the answer to the first question is not
correct and this happens to be the topic. We have analyze the main source of errors in order to
evaluate the correference component. Rules for topic detection are the source of errors only for
three of the cases, and therefore seems to work reasonably well although there is room for some
simple improvements. In one of these cases, the error is due to a question not correctly identified
as a NUMEX type. The rest of the errors are due to incorrect identification of the first answer
and the chaining of errors.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>We have presented the architecture of the MIRACLE QA system and the main modifications
introduced to cope with new challenges in the CLEF@QA task. Using this system we have
submitted one run for the monolingual Spanish subtask where results were lower than expected.</p>
      <p>
        The analysis of the performance across types have signalled that for some type of questions
the style of the text is an important issue. This is specially accute in the case of definitions.
We have employed the same subsystem and strategies for the EFE and the Wikipedia collections
with dissapointing results. The analysis of the errors have shown that the methods for document
retrieval and candidate extraction could be adapted to improve the accuracy. We plan to use type
speficic approaches like the ones used by [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ] and collection specific approaches to improve results
for these types.
      </p>
      <p>We have found that the module for correference resolution is effective even if it uses a limited
amount of knowledge. In contrast, the greater contribution of errors is due to the low accuracy at
the first answer. This is even more accute as a great deal of the questions that set the initial topic
are definitional. Besides type specific approaches, we need to improve the correference method to
consider more than one candidate answer and cope with uncertainty.</p>
      <p>
        Another question that we have not evaluated thoroughly yet is the way we combine results. We
use a semantic kind of combination that exploits the different time spans of the two collections.
It is still unclear if these method is appropriate or whether techniques adapted from information
retrieval from heterogeneus collections as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] would work better.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Lucene webpage</article-title>
          . http://lucene.apache.org/,
          <year>August 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Rita</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Aceves-Perez</surname>
          </string-name>
          , Manuel Montes y Gomez, and
          <string-name>
            <surname>Luis</surname>
          </string-name>
          Villasenor-Pineda.
          <article-title>Fusion de respuestas en la busqueda de respuestas multilingue</article-title>
          .
          <source>SEPLN, Sociedad Espaola para el Procesamiento del Lenguaje Natural</source>
          ,
          <volume>38</volume>
          :
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>David</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Valentin</given-names>
            <surname>Jijkoun</surname>
          </string-name>
          , Karin Mller, Maarten de Rijke, Stefan Schlobach, and
          <string-name>
            <given-names>Gilad</given-names>
            <surname>Mishne</surname>
          </string-name>
          . Making Stone Soup:
          <article-title>Evaluating a Recall-Oriented Multi-stream Question Answering System for Dutch</article-title>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Chu-Carroll</surname>
          </string-name>
          , John M. Prager, Christopher A.
          <string-name>
            <surname>Welty</surname>
            , Krzysztof Czuba, and
            <given-names>David A.</given-names>
          </string-name>
          <string-name>
            <surname>Ferrucci</surname>
          </string-name>
          .
          <article-title>A multi-strategy and multi-source approach to question answering</article-title>
          .
          <source>In TREC</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>DAEDALUS.</surname>
          </string-name>
          <article-title>Stilus website</article-title>
          . On line http://www.daedalus.es,
          <year>July 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Cesar de Pablo-Sanchez</surname>
          </string-name>
          , Ana Gonzalez-Ledesma, Antonio Moreno-Sandoval,
          <article-title>and MariaTeresa Vicente-Diez. MIRACLE experiments in QA@CLEF 2006 in spanish: main task, real-time QA and exploratory QA using wikipedia (WiQA)</article-title>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Abdessamad</given-names>
            <surname>Echihabi</surname>
          </string-name>
          , Eduard Hovy, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Fleischman</surname>
          </string-name>
          .
          <article-title>Offline strategies for online question answering:</article-title>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dan</surname>
            <given-names>I. Moldovan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanda M. Harabagiu</surname>
            , Marius Pasca, Rada Mihalcea, Roxana Girju, Richard Goodrum, and
            <given-names>Vasile</given-names>
          </string-name>
          <string-name>
            <surname>Rus</surname>
          </string-name>
          .
          <article-title>The structure and performance of an open-domain question answering system</article-title>
          .
          <source>In ACL</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Montes-y</surname>
          </string-name>
          <string-name>
            <surname>Gomez</surname>
          </string-name>
          ,
          <article-title>Luis Villasen˜or Pineda, Manuel P´erez-Coutin˜o, Jos´e Manuel Go´mez-</article-title>
          <string-name>
            <surname>Soriano</surname>
          </string-name>
          ,
          <article-title>Emilio Sanch´ıs-</article-title>
          <string-name>
            <surname>Arnal</surname>
            , and
            <given-names>Paolo</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          .
          <article-title>A full data-driven system for multiple language question answering</article-title>
          .
          <source>: Accessing Multilingual Information Repositories</source>
          , pages
          <fpage>420</fpage>
          -
          <lpage>428</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Porter</surname>
          </string-name>
          .
          <article-title>Snowball stemmers and resources website</article-title>
          . http://www.snowball.tartarus.org,
          <year>July 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>John</surname>
            <given-names>Prager</given-names>
          </string-name>
          , Eric Brown, Anni Coden, and
          <string-name>
            <given-names>Dragomir</given-names>
            <surname>Radev</surname>
          </string-name>
          .
          <article-title>Question-answering by predictive annotation</article-title>
          .
          <source>In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Question Answering</source>
          , pages
          <fpage>184</fpage>
          -
          <lpage>191</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Saquete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.L.</given-names>
            <surname>Vicedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mart</surname>
          </string-name>
          <article-title>´ınez-</article-title>
          <string-name>
            <surname>Barco</surname>
            , R. Mun˜oz, and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Llopis</surname>
          </string-name>
          .
          <article-title>Evaluation of complex temporal questions in clef-qa. : Multilingual Information Access for Text, Speech</article-title>
          and Images, pages
          <fpage>591</fpage>
          -
          <lpage>596</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Sekine</surname>
          </string-name>
          .
          <article-title>Sekine's extended named entity hierarchy</article-title>
          . On line http://nlp.cs.nyu.edu/ene/,
          <year>August 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <article-title>November 2006 dump of wikipedia</article-title>
          . http://download.wikimedia.org/images/archive/eswiki/ 20061202/pages-articles.
          <source>xml.bz2</source>
          ,
          <year>July 2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>