=Paper=
{{Paper
|id=Vol-1169/CLEF2003wn-QACLEF-VicedoEt2003
|storemode=property
|title=Question Answering in Spanish
|pdfUrl=https://ceur-ws.org/Vol-1169/CLEF2003wn-QACLEF-VicedoEt2003.pdf
|volume=Vol-1169
|dblpUrl=https://dblp.org/rec/conf/clef/GonzalezILM03a
}}
==Question Answering in Spanish==
<pdf width="1500px">https://ceur-ws.org/Vol-1169/CLEF2003wn-QACLEF-VicedoEt2003.pdf</pdf>
<pre>
                Question Answering in Spanish
      Jose L. Vicedo, Ruben Izquierdo, Fernando Llopis, and Rafael Mu~noz
                 Departamento de Lenguajes y Sistemas Informaticos
                           University of Alicante, Spain
                     fvicedo,rib1,llopis,rafaelg@dlsi.ua.es


        Abstract. This paper describes the architecture, operation and results
        obtained with the Question Answering prototype for Spanish developed
        in the Department of Language Processing and Information Systems
        at the University of Alicante for CLEF-2003 Spanish monolingual QA
        evaluation task. Our system has been fully developed from scratch and
        it combines shallow natural language processing tools with statistical
        data redundancy techniques. This system is able to perform QA tasks
        independently from static corpora or from Web documents. Moreover,
        World Wide Web can be used as external resource to obtain evidences
        for supporting and complementing CLEF Spanish corpora.


1     Introduction
Open domain QA systems are de ned as tools capable of extracting the answer
to user queries directly from unrestricted domain documents. Investigation in
question answering has been traditionally focussed to English language mainly
fostered by TREC1 evaluations. However, the developing of QA systems for other
languages than English was considered by the QA Roadmap Committee as one
of the main lines of future investigations in this eld [2]. Moreover, it considered
essential obtaining systems that perform QA from sources of information written
in di erent languages.
    As result of this interest, the Cross-Language Evaluation Forum2 (CLEF
2003), has organised a new task (Multiple Language Question Answering ) guided
to the evaluation of QA systems in several languages. This evaluation proposes
several subtasks: monolingual Spanish, Italian and Dutch QA and bilingual QA.
The bilingual subtask is designed to measure systems ability in nding answers
in a collection of English texts, when questions are posed in Spanish, Italian,
Dutch, German or French.
    The main characteristics of this rst evaluation are similar to those proposed
in past TREC Conferences. For each subtask, the organisation provides 200
questions requiring short, factual answers whose answer is not guaranteed to
occur in the document collection. Systems should return up to three responses
per question, and answers should be ordered by con dence. Responses have to
1
    Text REtrieval Conference
2
    http://clef-qa.itc.it/
be associated to the document they are found in. A response can be either a
[answer-string, docid ] pair or the string \NIL" when the systems do not nd
a correct answer in the document collection. The \NIL" string is considered
correct if there is no answer known to exist in the document collection; otherwise
it is judged as incorrect. Two di erent kinds of answers are accepted: the exact
answer or a 50 bytes long string that should contain the exact answer.
     Our participation has been restricted to the Spanish monolingual task in the
category of exact answers. Although we have experience in past TREC compe-
titions [4{6], we decided to build a new system mainly due to the big di erences
between English and Spanish languages. Moreover, we designed a very simple
approach (1 person month) that will facilitate later error analysis and will allow
detecting those basic language-dependent characteristics that make Spanish QA
di erent from English QA
     This paper is organised as follows: Section 2 describes the structure and
operation of our Spanish QA system. Afterwards, we present and analyse the
results obtained at CLEF QA Spanish monolingual task. Finally we extract
initial conclusions and discuss directions for future work.

2     System Description
Our QA system is structured into the three main modules of a general QA system
architecture:
1. Question analysis.
2. Passage retrieval.
3. Answer extraction.
    Question analysis is the rst stage in QA process. This module processes
questions formulated to the system in order to detect and extract the useful
information they contain. This information is represented in a form that allows
to be easily processed by the remaining modules. Passage retrieval module ac-
complishes a rst selection of relevant passages. This process is accomplished in
parallel retrieving relevant passages from the Spanish EFE document collection
and the Spanish pages in the World Wide Web. Finally, the answer selection
module processes relevant passages in order to locate and extract the nal an-
swer. Figure 1 shows system architecture.

2.1   Question analysis

Question analysis module carries out two main processes: answer type classi ca-
tion and keyword selection. The former detects the type of information that the
question expects as answer (a date, a quantity, etc) and the latter selects those
question terms (keywords ) that will allow locating the documents that are likely
to contain the answer.
    These processes are performed by using a simple manually developed set
of lexical patterns. Each pattern is associated with its corresponding expected
                                               Question


                                            Question Analysis


                                   IR-n                          Google
                                Passage                         Passage
                                Retrieval                       Retrieval


                     Relevant passages                               Relevant passages


                                            Answer Extraction


                                               Answers


                                  Fig.1. System architecture
answer type. This way, once a pattern matches the question posed to the system,
this process returns both, the list of keywords associated with the question and
the type of the expected answer associated to the matched pattern. As our system
lacks of a named-entity tagger, it currently only copes with three possible answer
types: NUMBER, DATE and OTHER. Figure 2 shows examples of the patterns
and the output generated at question analysis stage for test questions 002, 006
and 103.

2.2    Passage retrieval


Passage retrieval stage is accomplished in parallel using two di erent search
engines: IR-n [3] and Google3 .
    IR-n system is a passage retrieval system that uses groups of contiguous
sentences as unit of information. From QA perspective, this passage extraction
model allows us to bene t from the advantages of discourse-based passage re-
trieval models since self-contained information units of text, such as sentences,
are used for building the passages. First, IR-n system performs passage retrieval
over the entire Spanish EFE document collection. In this case, keywords detected
at question analysis stage are processed using MACO Spanish lemmatiser [1] and
their corresponding lemmas are used for retrieving the 50 most relevant passages
3
    http://www.google.com/
       Question 002   ¿Qué país invadió Kuwait en 1990?
       Pattern        (qué|Qué)\s+([a-z|áéíóúñ]+)
       Answer type    OTHER
       Keywords       país invadió Kuwait 1990
       Lemmas         país invadir Kuwait 1990

       Question 006 ¿Cuándo decidió Naciones Unidas imponer el embargo sobre Irak?
       Pattern      (cuándo|Cuándo)\s+
       Answer type DATE
       Keywords     decidió Naciones Unidas imponer embargo Irak
       Lemmas       decidir Naciones Unidas imponer embargo Irak

       Question 103 ¿De cuántas muertes son responsables los Jemeres Rojos?
       Pattern      (Cuántos|cuántos|Cuántas|cuántas)\s+([a-z|áéíóúñ]+)
       Answer type NUMBER
       Keywords     muertes responsables Jemeres Rojos
       Lemmas       muerte responsable Jemeres Rojos


                           Fig.2. Question analysis example
from the EFE document database. These passages are made up by text snippets
of 2 sentences length. Second, the same keyword list (without being lemmatised)
is posed to Google Internet search engine. Relevant documents are not down-
loaded. For eÆciency considerations, the system only selects the 50 best short
summaries returned in Google main retrieval pages. Figure 3 shows examples of
retrieved passages for question 103. In this example question keywords found in
relevant passages are underlined.

2.3   Answer extraction


This module processes both sets of passages selected at passage retrieval stage
(IR-n and Google) in order to detect and extract the three more probable answers
to the query. The processes involved at this stage are the following:
1. Relevant sentence selection. Sentences in relevant passages are selected and
   scored.
   (a) Passages are split into sentences.
   (b) Each sentence is scored according to the number of question keywords
       they contain. Keywords appearing twice or more times are only added
       once. This value (sentence score ) measures the similarity between each
       relevant sentence and the question.
   (c) Sentences that do not contain any keyword are discarded (sentence score
       = 0 ).
2. Candidate answer selection. Candidate answers are selected from relevant
   sentences.
   (a) Relevant sentences are tagged using MACO lemmatizer.
      Question 103         ¿De cuántas muertes son responsables los Jemeres Rojos?

        First retrieved passage from EFE Collection:

      <DOCNO> EFE19940913-06889
       ... explotan los Jemeres Rojos, quienes no les preocupa que sus
      ideas no sean respetadas por la comunidad internacional, que los
      acusa de ser los responsables de la muerte de más de un millón de
      camboyanos durante el genocidio de 1975 1978.

        First retrieved passage from the World Wide Web:

      <DOCNO> 1 Gooogle
         Los Jemeres Rojos fueron responsables de más de un millón de
      muertes, mataron al menos a 20.000 presos políticos y torturaron a
      cientos de miles de personas.


                      Fig.3. Passages retrieved for question 103
   (b) Quantities, dates and proper noun sequences are detected and they are
       merged into unique expressions.
   (c) Every term or merged expression in relevant sentences is considered a
       candidate answer.
   (d) Candidate answers are ltered. This process gets rid of those candidates
       that start of nish with a stopword or contain a question keyword.
   (e) From the remaining candidate set, only those whose semantic type matches
       the expected answer type are selected. When the expected answer type
       is OTHER, only proper noun phrases are selected as nal candidate an-
       swers. Figure 3 shows (in boldface) the selected answer candidates for
       question 103.
3. Candidate answer combination. Each answer candidate is assigned a score
   that measures its probability of being the correct answer (answer frequency ).
   As the same candidate answer can probably be found in di erent relevant
   sentences, the candidate answer set may contain repeated elements. Our
   system exploits this fact by relating candidate redundancy with answer cor-
   rectness as follows:
   (a) Repeated candidate answers are merged into a unique expression that is
       scored according to the number of times this candidate appears in the
       candidate answer set.
   (b) Shorter expressions are preferred as answer to longer ones. This way,
       terms in long candidates that appear themselves as answer candidates
       boost shorter candidate answer scores by adding long candidate scores
       to the frequency value obtained by shorter ones.
4. Web evidence addition. All previous processes may be optionally performed
   in parallel for retrieving answers from web documents. Therefore, at this
   moment the system has two lists of candidate answers: one obtained from
   EFE document set and another from available Spanish web documents. If
   web retrieval has been activated, candidate answer lists are merged. This
   process consists on increasing answer frequency of EFE list candidates by
   adding their corresponding frequency values obtained on web list. This way,
   candidates appearing only in web list are discarded.
5. Final answer selection. Answer candidates from previous steps are given a
    nal score (answer score ) that measures two circumstances: (1) their redun-
   dancy through the answer extraction process (answer frequency ) and (2)
   the context they have been found in (sentence score ). As the same candi-
   date answer may be found in di erent contexts, an answer will maintain the
   maximum score for all the contexts they appear in. Final answer score is
   computed as follows:

                answer score = sentence score  answer f requency             (1)

    Answers are then ranked accordingly to their answer score and rst three
    answers are selected for presentation. Among the candidate answers for ques-
    tion 103 (example in Figure 3), the system selects \un millon " (one million)
    as the nal answer.


3   Results

We submitted two runs for exact answer category. First run (alicex031ms ) was
obtained applying the whole system described above while second run performed
QA process without activating Web retrieval (alicex032ms ). Table 1 shows the
results obtained for each run.

                  Table 1. Spanish monolingual task results
                               Strict            Lenient
                Run      MRR % Correct MRR % Correct
                alicex031ms 0,3075     40,0    0,3208     43,5
                alicex032ms 0,2966     35,0    0,3175     38,5


    Result analysis may not be as conclusive as we would desire mainly due to
the simplicity of our approach. Besides, the lack of the correct answers for test
questions at this moment do not allow us to perform a correct error analysis.
Anyway, results obtained show that using the World Wide Web as external
resource increases the percentage of correct answers retrieved in ve points. This
fact con rms that QA systems performance for other languages than English can
also bene t from this resource.
4    Future work
This work has to be seen as a rst and simple attempt to perform QA in Spanish.
Consequently, there are several areas of future work to be investigated. Among
them, we can select the following ones:
 { Question analysis. Since the same question can be formulated in very diverse
   forms (interrogative, aÆrmative, using di erent words and structures,. . . ),
   we need to study aspects such as recognizing equivalent questions regardless
   of the speech act or of the words, syntactic and semantic inter-relations or
   idiomatic forms employed.
 { Answer taxonomy. An important part in the process of question interpre-
   tation resides in systems ability of relating questions with their respective
   answers characteristics. Consequently, we need to develop a broad answer
   taxonomy that enables multilingual answer type classi cation. Probably, us-
   ing EuroWordNet4 semantic net structure.
 { Passage Retrieval. An enhanced question analysis will improve passage re-
   trieval performance by including question expansion techniques that enable
   retrieving passages including relevant information expressed with terms that
   are di erent (but equivalent) to those used for question formulation.
 { Answer Extraction. Integrating named-entity taggers. Using a broad answer
   taxonomy involves using tools capable of identifying the entity that a ques-
   tion expects as answer. Therefore we need to integrate named-entity tagging
   capabilities that allows to narrow down the number of candidates to be con-
   sidered for answering a question.
    Even though all these lines need to be investigated, it is important to remark
that this investigation needs to be developed from a multilingual perspective.
That is, future investigations need to address language-dependent and language-
independent modules detection and combination with the main long-term objec-
tive of developing a whole system capable of performing multilingual question
answering.

References
1. Jordi Atserias, Josep Carmona, Irene Castellon, Sergi Cervell, Montse Civit, Llus
   Marquez, M.A. Mart, Llus Padro, Roser Placer, Horacio Rodrguez, Mariona Taule,
   and Jordi Turmo. Morphosyntactic Analysis and Parsing of Unrestricted Spanish
   Text. In Proceedings of First International Conference on Language Resources and
   Evaluation. LREC'98, pages 1267{1272, Granada, Spain, 1998.
2. John Burger, Claire Cardie, Vinay Chaudhri, Robert Gaizauskas, Sanda Harabagiu,
   David Israel, Christian Jacquemin, Chin-Yew Lin, Steve Maiorano, George Miller,
   Dan Moldovan, Bill Ogden, John Prager, Ellen Rilo , Amit Singhal, Rohini Shri-
   hari, Tomek Strzalkowski, Ellen Voorhees, and Ralph Weishedel. Issues, Tasks
   and Program Structures to Roadmap Research in Question & Answering (Q&A).
   http://www-nlpir.nist.gov/projects/duc/papers/qa.Roadmap-paper v2.doc, 2000.
 4
   http://www.dcs.shef.ac.uk/nlp/funded/eurowordnet.html
3. Fernando Llopis, Jose L. Vicedo, and Antonio Ferrandez. IR-n system, a passage
   retrieval systema at CLEF 2001. In Workshop of Cross-Language Evaluation Fo-
   rum (CLEF 2001), Lecture notes in Computer Science, Darmstadt, Germany, 2001.
   Springer-Verlag.
4. Jose Luis Vicedo and Antonio Ferrandez. A semantic approach to Question Answer-
   ing systems. In Ninth Text REtrieval Conference, volume 500-249 of NIST Special
   Publication, pages 511{516, Gaithersburg, USA, nov 2000. National Institute of
   Standards and Technology.
5. Jose Luis Vicedo, Antonio Ferrandez, and Fernando Llopis. University of Alicante
   at TREC-10. In Tenth Text REtrieval Conference, volume 500-250 of NIST Spe-
   cial Publication, Gaithersburg, USA, nov 2001. National Institute of Standards and
   Technology.
6. Jose Luis Vicedo, Fernando Llopis, and Antonio Ferrandez. University of Alicante
   Experiments at TREC-2002. In Eleventh Text REtrieval Conference, volume 500-
   251 of NIST Special Publication, Gaithersburg, USA, nov 2002. National Institute
   of Standards and Technology.

</pre>