=Paper= {{Paper |id=Vol-1176/CLEF2010wn-MLQA10-TannierEt2010 |storemode=property |title=FIDJI @ ResPubliQA 2010 |pdfUrl=https://ceur-ws.org/Vol-1176/CLEF2010wn-MLQA10-TannierEt2010.pdf |volume=Vol-1176 }} ==FIDJI @ ResPubliQA 2010== https://ceur-ws.org/Vol-1176/CLEF2010wn-MLQA10-TannierEt2010.pdf
                     FIDJI  ResPubliQA'10


                      Xavier Tannier, Véronique Mori eau


                                  LIMSI-CNRS
                          Univ. Paris-Sud, Orsay, Fran e
                         xtannier, mori eaulimsi.fr



      Abstra t.   In this paper, we present the results obtained by the sys-
      tem FIDJI for both Fren h and English monolingual evaluations,
      at ResPubliQA 2010       ampaign. In this    ampaign, we fo used on
       arrying on our evaluations   on erning the       ontribution of our syn-
      ta ti   modules with this spe i     olle tion.




1    Introdu tion


FIDJI (Finding In Do uments Justi ations and Inferen es) is an open-domain
question-answering (QA) system for Fren h [1℄ and, more re ently, English. It
 ombines synta ti   information with traditional QA te hniques su h as named
entity re ognition and term weighting in order to validate answers through dif-
ferent do uments.
    This paper fo uses on the results obtained by FIDJI at ResPubliQA 2010
evaluation. It presents rst a brief overview of the system and of its adaptation
to English. Then, the spe i     hoi es made for the      ampaign are detailed, and
some results are nally given.




2    FIDJI


Figure 1 presents the ar hite ture of FIDJI. The system relies on a synta ti
analysis and named entity tagging of the question and of a limited number of
do uments for ea h question. This analysis is performed by the parser XIP [2℄
enri hed with some additional spe i     rules.
    The do ument
                                                                         1
                    olle tion is indexed by the sear h engine Lu ene . The index
 ontains raw text only. First, the system analyses the question and submits the
keywords of the question to Lu ene (module A): the rst 15 do uments are then
pro essed (module B). We de ided to redu e the number of do uments be ause
they are rather long and their parsing would take too mu h time. The reason we
perform this analysis online is that we aim at avoiding as mu h prepro essing
as possible (the system is designed to explore Web       olle tions [1℄). Among these
do uments, FIDJI looks for senten es     ontaining the highest number of synta ti
relations of the question (module C1). Finally, answers are extra ted from these
                           Fig. 1.   Ar hite ture of FIDJI




senten es (module D1) and the answer type, when spe ied in the question, is
validated (module E).

     The main obje tive of FIDJI is to produ e answers whi h are fully validated
by a supporting text (or passage) with respe t to a given question. The di ulty
is that an answer (or some pie es of information         omposing an answer) may be
validated by several do uments.

     Our approa h    onsists in    he king if all the     hara teristi s of a question
(namely the dependen y relations and the answer type) may be retrieved in one
or several do uments. In this     ontext, FIDJI has to dete t synta ti   impli ations
between questions and passages       ontaining the answers and to validate the type
of the potential answer in this passage or in another do ument.


     Sin e the last evaluation    ampaign in 2009, FIDJI has been adapted to En-
glish. Spe i   rules have been developped for question analysis (module A) and
do ument pro essing (module B). The other modules are            ommon to both En-
glish and Fren h.

     The following examples illustrate how FIDJI extra ts answers, and more
details   on erning the system     an be found in [1℄.


1
    http://lu ene.apa he.org/
2.1 Example 1
Question analysis provides lemmatisation, POS tagging and dependen y rela-
tions, as well as the question type and the expe ted answer type. For example:


Question: Quel premier ministre s'est sui idé en 1993 ?
(   Whi h Prime Minister ommitted sui ide in 1993? )
Dependen ies: DATE(1993)
                PERSON(ANSWER)
                SUBJ(se sui ider, ANSWER)
                attribut(ANSWER, ministre)
                attribut(ministre, premier)
Question type: fa toid
Expe ted answer type: person (spe i      answer type: prime minister)


      The question is turned into a de larative senten e where the answer is rep-
resented by the `ANSWER' lemma. The following senten e is sele ted be ause
it    ontains the highest number of dependen y relations:


Pierre Bérégovoy s'est sui idé en 1993.
(Pierre Bérégovoy ommitted sui ide in 1993.)
Dependen ies:
         DATE(1993)
         PERSON(Pierre Bérégovoy)
         SUBJ(se sui ider, Pierre Bérégovoy)

      Pierre Bérégovoy instantiates the ANSWER slot of the question dependen ies
and be omes a      andidate answer. The named entity type (person) and the rst
three dependen ies of the question are validated in this senten e. In order to fully
validate the     andidate answer, the system sear hes the missing dependen ies
(attribut(Pierre     Bérégovoy, ministre) and attribut(ministre, premier) ) in
a single senten e of the whole do ument      olle tion. These dependen ies will be
found in any senten e speaking about      le premier ministre Pierre Bérégovoy 
(   Prime Minister Pierre Bérégovoy ) and the answer will be validated.

2.2 Example 2
For     omplex questions, it is obvious that answers are not always short phrases.
For this reason, FIDJI provides a full passage as an answer. On these kinds
of questions, the system behaves as a     lassi al passage retrieval system, ex ept
that     andidate passages are retrieved through synta ti   relations and relevant
dis ourse markers (about 100 nouns, verbs, prepositions and adje tives, manually
    ompiled) instead of keywords only. Here is an example of a   omplex question:


Question: Why is the sky blue?
Dependen ies: attribut(sky, blue)
Question type:     omplex (why)
Expe ted answer type: reason
                                  2


     The following passage is sele ted be ause it       ontains all the dependen y re-
lations of the question and a      ausal marker:


And if the sky is blue, it is be ause of Rayleigh s attering ...
         attribut(sky, blue)
         VMOD(be, s attering)
         PREPOBJ(s attering, be ause of)
         ...


3      ResPubliQA'10 experiments

In 2009, ResPubliQA results learned us a lot about the behavior of our system.
     Other evaluations (former CLEF and Quaero             ampaigns) had shown that
using synta ti     analysis modules for retrieving do uments and extra ting the
answers signi antly improved the results [1℄. However, with ResPubliQA eval-
uation set, passage extra tion turned out to be mu h better by repla ing syntax
by traditional bag-of-words te hniques [3℄. This is done by turning o modules
C1 and D1 in Figure 1.
     Passage extra tion is then performed by a      lassi al sele tion of senten es    on-
taining a maximum of question signi ant keywords (module C2), and answer ex-
tra tion is a hieved without slot instantiation within dependen ies (module D2).


     The new guidelines in ResPubliQA 2010 oered us the possibility to          arry on
our experiments in this way. Indeed, two dierent tasks were allowed this year:

     Paragraph sele tion (PS), similar to 2009 task, where only the full paragraph
       ontaining the exa t answer were to be returned. Passages are not indenite
      parts of texts of limited length, but predened paragraphs identied in the
       orpus by XML tags 

.  Answer sele tion (AS), loser to traditional QA tasks, where systems were required to demar ate also the exa t answer, supported by a full paragraph. In this latter task, judged answers an be INEXACT (good support but bad boundaries for short answer), MISSED (good support but wrong short answer), RIGHT (good support and good answer) or WRONG. Two runs per language were allowed. In order to ontinue testing our plug/ unplug strategies, and to experiment them for the rst time in English, we hose the following pro edure for our two runs: 2 Reason is not a named entity, as person in the rst example, but this answer type points out that a text expli itely explaining a reason should be prefered (in our ase, using dis ourse markers). 1. PS task, synta ti modules turned o, leading to an approa h loser to passage retrieval, that had the best results of the system last year. 2. AS task, synta ti modules turned on, in order to test whether answer ex- tra tion was ee tive or not on this olle tion. Moreover, by adding answers with INEXACT, MISSED and RIGHT status from our AS run, we an obtain a PS run with modules turned on, whi h allows us to evaluate modules on the same task. 4 Results We present the results of 5 experiments for both Fren h and English. The rst three ome from o ial ResPubliQA runs:  ➀: AS task with synta ti modules turned on (exa t answers judged as RIGHT),  ➁: PS task with synta ti modules turned on (exa t answers of ➀ judged as RIGHT, INEXACT, MISSED),  ➂: PS task with synta ti modules turned o. To omplete the evaluation, we also ran uno ial onguration and a hieved the assessment by ourselves:  ➃: AS task with passage retrieval turned o but answer extra tion turned on (modules C2 and D1, with exa t answers judged as RIGHT),  ➄: PS task with passage retrieval C1 turned o but answer extra tion turned on (exa t answers of ➃ judged as RIGHT, INEXACT, MISSED). In order to evaluate the performan e of the question analysis module, we manually identied the types of question. As FIDJI annot pro ess opinion ques- tions, we de ided to onsider them as fa toid. Although questions in Fren h and English are translations of ea h other and their respe tive answer should be ex- tra ted from the same paragraph, we noti ed that, for a given question, its type is not always the same in English as in Fren h. For example, in English, the type of question 169 is reason/purpose while in Fren h, it is fa toid : (EN) Why is the trade in ammonium nitrate fertilizers hampered within the Eu- ropean E onomi Community? (FR) Qu'est- e qui a entravé le ommer e d'engrais à base de nitrate d'ammonium dans la Communauté É onomique Européenne? (What has hampered the trade in ammonium nitrate fertilizers...? ) This is not only an issue of synta ti dieren es due to translation paraphras- ing; the target of the question is dierent. Stri tly speaking, the Fren h question might a ept a noun phrase like  les réglementations régissant la ommer ial- isation des engrais à base de nitrate d'ammonium  (the dierent regulations ontrolling the marketing of ammonium nitrate based fertilizers ), while su h an answer would be odd with the English question. We identied 7 questions raising 3 this issue . Tables 1 and 2 presents FIDJI's results for runs ➀, ➁ and ➂, as well as experiments ➃ and ➄, by types of questions (manually identied). In Fren h, 86% of question types were orre tly identied by FIDJI (we found 9 questions that were ill-formed or with misspellings and whi h FIDJI ould not orre tly analyse) whereas in English, only 69.5% were orre tly identied. Con erning our o ial runs, as we an see in Tables 1 and 2, answer extra - tion performan e (➀) is very low (0.25 for both English and Fren h). Results are better for passage sele tion (➁ and ➂) for every type of questions and even better when synta ti modules are swit hed o (➂). Results are globally better for English than for Fren h so the performan e of the question analysis module annot explain these results. In both languages, orre t answers to denition questions dramati ally de- rease with D1 turned o. This is be ause we do not have any non-synta ti way to extra t the answer for many of these questions (denitions not expe ting a named entity, as What is maladministration?, an only be answered by denition patterns in FIDJI). Turning o synta ti modules ne essarily leads to a NOA answer in these ases. We an noti e that for both English and Fren h, the results follow the same trend and that results for passage sele tion are better for  omplex questions (reason/purpose and pro edure), probably be ause FIDJI sele ts passages on- taining dis ourse markers for this type of questions. Also, for these questions, we always returned the full paragraph as exa t short answer, onsidering that try- ing to fo us even more inside the paragraph was not useful for su h questions. As the assessors did onsider that shorter answers an be better, the system often gets an INEXACT status for. Finally, our additional runs ➃ and ➄ show a small improvement, showing that best results are obtained when turning o synta ti passage retrieval, but turning on synta ti answer extra tion (using modules C2 and D1). This is at least lear on erning non-fa toid questions. This nding is important and will help us in the future to hoose our sear h strategies a ording to dierent orpora and question types. Last year, the pure information retrieval baseline [4℄ whi h onsisted in querying the indexed olle tion with the exa t text of the question and returning the paragraph retrieved in the rst position, had the best results for Fren h and ranked 5 out of 14 in English [5℄. Even if a subset of the Europarl orpus has been added to the do ument olle tion in 2010, we an see that our 1 measures (see Table 3) are still lower than the 2009 baseline (0.53 for English and 0.45 for Fren h). In 2009, we noted that our results were due to ACQUIS orpus spe i ities: dierent register of language, more onstrained vo abulary, texts having a parti - ular stru ture, with an introdu tion followed by long senten es extending on sev- 3 Questions 3, 11, 134, 169, 175, 197, 199. Type of questions Fa toid Denition Reason/Purpose Pro edure TOTAL Number of questions 110 29 29 32 200 ➀ Corre t answers 10 (9.1%) 3 (10.3%) 1 (3.5%) 3 (9.4%) 17 (8.5%) ➁ Corre t passages 33 (30%) 10 (34.5%) 10 (34.5%) 14 (43.8%) 67 (33.5%) ➂ Corre t passages 51 (46.3%) 3 (10.3%) 18 ( 62% ) 17 ( 53.1%) 89 (44.5%) Uno ial runs ➃ Corre t answers 13 (11.8%) 3 (10.3%) 2 (6.9%) 4 (12.5%) 22 (11%) ➄ Corre t passages 47 (42.7%) 9 (31.0%) 19 (65.5%) 18 (56.3%) 93 (46.5%) Table 1. Results by question type (English). Type of questions Fa toid Denition Reason/Purpose Pro edure TOTAL Number of questions 117 29 26 28 200 ➀ Corre t answers 11 (9.4%) 2 (6.9%) 0 (0%) 1 (3.6%) 14 (7%) ➁ Corre t passages 35 (29.9%) 6 (20.7%) 8 (30.8%) 8 (28.6%) 57 (28.5%) ➂ Corre t passages 30 (25.6%) 6 (20.7%) 13 (50% ) 13 (46.4% ) 62 (31%) Uno ial runs ➃ Corre t answers 12 (10.3%) 3 (10.3%) 0 (0%) 2 (6.3%) 17 (8.5%) ➄ Corre t passages 31 (28.2%) 7 (24.1%) 14 (53.8%) 15 (50.0%) 67 (33.5%) Table 2. Results by question type (Fren h). eral paragraphs, et . Table 4 shows that FIDJI found orre t answers/passages mainly in the ACQUIS olle tion. As FIDJI has di ulty with sele ting passages in the ACQUIS olle tion, FIDJI's low results ould be explained if a majority of orre t answers are in the ACQUIS olle tion. The main dieren e between FIDJI ar hite ture used for ResPubliQA and the one used for other evaluation ampaigns (CLEF, Quaero) is the number of do uments returned by Lu ene: 15 do uments for ResPubliQA and 100 for other ampaigns. We have to evaluate if sele ting more do uments would improve the results. Campaign FIDJI 2010 FIDJI 2009 Language English Fren h English Fren h ➀ 0.09 0.08 - - ➁ 0.35 0.30 - 0.30 ➂ 0.48 0.36 - 0.42 ➃ 0.11 0.08 - - ➄ Table 3. 0.47 0.34 - - 1 measure for Fren h and English. Language English Fren h Corpus Europarl A quis Europarl A quis ➀ 3 14 6 8 ➁ 24 43 22 36 ➂ 33 56 21 41 Table 4. Number of orre t answers/passages per orpus. 5 Con lusion We presented in this paper our parti ipation to the ampaign ResPubliQA 2010 in Fren h and English. We evaluated two strategies: plugging or unplugging the synta ti modules for do ument sele tion and answer extra tion. As in 2009, the system got low results and even lower when synta ti modules are turned o. Dierent experiments on the olle tion onrmed that the use of synta ti anal- ysis de reased results, whereas it proved to help when used in other ampaigns. We still have to evaluate if a higher number of do uments sele ted by the sear h engine an improve the results. 6 A knowledgements This work has been partially nan ed by OSEO under the Quaero program. Referen es 1. Mori eau, V., Tannier, X.: FIDJI: Using Syntax for Validating Answers in Multiple 10791 Do uments. Information Retrieval, Spe ial Issue on Fo used Information Retrieval (2010) 2. Aït-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond shallowness: In re- mental deep parsing. Natural Language Engineering 8 (2002) 121144 3. Tannier, X., Mori eau, V.: Studying Synta ti Analysis in a QA System: FIDJI  ResPubliQA'09. In: Pro eedings of CLEF 2010. Number LNCS 6241 in Le ture Notes in Computer S ien e, Springer-Verlag, New York City, NY, USA (2010) 4. Pérez, J., Garrido, G., Álvaro Rodrigo, Araujo, L., Peñas, A.: Information Re- trieval Baselines for the ResPubliQA Task. In: Working Notes for the CLEF 2009 Workshop, Corfu, Gree e (2009) 5. Peñas, A., Forner, P., Sut lie, R., Rodrigo, A., For s u, C., Alegria, I., Giampi - olo, D., Moreau, N., Osenova, P.: Overview of ResPubliQA 2009: Question An- swering Evaluation over European Legislation. In: Working Notes for the CLEF 2009 Workshop, Corfu, Gree e (2009)