=Paper=
{{Paper
|id=Vol-1176/CLEF2010wn-MLQA10-TannierEt2010
|storemode=property
|title=FIDJI @ ResPubliQA 2010
|pdfUrl=https://ceur-ws.org/Vol-1176/CLEF2010wn-MLQA10-TannierEt2010.pdf
|volume=Vol-1176
}}
==FIDJI @ ResPubliQA 2010==
FIDJI ResPubliQA'10 Xavier Tannier, Véronique Mori eau LIMSI-CNRS Univ. Paris-Sud, Orsay, Fran e xtannier, mori eaulimsi.fr Abstra t. In this paper, we present the results obtained by the sys- tem FIDJI for both Fren h and English monolingual evaluations, at ResPubliQA 2010 ampaign. In this ampaign, we fo used on arrying on our evaluations on erning the ontribution of our syn- ta ti modules with this spe i olle tion. 1 Introdu tion FIDJI (Finding In Do uments Justi ations and Inferen es) is an open-domain question-answering (QA) system for Fren h [1℄ and, more re ently, English. It ombines synta ti information with traditional QA te hniques su h as named entity re ognition and term weighting in order to validate answers through dif- ferent do uments. This paper fo uses on the results obtained by FIDJI at ResPubliQA 2010 evaluation. It presents rst a brief overview of the system and of its adaptation to English. Then, the spe i hoi es made for the ampaign are detailed, and some results are nally given. 2 FIDJI Figure 1 presents the ar hite ture of FIDJI. The system relies on a synta ti analysis and named entity tagging of the question and of a limited number of do uments for ea h question. This analysis is performed by the parser XIP [2℄ enri hed with some additional spe i rules. The do ument 1 olle tion is indexed by the sear h engine Lu ene . The index ontains raw text only. First, the system analyses the question and submits the keywords of the question to Lu ene (module A): the rst 15 do uments are then pro essed (module B). We de ided to redu e the number of do uments be ause they are rather long and their parsing would take too mu h time. The reason we perform this analysis online is that we aim at avoiding as mu h prepro essing as possible (the system is designed to explore Web olle tions [1℄). Among these do uments, FIDJI looks for senten es ontaining the highest number of synta ti relations of the question (module C1). Finally, answers are extra ted from these Fig. 1. Ar hite ture of FIDJI senten es (module D1) and the answer type, when spe ied in the question, is validated (module E). The main obje tive of FIDJI is to produ e answers whi h are fully validated by a supporting text (or passage) with respe t to a given question. The di ulty is that an answer (or some pie es of information omposing an answer) may be validated by several do uments. Our approa h onsists in he king if all the hara teristi s of a question (namely the dependen y relations and the answer type) may be retrieved in one or several do uments. In this ontext, FIDJI has to dete t synta ti impli ations between questions and passages ontaining the answers and to validate the type of the potential answer in this passage or in another do ument. Sin e the last evaluation ampaign in 2009, FIDJI has been adapted to En- glish. Spe i rules have been developped for question analysis (module A) and do ument pro essing (module B). The other modules are ommon to both En- glish and Fren h. The following examples illustrate how FIDJI extra ts answers, and more details on erning the system an be found in [1℄. 1 http://lu ene.apa he.org/ 2.1 Example 1 Question analysis provides lemmatisation, POS tagging and dependen y rela- tions, as well as the question type and the expe ted answer type. For example: Question: Quel premier ministre s'est sui idé en 1993 ? ( Whi h Prime Minister ommitted sui ide in 1993? ) Dependen ies: DATE(1993) PERSON(ANSWER) SUBJ(se sui ider, ANSWER) attribut(ANSWER, ministre) attribut(ministre, premier) Question type: fa toid Expe ted answer type: person (spe i answer type: prime minister) The question is turned into a de larative senten e where the answer is rep- resented by the `ANSWER' lemma. The following senten e is sele ted be ause it ontains the highest number of dependen y relations: Pierre Bérégovoy s'est sui idé en 1993. (Pierre Bérégovoy ommitted sui ide in 1993.) Dependen ies: DATE(1993) PERSON(Pierre Bérégovoy) SUBJ(se sui ider, Pierre Bérégovoy) Pierre Bérégovoy instantiates the ANSWER slot of the question dependen ies and be omes a andidate answer. The named entity type (person) and the rst three dependen ies of the question are validated in this senten e. In order to fully validate the andidate answer, the system sear hes the missing dependen ies (attribut(Pierre Bérégovoy, ministre) and attribut(ministre, premier) ) in a single senten e of the whole do ument olle tion. These dependen ies will be found in any senten e speaking about le premier ministre Pierre Bérégovoy ( Prime Minister Pierre Bérégovoy ) and the answer will be validated. 2.2 Example 2 For omplex questions, it is obvious that answers are not always short phrases. For this reason, FIDJI provides a full passage as an answer. On these kinds of questions, the system behaves as a lassi al passage retrieval system, ex ept that andidate passages are retrieved through synta ti relations and relevant dis ourse markers (about 100 nouns, verbs, prepositions and adje tives, manually ompiled) instead of keywords only. Here is an example of a omplex question: Question: Why is the sky blue? Dependen ies: attribut(sky, blue) Question type: omplex (why) Expe ted answer type: reason 2 The following passage is sele ted be ause it ontains all the dependen y re- lations of the question and a ausal marker: And if the sky is blue, it is be ause of Rayleigh s attering ... attribut(sky, blue) VMOD(be, s attering) PREPOBJ(s attering, be ause of) ... 3 ResPubliQA'10 experiments In 2009, ResPubliQA results learned us a lot about the behavior of our system. Other evaluations (former CLEF and Quaero ampaigns) had shown that using synta ti analysis modules for retrieving do uments and extra ting the answers signi antly improved the results [1℄. However, with ResPubliQA eval- uation set, passage extra tion turned out to be mu h better by repla ing syntax by traditional bag-of-words te hniques [3℄. This is done by turning o modules C1 and D1 in Figure 1. Passage extra tion is then performed by a lassi al sele tion of senten es on- taining a maximum of question signi ant keywords (module C2), and answer ex- tra tion is a hieved without slot instantiation within dependen ies (module D2). The new guidelines in ResPubliQA 2010 oered us the possibility to arry on our experiments in this way. Indeed, two dierent tasks were allowed this year: Paragraph sele tion (PS), similar to 2009 task, where only the full paragraph ontaining the exa t answer were to be returned. Passages are not indenite parts of texts of limited length, but predened paragraphs identied in the orpus by XML tags. Answer sele tion (AS), loser to traditional QA tasks, where systems were required to demar ate also the exa t answer, supported by a full paragraph. In this latter task, judged answers an be INEXACT (good support but bad boundaries for short answer), MISSED (good support but wrong short answer), RIGHT (good support and good answer) or WRONG. Two runs per language were allowed. In order to ontinue testing our plug/ unplug strategies, and to experiment them for the rst time in English, we hose the following pro edure for our two runs: 2 Reason is not a named entity, as person in the rst example, but this answer type points out that a text expli itely explaining a reason should be prefered (in our ase, using dis ourse markers). 1. PS task, synta ti modules turned o, leading to an approa h loser to passage retrieval, that had the best results of the system last year. 2. AS task, synta ti modules turned on, in order to test whether answer ex- tra tion was ee tive or not on this olle tion. Moreover, by adding answers with INEXACT, MISSED and RIGHT status from our AS run, we an obtain a PS run with modules turned on, whi h allows us to evaluate modules on the same task. 4 Results We present the results of 5 experiments for both Fren h and English. The rst three ome from o ial ResPubliQA runs: ➀: AS task with synta ti modules turned on (exa t answers judged as RIGHT), ➁: PS task with synta ti modules turned on (exa t answers of ➀ judged as RIGHT, INEXACT, MISSED), ➂: PS task with synta ti modules turned o. To omplete the evaluation, we also ran uno ial onguration and a hieved the assessment by ourselves: ➃: AS task with passage retrieval turned o but answer extra tion turned on (modules C2 and D1, with exa t answers judged as RIGHT), ➄: PS task with passage retrieval C1 turned o but answer extra tion turned on (exa t answers of ➃ judged as RIGHT, INEXACT, MISSED). In order to evaluate the performan e of the question analysis module, we manually identied the types of question. As FIDJI annot pro ess opinion ques- tions, we de ided to onsider them as fa toid. Although questions in Fren h and English are translations of ea h other and their respe tive answer should be ex- tra ted from the same paragraph, we noti ed that, for a given question, its type is not always the same in English as in Fren h. For example, in English, the type of question 169 is reason/purpose while in Fren h, it is fa toid : (EN) Why is the trade in ammonium nitrate fertilizers hampered within the Eu- ropean E onomi Community? (FR) Qu'est- e qui a entravé le ommer e d'engrais à base de nitrate d'ammonium dans la Communauté É onomique Européenne? (What has hampered the trade in ammonium nitrate fertilizers...? ) This is not only an issue of synta ti dieren es due to translation paraphras- ing; the target of the question is dierent. Stri tly speaking, the Fren h question might a ept a noun phrase like les réglementations régissant la ommer ial- isation des engrais à base de nitrate d'ammonium (the dierent regulations ontrolling the marketing of ammonium nitrate based fertilizers ), while su h an answer would be odd with the English question. We identied 7 questions raising 3 this issue . Tables 1 and 2 presents FIDJI's results for runs ➀, ➁ and ➂, as well as experiments ➃ and ➄, by types of questions (manually identied). In Fren h, 86% of question types were orre tly identied by FIDJI (we found 9 questions that were ill-formed or with misspellings and whi h FIDJI ould not orre tly analyse) whereas in English, only 69.5% were orre tly identied. Con erning our o ial runs, as we an see in Tables 1 and 2, answer extra - tion performan e (➀) is very low (0.25 for both English and Fren h). Results are better for passage sele tion (➁ and ➂) for every type of questions and even better when synta ti modules are swit hed o (➂). Results are globally better for English than for Fren h so the performan e of the question analysis module annot explain these results. In both languages, orre t answers to denition questions dramati ally de- rease with D1 turned o. This is be ause we do not have any non-synta ti way to extra t the answer for many of these questions (denitions not expe ting a named entity, as What is maladministration?, an only be answered by denition patterns in FIDJI). Turning o synta ti modules ne essarily leads to a NOA answer in these ases. We an noti e that for both English and Fren h, the results follow the same trend and that results for passage sele tion are better for omplex questions (reason/purpose and pro edure), probably be ause FIDJI sele ts passages on- taining dis ourse markers for this type of questions. Also, for these questions, we always returned the full paragraph as exa t short answer, onsidering that try- ing to fo us even more inside the paragraph was not useful for su h questions. As the assessors did onsider that shorter answers an be better, the system often gets an INEXACT status for. Finally, our additional runs ➃ and ➄ show a small improvement, showing that best results are obtained when turning o synta ti passage retrieval, but turning on synta ti answer extra tion (using modules C2 and D1). This is at least lear on erning non-fa toid questions. This nding is important and will help us in the future to hoose our sear h strategies a ording to dierent orpora and question types. Last year, the pure information retrieval baseline [4℄ whi h onsisted in querying the indexed olle tion with the exa t text of the question and returning the paragraph retrieved in the rst position, had the best results for Fren h and ranked 5 out of 14 in English [5℄. Even if a subset of the Europarl orpus has been added to the do ument olle tion in 2010, we an see that our 1 measures (see Table 3) are still lower than the 2009 baseline (0.53 for English and 0.45 for Fren h). In 2009, we noted that our results were due to ACQUIS orpus spe i ities: dierent register of language, more onstrained vo abulary, texts having a parti - ular stru ture, with an introdu tion followed by long senten es extending on sev- 3 Questions 3, 11, 134, 169, 175, 197, 199. Type of questions Fa toid Denition Reason/Purpose Pro edure TOTAL Number of questions 110 29 29 32 200 ➀ Corre t answers 10 (9.1%) 3 (10.3%) 1 (3.5%) 3 (9.4%) 17 (8.5%) ➁ Corre t passages 33 (30%) 10 (34.5%) 10 (34.5%) 14 (43.8%) 67 (33.5%) ➂ Corre t passages 51 (46.3%) 3 (10.3%) 18 ( 62% ) 17 ( 53.1%) 89 (44.5%) Uno ial runs ➃ Corre t answers 13 (11.8%) 3 (10.3%) 2 (6.9%) 4 (12.5%) 22 (11%) ➄ Corre t passages 47 (42.7%) 9 (31.0%) 19 (65.5%) 18 (56.3%) 93 (46.5%) Table 1. Results by question type (English). Type of questions Fa toid Denition Reason/Purpose Pro edure TOTAL Number of questions 117 29 26 28 200 ➀ Corre t answers 11 (9.4%) 2 (6.9%) 0 (0%) 1 (3.6%) 14 (7%) ➁ Corre t passages 35 (29.9%) 6 (20.7%) 8 (30.8%) 8 (28.6%) 57 (28.5%) ➂ Corre t passages 30 (25.6%) 6 (20.7%) 13 (50% ) 13 (46.4% ) 62 (31%) Uno ial runs ➃ Corre t answers 12 (10.3%) 3 (10.3%) 0 (0%) 2 (6.3%) 17 (8.5%) ➄ Corre t passages 31 (28.2%) 7 (24.1%) 14 (53.8%) 15 (50.0%) 67 (33.5%) Table 2. Results by question type (Fren h). eral paragraphs, et . Table 4 shows that FIDJI found orre t answers/passages mainly in the ACQUIS olle tion. As FIDJI has di ulty with sele ting passages in the ACQUIS olle tion, FIDJI's low results ould be explained if a majority of orre t answers are in the ACQUIS olle tion. The main dieren e between FIDJI ar hite ture used for ResPubliQA and the one used for other evaluation ampaigns (CLEF, Quaero) is the number of do uments returned by Lu ene: 15 do uments for ResPubliQA and 100 for other ampaigns. We have to evaluate if sele ting more do uments would improve the results. Campaign FIDJI 2010 FIDJI 2009 Language English Fren h English Fren h ➀ 0.09 0.08 - - ➁ 0.35 0.30 - 0.30 ➂ 0.48 0.36 - 0.42 ➃ 0.11 0.08 - - ➄ Table 3. 0.47 0.34 - - 1 measure for Fren h and English. Language English Fren h Corpus Europarl A quis Europarl A quis ➀ 3 14 6 8 ➁ 24 43 22 36 ➂ 33 56 21 41 Table 4. Number of orre t answers/passages per orpus. 5 Con lusion We presented in this paper our parti ipation to the ampaign ResPubliQA 2010 in Fren h and English. We evaluated two strategies: plugging or unplugging the synta ti modules for do ument sele tion and answer extra tion. As in 2009, the system got low results and even lower when synta ti modules are turned o. Dierent experiments on the olle tion onrmed that the use of synta ti anal- ysis de reased results, whereas it proved to help when used in other ampaigns. We still have to evaluate if a higher number of do uments sele ted by the sear h engine an improve the results. 6 A knowledgements This work has been partially nan ed by OSEO under the Quaero program. Referen es 1. Mori eau, V., Tannier, X.: FIDJI: Using Syntax for Validating Answers in Multiple 10791 Do uments. Information Retrieval, Spe ial Issue on Fo used Information Retrieval (2010) 2. Aït-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond shallowness: In re- mental deep parsing. Natural Language Engineering 8 (2002) 121144 3. Tannier, X., Mori eau, V.: Studying Synta ti Analysis in a QA System: FIDJI ResPubliQA'09. In: Pro eedings of CLEF 2010. Number LNCS 6241 in Le ture Notes in Computer S ien e, Springer-Verlag, New York City, NY, USA (2010) 4. Pérez, J., Garrido, G., Álvaro Rodrigo, Araujo, L., Peñas, A.: Information Re- trieval Baselines for the ResPubliQA Task. In: Working Notes for the CLEF 2009 Workshop, Corfu, Gree e (2009) 5. Peñas, A., Forner, P., Sut lie, R., Rodrigo, A., For s u, C., Alegria, I., Giampi - olo, D., Moreau, N., Osenova, P.: Overview of ResPubliQA 2009: Question An- swering Evaluation over European Legislation. In: Working Notes for the CLEF 2009 Workshop, Corfu, Gree e (2009)