FIDJI has to dete t synta ti impli ations between questions and passages ontaining the answers.

2009

974

and the do uments from whi h answers are extra ted. Our system relies on synta ti analysis provided by XIP, whi h is used to parse both the questions the most synta ti relations of the question. Finally, answers are extra ted from these senten es and the answer type, when spe ied in the question, is validated. Figure 1 presents the ar hite ture of FIDJI and more details an be found in [4, 3℄. Next se tions summarize the way FIDJI extra t analysis and named entity tagging). Among these do uments, FIDJI looks for senten es ontaining answers and fo use on ResPubliQA spe i ities.

VMOD (verb modier), COORDITEMS ( oordinated elements) CONNECT ( onne tor introdu

XIP [1℄ is a robust parser for Fren h and English whi h provides dependen y relations and ing lause). mainly: SUBJ (subje t), OBJ (obje t), PREPOBJ (prepositional group), NMOD (noun modier), named entity re ognition. The dependen y relations provided by XIP whi h are used by FIDJI are tagged as a NE and extra ted as an answer to What is the pri e of a Fren h stamp?. Other speed, weight, money, physi s, so that 0.55 euro in a Fren h stamp osts 0.55 euro an be allow for more pre ise types. For example, for number, we added the following features: length, The named entities (NE) are tagged using a set of 8 types: person, organization, lo ation, (lo ation) an be made more spe i ( ountry, region, ontinent...). We also added features to elements are also tagged, as names introdu ing persons: fun tions (leader...), professions (minisdate (dened by XIP), as well as nationality, number, duration, age (that we added). XIP’s lieu ter...), family indi ations (father...).

Question analysis onsists in identifying: The synta ti dependen ies given by XIP; • On e andidate do uments are sele ted by the sear h engine and analyzed by the parser, the are not indenite parts of texts of limited length; they must be predened paragraphs identied ’how’ or ’why’ questions, where no short answer may be retrieved. in the olle tion, rather than usual end-of-senten e markers. short answers, and then to return a paragraph ontaining the best answer. This is not the ase of fo used, short parts of texts, but full paragraphs that must ontain the answer. Se ond, passages hunt down short answers. For most questions, typi ally fa toid questions, it is still relevant to nd ResPubliQA answer format is dieren t from traditional QA ampaigns. First, answers are not in the olle tion by XML tags .

FIDJI usually works at senten e level. For the aim of ResPubliQA spe i rules, we hose to work at paragraph level. This onsisted in spe ifying that senten e separators were XML tags system ompares the do ument paragraphs with question analysis, in order to: Although answers to submit to the ampaign are full paragraphs, our system is designed to Expe ted type: lo ation (state) •

2.2.2 Complex questions

(Why should the stru ture of an ANIMO network be revised? ) is the one that is returned rst by Lu ene. For example: synta ti dependen ies in ommon with the question are sele ted. Among them, the best-ranked tions, the system behaves more as a passage retrieval system. The paragraphs ontaining the more 0155 - Pourquoi onvient-il de revoir l’ar hite ture du rØseau Animo ? Complex questions (’how’, ’why’, et .) do not expe t any short answer. On these kinds of ques

VMOD( onvenir, revoir) DEEPOBJ(revoir, ar hite ture)

Synta ti dependen ies and NE tagging: • Question type: omplex (why) • ATTRIBUT_DE(ar hite ture, rØseau) NMOD(rØseau, animo) riteria are listed below, and are presented in de reasing order of importan e: FIDJI’s s ores are not omposed of a single value, but of a list of dieren t values and ags. The As we said, a paragraph ontaining an extra ted short answer will be prefered if it exists. • 2.3 S oring a relevant passage. This is espe ially true for omplex questions, but not only. Indeed, the sele tion questions.

Results are lower than former ampaigns’ s ores, espe ially on erning fa toid and denition Looking arrefully at the results shows that, in these parti ular do uments, using synta ti dependen ies as the main lue to hoose paragraph andidates is not always a good way to nd out of the paragraph ontaining the most question dependen ies often leads to the introdu tion of the For example: do ument or to a very general paragraph ontaining poor information. (66/401/EEC) COUNCIL DIRECTIVE of 14 June 1966 on the marketing of fodder plant seed or newspapers. Question as well as do ument analyses suered from the spe i expressions and newspaper orpora, have been poorly re ognized for this evaluation. stru tures used by Fren h texts, and espe ially for denitions. Denitions, quite easy to dete t in Also, JRC-A quis orpus uses a dieren t register of language than usual orpora su h a Web

Dependen y relations are still useful to nd the good do ument, but often fails to point out to

the orre t paragraph.

We present the results Table 1 by types of questions. Only one answer per question was allowed,

so the values simply orrespond to the rate of orre t answers for ea h question type. form of JRC-A quis tagged paragraphs. Results showed that synta ti analysis should be used in adapted our synta ti -based QA system FIDJI in order to produ e a single long answer in the dieren t manners a ording to the type of tasks and questions. A areful look at our system’s errors should enable improvement of robustness of the sear h by applying ontextual strategies. We presented in this arti le our parti ipation to the ampaign resPubliQA 2009 in Fren h. We This Dire tive shall apply to fodder plant seed marketed within the Community, irrespe tive of the use for whi h the seed as grown is intended. 170 76 37 101 500 Number of questions 116 40 % Corre t answer 15.8 % 36.2 % 30.4 % 22.4 % 16.2 %

4 Con lusion

3 Results 0006 - What is the s ope of the oun il dire tive on the trading of fodder seeds? Table 1: FIDJI results by question types. is answered by ontaining many dependen ies but answering nothing, while a good result was later in the same do ument, but with an anaphora:

Fa toid

"How" Question type "Why" List Denition TOTAL of the fth onferen e on Applied natural language pro essing, pages 7279, Washington, DC, USA, 1997. Morgan Kaufmann Publishers In ., San Fran is o, California, USA. [1℄ Salah At-Mokh tar and Jean-Pierre Chanod. In remental nite-state parsing. In Pro eedings (TALN 2009, poster), Senlis, Fran e, jun 2009. question-rØponse. In A tes de la ConfØren e Traitement Automatique des Langues Naturelles [3℄ VØronique Mori eau and Xavier Tannier. tude de l’apport de la syntaxe dans un systŁme de les rØponses des questions par plusieurs do uments. In Pro eedings of workshop on COnfØren e en Re her he d’Information et Appli ations, CORIA, Presqu’le de Giens, Fran e, 2009. [4℄ VØronique Mori eau, Xavier Tannier, and Brigitte Grau. Utilisation de la syntaxe pour valider