FIDJI in ResPubliQA 2009
                            Xavier Tannier                 Véronique Mori eau

                            CNRS-LIMSI                        CNRS-LIMSI

                       University Paris-Sud 11            University Paris-Sud 11

                        xtannierlimsi.fr                  mori eaulimsi.fr


                                               Abstra t
        This paper presents FIDJI results in ResPubliQA 2009. FIDJI (Finding In Do uments
        Justi ations and Inferen es) is an open-domain question-answering system for Fren h.
        The main goal is to validate answers by   he king that all the information given in the
        question are retrieved in the supporting texts.


Categories and Subje t Des riptors

   Information Storage and Retrieval℄: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
H.3 [
                                                                             Database
mation Sear h and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [
Managment℄: LanguagesQuery Languages

General Terms

Measurement, Performan e, Experimentation


Keywords

Question answering, Questions beyond fa toids


1 Introdu tion
This paper presents FIDJI's results in ResPubliQA 2009 for Fren h. In this task, systems re eive
500 independent questions in natural language as input, and must return one paragraph ontaining
the answer from the do ument      olle tion. No exa t answer is required neither multiple responses.
The do ument      olle tion is JRC-A quis about EU do umentation.


2 FIDJI
        1
FIDJI (Finding In Do uments Justi ations and Inferen es) is an open-domain question-answering
system for Fren h. The main goal is to validate answers by      he king that all the information given
in the question are retrieved in the supporting texts. Our answer validation approa h assumes that
the dierent entities of the question   an be retrieved, properly   onne ted, either in a senten e, in a
passage or in multiple do uments. We designed the system so that no parti ular linguisti -oriented
pre-pro essing is needed.


   The do ument      olle tion is indexed by the sear h engine Lu ene
                                                                       2 [2℄. First, the system submits
the keywords of the question to Lu ene: the rst 100 do uments are then pro essed (synta ti

  1 This work has been partially nan ed by OSEO under the Quaero program.
  2 http://lu ene.apa he.org/java/do s/index.html
                                    Figure 1: Ar hite ture of FIDJI


analysis and named entity tagging). Among these do uments, FIDJI looks for senten es       ontaining
the most synta ti      relations of the question. Finally, answers are extra ted from these senten es
and the answer type, when spe ied in the question, is validated. Figure 1 presents the ar hite ture
of FIDJI and more details      an be found in [4, 3℄. Next se tions summarize the way FIDJI extra t
answers and fo use on ResPubliQA spe i ities.


2.1      Synta ti       analysis

FIDJI has to dete t synta ti     impli ations between questions and passages ontaining the answers.
Our system relies on synta ti     analysis provided by XIP, whi h is used to parse both the questions
and the do uments from whi h answers are extra ted.


      XIP [1℄ is a robust parser for Fren h and English whi h provides   dependen y relations and
named entity re ognition. The dependen y relations provided by XIP whi h are used by FIDJI are
mainly: SUBJ (subje t), OBJ (obje t), PREPOBJ (prepositional group), NMOD (noun modier),
VMOD (verb modier), COORDITEMS ( oordinated elements) CONNECT ( onne tor introdu -
ing    lause).


      The   named entities (NE) are tagged using a set of 8 types: person, organization, lo ation,
date (dened by XIP), as well as nationality, number, duration, age (that we added). XIP's lieu
(lo ation)       an be made more spe i   ( ountry, region,   ontinent...). We also added features to
allow for more pre ise types. For example, for number, we added the following features: length,
speed, weight, money, physi s, so that 0.55 euro in a Fren h stamp         osts 0.55 euro   an be
tagged as a NE and extra ted as an answer to What is the pri e of a Fren h stamp?.            Other
elements are also tagged, as names introdu ing persons: fun tions (leader...), professions (minis-
ter...), family indi ations (father...).


    Question analysis onsists in identifying:
    • The synta ti    dependen ies given by XIP;

    • The keywords submitted to Lu ene (words tagged as noun, verb adje tive or adverb by XIP);
    • The question type:
           Fa toid ( on erning a fa t, typi ally who, when, where questions),
           Denition (What is...),
           Boolean (expe ting a yes/no answer),
           List (expe ting an answer omposed of a list of items),
           Complex questions (why and how questions).
    • The expe ted type(s): NE type and/or (spe i ) answer type.


    The answer to be extra ted is represented by a variable (ANSWER) introdu ed in the depen-
den y relations. The slot noted 'ANSWER' is expe ted to be instantiated by a word, argument of
some dependen ies of the parsed senten es. This word represents the answer to the question (see
Se tion 2.2). The question type is mainly determined on the basis of the dependen y relations
given by the parser. For example:


0015 - Entre quels pays a été on lu l'a ord- adre de oopération ommer iale et é onomique du
2 avril 1990 ?
(Between whi h ountries is the Framework Agreement for trade and e onomi     ooperation of 2
April 1990? )
    • Synta ti   dependen ies and NE tagging:
         ATTRIBUTADJ( oopération, ommer ial)            ATTRIBUTADJ( oopération, é onomique)
         ATTRIBUT_DE(a ord- adre, oopération)           VMOD( on lure, ANSWER)
         PREPOBJ(ANSWER, entre)                         ATTRIBUT( on lure, a ord- adre)
         DATE(2 avril 1990)                             LIEU[PAYS℄(ANSWER)
    • Question type: list
    • Expe ted type: lo ation (state)


     Comment en ourage-t-on la produ tion de graines de vers à soie ?
0021 -
How is interest in produ ing silkworm eggs in reased? )
(


    • Synta ti   dependen ies and NE tagging:
         ATTRIBUT_DE(graine, vers)              ATTRIBUT_DE(produ tion, graine)
         DEEPOBJ(en ourager, produ tion)        NMOD(vers, soie)
         TOPIC(en ourager)
    • Question type:     omplex

    • Expe ted type: ∅
2.2      Extra ting       andidate paragraphs

ResPubliQA answer format is dierent from traditional QA            ampaigns.    First, answers are not
fo used, short parts of texts, but full paragraphs that must      ontain the answer. Se ond, passages
are not indenite parts of texts of limited length; they must be predened paragraphs identied
in the   olle tion by XML tags <p>.
   Although answers to submit to the        ampaign are full paragraphs, our system is designed to
hunt down short answers. For most questions, typi ally fa toid questions, it is still relevant to nd
short answers, and then to return a paragraph     ontaining the best answer. This is not the          ase of
'how' or 'why' questions, where no short answer may be retrieved.
   FIDJI usually works at senten e level. For the aim of ResPubliQA spe i              rules, we   hose to
work at paragraph level. This    onsisted in spe ifying that senten e separators were <p> XML tags
in the   olle tion, rather than usual end-of-senten e markers.
   On e     andidate do uments are sele ted by the sear h engine and analyzed by the parser, the
system    ompares the do ument paragraphs with question analysis, in order to:


   • Extra t     andidate answers or sele t a relevant paragraph;

   • Give a s ore to ea h answer, so that nal answers      an be ranked.


2.2.1 Fa toid questions
Within sele ted do uments,       andidate paragraphs are those      ontaining the most dependen ies
from the question. On e these paragraphs are sele ted, two        ases   an o   ur:


  1. Question dependen ies with an 'ANSWER' slot are found in the senten e. In this                 ase, the
      lemma instantiating this slot is the head of the answer. The full answer is         omposed of the
      head and its basi   modiers (for a noun phrase: noun       omplements, adje tives, determiners
      and    oordinated elements; for a verbal phrase:     verb    omplements, subje t and obje t).
      The eventual NE type and answer type of this answer are            he ked.      Answer type    an be
      validated by dierent synta ti   relations in the text: denition ("The Fren h Prime minister,
      Pierre Bérégovoy"), attributNN ("Pierre Bérégovoy is the Fren h Prime minister"), and
      sometimes attribut_de ("la maladie de Parkinson", Parkinson's desease, literally "the disease
      of Parkinson").

  2. The 'ANSWER' slot does not unify with any word of the passage. In this              ase, the elements
      having an appropriate NE type and/or answer type are sele ted in the senten e. This is done
      in order to   ounterbalan e the many parsing errors (or paraphrases). Often, the senten e
         ontains the answer but synta ti   dependen ies alone do not lead to it.


   If no possible short answer is found, the paragraph is still     onsidered as a       andidate answer.
But in any    ase, a paragraph   ontaining an extra ted short answer will be prefered if it exists.


Example 1.

0015 - Entre quels pays a été on lu l'a ord- adre de oopération ommer iale et é onomique du
2 avril 1990 ?
(Between whi h ountries is the Framework Agreement for trade and e onomi     ooperation of 2
April 1990? )
   • Synta ti    dependen ies and NE tagging:
         ATTRIBUTADJ( oopération, ommer ial)              ATTRIBUTADJ( oopération, é onomique)
         ATTRIBUT_DE(a ord- adre, oopération)             ATTRIBUT( on lure, a ord- adre)
         VMOD( on lure, ANSWER)                           PREPOBJ(ANSWER, entre)
         DATE[DATEABS℄(2 avril 1990)                      LIEU[PAYS℄(ANSWER)
   • Question type: list
     • Expe ted type: lo ation (state)


     The following passage is sele ted be ause it    ontains the dependen ies of the question:


Passage: un a ord- adre de oopération ommer iale et é onomique entre la Communauté é onomique
européenne et la République argentine (3) a été on lu le 2 avril 1990 ;
(Considering the Framework Agreement for trade and e onomi       ooperation between the European
E onomi Community and the Argentine Republi of 2 April 1990; )
    ATTRIBUTADJ( oopération, ommer ial)    ATTRIBUTADJ( oopération, é onomique)
    ATTRIBUT_DE(a ord- adre, oopération) ATTRIBUT( on lure, a ord- adre)
    NMOD( oopération, ommunauté é onomique européen)
    PREPOBJ( ommunauté é onomique européen, entre)
    COORDITEMS( ommunauté é onomique européen, république argentin)
    LIEU[PAYS ℄(république argentin)
    DATE(2 avril 1990)
    ORG( ommunauté é onomique européen)

     The slot 'ANSWER' is instantiated by     ommunauté é onomique européenne. As the question
type is 'list', the elements of the list has to be found in a 'COORDITEMS' dependen y: so, the
answers are   ommunauté é onomique européenne and république argentine. Finally, the expe ted
answer type is validated: the sele ted answer is tagged as a lo ation (state).


Example 2.

     Quel est le nom de la monnaie des états membres depuis le 1er janvier 1999 ?
0026 -
What is the name of the member states' urren y from 1 January 1999? )
(


     • Synta ti   dependen ies and NE tagging:
         ATTRIBUT_DE(monnaie, état)                  NMOD(état, membre)
         PREPOBJ(1er janvier 1999, depuis)           DEFINITION(ANSWER, monnaie)
         DATE(1er janvier 1999)
     • Question type: denition
     • Expe ted type: ∅


     The following passage is sele ted be ause it    ontains all the dependen ies of the question:


Passage:    onsidérant que le règlement (CE) n 974/98 du Conseil du 3 mai 1998 on ernant
l'introdu tion de l'euro (3) prévoit à son arti le 2 que, à ompter du 1er janvier 1999, la monnaie
des États membres parti ipants est l'euro ;
(Whereas Coun il Regulation (EC) No 974/98 of 3 May 1998 on the introdu tion of the euro (3),
provides in Arti le 2 that from 1 January 1999 the urren y of the parti ipating Member States
shall be the euro )
    ATTRIBUTADJ(membre, parti ipant)        ATTRIBUT_DE(monnaie, état)
    NMOD(état, membre)                      PREPOBJ(1er janvier 1999, à ompter de)
    DEFINITION(euro, monnaie)               DATE(1er janvier 1999)
    ...

and the slot 'ANSWER' is instantiated by     euro.
2.2.2 Complex questions
Complex questions ('how', 'why', et .) do not expe t any short answer. On these kinds of ques-
tions, the system behaves more as a passage retrieval system. The paragraphs         ontaining the more
synta ti    dependen ies in    ommon with the question are sele ted. Among them, the best-ranked
is the one that is returned rst by Lu ene. For example:


     Pourquoi onvient-il de revoir l'ar hite ture du réseau Animo ?
0155 -
Why should the stru ture of an ANIMO network be revised? )
(


      • Synta ti   dependen ies and NE tagging:
          VMOD( onvenir, revoir)                        DEEPOBJ(revoir, ar hite ture)
          ATTRIBUT_DE(ar hite ture, réseau)             NMOD(réseau, animo)
      • Question type: omplex (why)
      • Expe ted type: ∅


      The following passage is sele ted be ause all the dependen ies of the question are found in the
passage:


Passage:   onsidérant que, à la suite de diérents travaux ee tués dans le adre ommunautaire,
notamment lors d'études et de séminaires, il onvient de revoir l'ar hite ture du réseau Animo
an de pro éder à la mise en pla e d'un système vétérinaire intégrant les diérentes appli ations
informatisées ;
(Whereas, as a result of the work arried out at Community level in the ourse of studies and
seminars, the stru ture of the ANIMO network should be revised so that a veterinary system
integrating the various omputer appli ations an be introdu ed; )
    DEEPSUBJ( onvenir, il)                   VMOD( onvenir, revoir)
    DEEPOBJ(revoir, ar hite ture)            ATTRIBUT_DE(ar hite ture, réseau)
    NMOD(réseau, animo)
    PREPOBJ(pro éder, afin de)               VMOD(pro éder, mise)
    PREPOBJ(mise, à)                         NMOD(mise, pla e)
    ...


2.3       S oring

FIDJI's s ores are not      omposed of a single value, but of a list of dierent values and ags. The
    riteria are listed below, and are presented in de reasing order of importan e:


      • As we said, a paragraph     ontaining an extra ted short answer will be prefered if it exists.

      • Named entity value (appropriate NE value or not  only for fa toid questions).
      • Keyword rate (between 0 and 1, the rate of question major keywords present in the passage:
        proper names, answer type and numbers).

      • Answer type value (appropriate answer type or not  only for fa toid questions).
      • Frequen y weighting (number of extra ted o         urren es of this answer  only for fa toid
        questions).

      • Do ument ranking (best rank of a do ument ontaining the answer, as returned by the sear h
        engine. If this   ase, the lower the better).
3 Results
We present the results Table 1 by types of questions. Only one answer per question was allowed,
so the values simply        orrespond to the rate of    orre t answers for ea h question type.


                     Question type         Number of questions         Corre t answer
                     Fa toid                           116                  36.2 %
                     Denition                         101                  15.8 %
                     List                              37                   16.2 %
                     "How"                             76                   22.4 %
                     "Why"                             170                   40 %
                     TOTAL                             500                  30.4 %


                                 Table 1: FIDJI results by question types.


      Results are lower than former       ampaigns' s ores, espe ially   on erning fa toid and denition
questions.
      Looking     arrefully at the results shows that, in these parti ular do uments, using synta ti
dependen ies as the main        lue to   hoose paragraph andidates is not always a good way to nd out
a relevant passage. This is espe ially true for     omplex questions, but not only. Indeed, the sele tion
of the paragraph       ontaining the most question dependen ies often leads to the introdu tion of the
do ument or to a very general paragraph          ontaining poor information.
      For example:


      0006 -   What is the s ope of the oun il dire tive on the trading of fodder seeds?
      is answered by


   <p>COUNCIL DIRECTIVE of 14 June 1966 on the marketing of fodder plant seed
(66/401/EEC)</p>
       ontaining many dependen ies but answering nothing, while a good result was later in the
same do ument, but with an anaphora:


    <p>This Dire tive shall apply to fodder plant seed marketed within the Community, irrespe -
tive of the use for whi h the seed as grown is intended.</p>
      Dependen y relations are still useful to nd the good do ument, but often fails to point out to
the    orre t paragraph.


Also, JRC-A quis        orpus uses a dierent register of language than usual          orpora su h a Web
or newspapers. Question as well as do ument analyses suered from the spe i               expressions and
stru tures used by Fren h texts, and espe ially for denitions. Denitions, quite easy to dete t in
newspaper        orpora, have been poorly re ognized for this evaluation.


4 Con lusion
We presented in this arti le our parti ipation to the          ampaign resPubliQA 2009 in Fren h. We
adapted our synta ti -based QA system FIDJI in order to produ e a single long answer in the
form of JRC-A quis tagged paragraphs. Results showed that synta ti             analysis should be used in
dierent manners a          ording to the type of tasks and questions. A      areful look at our system's
errors should enable improvement of robustness of the sear h by applying             ontextual strategies.
Referen es
[1℄ Salah Aït-Mokhtar and Jean-Pierre Chanod. In remental nite-state parsing. In  Pro eedings
   of the fth onferen e on Applied natural language pro essing, pages 7279, Washington, DC,
   USA, 1997. Morgan Kaufmann Publishers In ., San Fran is o, California, USA.

[2℄ Erik Hat her and Otis Gospodneti¢.   Lu ene in A tion. Manning, 2004.
[3℄ Véronique Mori eau and Xavier Tannier. Étude de l'apport de la syntaxe dans un système de
   question-réponse. In A tes de la Conféren e Traitement Automatique des Langues Naturelles
   (TALN 2009, poster), Senlis, Fran e, jun 2009.
[4℄ Véronique Mori eau, Xavier Tannier, and Brigitte Grau. Utilisation de la syntaxe pour valider
   les réponses à des questions par plusieurs do uments. In Pro eedings of workshop on COn-
   féren e en Re her he d'Information et Appli ations, CORIA, Presqu'île de Giens, Fran e, 2009.