<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G. Bouma, I. Fahmi, J. Mur, G. van Noord, L. van der Plas, and J. Tiedemann. Linguisti
knowledge and question answering. Traitement automatique des langues</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <volume>46</volume>
      <issue>2007</issue>
      <abstract>
        <p>mulation of the question. A syntax-based strategy, where the system de ides whether the supporting text is a refor- • presen e of ommon words in the question and in the text, word distan e, et . A ma hine learning strategy, where several features are ombined in order to validate answers: • Most of question-answering (QA) systems an extra t the answer to a fa toid question when this one is expli itly present in texts, but in the opposite ase, they are not able to ombine dieren t pie es of information for produ ing an answer. (Finding In Do uments Justi ations and FIDJI1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>synta ti impli ations between Q and T. There are mainly two ases:
attribut(footballeur, franais)
extra ted. Syntex outputs are here given in an easily readable format. The named entities of
information, espe iallly dependen y relations: The goal is to mat h the dependen y relations
Text: Lionel Mathis est un footballeur franais nØ le 4 o tobre 1981
Montreuil-sous-Bois (Fran e) (Lionel Mathis is a Fren h footballer born. . . )
Figure 1: Ar hite ture of FIDJI and answer validation system
2.1 Pro essing of supporting texts
should evolve a lot in the future.
hara teristi s of the question Q an be retrieved in the text T. Then, the answer proposed by
1. There is an exa t mat hing between synta ti dependen ies of Q and T: the NP whi h unies
ar hite ture of FIDJI and its adjustments for the AVE task. The system is at its beginning and
T. To determine if the hara teristi s of the question Q an be retrieved in text T, FIDJI dete ts
Q141: Qui est Lionel Mathis ? (Who is Lionel Mathis?)
derived from the question and those of the potential answer, as in [8℄. Figure 1 presents the
approa h is to dete t, for a given question (Q)/answer text (T) tuple, if all the (Aave)/supporting
whi h answers have to be extra ted.</p>
      <p>Our system relies on synta ti analysis provided by Syntex [3℄, a dependen y parser for Fren h.</p>
      <p>NNPR(Mathis, Lionel)
attribut(footballeur, nØ)
Syntex is used to parse questions as well as the do ument olle tion from whi h answers are
Example:
our system is ompared to if the answer is validated and justied by (Afidji) Aave: Aave=Afidji,
NNPR(Mathis, Lionel) (proper noun relation)
attribut(ANSWER, Mathis)
Inferen es), an open-domain QA system for Fren h, aims at going beyond this insu ien y and
with the variable of the question representing the answer is extra ted:
do uments are also tagged. For the AVE task, supporting texts are onsidered as do uments from
2.1.1 Synta ti analysis
To apply our system to the AVE ompetition, all supporting texts are synta ti ally parsed. The
attribut(footballeur, Mathis)
fo uses on introdu ing text understanding me hanisms relying on inferen es. FIDJI uses synta ti
attribut_de(tremblement, terre)
attribut_de(nord, rØgion)
AUX(Œtre, se ouer) ⇒
modif_par(se ouer, tremblement)
DATE( , 17 janvier)
SUJ(se ouer, nord)
attribut_de(rØgion, los angeles)</p>
    </sec>
    <sec id="sec-2">
      <title>3. The rate of missing dependen ies was under a given threshold. This threshold has been</title>
      <p>experimentally set to 30% by testing dieren t ongurations on AVE 2006 and AVE 2007
olle tions.</p>
    </sec>
    <sec id="sec-3">
      <title>2. The NE type was the proper one,</title>
      <p>1. It was also an answer suggested by FIDJI,</p>
    </sec>
    <sec id="sec-4">
      <title>2.3 Answer validation for AVE: heuristi s</title>
      <p>modules des ribed above provide information on erning:
At the urrent state of our system, a few heuristi s are used to validate an answer. The dieren t
the orresponding word is extra ted (see se tion 2.1.1).</p>
      <p>If the slot for the answer in question dependen ies is unied in the andidate senten e, then •
the senten e before.</p>
      <p>If not, named entities having the expe ted type (if existing) are sele ted in the senten e and •</p>
    </sec>
    <sec id="sec-5">
      <title>3.1 Common terms</title>
      <p>3 FRASQUES as an entry of a ma hine learning system
Results are presented and dis ussed in se tion 4.
3.2 Answer veri ation</p>
    </sec>
    <sec id="sec-6">
      <title>The next se tions present the dieren t features.</title>
      <p>from the data provided by AVE 2006 and ontains 75% of the total data.</p>
      <p>The spe i features based on the vo abulary are presented in [1℄, while [5℄ shows and evaluates
The hosen lassier is a ombination of de ision trees with the bagging method. It is provided
these features and presents the ma hine learning method.</p>
      <p>FRASQUES [6℄ in order to ompute some of the learning features. The learning set is extra ted
The se ond system follows a ma hine learning approa h and applies the question-answering system
by the program that allows to test a lot of lassiers. WEKA4</p>
    </sec>
    <sec id="sec-7">
      <title>Jospin? , the fo us is Lionel Jospin.</title>
      <p>or a denition of this entity has to be sear hed. In Whi h is the politi al party of Lionel
Fo us: The fo us is the entity about whi h the question is asked and either a hara teristi •
by applying some synta ti rules. In the previous question, the expe ted type is politi al
a passage, it allows the system to he k that the proposed answer ts the expe ted type
Answer type: When the spe i answer type is expli it in a question and re ognized in •
party.
1. In order to fa ilitate the omparison between the text and the hypothesis, the words are
normalized (lemmatization and bringing of synonyms together).
of words in the hain and the number of words in the hypothesis.
4. The longest hain is sele ted and the value of the feature is the ratio between the number</p>
    </sec>
    <sec id="sec-8">
      <title>3.4 Che king the answer type with Wikipedia</title>
      <p>3.3 Longuest ommon hain of words
4 Results and omments
Some features oming from FIDJI are also added:
3.5 FIDJI features</p>
    </sec>
    <sec id="sec-9">
      <title>2. The algorithm looks for the longest groups of adja ent words ommon to the question and</title>
      <p>the hypothesis.</p>
    </sec>
    <sec id="sec-10">
      <title>ZinØdine Zidane pra ti e? expe ts a kind of sport as answer. To verify the type of the answer,</title>
      <p>The method looks for the type in the Wikipedia page whose title ontains the answer. If the
type is found, the value of the feature is 1 else it is 0. For questions without expe ted type, the
answer, we onsider that the answer orresponds to the expe ted type.</p>
      <p>A lot of questions expe t an answer of a spe ied type. For example the question What sport did
we use the en y lopaedia Wikipedia5.
The hypothesis is that if the type an be found in the Wikipedia page orresponding to the
value is -1.</p>
    </sec>
    <sec id="sec-11">
      <title>Pre ision over YES pairs 0.29</title>
      <p>F measure 0.45
Re all over YES pairs 1</p>
    </sec>
    <sec id="sec-12">
      <title>Pre ision over YES pairs 0.67</title>
      <p>F measure 0.63
Re all over YES pairs 0.60
qa a ura y 0.23
estimated_qa_performan e 0.32
b. Results for ML run with FIDJI
Re all over YES pairs 0.52
F measure 0.61
Pre ision over YES pairs 0.75
qa a ura y 0.19
Re all over YES pairs 0.42
a. Results for FIDJI alone run
estimated_qa_performan e 0.29
F measure 0.57
Pre ision over YES pairs 0.88
the following pair:
Table 2 shows the results obtained when we do not in lude hara teristi s oming from the
are orre t. These are the results provided by one parti ipant to the monolingual Fren h QA task
task. In order to tend towards these goals, the Fren h AVE test set ould not only be made of the
Table 3, orresponds to the strategy onsisting in answering YES to ea h pair.
systems that have given 3200 answers to 190 questions : among them, 627 answers were justied.
tion is. Whi h pie es of information must the passage ontain? In AVE, it seems that if the orre t
The goal of an evaluation ampaign is generally twofold : to provide ressour es allowing to
of one QA system, whi h is the best system in this language, but annot allow to measure its
be signi an t and the phenomena to treat representative of a task, and not of a system.
have 3 proposed answers. The ratio of validated pairs is 29%, that is to say that only 52 triples
Now, the question is to know the signi ation of this test set. This year, the Fren h test set
develop systems able to solve a task and omparing the dieren t approa hes developed for this
question : 39 questions have 1 answer to justify, 47 questions have 2 answers, and 22 questions
answer is in the passage, it is validated, even if the topi is only present with an anaphora, as in
If we ompare this test set to the test set provided for Fren h in 2006, there were 5 dieren t
is made of 199 triples, built from 108 dieren t questions. There are 1.8 triples in average for ea h
they are al ulated over a total of 50? One answer is equivalent to 2 points.
ability in a general exer i e of answer validation. Moreover, are the results really signi an t when
results of the urrent QA tra ks. It ought to be ompleted so that the number of examples will
Another important point on erns the denition of what a justi ation of an answer to a
quesSo, the urrent test set ould only measure the ability of a AVE system to evaluate the results
and the bilingual tasks with Fren h as target. Thus, the answers result from a single system.
FIDJI system. Finally, a baseline for Fren h AVE, provided by the organizers and presented in
are there in Colombo in 2001 ? )
J: La ville ompte 377 396 habitants en 2001 pour 2 234 289 dans l’agglomØration et ’est la ville
A: 377 396
Q: Combien la ville de Colombo omptait-elle d’habitants en 2001? (How many inhabitants
la plus peuplØe du Sri Lanka, ainsi que le ‰ur de l’a tivitØ ommer iale de e pays. (The town has
377 396 inhabitants in 2001 ...)
ongrŁs ISKO-Fran e, L’organisation des onnaissan es, Grenoble, 2003.
[4℄ F. Elkateb. Extra tion d’entitØs nommØes pour la re her he d’informations prØ ise. In 4Øme
[8℄ B. Katz and J. Lin. Sele tively using relations to improve pre ision in question answering.
In Pro eedings of workshop on Natural Language Pro essing for Question Answering, EACL,
Budapest, 1999.
Heidelberg, 1996.
ne tionist, Statisti al and Symboli Approa hes to Learning for Natural Language Pro essing,
[7℄ C. Ja quemin. A symboli and surgi al a quisition of terms through variation. In
Consimilar between the question and the passage, but also that this vo abulary is used in the same
has been developed last year, and we have added this year a new feature based on the dependen y
The rst is based on a synta ti approa h in order to verify that not only the vo abulary is
relations. We have to test our results on other orpora in order to validate the gain that this
tested another approa h onsisting in de iding if a passage is a justi ation or not a ording to a
Su h an approa h has good performan es at the pre ision level, but the re all remains low,
by a given extra t of text.
set of features. The de ision is the result of a lassier, automati ally trained. These last approa h
be ause of errors done by the synta ti parser, as in all these kind of approa hes. So, we also
feature seems to bring out.
We have presented in this paper two strategies for de iding if an answer to a question is justied
meaning. This is done by verifying the similarity of the relations between the orresponding terms.
Some heuristi s are then hosen for de iding if a passage justies or not an answer.</p>
    </sec>
    <sec id="sec-13">
      <title>5 Con lusion</title>
      <p>Referen es</p>
    </sec>
    <sec id="sec-14">
      <title>Cahiers de Grammaire, 25, 2000.</title>
      <p>[3℄ D. Bourigault and C. Fabre. Appro he linguistique pour l’analyse syntaxique de orpus.</p>
    </sec>
    <sec id="sec-15">
      <title>Whithout reading the do ument that ontains this passage, it is not possible to assert that La</title>
      <p>ville (the town) is Colom bo, even if the name of the do ument is COLOMBO. The only name
of a Wikipedia page annot allow to verify the referen e of the anaphora.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>