<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Spoken Language Pro essing Group, LIMSI-CNRS, B.P.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <volume>133</volume>
      <issue>91403</issue>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2 Analysis of do uments and queries</title>
      <p>4. Splitting into senten es at period marks.
In the QAst evaluation, four data types are of interest:
2.1 Normalization
1. Separating words and numbers from pun tuation.
3. Adding pun tuation.
2. Re onstru ting orre t ase for the words.
expressions ontained in a orpus questions and texts from a variety of sour es (pro eedings,
Following ommonly adopted denitions, the named entities are expressions that denote lo ations,
su ien t to analyze the wide range of user utteran es that an be found in le tures or meetings
trans ripts of le tures, dialogs et .). Figure 2 summarizes the dieren t entity types that are used.
trans ripts. Therefore we dened a set of spe i entities in order to olle t all observed information
people, ompanies, times, and monetary amounts. These entities have ommonly known and
named entity. However our experien e is that the information present in the named entities is not
a epted names. For example if the ountry Fran e is a named entity, apital of Fran e is not a</p>
    </sec>
    <sec id="sec-2">
      <title>2.2.2 Automati dete tion of typed entities</title>
      <p>2.2.1 Denition of Entities
Figure 1: Examples of pertinent information hunks from the CHIL data olle tion
Figure 2: Examples of the main entity types
question markers
2.2 Non ontextual analysis module
_prep in _org NIST _NN metadata evaluations _verb reported _NN speaker tra king
_s ore error rates _aux are _prep about _val_s ore 15 %
linguisti hunk</p>
    </sec>
    <sec id="sec-3">
      <title>The types we need to dete t orrespond to two levels of analysis: named-entity re ognition and</title>
      <p>hunk-based shallow parsing. Various strategies for named-entity re ognition using ma hine
learn</p>
    </sec>
    <sec id="sec-4">
      <title>Type of entities</title>
      <p>lassi al
named entities
Qmeasure: what is the weight of the blue spoon headset
org: European Commission ; NATO
lo : Cambridge ; England
adj_sup: the biggest produ er of o oa of the world
event: the 9th onferen e on spee h ommuni ation and te hnology
pers: Romano Prodi ; Winston Chur hill
verb: Roberto Martinez now knows the full size of the task
time: third entury ; 1998 ; June 30th
measure: year ; mile ; Hertz
olor red, spring green
Qlo : where is IBM
prod: Pulp Fi tion ; Titani
Examples
amount: 500 ; two hundred and ft y thousand
ompound: language pro essing ; information te hnology
method: HMM, Gaussian mixture model
Qpers: who wrote... ; who dire ted Titani
adj_ omp: the mi rophones would be similar to ...
named entities
extended
The analysis is onsidered non- ontextual be ause ea h senten e is pro essed in isolation. The
following se tions, the types of entities handled by the system are des ribed, along with how they
and extra tion, whi h we all pertinent information hunks. These an be of dieren t ategories:
words that do not fall into su h hunks are automati ally grouped into hunks via a
longestare re ognized.
general obje tive of this analysis is to nd the bits of information that may be of use for sear h
mat h strategy. Some examples of pertinent information hunks are given in Figure 1. In the
named entities, linguisti entities (e.g. verbs, prepositions), or spe i entities (e.g. s ores). All
analysis of the do uments. However, in the present work, performing this sort of analysis is not
based approa hes to named-entity re ognition (e.g. [15℄) rely on morphosynta ti and/or synta ti
an XML-based one for interoperability and to allow haining of instan es of the tool with dieren t
a larger expression). Wmat h in ludes also NLP-oriented features like strategies for prioritizing
ing te hniques have been proposed [12, 13, 14℄. In these approa hes, a statisti ally pertinent
We de ided to ta kle the problem with rules based on regular expressions on words as in other
feasible: the spee h trans riptions are too noisy to allow for both a urate and robust linguisti
expressions and enables the use of lasses (lists of words) and ma ros (sub-expressions in-line in
alled Wmat h. This engine mat hes (and substitutes) regular expressions using words as the base
omprises some 50 steps and takes roughly 4 ms on a typi al user utteran e (or do ument senten e).
and therefore rely on the availability of large annotated orpora whi h are di ult to build.
Ruleunit instead of hara ters. This property allows for a more readable syntax than traditional regular
order to speed up the pro ess. Analysis is multi-pass, and subsequent rule appli ations operate
ategories (number, a ronym, proper name...). It has multiple input and output formats, in luding
simple ategorizations. The tool used to implement the rule-based automati annotation system is
analysis based on typi al rules and the pro essing time of most of existing linguisti analyzers is
rule appli ation, re ursive substitution modes, word tagging (for tags like noun, verb...), word
The analysis provides 96 dieren t types of entities. Figure 3 shows an example of the analysis on
works [16℄: we allow the use of lists for initial dete tion, and the denition of lo al ontexts and
a query (top) and on a trans ription (bottom).
on the results of previous rule appli ations whi h an be enri hed or modied. The full analysis
rule sets. Rules are pre-analyzed and optimized in several ways, and stored in ompa t format in
not ompatible with the high speed we require.
overage of all dened types and subtypes indu ed the need of a large number of o urren es,
and stop as soon as we get do ument snippets (senten e or small groups of onse utive
senten es) ba k.
2. Snippet retrieval: we submit ea h query, a ording to their rank, to the indexation server,</p>
    </sec>
    <sec id="sec-5">
      <title>3. Answer extra tion and sele tion: the dete tion of the answer type has been extra ted</title>
      <p>andidate answers is done, based on frequen ies. The most frequent answer wins, and the
beforehand from the question, using Question Marker, Named, Non-spe i and Extended
Entities o-o urren es (_Qwho _pers or _pers_def or _org). Therefore, we sele t the →
entities in the snippets with the expe ted type of the answer. At last, a lustering of the
distribution of the ounts gives an idea of the onden e of the system in the answer.</p>
      <p>The ba k-o queries lists require a large amount of maintenan e work and will never over •
all of the ombinations of entities whi h may be found in the questions.
rank andidate answers with the same s ore.</p>
      <p>The answer sele tion uses only frequen ies of o urren e, often ending up with lists of rst- •
may sometimes be very large. To limit the number of snippets is not easy, as they are not
ranked a ording to pertinen e.</p>
      <p>The system answering speed dire tly depends on the number of snippets to retrieve whi h •</p>
      <p>Re all
22.6%
32.2%
57.1%
28.5%
41.6%
43.8%
28.5%
22.6%
is 5; Passage without limit there is no limit for the passage number; A . is the a ura y, MRR is
the Mean Re ipro al Rank and Re all the total number of orre t answers in the returned answers
Table 2: Results for Passage Retrieval for System 2. Passage 5 the maximum of passage number
30.2% 0.38 68.8%
29.6% 0.37 57.0%
44.9% 0.53 71.4%
18.3% 0.24 51.6%
A . MRR Re all
Passage without limit
Sys2
Sys1
Sys1
Sys2
Sys2
Sys2
Sys1
Sys1
System
30.2% 0.37 47.9%
18.3% 0.22 31.2%
44.9% 0.52 67.3%
A . MRR Re all
Passage limit = 5
29.6% 0.36 46.9%</p>
    </sec>
    <sec id="sec-6">
      <title>Re ipro al Rank and Re all the total number of orre t answers in the 5 returned answers</title>
      <p>Table 1: General Results. Sys1 System 1; Sys2 System 2; A . is the a ura y, MRR is the Mean
T4
Task
T1
T2
T3
of do ument/snippet queries greatly improves the overage as ompared to hand rafted rules.
System 2 did not perform better than System 1 on the T2 task. Further analysis is needed to
understand why.
answer has a reasonnable margin for improvement. The dieren e between the snippet Re all and
passage retrieval in two onditions: with a limitation of the number of passages at 5 and without
its A ura y (from 26 to 38% for the no limit ondition) illustrates that the snippet s oring an
limitation. The diferen e between the Re all on the snippets (how often the answer is present
be improved.
answer extra tion. The passage retrieval is easier to evaluate for System 2 be ause it is a omplete
in the sele ted snippets) and the QA A ura y show that the extra tion and the s oring of the
separate module, whi h is not the ase in the System 1. The Table 2 give the results on the
The dieren t modules we an evaluate are the analysis module, the passage retrieval and the</p>
    </sec>
    <sec id="sec-7">
      <title>We observed large dieren es with the results obtained on the development data, in parti u</title>
      <p>larly with the method, olor and time ategories. The analysis module has been built on orpus
lass sele ts spe i entities (method, models, system, language...) over the other entity types for
One of the key uses of the analysis results is routing the question whi h is determining a rough
absen e of major dieren es between System 1 and System 2 for the T1/T2 tasks. Most of the
entity types have equal priority.
observations and it seems to be too dependant on the development data. That an explain the
were not routed.
given in Table 3 with details by answer ategory. Two questions of T1/T2 and three of T3/T4
the possible answers. In System 2 no su h adaptation to the task has been done and all possible
lass for the type of the answer (language, lo ation, ...). The results of the routing omponent are
wrongly routed questions have been routed to the generi answer type lass. In System 1 this
98
72%
All
95%
ORG
20
6
50%
73%
11
T1/T2
Referen es
T3/T4
2. More pertinent answer s oring using proximities whi h allows a smoothing of the results.
T3/T4
T1/T2
7 A knowledgments
6 Con lusion and future work
80%
96</p>
    </sec>
    <sec id="sec-8">
      <title>Two dieren t systems have been used for this parti ipation. The two main hanges between</title>
      <p>The results show that the System 2 outperforms the System 1. The main reasons are:
System 1 and System 2 are the repla ement of the large set of hand made rules by the automati
generation of a resear h des riptor, and the addition of an e ien t s oring of the andidate answers.
We presented the Question Answering systems used for our parti ipation to the QAst evaluation.
MAT
93%
14
12
83%
% Corre t
# Questions
28
MEA
75%
2
100%
question and do ument types.
3. Presen e of various tuning parameters whi h enable the adaption of the system to the various
[1℄ E. M. Voorhees, L. P. Bu kland. The Fifteenth Text REtrieval Conferen e Pro eedings (TREC
2006), In Voorhees and Bu kland eds. 2006.
100%
4
LAN
SHA
13
85%
% Corre t
# Questions
89%
9
% Corre t
# Questions</p>
    </sec>
    <sec id="sec-9">
      <title>MET: method/system; ORG: organization; PER: person; TIM: time; SHAP: shape; COL: olour.</title>
      <p>Table 3: Routing evaluation. All: all questions; LAN: language; LOC: lo ation; MEA: measure;
10
TIM
80%
9
89%
LOC
handle spee h re ognition errors. The best result is 18.3% on meeting and 21.3% on le tures. From
meetings, 24% for A ura y. There was no spe i eort done on the automati ally trans ribed
(like the weight of the transformations generated in the DDR) have not been optimized yet.
le tures and meetings, so the performan es only give an idea of what an be done without trying to
some type of questions whi h should improve the answer typing and extra tion. The s oring of the
the analysis presented in the previous se tion, performan e an be improved at every step. For
snippets and the andidate answers an also be improved. In parti ular some tuning parameters
example, the analysis and routing omponent an be improved in order to better take into a ount
These systems have been evaluated on dieren t data orresponding to dieren t tasks. On
the manually trans ribed le tures, the best result is 39% for A ura y, on manually trans ribed
89%
PER
9
COL
% Corre t
# Questions
80%
15
71%
14</p>
    </sec>
    <sec id="sec-10">
      <title>This work was partially funded by the European Commission under the FP6 Integrated Proje t</title>
      <p>IP 506909 Chil and the LIMSI AI/ASP Ritel grant.
17%
18
MET
[9℄ AMI proje t. http://www.amiproje t.org
[7℄ CHIL Proje t. http:// hil.server.de
[3℄ C. Aya he, B. Grau, A. Vilnat. Evaluation of question-answering systems : The Fren h
EQueREVALDA Evaluation Campaign. Pro eedings of LREC’06, Genoa, Italy.
[16℄ S. Sekine. Denition, di tionaries and tagger of Extended Named Entity hierar hy. Pro
eedings of LREC’04, Lisbon, Portugal. 2004.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>