2 Analysis of do uments and queries

Spoken Language Pro essing Group, LIMSI-CNRS, B.P.

2006

133 91403

4. Splitting into senten es at period marks. In the QAst evaluation, four data types are of interest: 2.1 Normalization 1. Separating words and numbers from pun tuation. 3. Adding pun tuation. 2. Re onstru ting orre t ase for the words. expressions ontained in a orpus questions and texts from a variety of sour es (pro eedings, Following ommonly adopted denitions, the named entities are expressions that denote lo ations, su ien t to analyze the wide range of user utteran es that an be found in le tures or meetings trans ripts of le tures, dialogs et .). Figure 2 summarizes the dieren t entity types that are used. trans ripts. Therefore we dened a set of spe i entities in order to olle t all observed information people, ompanies, times, and monetary amounts. These entities have ommonly known and named entity. However our experien e is that the information present in the named entities is not a epted names. For example if the ountry Fran e is a named entity, apital of Fran e is not a

2.2.2 Automati dete tion of typed entities

2.2.1 Denition of Entities Figure 1: Examples of pertinent information hunks from the CHIL data olle tion Figure 2: Examples of the main entity types question markers 2.2 Non ontextual analysis module _prep in _org NIST _NN metadata evaluations _verb reported _NN speaker tra king _s ore error rates _aux are _prep about _val_s ore 15 % linguisti hunk

The types we need to dete t orrespond to two levels of analysis: named-entity re ognition and

hunk-based shallow parsing. Various strategies for named-entity re ognition using ma hine learn

Type of entities

lassi al named entities Qmeasure: what is the weight of the blue spoon headset org: European Commission ; NATO lo : Cambridge ; England adj_sup: the biggest produ er of o oa of the world event: the 9th onferen e on spee h ommuni ation and te hnology pers: Romano Prodi ; Winston Chur hill verb: Roberto Martinez now knows the full size of the task time: third entury ; 1998 ; June 30th measure: year ; mile ; Hertz olor red, spring green Qlo : where is IBM prod: Pulp Fi tion ; Titani Examples amount: 500 ; two hundred and ft y thousand ompound: language pro essing ; information te hnology method: HMM, Gaussian mixture model Qpers: who wrote... ; who dire ted Titani adj_ omp: the mi rophones would be similar to ... named entities extended The analysis is onsidered non- ontextual be ause ea h senten e is pro essed in isolation. The following se tions, the types of entities handled by the system are des ribed, along with how they and extra tion, whi h we all pertinent information hunks. These an be of dieren t ategories: words that do not fall into su h hunks are automati ally grouped into hunks via a longestare re ognized. general obje tive of this analysis is to nd the bits of information that may be of use for sear h mat h strategy. Some examples of pertinent information hunks are given in Figure 1. In the named entities, linguisti entities (e.g. verbs, prepositions), or spe i entities (e.g. s ores). All analysis of the do uments. However, in the present work, performing this sort of analysis is not based approa hes to named-entity re ognition (e.g. [15℄) rely on morphosynta ti and/or synta ti an XML-based one for interoperability and to allow haining of instan es of the tool with dieren t a larger expression). Wmat h in ludes also NLP-oriented features like strategies for prioritizing ing te hniques have been proposed [12, 13, 14℄. In these approa hes, a statisti ally pertinent We de ided to ta kle the problem with rules based on regular expressions on words as in other feasible: the spee h trans riptions are too noisy to allow for both a urate and robust linguisti expressions and enables the use of lasses (lists of words) and ma ros (sub-expressions in-line in alled Wmat h. This engine mat hes (and substitutes) regular expressions using words as the base omprises some 50 steps and takes roughly 4 ms on a typi al user utteran e (or do ument senten e). and therefore rely on the availability of large annotated orpora whi h are di ult to build. Ruleunit instead of hara ters. This property allows for a more readable syntax than traditional regular order to speed up the pro ess. Analysis is multi-pass, and subsequent rule appli ations operate ategories (number, a ronym, proper name...). It has multiple input and output formats, in luding simple ategorizations. The tool used to implement the rule-based automati annotation system is analysis based on typi al rules and the pro essing time of most of existing linguisti analyzers is rule appli ation, re ursive substitution modes, word tagging (for tags like noun, verb...), word The analysis provides 96 dieren t types of entities. Figure 3 shows an example of the analysis on works [16℄: we allow the use of lists for initial dete tion, and the denition of lo al ontexts and a query (top) and on a trans ription (bottom). on the results of previous rule appli ations whi h an be enri hed or modied. The full analysis rule sets. Rules are pre-analyzed and optimized in several ways, and stored in ompa t format in not ompatible with the high speed we require. overage of all dened types and subtypes indu ed the need of a large number of o urren es, and stop as soon as we get do ument snippets (senten e or small groups of onse utive senten es) ba k. 2. Snippet retrieval: we submit ea h query, a ording to their rank, to the indexation server,

3. Answer extra tion and sele tion: the dete tion of the answer type has been extra ted

andidate answers is done, based on frequen ies. The most frequent answer wins, and the beforehand from the question, using Question Marker, Named, Non-spe i and Extended Entities o-o urren es (_Qwho _pers or _pers_def or _org). Therefore, we sele t the → entities in the snippets with the expe ted type of the answer. At last, a lustering of the distribution of the ounts gives an idea of the onden e of the system in the answer.

The ba k-o queries lists require a large amount of maintenan e work and will never over • all of the ombinations of entities whi h may be found in the questions. rank andidate answers with the same s ore.

The answer sele tion uses only frequen ies of o urren e, often ending up with lists of rst- • may sometimes be very large. To limit the number of snippets is not easy, as they are not ranked a ording to pertinen e.

The system answering speed dire tly depends on the number of snippets to retrieve whi h •

Re all 22.6% 32.2% 57.1% 28.5% 41.6% 43.8% 28.5% 22.6% is 5; Passage without limit there is no limit for the passage number; A . is the a ura y, MRR is the Mean Re ipro al Rank and Re all the total number of orre t answers in the returned answers Table 2: Results for Passage Retrieval for System 2. Passage 5 the maximum of passage number 30.2% 0.38 68.8% 29.6% 0.37 57.0% 44.9% 0.53 71.4% 18.3% 0.24 51.6% A . MRR Re all Passage without limit Sys2 Sys1 Sys1 Sys2 Sys2 Sys2 Sys1 Sys1 System 30.2% 0.37 47.9% 18.3% 0.22 31.2% 44.9% 0.52 67.3% A . MRR Re all Passage limit = 5 29.6% 0.36 46.9%

Re ipro al Rank and Re all the total number of orre t answers in the 5 returned answers

Table 1: General Results. Sys1 System 1; Sys2 System 2; A . is the a ura y, MRR is the Mean T4 Task T1 T2 T3 of do ument/snippet queries greatly improves the overage as ompared to hand rafted rules. System 2 did not perform better than System 1 on the T2 task. Further analysis is needed to understand why. answer has a reasonnable margin for improvement. The dieren e between the snippet Re all and passage retrieval in two onditions: with a limitation of the number of passages at 5 and without its A ura y (from 26 to 38% for the no limit ondition) illustrates that the snippet s oring an limitation. The diferen e between the Re all on the snippets (how often the answer is present be improved. answer extra tion. The passage retrieval is easier to evaluate for System 2 be ause it is a omplete in the sele ted snippets) and the QA A ura y show that the extra tion and the s oring of the separate module, whi h is not the ase in the System 1. The Table 2 give the results on the The dieren t modules we an evaluate are the analysis module, the passage retrieval and the

We observed large dieren es with the results obtained on the development data, in parti u

larly with the method, olor and time ategories. The analysis module has been built on orpus lass sele ts spe i entities (method, models, system, language...) over the other entity types for One of the key uses of the analysis results is routing the question whi h is determining a rough absen e of major dieren es between System 1 and System 2 for the T1/T2 tasks. Most of the entity types have equal priority. observations and it seems to be too dependant on the development data. That an explain the were not routed. given in Table 3 with details by answer ategory. Two questions of T1/T2 and three of T3/T4 the possible answers. In System 2 no su h adaptation to the task has been done and all possible lass for the type of the answer (language, lo ation, ...). The results of the routing omponent are wrongly routed questions have been routed to the generi answer type lass. In System 1 this 98 72% All 95% ORG 20 6 50% 73% 11 T1/T2 Referen es T3/T4 2. More pertinent answer s oring using proximities whi h allows a smoothing of the results. T3/T4 T1/T2 7 A knowledgments 6 Con lusion and future work 80% 96

Two dieren t systems have been used for this parti ipation. The two main hanges between

The results show that the System 2 outperforms the System 1. The main reasons are: System 1 and System 2 are the repla ement of the large set of hand made rules by the automati generation of a resear h des riptor, and the addition of an e ien t s oring of the andidate answers. We presented the Question Answering systems used for our parti ipation to the QAst evaluation. MAT 93% 14 12 83% % Corre t # Questions 28 MEA 75% 2 100% question and do ument types. 3. Presen e of various tuning parameters whi h enable the adaption of the system to the various [1℄ E. M. Voorhees, L. P. Bu kland. The Fifteenth Text REtrieval Conferen e Pro eedings (TREC 2006), In Voorhees and Bu kland eds. 2006. 100% 4 LAN SHA 13 85% % Corre t # Questions 89% 9 % Corre t # Questions

MET: method/system; ORG: organization; PER: person; TIM: time; SHAP: shape; COL: olour.

Table 3: Routing evaluation. All: all questions; LAN: language; LOC: lo ation; MEA: measure; 10 TIM 80% 9 89% LOC handle spee h re ognition errors. The best result is 18.3% on meeting and 21.3% on le tures. From meetings, 24% for A ura y. There was no spe i eort done on the automati ally trans ribed (like the weight of the transformations generated in the DDR) have not been optimized yet. le tures and meetings, so the performan es only give an idea of what an be done without trying to some type of questions whi h should improve the answer typing and extra tion. The s oring of the the analysis presented in the previous se tion, performan e an be improved at every step. For snippets and the andidate answers an also be improved. In parti ular some tuning parameters example, the analysis and routing omponent an be improved in order to better take into a ount These systems have been evaluated on dieren t data orresponding to dieren t tasks. On the manually trans ribed le tures, the best result is 39% for A ura y, on manually trans ribed 89% PER 9 COL % Corre t # Questions 80% 15 71% 14

This work was partially funded by the European Commission under the FP6 Integrated Proje t

IP 506909 Chil and the LIMSI AI/ASP Ritel grant. 17% 18 MET [9℄ AMI proje t. http://www.amiproje t.org [7℄ CHIL Proje t. http:// hil.server.de [3℄ C. Aya he, B. Grau, A. Vilnat. Evaluation of question-answering systems : The Fren h EQueREVALDA Evaluation Campaign. Pro eedings of LREC’06, Genoa, Italy. [16℄ S. Sekine. Denition, di tionaries and tagger of Extended Named Entity hierar hy. Pro eedings of LREC’04, Lisbon, Portugal. 2004.