=Paper= {{Paper |id=Vol-1173/CLEF2007wn-QACLEF-RossetEt2007 |storemode=property |title=The LIMSI Participation in the QAst Track |pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-QACLEF-RossetEt2007.pdf |volume=Vol-1173 |dblpUrl=https://dblp.org/rec/conf/clef/RossetGAB07a }} ==The LIMSI Participation in the QAst Track== https://ceur-ws.org/Vol-1173/CLEF2007wn-QACLEF-RossetEt2007.pdf
           The LIMSI parti ipation to the QAst tra k
                     Sophie Rosset, Olivier Galibert, Gilles Adda, Eri    Bilinski

      Spoken Language Pro essing Group, LIMSI-CNRS, B.P. 133, 91403 Orsay            edex, Fran e

                                {firstname.lastname}limsi.fr


                                             Abstra t
       In this paper, we present twe two dierent question-answering systems on spee h tran-
       s ripts whi h parti ipated to the QAst 2007 evaluation. These two systems are based
       on a   omplete and multi-level analysis of both queries and do uments. The rst sys-
       tem uses hand rafted rules for small text fragments (snippet) sele tion and answer
       extra tion. The se ond one repla es the hand rafting with an automati ally generated
       resear h des riptor. A s ore based on those des riptors is used to sele t do uments
       and snippets. The extra tion and s oring of      andidate answers is based on proximity
       measurements within the resear h des riptor elements and a number of se ondary fa -
       tors. The evaluation results are ranged from 17% to 39% as a        ura y depending on
       the tasks.



Categories and Subje t Des riptors

H.3 [Information Storage and Retrieval℄: H.3.1 Content Analysis and Indexing; H.3.3 Information
Sear h and Retrieval; H.3.4 Systems and Software



General Terms

Measurement, Performan e, Experimentation



Keywords

Question answering, spee h trans riptions of meeting and le tures



1      Introdu tion

In the QA and Information Retrieval domains progress has been demonstrated via evaluation
 ampaigns for both open domain and limited domains [1, 2, 3℄.           In these evaluations systems
are presented with independent questions and should provide one answer extra ted from textual
data to ea h question. Re ently, there has been growing interest in extra ting information from
multimedia data su h as meetings, le tures... Spoken data is dierent from textual data in various
ways. The grammati al stru ture of spontaneous spee h is quite dierent from written dis ourse
and in lude various types of disuen ies. The le ture and intera tive meeting data provided in
QAst evaluation are parti ularly di ult due to run-on senten es and interruptions. Most of the
QA systems use a     omplete and heavy synta ti       and semanti   analysis of both the question and
the do ument or snippets given by sear h engine in whi h the answer has to be found.             Su h
analysis   an't reliably be performed on the data we are interested in. Typi al textual QA systems
are   omposed of question analysis, information retrieval and answer extra tion      omponents [1, 4℄.
The answer extra tion    omponent is quite   omplex and involves natural language analysis, pattern
mat hing and sometimes even logi al inferen e [5℄. Most of these natural language tools are not
designed to handle spoken phenomena.



                                                  1
In this paper, we present the ar hite ture of the two QA systems developed in LIMSI for the QAst
evaluation. Our QA systems are part of an intera tive and bilingual (English and Fren h) QA
system   alled Ritel [6℄ whi h spe i ally addressed speed issues. The following se tions present the
do uments and queries pre-pro essing and the non- ontextual analysis whi h are          ommon to both
systems. The se tion 3 des ribes the older system (System 1). Se tion 4 presents the new system
(System 2). Se tion 5 nally presents the results for these two systems on both development and
test data.



2     Analysis of do uments and queries

Usually, the synta ti /semanti      analysis is dierent for the do ument and for the query; our
approa h is to perform the same     omplete and multilevel analysis on both queries and do uments.
There are several reasons for this. First of all, the system has to deal with both trans ribed spee h
(trans riptions of meetings and le tures, user utteran es) and text do uments, so there should be a
 ommon analysis that takes into a      ount the spe i ities of both data types. Moreover, in orre t
analysis due to the la k of      ontext or limitations of hand- oded rules are likely to happen on
both data types, so using the same strategy for do ument and utteran e analysis helps to redu e
their negative impa t. In order to use the same analysis module for all kinds of data, we should
transform the query and the do uments, whi h may         ome from dierent modality (text, manual
trans ripts, automati    trans ripts) in order to have a       ommon representation of the senten e,
word, et . This pro ess is the normalization.



2.1      Normalization

Normalization, in our appli ation, is the pro ess by whi h      raw texts are onverted to a text form
where words and numbers are unambiguously delimited, pun tuation is separated from words,
and the text is split into senten e-like segments (or as       lose to senten es as is reasonably possi-
ble). Dierent normalization steps are applied, depending of the kind of input data; these steps are:



    1. Separating words and numbers from pun tuation.

    2. Re onstru ting   orre t   ase for the words.

    3. Adding pun tuation.

    4. Splitting into senten es at period marks.

In the QAst evaluation, four data types are of interest:

    • CHIL le tures [7℄ with manual trans riptions, where manual pun tuations are separated from
      words. Only the splitting step is needed.

    • CHIL le tures with automati      trans riptions [8℄. Requires adding pun tuation and splitting.

    • AMI meetings [9℄ manual trans riptions. The trans riptions had been textied, with pun -
      tuation joined to the words, rst words senten es upper- ased, et . Requires all the steps
      ex ept adding pun tuation.

    • AMI meetings with automati        trans riptions [10℄.    La king   ase, they required the last 3
      steps.

Re onstru ting the   ase and adding pun tuation is done in the same pro ess based on using a fully-
 ased, pun tuated language model [11℄. A word graph was built          overing all the possible variants
(all possible pun tuations added between words, all possible word         ases), and a 4-gram language
model was used to sele t the most probable hypothesis. The language model was estimated on
House of Commons Daily Debates, nal edition of the European Parliament Pro eedings and
various newspapers ar hives. The nal result, with upper ase only on proper nouns and words
 learly separated by white-spa es, is then passed to the non- ontextual analysis.
2.2     Non       ontextual analysis module

The analysis is     onsidered    non- ontextual be ause ea h senten e is pro essed in isolation. The
general obje tive of this analysis is to nd the bits of information that may be of use for sear h
and extra tion, whi h we        all   pertinent information hunks. These an be of dierent ategories:
named entities, linguisti   entities (e.g. verbs, prepositions), or spe i     entities (e.g. s ores). All
words that do not fall into su h          hunks are automati ally grouped into     hunks via a longest-
mat h strategy.     Some examples of pertinent information         hunks are given in Figure 1.      In the
following se tions, the types of entities handled by the system are des ribed, along with how they
are re ognized.



     _prep in _org NIST _NN metadata evaluations _verb reported _NN speaker tra king
     _s ore error rates _aux are _prep about _val_s ore 15 %
        Figure 1: Examples of pertinent information         hunks from the CHIL data     olle tion




2.2.1 Denition of Entities
Following    ommonly adopted denitions, the named entities are expressions that denote lo ations,
people,    ompanies, times, and monetary amounts.           These entities have    ommonly known and
a    epted names. For example if the        ountry Fran e is a named entity,  apital of Fran e is not a
named entity. However our experien e is that the information present in the named entities is not
su ient to analyze the wide range of user utteran es that          an be found in le tures or meetings
trans ripts. Therefore we dened a set of spe i entities in order to olle t all observed information
expressions      ontained in a    orpus questions and texts from a variety of sour es (pro eedings,
trans ripts of le tures, dialogs et .). Figure 2 summarizes the dierent entity types that are used.


    Type of entities Examples
            lassi al     pers: Romano Prodi ; Winston Chur hill
      named entities     prod: Pulp Fi tion ; Titani
                         time: third entury ; 1998 ; June 30th
                         org: European Commission ; NATO
                         lo : Cambridge ; England
           extended      method: HMM, Gaussian mixture model
      named entities     event: the 9th onferen e on spee h ommuni ation and te hnology
                         amount: 500 ; two hundred and fty thousand
                         measure: year ; mile ; Hertz
                          olor red, spring green
    question markers     Qpers: who wrote... ; who dire ted Titani
                         Qlo : where is IBM
                         Qmeasure: what is the weight of the blue spoon headset
     linguisti    hunk    ompound: language pro essing ; information te hnology
                         verb: Roberto Martinez now knows the full size of the task
                         adj_ omp: the mi rophones would be similar to ...
                         adj_sup: the biggest produ er of o oa of the world

                             Figure 2: Examples of the main entity types




2.2.2 Automati dete tion of typed entities
The types we need to dete t           orrespond to two levels of analysis: named-entity re ognition and
 hunk-based shallow parsing. Various strategies for named-entity re ognition using ma hine learn-
ing te hniques have been proposed [12, 13, 14℄.        In these approa hes, a statisti ally pertinent
 overage of all dened types and subtypes indu ed the need of a large number of o             urren es,
and therefore rely on the availability of large annotated    orpora whi h are di ult to build. Rule-
based approa hes to named-entity re ognition (e.g. [15℄) rely on morphosynta ti       and/or synta ti
analysis of the do uments. However, in the present work, performing this sort of analysis is not
feasible: the spee h trans riptions are too noisy to allow for both a      urate and robust linguisti
analysis based on typi al rules and the pro essing time of most of existing linguisti      analyzers is
not      ompatible with the high speed we require.
We de ided to ta kle the problem with rules based on regular expressions on words as in other
works [16℄: we allow the use of lists for initial dete tion, and the denition of lo al    ontexts and
simple     ategorizations. The tool used to implement the rule-based automati     annotation system is
 alled Wmat     h. This engine mat hes (and substitutes) regular expressions using words as the base
unit instead of hara ters. This property allows for a more readable syntax than traditional regular
expressions and enables the use of     lasses (lists of words) and ma ros (sub-expressions in-line in
a larger expression).    Wmat h in ludes also NLP-oriented features like strategies for prioritizing
rule appli ation, re ursive substitution modes, word tagging (for tags like noun, verb...), word
 ategories (number, a ronym, proper name...). It has multiple input and output formats, in luding
an XML-based one for interoperability and to allow       haining of instan es of the tool with dierent
rule sets. Rules are pre-analyzed and optimized in several ways, and stored in       ompa t format in
order to speed up the pro ess. Analysis is multi-pass, and subsequent rule appli ations operate
on the results of previous rule appli ations whi h      an be enri hed or modied. The full analysis
 omprises some 50 steps and takes roughly 4 ms on a typi al user utteran e (or do ument senten e).
The analysis provides 96 dierent types of entities. Figure 3 shows an example of the analysis on
a query (top) and on a trans ription (bottom).


 <_Qorg> whi h organization  <_a tion> provided 
 <_det> a  <_NN> signi ant amount 
 <_prep> of  <_NN> training data  <_pun t> ? 

 <_pro> it  <_verb> 's  <_adv> just 
 <_prep_ omp> sort of  <_det> a 
 <_NN> very pale  <_ olor> blue  <_ onj> and 
 <_det> a  <_adj> light-up  <_ olor> yellow 
 <_pun t> . 

Figure 3: Example annotation of a query:     whi h organization provided a signi ant amount of
training data ? (top) and of a trans ription it's just sort of a very pale blue (bottom).


3        Question-Answering System 1

The      Question-Answering system handles sear h in do uments of any types (news arti les, web
do uments, trans ribed broad ast news, et .). For speed reasons, the do uments are all available
lo ally and prepro essed: they are rst normalized, and then analyzed with the NCA module.
The (type, values) pairs are then managed by a spe ialized indexer for qui k sear h and retrieval.
This somewhat bag-of-typed-words system [6℄ works in three steps:




    1.   Do ument query lists reation. Using the entities found in the question, we generate
         a do ument query, and a ordered list of hand rafted ba k-o queries.        These queries are
         obtained by relaxing some of the   onstraints on the presen e of the entities, using a relative
         importan e ordering (Named entity > NN > adj_ omp > a tion > subs ...)
    2.   Snippet retrieval: we submit ea h query, a ording to their rank, to the indexation server,
         and stop as soon as we get do ument snippets (senten e or small groups of               onse utive
         senten es) ba k.

    3.   Answer extra tion and sele tion: the dete tion of the answer type has been extra ted
         beforehand from the question, using Question Marker, Named, Non-spe i              and Extended
         Entities     o-o   urren es (_Qwho → _pers or _pers_def or _org). Therefore, we sele t the
         entities in the snippets with the expe ted type of the answer. At last, a          lustering of the
          andidate answers is done, based on frequen ies. The most frequent answer wins, and the
         distribution of the    ounts gives an idea of the    onden e of the system in the answer.



4        Question-Answering System 2

System 1 has three main problems:

    • The ba k-o queries lists require a large amount of maintenan e work and will never              over
         all of the    ombinations of entities whi h may be found in the questions.

    • The answer sele tion uses only frequen ies of o          urren e, often ending up with lists of rst-
         rank    andidate answers with the same s ore.

    • The system answering speed dire tly depends on the number of snippets to retrieve whi h
         may sometimes be very large. To limit the number of snippets is not easy, as they are not
         ranked a      ording to pertinen e.

A new system, System 2 has been designed to solve these problems. We have kept the three steps
des ribed in se tion 3, with some major          hanges.     In step 1, instead of instantiating do ument
queries from a large number of preexisting hand rafted rules (about 5000), we generate a resear h
des riptor using a very small set of rules (about 10); this des riptor             ontains all the needed
information about the entities and the answer types, together with weights. In step 2, a s ore is
 al ulated from the proximity between the resear h des riptor and the do ument and snippets, in
order to     hoose the most relevant ones. In step 3, the answer is sele ted a     ording to a s ore whi h
takes into a        ount many dierent features and tuning parameters, whi h allow an automati          and
e ient adaptation.



4.1       Resear h Des riptor generation

The rst step of System 2 is to build a resear h des riptor (data des riptor re ord, DDR) whi h
 ontains the important elements of the question, and the possible answer types with asso iated
weight. Some elements are marked as        riti al, whi h makes them mandatory in future steps, while
others are      se ondary. The element extra tion and weighting is based on a empiri al lassi ation
of the element types in importan e levels.         Answer types are predi ted through rules based on
 ombinations of elements of the question. The Figure 4 shows an example of a DDR.



4.2       Do uments and snippets sele tion and s oring

Ea h of the do ument is s ored with geometri          mean of the number of o     urren es of all the DDR
elements whi h appear in it.         Using a geometri    mean prevents from res aling problems due to
some elements being naturally more frequent. The do uments are sorted by s ore and the                n-best
ones are kept. The speed of the entire system         an be    ontrolled by   hoosing   n, the whole system
being in pra ti e io-bound rather than         pu-bound.
The sele ted do uments are then loaded and all the lines in a predened window (2-10 lines
depending on question types) from the          riti al elements are kept,   reating snippets. Ea h snippet
is s ored using the geometri al mean of the number of o           urren es of all the DDR elements whi h
appear in the snippet, smoothed with the do ument s ore.
                     {
                     question: in whi h ompany Bart works as a proje t manager ?
                     ddr:
                     { w=1, riti al, pers, Bart},
                     { w=1, riti al, NN, proje t manager },
                     { w=1, se ondary, a tion, works },
                     answer_type = {
                       { w=1.0, type=orgof },
                       { w=1.0, type=organisation },
                       { w=0.3, type=lo },
                       { w=0.1, type=a ronym },
                       { w=0.1, type=np },
                     }

Figure 4: Example of a DDR    onstru ted from the question in whi h ompany Bart works as a
proje t manager; ea h element ontains a weight w, their importan e for future steps, and the pair
(type,value); ea h possible answer type         ontains a weight w and the type of the answer.



4.3      Answer extra tion, s oring and                    lustering

In ea h snippet all the elements whi h type is one of the predi ted possible answer types are
 andidate answers. We asso iate to ea h          andidate answer A a s ore S(A):
                                            P                w(E)     1−γ    γ
                                    [w(A)       E maxe=E (1+d(e,A))α ]    × Ssnip
                           S(A) =                                                                    (1)
                                                    Cd (A)β Cs (A)δ
In whi h:

      • d(e, A) is the distan e to ea h element e of the snippet, instantiating a sear h element E of
the DDR

      • Cs is the number of o     urren es of A in the extra ted snippets, Cd in the whole do ument
 olle tion

    • Ssnip is the extra ted snippet s ore (see 4.2)
    • w(A) is the weight of the answer type and w(E) the weight of the element E in the DDR
    • α, β , γ and δ are tuning parameters estimated by systemati trials on the development data.
α, β, γ ∈ [0, 1] and δ ∈ [−1, 1]
An intuitive explanation of the formula is that ea h element of the DDR adds to the s ore of the
              P
 andidate (    E ) proportionally to its weight (w(E)) and inversely proportionally to its distan e of
the    andidate(d(e, A)). If multiple instan e of the element are found in the snippet only the best
one is kept (maxe=E ). The s ore is then smoothed with the snippet s ore (Ssnip ) and      ompensated
in part with the      andidate frequen y in all the do uments (Cd ) and in the snippets (Cs ).
The s ores for identi al (type,value) pairs are added together and give the nal s oring for all the
possible     andidate answers.



5      Evaluation

In this se tion, we present the results obtained in the four tasks. T1 and T2 tasks were         omposed
of an identi al set of 98 questions; T3 task was       omposed of a dierent set of 96 questions and T4
task of a subset of 93 questions. Table 1 show the overall results with the 3 measures used in this
evaluation. We submitted two runs, one for ea h system, for ea h of the four tasks. As required
by the evaluation pro edure, a maximum of 5 answers per question was provided.
      Globally, we    an see that System 2 gets better results than System 1. The improvement of the
Re all (9-11%) observed on T1, and T3 tasks for System 2 illustrates that automati           generation
                                 Task        System    A    .    MRR       Re all
                                 T1          Sys1      32.6%     0.37      43.8%
                                             Sys2      39.7%     0.46      57.1%
                                 T2          Sys1      20.4%     0.23      28.5%
                                             Sys2      21.4%     0.24      28.5%
                                 T3          Sys1      26.0%     0.28      32.2%
                                             Sys2      26.0%     0.31      41.6%
                                 T4          Sys1      18.3%     0.19      22.6%
                                             Sys2      17.2%     0.19      22.6%


Table 1: General Results.      Sys1 System 1; Sys2 System 2; A . is the a              ura y, MRR is the Mean
Re ipro al Rank and Re all the total number of               orre t answers in the 5 returned answers



of do ument/snippet queries greatly improves the                overage as    ompared to hand rafted rules.
System 2 did not perform better than System 1 on the T2 task. Further analysis is needed to
understand why.
    The dierent modules we         an evaluate are the analysis module, the passage retrieval and the
answer extra tion. The passage retrieval is easier to evaluate for System 2 be ause it is a             omplete
separate module, whi h is not the             ase in the System 1.       The Table 2 give the results on the
passage retrieval in two      onditions: with a limitation of the number of passages at 5 and without
limitation.    The diferen e between the Re all on the snippets (how often the answer is present
in the sele ted snippets) and the QA A              ura y show that the extra tion and the s oring of the
answer has a reasonnable margin for improvement. The dieren e between the snippet Re all and
its A     ura y (from 26 to 38% for the no limit           ondition) illustrates that the snippet s oring    an
be improved.


                                    Passage limit = 5            Passage without limit
                       Task     A     .      MRR      Re all    A   .      MRR      Re all
                       T1       44.9%        0.52     67.3%     44.9%      0.53     71.4%
                       T2       29.6%        0.36     46.9%     29.6%      0.37     57.0%
                       T3       30.2%        0.37     47.9%     30.2%      0.38     68.8%
                       T4       18.3%        0.22     31.2%     18.3%      0.24     51.6%


Table 2: Results for Passage Retrieval for System 2.      Passage 5 the maximum of passage number
is 5;   Passage without limit there is no limit for the passage number; A . is the a ura y, MRR is
the Mean Re ipro al Rank and Re all the total number of                 orre t answers in the returned answers


    One of the key uses of the analysis results is routing the question whi h is determining a rough
 lass for the type of the answer (        language, lo ation, ...). The results of the routing omponent are
given in Table 3 with details by answer             ategory. Two questions of T1/T2 and three of T3/T4
were not routed.
    We observed large dieren es with the results obtained on the development data, in parti u-
larly with the   method, olor and time           ategories. The analysis module has been built on         orpus
observations and it seems to be too dependant on the development data. That                       an explain the
absen e of major dieren es between System 1 and System 2 for the T1/T2 tasks. Most of the
wrongly routed questions have been routed to the generi                  answer type    lass.   In System 1 this
 lass sele ts spe i   entities ( method, models, system, language...) over the other entity types for
the possible answers. In System 2 no su h adaptation to the task has been done and all possible
entity types have equal priority.
                                    All      LAN      LOC      MEA       MET      ORG     PER
                    % Corre t       72%      100%     89%      75%       17%      95%     89%
          T1/T2
                    # Questions     98       4        9        28        18       20      9

                    % Corre t       80%      100%     93%      83%       -        85%     80%
          T3/T4
                    # Questions     96       2        14       12        -        13      15



                                    TIM      SHA      COL      MAT
                    % Corre t       80%      -        -        -
          T1/T2
                    # Questions     10       -        -        -

                    % Corre t       71%      89%      73%      50%
          T3/T4
                    # Questions     14       9        11       6


Table 3: Routing evaluation. All: all questions; LAN: language; LOC: lo ation; MEA: measure;
MET: method/system; ORG: organization; PER: person; TIM: time; SHAP: shape; COL: olour.

6     Con lusion and future work

We presented the Question Answering systems used for our parti ipation to the QAst evaluation.
Two dierent systems have been used for this parti ipation.             The two main      hanges between
System 1 and System 2 are the repla ement of the large set of hand made rules by the automati
generation of a resear h des riptor, and the addition of an e ient s oring of the andidate answers.
The results show that the System 2 outperforms the System 1. The main reasons are:


    1. Better generi ity through the use of a kind of expert system to generate the resear h de-
      s riptors.

    2. More pertinent answer s oring using proximities whi h allows a smoothing of the results.

    3. Presen e of various tuning parameters whi h enable the adaption of the system to the various
      question and do ument types.


    These systems have been evaluated on dierent data              orresponding to dierent tasks.    On
the manually trans ribed le tures, the best result is 39% for A          ura y, on manually trans ribed
meetings, 24% for A    ura y. There was no spe i          eort done on the automati ally trans ribed
le tures and meetings, so the performan es only give an idea of what          an be done without trying to
handle spee h re ognition errors. The best result is 18.3% on meeting and 21.3% on le tures. From
the analysis presented in the previous se tion, performan e          an be improved at every step. For
example, the analysis and routing   omponent       an be improved in order to better take into a      ount
some type of questions whi h should improve the answer typing and extra tion. The s oring of the
snippets and the   andidate answers      an also be improved. In parti ular some tuning parameters
(like the weight of the transformations generated in the DDR) have not been optimized yet.



7     A knowledgments

This work was partially funded by the European Commission under the FP6 Integrated Proje t
IP 506909   Chil and the LIMSI AI/ASP Ritel grant.

Referen es

[1℄ E. M. Voorhees, L. P. Bu kland. The Fifteenth Text REtrieval Conferen e Pro eedings (TREC
    2006), In Voorhees and Bu kland eds. 2006.
[2℄ B. Magnini, D. Giampi    olo, P. Former, C. Aya he, P. Osenova, A. Penas, V. Jijkown, B.
   Sa aleanu, P. Ro ha, R. Sut lie. Overview of the CLEF 2006 Multilingual Question Answering
   Tra k. Working Notes for the CLEF 2006 Workshop. 2006.

[3℄ C. Aya he, B. Grau, A. Vilnat. Evaluation of question-answering systems : The Fren h EQueR-
   EVALDA Evaluation Campaign. Pro eedings of LREC'06, Genoa, Italy.

[4℄ S. Harabagiu and D. Moldovan. Question-Answering. In    The Oxford Handbook of Computa-
   tional Linguisti s. R. Mitkov (Eds). Oxford University Press. 2003.
[5℄ S. Harabagiu, A. Hi kl. Methods for using textual entailment in Open-Domain question-
   answering. Pro eedings of COLING'06. Sydney, Australia. July 2006.

[6℄ B. van S hooten, S. Rosset, O. Galibert, A. Max, R. op den Akker, G. Illouz. Handling spee h
   input in the Ritel QA dialogue system. 2007. Pro eedings of Interspee h'07. Antwerp. Belgium.
   August 2007.

[7℄ CHIL Proje t. http:// hil.server.de

[8℄ L. Lamel, G. Adda, E. Bilinski, and J.-L. Gauvain. Trans ribing Le tures and Seminars. In
   InterSpee h, Lisbon, September 2005.

[9℄ AMI proje t. http://www.amiproje t.org

[10℄ T. Hain, L. Burget, J. Dines, G. Garau, M. Karaat, M. Lin oln, J. Vepa, and V. Wan.
   The AMI Meeting Trans ription System: Progress and Performan e. Ri h Trans ription 2006
   Spring (RT06s) Meeting Re ognition Evaluation. 3 May 2006, Bethesda, Maryland, USA.

[11℄ D. Dé helotte, H. S hwenk, G. Adda, J.-L. Gauvain. Improved Ma hine Translation of Spee h-
   to-Text outputs. 2007. Pro eedings of Interspee h'07. Antwerp. Belgium. August 2007.

[12℄ D.M. Bikel, S. Miller, R. S hwartz, R. Weis hedel. Nymble:     a high-performan e learning
   name-nder. Pro eedings of ANLP'97, Washington, USA, 1997.

[13℄ H. Isozaki, H. Kazawa, E ient Support Ve tor Classiers for Named Entity Re ognition.
   Pro eedings of COLING, Taipei. 2002.

[14℄ M. Surdeanu, J. Turmo, E. Comelles. Named Entity Re ognition from spontaneous Open-
   Domain Spee h. Pro eedings of InterSpee h'05, Lisbon, Portugal. 2005.

[15℄ F. Wolinski, F. Vi hot, B. Dillet. Automati Pro essing of Proper Names in Texts. Pro eedings
   of EACL'95, Dublin, Ireland. 1995.

[16℄ S. Sekine. Denition, di tionaries and tagger of Extended Named Entity hierar hy. Pro eed-
   ings of LREC'04, Lisbon, Portugal. 2004.