APPENDIX B


Results of the Multiple Language Question Answering Track
                                     (QA@CLEF)

                                     Prepared by:
                            Alessandro Vallin† and Jesús Herrera‡
                             †
                               ITC-Irst, Trento, Italy, vallin@itc.it
‡
    Dpto. Lenguajes y Sistemas Informáticos, UNED, Madrid, Spain, jesus.herrera@lsi.uned.es
                                           List of Run Characteristics
The following table lists the 18 participating teams and the 49 runs submitted at the CLEF-2004 Question
Answering Track.

                                                                TASKS

         GROUP               COUNTRY                 monolingual                                    RUNS
                                                                                        pilot
                                                                            bilingual
                                                                                         ES
                                           DE   ES   FR    IT   NL     PT
  Bulgarian Academy of
                              Bulgaria                                      BG=>EN               bgas041bgen
        Sciences
      U. Da Coruna             Spain            ●                                                 cole041eses
    DAEDALUS (*)               Spain            ●                                                 mira041eses
                                                                                                 dfki041dede
           DFKI              Germany        ●                               DE=>EN
                                                                                                 dfki041deen
        U.Helsinki            Finland                                       FI=>EN                hels041fien
                                                                                                 edin041deen
                                                                            DE=>EN               edin042deen
       U.Edinburgh              UK
                                                                            FR=>EN                edin041fren
                                                                                                  edin042fren
          ILC-CNR               Italy                      ●                                    ILCP-QA-ITIT
   Inst. Nac. Astrofisica,                                                                        inao041eses
                              Mexico            ●
     Optica y Electron.                                                                           inao042eses
                                                                                                    irst041itit
         ITC-irst,                                                                                  irst042itit
                                Italy                      ●                IT=>EN
       TCC Division                                                                                irst041iten
                                                                                                   irst042iten
                                                                                                  sfnx041ptpt
    Linguateca, Sintef        Norway                                    ●
                                                                                                  sfnx042ptpt
                                                                                                   lire041fren
       LIMSI-CNRS              France                                       FR=>EN
                                                                                                   lire042fren
                                                                                                  talp041eses
  U.Politecnica Catalunya      Spain            ●
                                                                                                  talp042eses
                                                                                                  aliv041eses
        U. Alicante            Spain            ●                                        ●        aliv042eses
                                                                                                     alivpilot
                                                                                                 uams041nlnl
      U. Amsterdam           Netherlands                           ●        EN=>NL               uams042nlnl
                                                                                                 uams041ennl
         U.Evora              Portugal                                  ●                        PTUE041ptpt
         U.Hagen              Germany       ●                                                   FUHA041dede
                                                                                                  dltg041fren
        U.Limerick             Ireland                                      FR=>EN
                                                                                                  dltg042fren
                                                                                                   gine041frfr
                                                                                                   gine042frfr
                                                                                                  gine041bgfr
                                                                                                  gine042bgfr
                                                                                                  gine041defr
                                                                            BG=>FR
                                                                                                  gine042defr
                                                                            DE=>FR
                                                                                                  gine041enfr
                                                                            EN=>FR
                                                                                                  gine042enfr
       U.Neuchatel           Switzerland              ●                     ES=>FR
                                                                                                  gine041esfr
                                                                            IT=>FR
                                                                                                  gine042esfr
                                                                            NL=>FR
                                                                                                   gine041itfr
                                                                            PT=>FR
                                                                                                   gine042itfr
                                                                                                  gine041nlfr
                                                                                                  gine042nlfr
                                                                                                  gine041ptfr
                                                                                                  gine042ptfr

(*) The DAEDALUS group submitted the results after the scheduled deadline.
                                            Results for Main Tasks

  In the following six pages the results for the main QA tasks are given. They are divided according to target
  languages, so that there is a separate table per language. Several tasks can be grouped in the same target
  language.

  Each table provides the following information:
  - the name of the submitted run;
  - the task in which the group participated;
  - the number of answers contained in each submission (divided into Right, Wrong, ineXact and
    Unsupported). In all the tasks there were 200 questions and systems were allowed to return just one response
    per question. Nevertheless, some runs count less than 200 answers, because some questions that contained
    mistakes were discarded;
  - the overall accuracy of each run (i.e. the percentage of Right answers);
  - the accuracy over the Factoid questions;
  - the accuracy over the Definition questions (test sets contained around 20 of them);
  - the systems’ Precision and Recall in recognising the questions that did not have any answer (the correct
    answer-string was “NIL”);
- the Confidence-weighted Score, which takes into account the systems’ ability to rank the answers according
    to confidence. This additional measure ranges between 0 (no correct response at all) and 1 (all the answers are
    correct and the system is always confident about them). Since the confidence value was not mandatory, the
    Confidence-weighted Score was not computed for all the runs.
German (DE) as target language:


                                                                           Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                           #       #        #        #          #
  Run Name      Task                                                      Accuracy    over F     over D                          weighted
                        Answers   Right   Wrong   ineXact   Unsupported
                                                                             %          %          %                              Score
                                                                                                           Precision   Recall
 dfki041dede   DE=>DE    197       50      143      1           3           25.3      28.25        0         0.13      0.85         /

FUHA041dede    DE=>DE    197       67      128      2           0           34        31.64       55         0.13        1        0.333
English (EN) as target language:


                                                                           Overall   Accuracy   Accuracy     NIL Accuracy       Confidence
                           #       #        #        #          #
 Run Name       Task                                                      Accuracy    over F     over D                          weighted
                        Answers   Right   Wrong   ineXact   Unsupported
                                                                             %          %          %                              Score
                                                                                                           Precision   Recall

bgas041bgen    BG=>EN     200      26      168      5           1           13         11.6       25         0.13       0.4       0.056

dfki041deen    DE=>EN     200      47      151      0           2           23.5       23.8       20         0.1       0.75       0.177

 dltg041fren   FR=>EN     200      38      155      7           0           19         17.7       30         0.17      0.55         /

 dltg042fren   FR=>EN     200      29      164      7           0           14.5       12.7       30         0.14      0.45         /

edin041deen    DE=>EN     200      28      166      5           1           14         13.3       20         0.14      0.35       0.049

 edin041fren   FR=>EN     200      33      161      6           0           16.5       17.7        5         0.15      0.55       0.056

edin042deen    DE=>EN     200      34      159      7           0           17         16.1       25         0.14      0.35       0.052

 edin042fren   FR=>EN     200      40      153      7           0           20         20.5       15         0.15      0.55       0.058

 hels041fien   FI=>EN     193      21      171      1           0           10.8       11.5        5         0.1       0.85       0.046

 irst041iten   IT=>EN     200      45      146      6           3           22.5       22.2       25         0.24       0.3       0.121

 irst042iten   IT=>EN     200      35      158      5           2           17.5       16.6       25         0.24       0.3       0.075

 lire041fren   FR=>EN     200      22      172      6           0           11         10         20         0.05      0.05       0.032

 lire042fren   FR=>EN     200      39      155      6           0           19.5       20         15          0          0        0.075
Spanish (ES) as target language:


                                                                           Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                           #       #        #        #          #
 Run Name       Task                                                      Accuracy    over F     over D                          weighted
                        Answers   Right   Wrong   ineXact   Unsupported
                                                                             %          %          %                              Score
                                                                                                           Precision   Recall

 aliv041eses   ES=>ES     200      63      130      5           2           31.5       30.5       40         0.17      0.35       0.121

 aliv042eses   ES=>ES     200      65      129      4           2           32.5       31.1       45         0.17      0.35       0.144

 cole041eses   ES=>ES     200      22      178      0           0           11         11.6        5         0.1         1          /

 inao041eses   ES=>ES     200      45      145      5           5           22.5      19.44       50         0.19       0.5         /

 inao042eses   ES=>ES     200      37      152      6           5           18.5      17.78       25         0.21       0.5         /

mira041eses    ES=>ES     200      18      174      7           1            9         10          0         0.14      0.55         /

 talp041eses   ES=>ES     200      48      150      1           1           24         18.8       70         0.19       0.5       0.087

 talp042eses   ES=>ES     200      52      143      3           2           26         21.1       70         0.2       0.55       0.102
French (FR) as target language:

                                                                          Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                          #       #        #        #          #
Run Name       Task                                                      Accuracy    over F     over D                          weighted
                       Answers   Right   Wrong   ineXact   Unsupported
                                                                            %          %          %                              Score
                                                                                                          Precision   Recall

gine041bgfr   BG=>FR     200      13      182      5           0           6.5        6.6         5         0.1        0.5       0.051

gine041defr   DE=>FR     200      27      162      11          0           13.5       13.8       10         0.15       0.2       0.071

gine041enfr   EN=>FR     200      16      171      13          0            8         8.3         5         0.05       0.1       0.031

gine041esfr   ES=>FR     200      25      166      9           0           12.5       13.8        0         0.12       0.15      0.054

gine041frfr   FR=>FR    200       26      160      14          0           13         13.8        5          0          0        0.046

gine041itfr   IT=>FR     200      23      166      11          0           11.5       12.2        5         0.15       0.3       0.047

gine041nlfr   NL=>FR    200       17      171      12          0           8.5        8.8         5         0.12       0.2       0.041

gine041ptfr   PT=>FR     200      22      170      8           0           11         11.1       10         0.11       0.15      0.041

gine042bgfr   BG=>FR     200      13      180      7           0           6.5        6.1        10         0.1        0.35      0.038

gine042defr   DE=>FR     200      32      155      13          0           16         15         25         0.23       0.2       0.087

gine042enfr   EN=>FR     200      25      165      10          0           12.5       11.6       20         0.06       0.1       0.048

gine042esfr   ES=>FR     200      30      164      6           0           15         15.5       10         0.11       0.1       0.063

gine042frfr   FR=>FR    200       42      147      11          0           21         20.5       25         0.09       0.05      0.095

gine042itfr   IT=>FR     200      27      165      8           0           13.5       14.4        5         0.14       0.3       0.052

gine042nlfr   NL=>FR    200       26      158      16          0           13         12.2       20         0.14       0.2        0.06

gine042ptfr   PT=>FR     200      25      166      9           0           12.5       11.6       20         0.1        0.15       0.05
Italian (IT) as target language:


                                                                             Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                             #       #        #        #          #
  Run Name        Task                                                      Accuracy    over F     over D                          weighted
                          Answers   Right   Wrong   ineXact   Unsupported
                                                                               %          %          %                              Score
                                                                                                             Precision   Recall

ILCP-QA-ITIT     IT=>IT     200      51      117      29          3           25.5       22.7       50         0.62       0.5         /

  irst041itit    IT=>IT    200       56      131      11          2           28         26.6       40         0.27       0.3       0.155

  irst042itit    IT=>IT    200       44      147      9           0           22         20         40         0.66       0.2       0.107


Dutch (NL) as target language:


                                                                             Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                             #       #        #        #          #
 Run Name        Task                                                       Accuracy    over F     over D                          weighted
                          Answers   Right   Wrong   ineXact   Unsupported
                                                                               %          %          %                              Score
                                                                                                             Precision   Recall

uams041ennl     EN=>NL      200      70      122       7          1           35         31         65.2        0          0        0.222

uams041nlnl     NL=>NL      200      88       98      10          4           44         42.3       56.5        0          0        0.284

uams042nlnl     NL=>NL      200      91       97      10          2           45.5       45.2       47.8       0.55      0.25       0.326
Portuguese (PT) as target language:


                                                                           Overall   Accuracy   Accuracy      NIL Accuracy      Confidence
                           #       #        #        #          #
 Run Name       Task                                                      Accuracy    over F     over D                          weighted
                        Answers   Right   Wrong   ineXact   Unsupported
                                                                             %          %          %                              Score
                                                                                                           Precision   Recall

PTUE041ptpt    PT=>PT     199      56      125      18          0           28.1       28.5       25.8       0.14       0.9       0.243

 sfnx041ptpt   PT=>PT    199       22      165      8           4           11         11.9       6.4        0.13       0.7         /

 sfnx042ptpt   PT=>PT    199       30      154      10          5           16         11.3       9.6        0.16       0.6         /
                                                 Spanish Pilot Task
An additional pilot task was set up only for Spanish. Differently from the main tasks, list questions and questions
that required more sophisticated temporal reasoning were proposed.
The following table describes the results of the run alivpilot, submitted by the University of Alicante, that was
the only participating team. Results have been grouped by type of question (definition, factoid, list, temporally
restricted by date, temporally restricted by event and temporally restricted by period).
In addition, a couple of the posed questions had no answer in the corpus (NIL) but the system did not recognise
them.

The table provides the following information:
- the number of questions;
- the number of known distinct answers, i.e., the number of different and correct answers retrieved by the
University of Alicante system in its exercise and by humans during the pre-assessment process;
- the number of given answers;
- the number of questions with at least 1 correct answer, i.e., questions with at least 1 answer assessed as
  Right;
- the number of given correct answers;
- the system's recall in recognising correct answers, i.e., the ratio between the number of given correct answers
and the number of known distinct answers;
- the system's precision in recognising correct answers, i.e., the ratio between the number of given correct
answers and the number of given answers;
- the K-measure1 value; this metrics ranges in [-1, 1] and rewards systems that:
     • answer as many questions as possible,
     • give as many different right answers for each question as possible,
     • give the smaller number of wrong answers to each question,
     • assign higher values of the score to right answers,
     • assign lower values of the score to wrong answers,
     • give answer to the questions having less known answers;
- the correlation coefficient (r) between the confidence score and human assessment; human assessment equals
  1 when an answer is assessed as Right and 0 otherwise; r gives an idea about the quality of the system's self-
  scoring.

                                                       # questions
                                   # known                            # given
                                              # given with at least 1
                     # questions   distinct                           correct   recall   precision     K        r
                                              answers    correct
                                   answers                            answers
                                                         answer
      Definition          2            3         2        0 (0%)         0        0%       0%           0     N/A †
       Factoid           18           26        42     4 (22.2%)         5      19.2%     11.9%      -0.029   -0.089
        List             20          191        55       4 (20%)         6      3.1%      10.9%       -0.07    0.284
             Date        20           20        30       2 (10%)         2       10%      6.6%       -0.019     N/A
    Temp. Event          20           20        42       2 (10%)         2       10%      4.7%       -0.024    0.255
            Period       20           20        29       3 (15%)         3       15%      10.3%      -0.003    0.648
        Total           100          280        200      15 (15%)       18      6.4%       9%        -0.086   0.246
† r is Not Available because 0 was given for every component of any variable.


1
 K-measure is defined in: J. Herrera, A. Peñas, and F. Verdejo. Question Answering Pilot Task at CLEF 2004. In
Proceedings of the CLEF 2004 Workshop, Bath, United Kingdom, September 2004.