Passage Retrieval vs. Document Retrieval in the
     Monolingual Task with the IR-n system
                                Elisa Noguera and Fernando Llopis
    Grupo de investigación en Procesamiento del Lenguaje Natural y Sistemas de Información
                      Departamento de Lenguajes y Sistemas Informáticos
                                   University of Alicante, Spain
                                    elisa,llopis@dlsi.ua.es


                                              Abstract
      The paper describes our participation in monolingual tasks at CLEF 2006. We have
      submitted results for the following languages: English, French, Portuguese and Hungar-
      ian. We focused on studying different weighting schemes (okapi and dfr) and retrieval
      strategies (passage retrieval and document retrieval) to improve retrieval performance.
      After an analysis of our experiments and of the official results at CLEF, we find that
      our different configurations (French, Portuguese and Hungarian) achieve considerably
      improved scores.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval

General Terms
Experimentation, Measurement, Performance

Keywords
Information Retrieval


1     Introduction
In our sixth participation at CLEF, we focused on evaluating: a new weighting model (dfr), re-
trieval based on passages/documents and setting the best configuration to each language. Specif-
ically, we participated in the following languages: English, French, Portuguese and Hungarian.
    IR-n system [5] was developed in 2001. It is a Passage Retrieval (PR) system which uses
passages with a fixed number of sentences. This provides the passages with some syntactical
content. Previous researches with the IR-n system ([6] [7] [8] [9]) are based on detecting the suitable
size for each collection ( to experiment with test collection ), but determining the similarity of a
document based on the passage with more similarity. Last year [10] we proposed a new method
’combined size passages’ in order to improve the performance of the system combining different
size passages. This year we implemented a new similarity measure in the system and we tested
our system with different configurations to each language.
    Futhermore, our team participated in other tasks at CLEF-2006 as GeoCLEF, CL-SR... and
we also applied PR systems in other tasks, such as Question Answering (QA) [3].
    This paper is organized as follows: next section describes IR-n system and its new changes.
Following, we describe the task developed at CLEF 2006 by our system and the training. And
finally, we present the achieved results and the conclusions.


2     IR-n system
In this section the main characteristics of the IR-n system are presented and details are given on
the resources and the techniques of the system used in CLEF 2006.

2.1    Resources: stemmers and stopword lists
We used the stemmers and stopwords lists available on the web http://www.unine.ch/info/clef.
We highlight that Hungarian collections are encoded in UTF-8.

2.2    Weighting models
IR-n system uses several similarity measures. This year, the weighting model dfr [1] was included,
but we also used okapi weighting model[4].
   The document ranking produced by each weighting model is represented using the same general
expression, namely as the product of a document-based term weight by a query-based term weight:
                                                X
                                    sim(q, d) =      wt,p · wt,q                               (1)
                                                  t∈q∧p


    List of variables Here is described the list of variables used in the following formulas. 2, 3:

    • ft,p is the frequency of the term t on the passage p,

    • ft,q is the frequency of the term t on the query q,
    • n is the number of documents in the collection,
    • nt is the number of documents which t appears,

    • c, k1 , b and k3 are constant values,

    • ld is the length of the document,
    • avgld is the average of the length of the documents

    Okapi Using the okapi model, the relevance score of a passage p for query q is given by:


                                               (k1 + 1) · ft,p
                                        wt,p =
                                                   K · ft,p
                                           (k3 + 1) · ft,q
                                    wt,q =                 · wt
                                              k3 · ft,q
                                                           ld
                                     K = (1 − b) + b ·
                                                         avrl d
                                                n − nt + 0.5
                                      wt = log2                                                 (2)
                                                   nt + 0.5
    DFR    Using this model, the weight of a passage p for query q is given by:


                                                                        wt,q = ft,q
                                           0            1 + wt           ft + 1
                 wt,p = (log2 (1 + wt ) + wt,p · log2 (        )) ·         0 + 1)
                                                          wt        nt · (wt,p
                                                  0                       c · avrld
                                                wt,p = ft,p · log2 (1 +             )
                                                                              ld
                                                                                  ft
                                                                           wt =                     (3)
                                                                                  n

2.3    Query expansion
Most IR systems use query expansion techniques [2] based on adding the most frequent terms
contained in the most relevant documents to the original query. The IR-n architecture allows us
to use query expansion based on either the most relevant passages or the most relevant documents.
In previous researches, we obtained better results using the most relevant passages.


3     Training
This section describes the training process which has been carried out in order to obtain the
best features to improve the performance of the system. Firstly, the collections and resources are
described. The following section explains the specific experiments which we have carried out.

3.1    Data Collections
This year our system has participated in the following monolingual tasks: English, French, Por-
tuguese and Hungarian. Table 1 shows the characteristics of the language collections.

    Language         Collections             NDocs        Size        SDAvg        WDAvg      WSAvg
     English     The Angeles Times 94        169477      579 MB         25          529        20
                  Glasgow Herald 95
      French       Le Monde 94/95            177452      487 MB          17             388    21
                  SDA French 94/95
    Portuguese      Público 94/95           210734      564 MB          18             433    23
                     Folha 94/95
    Hungarian      Magyar Hirlap 02           49530      105 MB          11             245    20

                                      Table 1: Data Collections


    • SDAvg is the average of sentences in each document.
    • WDAvg is the average of words in each document.
    • WSAvg is the average of words in each sentence.

3.2    Experiments
The aim of the experiment phase is set up the optimum value of the input parameters for each
collection. For training has been used the collections CLEF-2005 (English, French, Portuguese
and Hungarian). Query expansion techniques have also been used in all languages. In addition,
we describe the input parameter of the system:
   • Size passage (sp): We established two size passage: 8 (normal passage) or 30 (big passage).
   • Weighting model (wm): We use two weighting models: okapi and dfr.
   • Opaki parameters: these are k1 , b and avgld (k3 is fixed as 1000).
   • Dfr parameters: these are c and avgld .
   • Query expansion parameters: If exp has value 1, this denotes we use relevance feedback
     based on passages in this experiment. But, if exp has value 2, the relevance feedback is
     based on documents. Moreover, np and nd denote the k terms extracted from the best
     ranked passages (np) or documents (nd) from the original query.
   • Evaluation measure: Mean average precision (avgP) is the evaluation measure used in
     order to evaluate the experiments.

3.2.1   English
As we can see at table 2, the best weighting scheme is dfr. Therefore, the passage size 8 obtains
0.5403 as average precision.

                  sp    wm      c   avgld   k1    b     exp      np       nd     avgP
                   8    dfr     4    600                                        0.4940
                  30    dfr     4    600                                        0.5063
                   8    dfr     4    600                 2       10       10    0.5403
                  30    dfr     4    600                 1        5       10    0.5384

                               Table 2: Training results English 2005


3.2.2   French
For French language, the best weighting scheme is okapi with 9 as passage size. This configuration
has obtained 0.3701 as average precision.

                 sp     wm     c    avgld   k1     b     exp      np       nd     avgP
                  8     dfr    3     300                                         0.3090
                 30     dfr    3     300                                         0.3105
                 50     dfr    2     300                                         0.3102
                  8     dfr    2     300                     2    10       10    0.3686
                 30     dfr    2     300                     2     5       10    0.3607
                  9    okapi         300    1.2   0.3                            0.2964
                  8    okapi         300    1.2   0.3                            0.2954
                  9    okapi         300    1.5   0.3                            0.3011
                  9    okapi         300    1.5   0.3        1        5    10    0.3701
                 30    okapi         300    1.5   0.3        1        5    10    0.3603

                               Table 3: Training results French 2005


3.2.3   Hungarian
The best weighting scheme is dfr to Hungarian language, whereas the passage size is 30 to Hun-
garian. The best average precision obtained by this configuration is 0.3644.
                 sp    wm       c    avgld       k1    b     exp     np          nd     avgP
                  8    dfr      5     300                                              0.3119
                 30    dfr      2     300                                              0.3333
                 50    dfr      2     300                                              0.3334
                  8    dfr      5     300                     1      10          10    0.3534
                 30    dfr      2     300                     2      10          10    0.3644
                  8   okapi           100        1.5   0.3                             0.2930
                  8   okapi           300        1.2   0.3    1         10       10    0.3264

                              Table 4: Training results Hungarian 2005


3.2.4    Portuguese
The best configuration by Portuguese language is the same as Hungarian language (dfr as weighting
scheme and 30 as size passage). The average precision obtained with this configuration is 0.3948.

                 sp    wm       c    avgld       k1    b     exp     np          nd     avgP
                  8    dfr      6     300                                              0.3362
                 30    dfr      4     300                                              0.3484
                 30    dfr      6     300                                              0.3457
                 50    dfr      4     300                                              0.3474
                  8    dfr      6     300                     1         10       10    0.3733
                 30    dfr      4     300                     1         5        10    0.3948
                  8   okapi           300        1,5   0,3                             0.3283
                  8   okapi           300        1,5   0,3    1         10       10    0.3676
                 30   okapi           300        1,5   0,3    1         10       10    0.3793

                           Table 5: Training results Portuguese 2005


3.2.5    Experiments summary
In conclusion, the table 6 shows the best configuration for each language. These configurations
were used at CLEF 2006.

     language        run            sp    wm       C    avgld      k1        b        exp   np   nd   avgP
      English     8-dfr-exp          8    dfr      4     600                           2    10   10   0.5403
      French     9-okapi-exp         9   okapi           300       1.5       0.3       1     5   10   0.3701
    Hungarian     30-dfr-exp        30    dfr      2     300                           2    10   10   0.3644
    Portuguese    30-dfr-exp        30    dfr      4     300                           1     5   10   0.3948

                          Table 6: Configurations used at CLEF 2006


4       Results at CLEF-2006
We submitted four runs for each language in our participation (except for English that we have
submitted 1 run) in CLEF 2006. The best parameters, i.e. those that gave the best results in
system training, were used in all cases.
   This is the description of the runs that we submitted at CLEF 2006:
    • yy-xx-zexp
        – yy is the passage size
        – xx is the weighting model used (dfr or okapi)
        – z is the expansion query (not used ’nexp’)
   The official results for each run are showed in Table 7. Like other systems which use query
expansion techniques, these models also improve performance with respect to the base system. Our
results are appreciably above average in all languages, except for English where they are sensible
below the average. This results present that the percentage of improvement in Portuguese is
15.43% in avgP.

                        Language            Run         AvgP        Dif
                         English      CLEF Average      38.73
                                         30-dfr-exp     38.17     -1.44%
                          French      CLEF Average      37.09
                                         30-dfr-exp     37.13
                                          8-dfr-exp     38.28     +3.2%
                                        9-okapi-exp     35.28
                                       30-okapi-exp     37.80
                        Portuguese    CLEF Average      37.32
                                         30-dfr-exp     41.95
                                          8-dfr-exp     42.06
                                        8-okapi-exp     42.41
                                       30-okapi-exp     43.08    +15.43%
                        Hungarian     CLEF Average      33.37
                                         30-dfr-exp     35.32     +5.5%
                                          8-dfr-exp     34.25
                                       30-dfr-nexp      30.60
                                        8-dfr-nexp      29.50

                     Table 7: CLEF 2006 official results. Monolingual tasks.


5    Conclusions and Future Work
In this seventh CLEF evaluation campaign, we proposed a different configuration for the English,
French, Portuguese and Hungarian languages (see table 6). In order to enhance retrieval perfor-
mance, we have evaluated different weighting models using also a query expansion approach based
on passages and documents.
    The results of this evaluation indicate that for the French, Portuguese and Hungarian languages
proved to be effective (see table 7) because the results are above average. However, the English
language has obtained results sensible below average.
    For Portuguese language, the best results are obtained by okapi weighting model. For other
languages (English, French and Hungarian), the best results are obtained by dfr (see table 7).
    The best passage size for French was 9, although for other languages (English, Portuguese and
Hungarian) was 30 (this passage size is comparable to IR based on the complete document).
    As in previous evaluation campaigns, pseudo-relevance feedback based on passages improves
mean average precision statistics for all languages, even though this improvement is not always
statistically significant.
    Lastly, we outline the future directions that we plan to undertake are evaluate languages as
Bulgarian or Spanish. Therefore, as future work we also consider to research into different ways of
providing Natural Language information to basic IR and evaluating the impact of each approach.
6    Acknowledgements
This research has been partially funded by the Spanish Government under project CICyT number
TIC2003-07158-C04-01 and by the Valencia Government under project number GV06-161.


References
 [1] G. Amati and C. J. Van Rijsbergen. Probabilistic Models of information retrieval based on
     measuring the divergence from randomness. ACM TOIS, 20(4):357–389, 2002.
 [2] Aitao Chen and Fredric C. Gey. Combining Query Translation and Document Translation
     in Cross-Language Retrieval. In Carol Peters, Julio Gonzalo, Martin Braschler, and et al.,
     editors, 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Lecture notes
     in Computer Science, pages 108–121, Trondheim, Norway, 2003. Springer-Verlag.
 [3] Noguera Elisa, Fernando Llopis, and Antonio Ferrández. Passage Filtering for Open-Domain
     Question Answering. In Sampo Pyysalo Tapio Salakoski, Filip Ginter and Tapio Pahikkala,
     editors, Advances in Natural Language Processing, Proceedings of 5th International Confer-
     ence on Natural Language Processing FinTAL, volume 4139 of Lecture Notes in Computer
     Science, pages 756–767. Springer-Verlag, August 2006.

 [4] Savoy J. Fusion of Probabilistic Models for Effective Monolingual Retrieval. In Carol Peters,
     Julio Gonzalo, Martin Braschler, and et al., editors, 4th Workshop of the Cross-Language
     Evaluation Forum, CLEF 2003, Lecture notes in Computer Science, Trondheim, Norway,
     2003. Springer-Verlag.

 [5] F. Llopis. IR-n: Un Sistema de Recuperación de Información Basado en Pasajes. PhD thesis,
     University of Alicante, 2003.

 [6] Fernando Llopis and José Luis Vicedo González. IR-n: A Passage Retrieval System at CLEF-
     2001. In CLEF, pages 244–252, 2001.

 [7] Fernando Llopis, José Luis Vicedo González, and Antonio Ferrández. IR-n System at CLEF-
     2002. In Proceedings of CLEF 2002, pages 291–300, 2002.

 [8] Fernando Llopis and Rafael Muñoz. Cross-Language Experiments with the IR-n System. In
     Proceedings of CLEF 2003, 2003.

 [9] Fernando Llopis, Rafael Muñoz, Rafael M. Terol, and Elisa Noguera. IR-n r2 : Using nor-
     malized passages. In Carol Peters and Francesca Borri, editors, Cross Language Evaluation
     Forum: Working Notes for the CLEF 2004 Workshop, pages 65–72, Pisa, Italy, 2004. IST-
     CNR.

[10] Fernando Llopis and Elisa Noguera. Combining Passages in the Monolingual Task with the
     IR-n System. In Proceedings of CLEF 2005, 2005.