Mercure at CLEF-1


                                M. Boughanem, N.Nassr

                                                IRIT/SIG
                                     Campus Univ. Toulouse III
                                      118, Route de Narbonne
                                     F-31062 Toulouse Cedex 4
                                           Email : trec@irit.fr


1 Summary
This paper describes the tests performed by our team in CLEF programme. These tests were
done using Mercure system and concern : Multilingual, Bilingual and Monolingual tasks.
The section 2 presents the Mercure system. The section 3 describes our general approach to
CLIR. The section 4 gives the details of the experiments and the results.

2 Mercure model
Mercure is an information retrieval system based on a connectionist approach and modelled
by a multi-layered network. The network is composed of a query layer (set of query terms),
a term layer representing the indexing terms and a document layer [3],[2].
    Mercure includes the implementation of a retrieval process based on spreading activation
forward and backward through the weighted links. Queries and documents can be either
inputs or outputs of the network. The links between two layers are symmetric and their
weights are based on the tf  idf measure inspired from the OKAPI [4] term weighting
formula.
- the term-document link weights are expressed by:

                                            ij  (h1 + h2  log ( nN ))
                                           tf
                                d   ij =              dl
                                                                   i
                                                                                         (1)
                                           h3 + h4 
                                                      d + h5  tfij
                                                        j


- the query-term (at stage s) links are weighted as follows:

                                           ( nqqtf
                                q
                                  (s)
                                 ui =           nq;qtf si (nq > qtf )                    (2)
                                                qtf otherwise


                                                    1
   The query evaluation is based on spreading activation. Each node computes an input and
spreads an output signal [2].

2.1 Query evaluation
A query is evaluated using the spreading activation process described as follows :
  1. The query Qu is the input of the network. Each node from the term layer computes
     an input value from this initial query: I n(ti ) = quis and then an activation value :
     Out(ti ) = g (I n(ti )) where g is the identity function.


  2. These signals are propagated forwards through the network from thePterm layer to the
     document layer. Each document node computes an input : I n(dj ) = Ti=1 Out(ti )  wij
     and then an activation , Out(dj ) = RS V (Qu ; dj ) = g (I n(dj )):
     Notations :
     T : the total number of indexing terms,

     N : the total number of documents,

     qui : the weight of the term ti in the query u,

     ti : the term ti ,

     dj : the document dj ,

     wij : the weight of the link between the term ti and the document dj ,

     dlj : document length in words (without stop words),

     d: average document length, tfij : the term frequency of ti in the document Dj ,
     ni : the number of documents containing term ti ,

     nq : the query length, (number of unique terms)

     qtf : query term frequency.


3 General Clir Methodology
Our CLIR approach is based on query translation. It is illustrated by gure 1.
Indexing : a separate index is built for the documents in each language. English words are
    stemmed using Porter algorithm, French words are stemmed using a truncature (7 rst
    characters), no stemming for the German and Italian words. The German and Italian
    stoplists were downloaded from Internet.
Translation : is based on \dictionaries". For the CLEF1 experiments, three bilingual dic-
    tionaries were used all of which were actually simply a list of terms in language l1 that
    were paired with some equivalent terms in language l2. Table 1, shows the source and
    the number of entries in each dictionary.
Desambiguisation when multiple translations exist for a given term they are generally
    relevant only in a speci c context. The disambiguisation consists of selecting the terms
    that are in the context of the query. We consider that a context of a given query
    can be represented by the list of its terms. The desambiguisation process consists of
    building a context of the target query and using this context to desambiguate the list
    of substitutions resulting from the query source translation.
                                              2
                                 Documents
                                                                          Source query
                                                                                                   Dico.
                                                                                                   Aligned corpus
                                                                             Translation
                                   Indexing
                                                                             sustitustion list

                                                                                                 Desambiguisation
                    Index L1    Index L2      Index Ln

                                                                              Target query

                                                          Matching


                                                           merging


                                                         List of documents


                                      Figure 1: General CLIR approach

                               Type        Source          nb. entries
                               E2F http://www.freedict.com   42443
                               E2G http://www.freedict.com   87951
                               E2I http://www.freedict.com   13478
                                    Table 1: Dictionaries characteristics

     A context of the target query is built using an aligned corpus. It consists of selecting the
     best terms appearing in the top (X=12) documents in target language aligned to the
     top (X=12) retrieved by the query source. The terms are sorted according the following
     formula :                                       X
                                       score(ti ) =       dik

                                                                     d 2D
                                                                      k        x


          D    x : set of aligned documents to those retrieved by the source query,
          d   ik : weight of term ti in document dk .
     The desambiguisation of the translated query consists of retaining only terms that
     appear in the list of terms of the target context. However, if a speci c term has an
     unique substitution this term is retained even though it not exists in the context of the
     target query. Note that in this process all the terms appearing in the target context are
     retained we do not select only the best translation as it is done in some other works [1].


4 Experiment and Results
4.1 Multilingual experiment
Two runs using English topics and retrieving documents from the pool of documents in all
four languages (German, French, Italian and English), were submitted. The queries were
                                                           3
translated using the downloaded dictionaries. No desambiguisation, all the translated words
were retained in the target queries. The runs were performed by doing individual runs for
pair languages and merging the results to form the nal ranked list. Two merging strategies
were tested :
    naive strategy : all the documents resulting from the pair searches join a nal list. These
      documents are then sorted according to their RSV. The top 1000 were submitted.
    normalised strategy : each list of retrieved documents resulting from the pair search was
      normalised. The normalisation consists simply of dividing the RSV of each document
      by the maximum of RSVs in that list. The documents of the di erent lists are then
      merged and sorted according to their normalised RSV. The nal list corresponds to the
      top 1000 documents.
    Two runs were submitted : irit1men2a based on normalised merging and irit2men2a based
on naive merging.
                                                 irit1men2a irit2men2a
              better than median Avg. Prec. :    15 (best 0) 16 (best 0)
              worse than median at Avg. Prec. : 25 (worst 2) 24 (worst 1)
                   Table 2: Comparison with Median at average precision

   Table 2 compares our runs against the published median runs. We notice that for both
runs the number of topics better and less than median are slightly the same.
            Run-Id       P5    P10    P15    P30 Exact Avg. Prec.
            irit1men2a 0.3750 0.3250 0.2900 0.2433 0.1996 0.1519
            irit2men2a 0.3950 0.3400 0.3017 0.2500 0.2284 0.1545
                   Table 3: Comparisons between the merging strategies

    Table 3 compares the merging strategies. It can be seen that the naive strategy is slightly
better than the normalised strategy in the top document, and at Exact precision but no dif-
ference at average precision. Nothing was gained from the normalised strategy.
   The impact of the merging strategy.

         Pair language         P5      P10      P15      P30     Exact Avg. Prec.
         E2F (34 queries)    0.2941   0.2118   0.1824   0.1353   0.2185 0.2046
         E2G (37 queries)    0.2378   0.2189   0.1910   0.1396   0.1683 0.1489
         E2I (34 queries)    0.1882   0.1647   0.1333   0.0843   0.1877 0.1891
         E2E (33 queries)    0.5091   0.4212   0.3677   0.2798   0.4490 0.4611
                               Table 4: Results of pair search


                                               4
    Table 4 shows the results of pair language (example, E2F means English queries translated
to French and compared to French documents, etc.). We can easily notice that the monolin-
gual (E2E) search performs much more better than all the pair (E2F, E2G, E2I) searches.
Moreover, all the pair searches (except E2G) have their average precision better than the
best multilingual search. The merging strategy caused the loss of relevant documents, Table
5 shows the total number of relevant in the pair list and the numuber of document which was
kept in the nal list lost when merging. Relevant documents were lost from all the pair lists.
                                              E2E      E2F   E2I    E2G
                    Rel. Ret. by pair list    554      389   228    467
                    Rel. kept in the nal list 500      281   152    296
                    Rel. lost.                 54      107   76     171
    Table 5: Comparison between the number of relevant in Pair and Multilingual lists


4.2 Bilingual experiment
The bilingual experiment was carried on using F2E free dictionary + desambiguisation. The
desambiguisation was performed using WAC (Word-wide-web Aligned Corpus) parallel corpus
built by RALI Lab (http://www-rali.iro.umontreal.ca/wac/).
                                       irit1bfr2en
                     better than median Avg. Prec. :    22 (best 3)
                     worse than median at Avg. Prec. : 11 (worst 2)
              Table 6: Comparative bilingual F2E results at average precision

   Table 6 compares our run against the published median runs. Most queries give results
better than the median and 3 were the best.
        Run-id (33 queries) P5        P10     P15    P30 Exact Avg. Prec.
        Dico+Des.           0.3152 0.2636 0.2182 0.1636 0.2841 0.2906
        Dico                0.2788 0.2515 0.2000 0.1566 0.2685 0.2741
        Impr (%)              13      4.8      9      4.5      5.8        6
                          Table 7: Impact of the desambiguisation

   Table 7 compares the results between the runs Dico+desambiguisation and Dico only.
The desambiguisation is e ective the average precision improves of 6%.
4.3 Monolingual experiments
Three runs were submitted in monolingual tasks : iritmonofr, iritmonoit, iritmonoge
   First of all, we notice clearly that the monolingual search is much better than both the
multilingual and the bilingual searches. Secondly, French monolingual results seem to be
better than both Italian and the German. Italian results are better than German. These
                                             5
   Run-id                       P5    P10    P15    P30 Exact Avg. Prec.
   iritmonofr FR (34 queries) 0.4765 0.4000 0.3510 0.2637 0.4422 0.4523
   iritmonoit IT (34 queries) 0.4412 0.3324 0.2490 0.1637 0.4182 0.4198
   iritmonoge GE (37 queries) 0.4108 0.3892 0.3550 0.2766 0.3197 0.3281
                     Table 8: Comparison between monolingual search

runs were done using exactly the same procedures the only di erence concerns the stemming
which was used only for French.

5 Acknowledgements
This work was in part supported by the EC through the 5th framework, Information Societies
Technology programme (IRAIA Project, IST-1999-10602, http://iraia.diw.de).

References
 [1] L. Ballesteros, W. Croft. Resolving Ambiguity for Cross-Language Retrieval. in Proceed-
     ings of the 21st ACM SIGIR'98, pages, 64-71.
 [2] M. Boughanem, C. Chrisment & C. Soule-Dupuy, Query modi cation based on rel-
     evance backpropagation in Adhoc environment, Information Processing and Man-
    agment. April 1999.
[3] M. Boughanem, T. Dkaki, J. Mothe & C. Soule-Dupuy, Mercure at trec7, Pro-
    ceedings of the 7th International Conference on Text REtrieval TREC7,
    E. M. Voorhees and Harman D.K. (Ed.), NIST SP 500-236, Nov. 1997.
[4] S. Robertson and al Okapi at TREC-6, Proceedings of the 6th International
    Conference on Text REtrieval TREC6, Harman D.K. (Ed.), NIST SP 500-
    236, Nov. 1997.


                                             6