-

UniNE at CLEF 2012

Mitra Akasereh

mitra.akasereh@unine.ch 0

Nada Naji

nada.naji@unine.ch 0

Jacques Savoy

jacques.savoy@unine.ch 0 0 Computer Science Dept., University of Neuchatel , Rue Emile Argand 11, 2000 Neuchatel , Switzerland

As participants in this CLEF evaluation campaign, our first objective is to propose and evaluate various indexing and search strategies for the CHiC corpus, in order to compare the retrieval effectiveness across different IR models. Our second objective is to measure the relative merit of various stemming strategies when used for the French and English monolingual task in the CH context. Our third objective is to assess the effectiveness of query translation methods in a bilingual retrieval. To do so we evaluated the CHiC testcollections using Okapi, various IR models derived from the Divergence from Randomness (DFR) paradigm together with the dtu-dtn vector-space model. We also evaluated different pseudo-relevance feedback approaches. In the bilingual task, we conducted our search on the English corpus using the French and German topics with two different translations for each of them. For both English and French languages, we find that word-based indexing with our light stemming procedure results in better retrieval effectiveness than with other strategies. When ignoring stemming, the performance variations were relatively small yet for the French corpus better than applying a light stemmer. In bilingual level results show that using a combination of translation resources gives better results than a single source.

Probabilistic IR Models Stemmer Data Fusion Cultural Heritage bilingual IR

Cultural heritage can be defined as any handmade substance or intangible feature remained from previous societies. It can refer to any artefacts, built or natural environments, traditions and languages, etc. The developing use of digital information challenges the cultural heritage organizations to provide cultural heritage collections in electronic format. The data may come from different sources (libraries, archives, museums, audiovisual archives, books, journals, etc.), in various languages and formats. These digital libraries should not only be created but also properly managed and assessed in order to bring the maximum utility to their users. As yet no proper evaluation approaches are available and there is work to be done in this area. The goal of Cultural Heritage in CLEF (CHiC) evaluation lab is thus providing a systematic and large-scale evaluation of cultural heritage digital libraries.

The IR group of university of Neuchâtel focuses, as one of its main tasks, on design, implementation and evaluation of various indexing and search strategies for a set of different natural languages. Up to this point we achieved to provide a groundwork for evaluation and comparison of different tools for monolingual IR, in different languages, using generic test-collections (e.g., newspaper articles). Our second goal is to evaluate different tools considering only a specific field of knowledge in order to integrate domain specific search into our system. The aim here is to be able to evaluate the impact of document structure and query formulation on retrieval effectiveness in order to study the possibilities to improve the search quality in a domain specific search. As a third objective we also want to integrate translation into the search process and adapting our system for bilingual and multilingual IR. Accordingly reaching these objectives has been our main motive to participate in CHiC evaluation lab at CLEF 2012.

The rest of this report is organized as follows: section 2 presents an introduction to our experiment setup. Section 3 describes the results obtained during the experiment and the related analysis. Section 4 shows our official runs and finally section 5 concludes the experiment. 2 2.1

Experiment Setup Overview of the Task

In our participation in CHiC we worked on the ad-hoc retrieval task. This task is a standard retrieval task in which retrieval effectiveness for individual queries is assessed. At this level the only authorized user/system interaction would be blind-query expansion technics. The expected output is a ranked list of retrieved documents for each query. The task is covering monolingual, bilingual and multilingual subtasks in English, French and German. In our experiment we worked on monolingual English and French retrieval as well as bilingual retrieval in which we worked with French and German topics to be searched on the English corpus. 2.2

Overview of the Test-collection

The corpus used in CHiC test-collection is extracted from Europeana (www.europeana.eu) and is offered in 3 major European languages, namely English (EN), French (FR) and German (DE). Europeana is a digitized collection of Europe‟s cultural and scientific heritage. It provides access over 23 million objects such as books, paintings, films, museum objects, etc. collected from more than 2200 institutions in 33 countries. Europeana collection is cross-domain and in multiple languages. The documents metadata is mapped to a single data model. Each document consists of elements providing brief descriptions of the objects (title, keywords, description, date, provider, etc.). It is worth-mentioning that some documents contain less of these tags than other ones which sometimes leaves them with very poor content. As far as our experiment is concerned, only human-readable informative texts are of use.

The English corpus consists of 1,106,426 documents while the French one has 3,635,388 ones. A sample of both French and English documents is shown in Figures 1 and 2. <ims:metadata ims:identifier="http://www.europeana.eu/resolve/record/10105/662DC5085397837 C8C8891836EA6431C4A477CB2"ims:namespace="http://www.europeana.eu/" ims:language="eng"> <ims:fields> <dc:identifier>Orn.0446</dc:identifier> <dc:subject>Australian Pelican</dc:subject> <dc:title>Australian Pelican (Orn.0446)</dc:title> <dc:type>mounted specimen</dc:type> <europeana:country>malta</europeana:country> <europeana:dataProvider>Heritage Malta</europeana:dataProvider> <europeana:isShownAt>http://www.heritagemalta.org/sterna/orn.php?id=0446 </europeana:isShownAt> <europeana:language>en</europeana:language> <europeana:provider>STERNA</europeana:provider> <europeana:type>IMAGE</europeana:type> <europeana:uri> http://www.europeana.eu/resolve/record/10105/662DC5085397837C8C8891 836EA6431C4A477CB2</europeana:uri> </ims:fields> </ims:metadata>

For the ad-hoc task there are 50 very short topics. These topics are mostly named entities (people, places and works) and they mainly extracted from Europeana queries logs. Thus they convey the real user‟s information needs in a cultural heritage search context. Among the 50 French topics, 11 have no relevant documents in the collection. This number grows to 14 for the English topics. One topic from each language is shown in Figure 3. As shown in the sample below each topic consists of a title and, sometimes, a description of the content. Even though, the only field that should be used for retrieval is the title. - <topic lang="en"> <identifier>CHIC-006</identifier> <title>esperanto</title> <description>Constructed international auxiliary language</description> </topic> - <topic lang="fr"> <identifier>CHIC-004</identifier> <title>film muet</title> <description /> </topic> - <topic lang="de"> <identifier>CHIC-025</identifier> <title> amerikanische sklaverei </title> < description /> </topic> In our experiment we applied a stopword removal along with a light stemmer for both English and French corpora. Our stopword list for English contains 571 terms while the French one has 464 terms. These tools are freely available at members.unine.ch/jacques.savoy/clef/. These lists are composed of terms having a high frequency such as determinants, prepositions, conjunctions, pronouns, and some verbal forms which convey no important meaning. The light stemmer that we used for English removes only the plural „-s‟ and is called S-stemmer [ 1 ]. The stemmer for French removes the inflectional suffixes from plural and feminine forms of the words [ 2 ].

Our choice of these light stemmers is based on previous experiments which show that light stemmers tend to be as effective as stemmers based on morphological analysis [ 1 ], [ 3 ], [ 4 ]. Moreover applying stemming would not be a good manner to achieve high precision which is the aim in this experiment [ 5 ]. 2.4

IR Models

In our experiments we tried different weighting schemes in order to compare them and define the most effective ones in terms of achieving a high precision. First we picked the dtu-dtn model [ 6 ] as an effective vector-space model. Second, as probabilistic models, we used the Okapi (BM25) [ 7 ]. Then we tried three other probabilistic models extracted from the Divergence from Randomness (DFR) family [ 8 ], namely DFR-PL2, DFR-I(ne)C2, and DFR-I(ne)B2. The indexing weight (weight of term tj in document di) in these models is computed as shown in Table 1. li is the length of document di and avdl is the average document length. ] wij  Infi1j  Infij2   log2 (Prob1ij (tfij ))  (1  Probi2j (tfij ))

DFR- I(ne)C2:

 n  1  Infi1j  tfnij  log

 ne  0.5  DFR-I(ne)B2: Probi1j  DFR-PL2: e j  tjfnij tfnij! ) ( c and mean _ dl (average document For evaluating the retrieval performance we chose the MAP (mean average precision) measure. This is computed with the TREC_EVAL program where MAP value is computed based on, maximum, 1000 retrieved items per query. It is important to mention that when computing the MAP, the topics with no relevant items are not taken into account (14 topics among the English topics and 11 French ones). In order to enhance the retrieval effectiveness we also applied a blind-query expansion to our test. Our previous experiments on other corpora show that pseudorelevance feedback (PRF or blind-query expansion) tends to improve the retrieval effectiveness [ 9 ]. As a first approach we tried the Rocchio's approach [ 10 ] with α = 0.75, β = 0.75. In this method the system expands the query by adding m terms selected from the k best ranked documents retrieved for the original query. As a second approach we tried an idf-based query expansion model [ 11 ]. The reason for trying both approaches is that in some cases adding frequently occurring terms produces noise and consequently Rocchio's approach does not give good results [ 12 ]. 2.7

Data Fusion

In our experiment we tried to see whether combining different indexing schemes and IR models improves the retrieval effectiveness, as it is supposed to, or not [ 13 ]. It is probable that different strategies retrieve the same relevant items in their top ranks rather than the same non-relevant ones. Therefore we consider that by combining different ranked lists, resulting from different IR models, we will gain a list with relevant documents in higher ranks and the non-relevant items in lower ones [ 14 ]. In order to produce this combination of ranked lists, different fusion operators can be used. In our study we chose the Z-score scheme which tends to perform the best [ 14 ], [ 15 ]. More details about the Z-score strategy can be found in [ 16 ]. 3 3.1

Results & Analysis Monolingual Retrieval

At monolingual ad-hoc task, we test our system using the English and French corpora. Tables 2 and 3 show the Mean Average Precision (MAP) for, respectively, English and French corpora. For both languages, we tried different IR models while applying a light stemmer (Section 2.3) and compared these results with the ones obtained when stemming is ignored. In using the Okapi model the avdl (average document length) is set to 181 for English corpus and 169 for the French one, the constant k1 to 1.2, for both languages, and we tried three different values for the constant b: 0.5, 0.7 & 0.9.

As the results show, for English the corpus with DFR-I(ne)B2 model we achieve the highest MAP while the best performing model for French is Okapi model (with b=0.5). The results show that applying the light stemmer for the English language improves the effectiveness of the search which is not the case for the French collection. As can be seen in Table 3 we achieved higher MAP while ignoring the stemming phase for the French language. By making a query-by-query analysis on the results we can find some examples where stemming misleads the retrieval. In Topic #21 the title “chanrdonne” (Jacques Chardonne, Writer (F.) Or place in Switzerland) is indexed as “chardon” (after applying the light stemmer) which leads the system to retrieve in its top ranks non-relevant documents (in which “chardon” refers to a flower) such as: ─ Etude de feuilles de echirops, de sphoerophalus, chardon cultivé, de chardon sauvage de la mer, de fleur lilas, de chardon sauvage ─ Sujet ou décor : représentation végétale (fleur, chardon) ; chardon bleu ; Etude de chardon fleuri ─ Chardons sur la côte rocheuse As another example we can mention Topic #9 for which the title “îles malouines” changes to “malouin” after stemming and results in the retrieval of non-relevant documents (where “Malouin” is a proper name) such as follows in the top ranks: ─ L'Avare, comédie de Molière en 5 actes, mise en vers, par A. Malouin ─ villas de la Malouine

Table 4 contains the MAP obtained when applying pseudo-relevance feedback. These results reveal that in this experiment the PRF technic did not help to enhance the retrieval performance. The reason should be due to the fact that in this experiment we are dealing with relatively short documents (having the average number of distinct indexing terms per document at ~54 for English and ~56 for French).

In Table 5 we can see the results for our data fusion approach for the English corpus. We can see that the MAP obtained by combining different result lists enhances, in some cases, slightly the performance. However the difference between the MAP obtained for each model separately and the combined one is rather small.

Model DFR-I(ne)B2 DFR-PL2 DFR-I(ne)B2 DFR-I(ne)C2 DFR-I(ne)B2 dtu-dtn dtu-dtn DFR-PL2 DFR-I(ne)C2 dtu-dtn DFR-I(ne)B2 Okapi(b=0.9) dtu-dtn DFR-PL2 DFR-I(ne)C2 dtu-dtn Okapi(b=0.9) DFR-I(ne)C2 dtu-dtn Okapi(b=0.9) 5 documents /10 terms 20 documents /10terms 5 documents /10terms 20 documents /10terms 10 documents /10terms 20 documents /10terms 10 documents /30terms In our bilingual retrieval we used the German and French topics to search the English corpus. Our approach was based on query translation (QT). Thus we produced the English translations for German and French topics and then we launched the search on English corpus. To translate the queries we first used two different strategies. First we used Google translation which seems to give reasonable results when dealing with very short query formulation [ 17 ]. As a second approach we used the combination of Wikipedia and Google considering that a combination of translation strategies slightly improves the retrieval performance [ 16 ]. The results for the bilingual retrieval are shown in Tables 6 and 7. We can see that using the combination of Google and Wikipedia results a better performance even though the difference is not remarkable.

The topics used in this collection are mostly name entities and only the title is used for the search which makes the translation less critical and easier. As a result there are not many differences between translations produced with the two strategies. However, by inspecting the results in details we can find some cases for which a better translation led to better retrievals. In translating Topic #5 (“briefmarke”), from German to English, Google gives us the word “stamp” versus “postage stamp” which resulted from the Google and Wikipedia combination. As a result the system returns 9 relevant documents among its first 10 ranks when searching “postage stamp” while by searching “stamp” the first relevant document only appears at rank 82. Using the French topics for the same topic (“timbre poste”), Google gives us “stamp post” versus “postage stamp” using the combination method. Here again the system retrieves 9 relevant documents among its first 10 ranks using “postage stamp” while by searching “stamp post” it retrieves 5 relevant documents among its first 10 having the first relevant at rank 5.

Table 8 summarizes our twelve official runs. We have submitted four runs for the English monolingual ad-hoc task and four French monolingual ad-hoc runs. For bilingual ad-hoc we submitted two runs using French topics to retrieve English documents and two runs using German topics again on the English corpus. In each run we used our different selected models while applying our light stemmers or alternatively skipping the stemming phase. In some cases we applied a pseudo-relevance feedback strategy [ 11 ] to evaluate its impact on the system‟s performance. We also tried to merge different models into a single ranked list using the Z-score scheme [ 16 ] in order to improve the retrieval effectiveness. The results obtained in CLEF 2012 CHiC lab, state that the models derived from the Divergence from Randomness (DFR) family, yield the best retrieval effectiveness regardless the underlying language and test-collection. Applying DFR-I(ne)B2 and DFR-PL2 for both the French and English corpora produced a high MAP compared to other tested models. Our results reveal that the Okapi model (with b=0.5) tends also to be an effective model. The resulting question is to define the best values for the underlying constants.

Our experiment shows that applying a light stemmer (removing only the plural „-s‟) for English, helps to achieve better results than when the stemming phase is skipped. On the contrary, when using our light stemmer for French (removing plural and feminine suffixes) does not seem to enhance the retrieval performance. A simpler stemmer for the French language may produce a better effectiveness than the applied light stemmer.

Considering the results, we can also conclude that when dealing with relatively short documents, blind-query expansion is not a useful expansion method in order to improve the retrieval effectiveness. In such cases, it seems difficult to select the most appropriate terms to be included in the expanded query.

Finally, our results from the bilingual search confirm the effectiveness of DFRI(ne)B2 model and the S-stemmer (used for English). Furthermore, they show that a combined translation strategy leads to perform better results than a single one. Even though in our experiment, having very short topics (and mostly name entities), the difference between the various translation methods is not remarkable. Acknowledgements. This work was supported in part by the Swiss National Science Foundation under Grant #200020-129535/1.

1. Harman , D.K. : How effective is suffixing? JASIS . 42 ( 1 ), 7 - 15 ( 1991 )

2. Savoy , J.: A stemming procedure and stopword list for general French corpora . JASIS . 50 , 944 - 952 ( 1999 )

3. Savoy , J.: Light Stemming Approaches for the French , Portuguese, German and

Hungarian

Languages . Proceedings ACM-SAC , 1031 - 1035 . The ACM Press, ( 2006 )

4. Fautsch

, Savoy

.: Algorithmic Stemmers or Morphological Analysis: An Evaluation. JASIST . 60 , 1616 - 1624 ( 2009 )

5. Savoy

, Rasolofo Y.: Report on the TREC 11 Experiment: Arabic , Named Page and Topic Distillation Searches . In: Proceedings of the eleventh text retrieval conference TREC2002 , pp. 765 - 774 . NIST Special Publication ( 2003 )

6. Singhal , A. : AT & T at TREC-6 . ACM Conference on Research and Development in Information Retrieval , pp. 35 - 41 . ACM/SIGIR ( 2002 )

7. Robertson , S.E. , Walker , S. & Beaulieu , M. : Experimentation as a way of life: Okapi at TREC . Information Processing & Management . 36 ( 1 ), 95 - 108 ( 2000 )

8. Amati , G. , & van Rijsbergen , C.J. : Probabilistic models of information retrieval based on measuring the divergence from randomness . ACM Transactions on Information Systems . 20 ( 4 ), 357 - 389 ( 2002 )

9. Akasereh , M. , Savoy , J.: Ad Hoc Retrieval with Marathi Language . Working notes, Forum for Information Retrieval Evaluation ( 2011 )

10. Buckley , C. , Singhal , A. , Mitra , M. , Salton , G.: New Retrieval Approaches Using SMART . Proceedings TREC-4 , 25 - 48 . NIST Publication # 500 - 236 , Gaithersburg, ( 1996 ).

11. Abdou , S. , Savoy , J.: Searching in Medline: Stemming, Query Expansion , and Manual Indexing Evaluation. Information Processing & Management. 44 , 781 - 789 , ( 2008 ).

12. Peat , H.J. , Willett , P. : The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems . JASIS. 42 , 378 - 383 , ( 1991 ).

13. Vogt , C.C. , & Cottrell , G.W. : Fusion via a linear combination of scores . IR Journal . 1 ( 3 ), 151 - 173 ( 1999 )

14. Savoy

: Data Fusion for Effective European Monolingual Information Retrieval . CLEF 2004. LNCS , vol. 3491 , pp. 233 - 244 . Springer, Heidelberg ( 2005 )

15. Dolamic , L. , Fautsch , C. , Savoy , J. UniNE at CLEF 2008: TEL, and Persian IR . CLEF 2008. LNCS , vol. 5706 , pp. 178 - 185 . Springer, Heidelberg ( 2009 )

16. Savoy , J. , Berger , P. : Selection and Merging Strategies for Multilingual Information . CLEF 2004. LNCS , vol. 3491 , pp. 27 - 37 . Springer, Heidelberg ( 2005 )

17. Dolamic , L. , Savoy

.: How effective is Google's translation service in search? Commun . ACM. 52 , 139 - 143 ( 2009 )