-

XRCE's Participation to CLEF 2007 Domain-specific Track

0 Stephane Clinchant and Jean-Michel Renders Xerox Research Centre Europe , 6 ch. de Maupertuis, 38240 Meylan , France

2006

334 342

Our participation to CLEF07 (Domain-specific Track) was motivated this year by assessing several query translation and expansion strategies that we recently designed and developed. One line of research and development was to use our own Statistical Machine Translation system (called Matrax) and its intermediate outputs to perform query translation and disambiguation. Our idea was to benefit from Matrax' flexibility to output more than one plausible translations and to train its Language Model component on the CLEF07 target corpora. The second line of research consisted in designing algorithms to adapt an initial, general probabilistic dictionary to a particular pair (query, target corpus); this constitutes some extreme viewpoint on the “bilingual lexicon extraction and adaptation” topic that we are investigating since now more than 6 years. For this strategy, our main contributions lie in a pseudo-feedback algorithm and an EM-like optimisation algorithm that realize this adaptation. A third axis was to evaluate the potential impact of “Lexical Entailment” models in a cross-lingual framework, as they were only used in a monolingual setting up to now. Experimental results on CLEF-2007 corpora (domain-specific track) show that the dictionary adaptation mechanisms appear quite effective in the CLIR framework, exceeding in certain cases the performance of much more complex Machine Translation systems and even the performance of the monolingual baseline. In most cases also, Lexical Entailment models, used as query expansion mechanisms, turned out to be beneficial.

We can distinguish at least two families to perform query translation. The first one is to use Machine Translation systems (such as Babylon, Systran, etc.); the second one is to rely on multilingual dictionaries or lexicons. Machine Translations systems aims at translating a source sentence into a target sentence. MT systems are built to produce well formed grammatical sentences. However, most information retrieval models (or user’s queries) do not rely today on proper syntax: this is the bag of words hypothesis. A query is a set of terms and no use is made about the order or the syntax in the query, if it exists. One need not translate properly the query into a correct sentence, a rough term-to-term translation can be sufficient to capture the concept of a query. Hence, termto-term translations rely on bilingual dictionaries and cross-lingual information retrieval has been concerned with the extraction of bilingual dictionaries on the one hand, and with algorithms to obtain the best translation of a query from a dictionary on the other hand.

The first and naive use of a dictionary, is to use all translations — possibly weighted — of a query word. Albeit simple, this approach does not address the polysemy of words. A classical example is the translation of the english word bank. Bank can refer either to a financial institution or to the edge of a river. Choosing the right translation of a query term can be obvious with the context of the complete query. If one was to translate the word bank in a query and also observe the word account, then the translation is no longer ambiguous. Note though that the retrieval process is a disambiguating process in itself, in that spurious translations are generally filtered out simply by the fact that it is very unlikely that they co-occur with other translations. Several approaches [ 15, 12, 8, 13, 14, 9 ] resolve the translation of query with the notion of coherence. Each query term has candidate translation terms and a co-occurrence statistics can be computed between all the candidate translation terms; then an optimisation algorithm is used to solve some maximum coherence problem. The idea is that the query defines a lexical field. The more likely a candidate belongs to the lexical field, the better it is for translation. 2

Cross-lingual Information Retrieval and Language Modelling

We will first introduce the standard monolingual language modeling approach to information retrieval. Then, we will present the classical extensions to cross-lingual information retrieval.

The core idea of language models is to determine the probability P (q|d) — the probability that the query would be generated from a particular document. Formally, given a query q, the language model approach to IR [ 17 ] scores documents d by estimating P (q|d), the probability of the query according to some language model of the document. Using some independence assumption, for a query q = {q1, . . . q`}, we get: We assume that for each document there exists some parameter θd, which is a probability distribution over words — a language model. Abusively we note P (q|d) ≡ P (q|θd). Standard language models in information retrieval are multinomial distributions : the language model of a document is defined by its parameter vector θd, whose dimension is the size of the vocabulary. As this multinomial parameter is normalized (the sum of its components sums up to one), another notation is used : θdw = P (w|d).

For each document d, a simple language model could be obtained by considering the frequency of words in d, PML(w|d) ∝ #(w, d) (this is the Maximum Likelihood, or ML, estimator). The probabilities are smoothed by the corpus language model PML(w|C) ∝ Pd #(w, d). The resulting language model is:

P (w|d) = λ PML(w|d) + (1 − λ) PML(w|C).

The reasons of smoothing are twofold: first a word can be present in a query but absent in a document. However this fact does not make it impossible and the document should give it a probability. The second reason is to play a role like the Inverse Document Frequency. Smoothing enables implicitly to renormalize the frequency of one word in a document with respect to its occurrence in the corpus. Others smoothing methods could be applied (Dirichlet smoothing, ` P (q|d) = Y P (qi|d).

i=1 (1) (2) Absolute Discounting, ...) and can be found in [20]. The Query Likelihood approach above gives an intuitive view of how language models works in information retrieval. Others equivalent ranking functions can be considered and lead to the same ranking function as the Query Likelihood formulation. For example the KL-divergence and the Cross-Entropy functions can also be used in information retrieval. Let θq be a multinomial parameter for the language model of a query q, θd the language model for a document d, the cross-entropy function between these two objects is: CE(θq|θd) = X P (w|q) log(P (w|d)) = X θqw log(θdw)

w w

As far as cross-lingual IR is concerned, the core idea remains the same: modeling the probability of the query given the document. Let qs be the query in some source language, ws a word in the source language, dt a document in the target language, wt a word in the target language, P (wt|ws) the probability that word ws is translated into wt. We can distinguish two methods:

The first method, we will refer as CL LM1, translates the query into a query language model in the target language [ 12 ]. Then a monolingual search is performed, using a ranking criterion such as the Cross-Entropy:

CE(qs|dt) = = = ∼

X P (wt|qs) log P (wt|dt) wt X P (wt|ws, qs)P (ws|qs) log P (wt|dt) wt,ws X P (wt|ws)P (ws|qs) log P (wt|dt) wt,ws (3) (4) (5) Both models are based on probabilistic dictionaries, but the first model uses a dictionary from source language to target language, whereas the second model uses a dictionary from target to source. In CL LM1, the translation process in independent of the document, whereas, in CL LM2, one tries to model the probability that a particular document is translated and “distilled” into the original query.

In the following part of this report, we will adopt the viewpoint of model CL LM1 for two reasons: first it is simpler to use because it just requires a monolingual retrieval system, unlike CL LM2 which need a devoted cross-lingual system. The second reason is a benchmarking one : we wanted to compare our results with Machine Translation tools, which operate in that direction (translate the query from source to target) for obvious practical reasons. 3

Dictionary Adaptation

The main idea of dictionary adaptation is to be able to adapt the entries of a dictionary to a query and a target corpus. Formally, let qs = (ws1, . . . , wsl) be the query in source language. Ideally, we are looking for P (wt|qs), the probability of a target term given the source query. As we adopt the CL LM1 model, this leads us to focus on P (wt|ws, q), which is the probability that source term ws tranlates to wt, given the context of the query. Computing this probability would need

The second model, we will refer as CL LM2[ 2, 11 ], models the translation from the document side : a language model of the document is built in the source language and compared to the query:

CE(qs|dt) =

X P (ws|qs) log P (ws|dt) ws = ∼

X P (ws|qs) log(X P (ws|wt)P (wt|dt)) ws wt to clearly define the context of a query, or its associated “concept”. The next question is how can we find the context of the query in the target language? We argue that relevant documents in target language contain such an information. In other words, the coherence is implicitly present in relevant documents. Even if relevant documents are obviously not known in advance, they can be found by active relevance feedback or pseudo-relevance feedback (PRF). Hence, our algorithm will adapt the probabilities in the dictionary based on the set of (pseudo) relevant documents. Before going into the details of this adaptation mechanism, let us first review monolingual PRF techniques in the framework of Language Modelling-based retrieval. Their extension to the cross-lingual case will provide us with the adaptation method. 3.1

Monolingual PRF within the language modeling framework

Traditional methods, such as Rocchio’s algorithm, extract terms from feedback documents and add them to the query. The language modeling approach to information retrieval goes beyond this approach: it extracts a probability distribution over words from the feedback documents. We shall first present the general setting for pseudo-feedback with monolingual language models. • Let C be a corpus, dk a document of the corpus. • Let n the number of top documents selected after a first retrieval. • F = (d1, . . . , dn) the feedback documents. • Let θF , a multinomial parameter, standing for the distribution of relevant terms in F: in other words θF is a probability distribution over words peaked on relevant terms. Feedback methods have two aspects: first extracting relevant information (identification of θF ) and, secondly, enriching the query. 3.1.1

Estimation of θF

To estimate θF from feedback documents F, we present as an example the method of Zhai and Lafferty [21]. They propose the following generative process for F: • For i from 1 to n, draw document di following the distribution:

– di ∼ M ultinomial(ldi , λθF + (1 − λ)p(.|C)) so that we have the following global likelihood:

P (F|θ) = Y Y(λθF w + (1 − λ)P (w|C))c(w,dk) k w (6) P (w|C) is word probability built upon the corpus, λ is a fixed parameter, which can be understood as a noise parameter for the distribution of terms. c(w, dk) is the number of occurence of term w in document dk. Finally θF is learned by optimising the data loglikelihood with an Expectation Maximization algorithm. 3.1.2

Updating the original query

Now suppose the relevant language model θF has been estimated; how can we add the information from the feedback to the query? Within the language model approach to IR, a query is represented as a probability distribution over words (in practice a multinomial distribution which is estimated from maximum likelihood). If θQ is the multinomial parameter for a query Q, then the MLestimation of θQw is equal to the proportion of words w in the query Q. To come back with the initial question of how to combine information from the initial query and feedback documents, a simple method is to simply mix the parameters of their distributions: θnew query = αθold query + (1 − α)θF (7)

In practice, we restrict θF to their top N words, by considering all other values of this vector as null.

We can note that more elaborated techniques exists in [18]. Setting the value of α is done experimentally and adapted to collections. The robustness of the estimation of θF has a significant impact on the value of α. Lastly, the value of α could be understood as a trade off between precision and recall. 3.2

Extension to the Cross-lingual case: Dictionary Adaptation

We generalize the monolingual mixture model for feedback to the case of CLIR: the input data are an initial source query language model p(ws|qs) and a first dictionary p(wt|ws). The monolingual mixture model can be interpreted as follows: for each term in a document, first choose between the relevant topic model or the corpus language model. Then generate the frequency of the term from the chosen mixture component. We extend this process, by choosing either a source query term ws (instead of the relevant topic model), or the target corpus (C) language model, for each term in a feedback document. If a query term ws has been chosen, then a target term wt is generated with some unkown (ideal) probabilistic dictionary. Mathematically, this gives: • For i from 1 to n, draw document di with:

– di ∼ M ultinomial(ldi , λ Pws θsp(ws|qs) + (1 − λ)p(.|C)) where ldi is the length of document di.

In this framework, θs can be interpreted as an adapted probability of translation : θst ≡ p(wt|ws, qs). But is can be interpreted too as a probability distribution (multinomial parameter) over the vocabulary of target terms; it is like a language model, but associated to a specific word ws. To understand the connections between the monolingual model and the bilingual model, we can make an analogy of this form : θF ≡ Pws θsp(ws|qs). Note that the same algorithm realizes both the query enrichment and the dictionary adaptation. Note also that the translation/adaptation is limited to the words of the query (ws) if we adopt a simple maximum likelihood language model for the query (what is assumed in the following). Lastly, but importantly, the role of the initial (probabilistic), non-adapted dictionary relies in providing the algorithm with a good starting candidate solution for θs.

From this generative process, it remains to solve the problems of estimating the parameters (θs)ws∈Q and of generating the new query language model (on the target side). 3.2.1

Estimation of adapted translation probabilities

We now proceed to the estimation of the parameters (θs)ws∈Q with maximum likelihood approach using an EM-like algorithm. Recall that, as in the monolingual setting, λ is a fixed parameter and p(ws|qs) is also known since it represents the distribution of words in a particular query.

First, the model likelihood can be written in the equivalent form:

P (F|θ) = Y Y ¡λ(X θstp(ws|qs)¢ + (1 − λ)P (wt|C)¢c(wt,dk) (8) k wt

We can maximize the log-likelihood with an EM algorithm. Let twd the hidden random variable whose value is 1 if word w in document d has been generated by p(.|C). Let rws be the indicator for which query word has been chosen. Let θts = p(wt|ws, qs) be the unknown parameter of this model.

The E-step gives: Then, rws is only defined for twd = 0: p(twd = 1|F, θ(i)) =

ws Differentiating w.r.t. θ(i+1) and adding Lagrange multiplier (for Pwt θts = 1) gives the M-step: θt(si+1) ∝ X c(wt, d)p(twd = 0|F, θ(i))p(rws = k|F, θ(i), twd = 0)

As already mentioned, θ(0) is given by the corresponding part of an initial (probabilistic), non-adapted dictionary. 3.2.2

Query Update

When the algorithm converges giving some optimal θ(adapted) parameters, a new query can be generated by using all entries in the adapted dictionary ((θsadapted)ws∈Q), so no selection method, nor threshold is required to compute the new query. To make the analogy with monolingual IR we do not use a parameter like α or, in a sense, we use α = 1, since we only use the dictionary learnt by feedback. The new query language model becomes:

P (wt|qs) = X θsatdaptedP (ws|qs)

ws In others words, model CL LM1 with p(wt|ws) = θsatdapted is used to perform the retrieval. 3.3

Remarks

The initial dictionary is used as the starting point for the EM algorithm. As a consequence, only non zero entries are used in this algorithm. During the iterations of the EM, the dictionary weights are adapted to fit the feedback documents and hence to choose the correct translations for a query.

In the introduction to dictionary adaptation we argued that one should model the probability P (wt|ws, q). In the model represented by equation 4, we made an independence assumptions which discard the query q from this latter probability. However, the query q is implicitly present in the feedback documents, which enables to learn translation probabilities from the context of the query. [ 11 ] propose a feedback method for CL LM2 relying also on dictionary adaptation. Our method is an extension of the classical monolingual mixture model for feedback to the cross-lingual case, which is also a natural feedback method for CL LM1. However, Hiemstra and al. [ 11 ] experiments show that their model were unable to perform pseudo-relevance feedback, but was very good with active relevance feedback. (9) (10) (11) (12) (13) (14)

Lexical Entailment as Query Expansion Mechanism

Lexical Entailment (LE) [ 3, 10, 5 ] models the probability that one term entails another, in a monolingual framework. It can be understood as a probabilistic term similarity or as a unigram language model associated to a word (rather than to a document or a query). Let u be a term in the corpus, then lexical entailment models compute a probability distribution over terms v of the corpus P (v|u). These probabilities can be used in information retrieval models to enrich queries and/or documents and to give a similar effect than the use of a semantic thesaurus. However, lexical entailment is purely automatic, by extracting statistical relationships from the considered corpus. In practice, a sparse representation of P (v|u) is adopted, where we restrict v to be one of the Nmax terms that are the closest from u using an Information Gain metric 1 .

We refer to [ 3 ] for all technical and practical details of the method. Still one important thing to be mentioned is that the LE models P (v|u) are used as if this was a cross-lingual framework (for instance one of the CL LM1 or CL LM2 models), i.e. as if P (v|u) was a probabilitic translation matrix. If q = (q1, ..., ql) and if CL LM2 is chosen, this gives using the CE criterion: CE(q|d) = X P (qi|q) log(X P (qi|w)P (w|d))

qi w

P (qi|w) is the result of the Lexical Entailment model, and P (w|d) is given by equation 2. We also used a slighty modified formula, introducing a background query-language smoothing P (qi|D). Instead of eq. 15, the document score is now computed as:

CE(q|d) = X P (qi|q) log(β X P (qi|w)P (w|d) + (1 − β)P (qi|D))

qi w 5

Experiments on GIRT - 2004 to 2006

We refer to the overview paper [ 1 ] for the description of the task, the corpora and the available resources (see aldo http://www.gesis.org/en/research/information technology/girt4.htm for specific information).

In order to do some preliminary tunings and validations, we used the domain-specific corpus GIRT as available in 2006 from the CLEF Evaluation Forum, as well as the 75 queries and their relevance assessments collected from the years 2004, 2005 and 2006. In the next section, we will present the results on the test data, namely the new GIRT corpus (extended on the english side, by additional documents coming from the CSA corpus) and the corresponding new queries. We used Mean Average Precision (MAP) as retrieval performance measure.

For the whole collection and the queries, we used our home-made lemmatiser and wordsegmenter (decompounder) for German. Classical stopword removal was performed. We used only the title and the description of the queries.

As multilingual resources, we used on the one hand the English-German GIRT Thesaurus (considered as domain-specific, but very narrow) and, on the other hand, a probabilistic one, called ELRAC, that is a combination of a very standard one (ELRA) and a lexicon automatically extracted from the parallel JRC-AC (Acquis Communautaire) Corpus (see URL: langtech.jrc.it/JRC-Acquis.html) using the Giza++ word alignment algorithm.

As already mentioned, one goal of the experiments was to compare the query translation approach using dictionary adaptation with the use of our Statistical Machine Translation system (MATRAX). The latter needs two kinds of corpus: a parallel corpus for the alignment models, and a corpus in the target language to learn a “language model”. We fed MATRAX with the JRC-AC (Acquis Communautaire) Corpus for the alignment models, and with out GIRT / CSA corpora (in the target language) for the language models. In this way, we can expect to introduce some bias or adaptation to our target corpus in the translation process, as the Language Model component of Matrax will favour translation and disambiguation consistent with this corpus.

1The Information Gain, aka Generalised (or average) Mutual Information [ 4 ], is used for selecting features in text categorisation [ 19, 7 ] or detecting collocations [ 6 ]. (15) (16) Monolingual results enable to evaluate the performance of the cross-lingual results, being a reference to compete with. Table 1 shows the results of the monolingual experiments. A Dirichlet prior smoothing was used with a value of 200, and PRF was applied, using the TOP15 documents with the mixture model algorithm described in 3.1. We observe a significative difference between the behavior of the english corpus and the german one. English documents are sparser than german ones, which explains the retrieval deficiency.

Table 2 shows monolingual experiments using lexical entailment models. We used the top 20 entailed terms (Nmax = 20) , for each german term, and the top 10 terms for english term (Nmax = 10) since the english corpus is sparser than the german one. We applied LE first on the basic query (results are given in column 2). Then the mixture model algorithm for pseudofeedback described in 3.1 is applied on the top15 documents, which provides a new query and once again the lexical entailment model is applied. The lexical entailment model using pseudo-relevance feedback will also be called PRF+Lexical Entailment, or Double Lexical Entailment (as actually the top15 documents are retrieved using a first lexical entailment step). Performance of this model is given in Column 3 of Table 1. One can see that lexical entailment models perform better than the baseline monolingual models without feedback and that lexical entailment techniques provide improvment comparable (and better than) to those obtain by pseudo-relevance feedback. 5.2 5.2.1

Cross-lingual Experiments Baseline

Then, we perform a dictionary adaptation with parameters λ = 0.5 (in equation 8) and the number of feedback documents is set to 50 ( Table 3). The results show that, with dictionary adaptation, we gain in performance for every dictionary and translation sense. We obtain a global improvement ranging from 3% to 10% , and a relative improvement from 10% to 50% and an average gain of 6% for both directions and both dictionaries.

The thesaurus used is the one provided by GIRT and already performs well since it is adapted to the corpus of GIRT: there is less ambiguity in this dictionary than in the standard ELRAC dictionary. Still, the method is able to gain in precision. The interesting fact is the improvement obtained by the ELRAC dictionary after adaptation. ELRAC is a general dictionary not at all adapted to social science corpus of GIRT. The initial performance of ELRAC is worst than using the GIRT thesaurus. However, the dictionary adaptation improves a lot the query translation process( 8% avg increase in map on both direction). This shows that a general dictionary with the adequate adaptation mechanism can be used for a specialized corpus, without a huge loss compared to a domain specific dictionary. Of course, domain specific dictionaries work better but they require external resources, or comparable corpora to be extracted from, whereas general dictionaries are always more easily available. Beyond the feature of giving a more accurate translation, a second reason of these improvements is that dictionary often encodes some semantic enrichment. For example the word area can be translated in french into region, or zone.

Figure 1 shows the evolution of mean average precision with an increasing number of pseudofeedback documents. This graph indicates that the algorithm seems to be very stable and robust to a large set of feedback documents. One can also notice , that much of the gain can be obtained using only the top 10 documents. We believe the stability is due to the initialization of algorithm with the previous dictionary, which make only non zeros entries serves as training data.

Figure 2 shows the influence of the λ parameter. This parameter can be interpreted as a noise parameter in the feedback documents. Since, we restrain ourself to non-zero entries, a better interpretation would be as a noise parameter in the dictionary. The conclusion we can draw from this graph, is that modeling the noise is useless when only non-zero entries are used. So, the algorithm should be used with λ = 1. This parameter could have more influence if we extend the number of feedback documents to a larger value. Then, the data would be noisier. The results around the influence of the number of top documents show that they are sufficient to disambiguate the query. However, if we were to “smooth” zero entries of the dictionary (and then allow new translation candidates that were not present in the initial dictionary), this noise parameter would influence much more the performance. There are two problems acting at the same time : query translation and query enrichment. Enriching the query amounts to smoothing non zeros entries in the dictionary. We believe it is more important to solve the query translation problem first and enrich the query later (possibly with another monolingual mechanism). Hence, the λ parameter can be set to 1 without loss of performance.

Table 4 shows the results of lexical entailment model after a first step of dictionary adaptation. To sum up, the original query is first roughly translated with an initial dictionary, then a first retrieval is done and the dictionary is adapted to the query: a new translation of the query is obtained. The baseline model is the model CL LM1 using the new translated query. Instead of using CL LM1, the others models rely on a Lexical Entailment model. As before, Simple Lex Entailment names the model CL LM2 with the lexical entailment model based on the information gain. PRF Lex Ent denotes the same model, but with a step a pseudo-feedback with the mixture model introduced previously. Once again, the lexical entailment model outperforms the baseline. One can argue that, both models CL LM1 and CL LM2 are alternatively use in the same retrieval process. This comes from historical reasons : we first developped lexical entailment model a few years ago, and dictionary adaptation model later on (for CLEF07). These two models were combined afterwards. Theoretically, it could be interesting to develop a single model tackling the multilinguality and the use of monolingual thesaurus in a single framework.

Experimental results on GIRT 07

We now proceed to our participation to the Domain Specific Task in CLEF 2007, on the GIRT and CSA corpora. Once again, we refer to [ 16 ] for a precise description of the task, the corpora, and the available resources. We submitted monolingual runs as well as bilingual runs, restricted to English and German. Our monolingual runs mainly rely on lexical entailment models. The bilingual runs are issued from two techniques: either query translation with our home-developed Statistical Machine Translation System called Matrax, or query translation through dictionary adaptation.

Parameters, Nomenclature and Monolingual Runs

• PRF Lexical Entailment : this is the Double Lexical Entailment model explained before, where a first lexical entailment model is used to provide the system with an initial set of TOPn documents, from which a mixture model for pseudo-feedback is built, and a second retrieval is performed based once again on the lexical entailment model applied to the enriched query. The bilingual retrieval model adopts the same nomenclature as in previous sections.

As allready explained, all our bilingual runs follow the same schema “query translation” followed by a monolingual search (most often with PRF or query expansion in the target language). For the first step —query translation —, we used either our Statistical Machine Translation system (MATRAX), either one (initial standard) dictionary adapted following the strategy described in this paper. The monolingual search component obeys the same nomenclature as in the previous section.

In order to increase the recall of what can be obtained with MATRAX, we intentionally kept the TOP5 most plausible translations given by MATRAX and concatenated them to obtain the new query in the target language (this indeed significantly increased the performance of the retrieval).

In order to perform lexicon adaptation, the choice of the initial dictionary is crucial to the task. We used two initial dictionaries that were at our disposal: the first one, CsaGirt, has been extracted from the concatenation of the GIRT and CSA thesauri. The second dictionary was ELRAC, composed as described before. Hence, to benefit from both sources, the dictionaries were merged hierarchically : an entry of the dictionary is added to the other one, if this entry is not already present in the master dictionary. The dictionary named Hier-CsaGirtElrac (abbreviation: hcge) is the dictionary obtained by giving priority to the dictionary CsaGirt and then adding any Elrac entry not already present in CsaGirt. The dictionary named Hier-ElracCsaGirt (abbreviation: hecg) is the dictionary obtained by giving priority to the dictionary Elrac and then adding the dictionary CsaGirt.

Table 8 shows the result of our bilingual runs with their mean average precision and the model used for translation and retrieval. If no other query expansion (in the target language) is done beyond the lexical entailment model, Matrax offers the best results (but recall that Matrax is significantly harder and more time-consuming to train than our simple dictionary extraction and adaptation). However, it seems that, once we want to adopt more complex PRF techniques after translation, there is a substantial advantage to use our dictionary adaptation method that, presumably, gives less noisy translations. Consequently, the best absolute performance are obtained by combining (1) the hierarchical building of the inital dictionary (the order in the hierarchy is dependent of the source and target languages, (2) adapting this initial dictionary with the proposed algorithm and (3) performing a rather sophisticated (PRF+Lexical Entailment) query expansion/enrichment in the target language. Note that, when English is the target language, bilingual performances are even better than monolingual ones.

Table 9 shows the results of some experiments that we performed after the submission to CLEF, but using the CLEF 2007 queries and relevance assessments. The table intent is to better understand the individual effect of the basic components of our official runs. We can observe that the monolingual pseudo-relevance feedback algorithm improves a lot the results: for German, it boosted the mean average precision form 0.30 to 0.44. We can also see that the dictionary adaptation also works for queries of this year. Finally, there is still a deficiency when the target corpus is the english corpus: we still believe this is due to the unbalanced nature of the documents (german documents are longer in average and, consequently, more reliable, because they most often contain the abstract field). 7

Conclusion to GIRT Participation

Our main goal this year was to validate two query translation and disambiguation strategies. The first one relies on the use of our Statistical Machine Translation tool, especially taking benefit from its flexibility to output more than one plausible translations and to train its Language Model component on the CLEF07 target corpora. The second one relies on a pseudo-feedback adaptation mechanism that performs simultaneously dictionary adaptation and query expansion.

Experimental results on CLEF-2007 corpora (domain-specific track) show that the dictionary adaptation mechanisms appear quite effective in the CLIR framework, exceeding in certain cases the performance of much more complex Machine Translation systems and even the performance of the monolingual baseline. The pseudo-feedback adaptation method turns out to be robust to the number of feedback documents and relatively efficient since we do not need to extract co-occurence statistics. It is also robust to the noise in feedback documents, contrary to several traditional monolingual feedback methods that decreased their performances in our experiments. Lastly, it enables to use general dictionaries in domain specific context with almost as good performance as domain specific dictionaries.

We believe that the concept of adaptation of lexicon has other applications in cross-lingual information access tasks. For instance, if there is some underlying class or category system (built in a supervised or unsupervised way), lexicons could be adapted to a particular category/cluster. Moreover, the adaptation model could be useful to adapt a dictionary to a user profile: from feedback sessions, one can learn an bilingual lexicon adapted to a particular user, which has significant applications. Our further works will focus on such aspects.

Aknowledgments

This work was partly supported by the IST Programme of the European Community, under the SMART project, FP6-IST-2005-033917. The authors also want to thank Francois Pacull for his greatly appreciated help in applying the MATRAX tools in CLEF07 experiments.

[1]

Baerisch and

Stempfhuber . Domain-specific track clef 2006 : Overview of the results . In CLEF 2006: Proceedings of the Workshop of the Cross-Language Evaluation Forum , Alicante, Spain, September 20 - 22 , 2006 . Springer, 2006 .

[2]

A. L.

Berger and

J. D.

Lafferty . Information retrieval as statistical translation . In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 222 - 229 . ACM, 1999 .

[3]

Clinchant ,

Goutte , and E´ . Gaussier. Lexical entailment for information retrieval . In M. Lalmas, A. MacFarlane, S. M. Ru¨ger, A . Tombros,

Tsikrika , and A . Yavlinsky, editors, ECIR , volume 3936 of Lecture Notes in Computer Science, pages 217 - 228 . Springer, 2006 .

[4]

Colin . Information et analyse des donn´ees . Pub. Inst. Stat. Univ. Paris, XXXVII(3 -4 ): 43 - 60 , 1993 .

[5]

Dagan ,

Glickman , and

Magnini . The pascal recognising textual entailment challenge . In PASCAL Challenges Workshop for Recognizing Textual Entailment , 2005 .

[6]

Dunning . Accurate methods for the statistics of surprise and coincidence . Computational Linguistics , 19 ( 1 ): 61 - 74 , 1993 .

[7] G. Forman. An extensive empirical study of feature selection metrics for text classification . Journal of Machine Learning Research , 3 : 1289 - 1305 , 2003 .

[8]

Gao ,

J.-Y.

Nie ,

Xun ,

Zhang ,

Zhou , and

Huang . Improving query translation for cross-language information retrieval using statistical models . In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages 96 - 104 , New York, NY, USA, 2001 . ACM Press.

[9]

Gao ,

J.-Y.

Nie , and

Zhou . Statistical query translation models for cross-language information retrieval . ACM Transactions on Asian Language Information Processing (TALIP) , 5 ( 4 ): 323 - 359 , 2006 .

[10]

Glickman , I. Dagan , and

Koppel . A probabilistic classification approach for lexical textual entailment . In Twentieth National Conference on Artificial Intelligence (AAAI-05) , 2005 .

[11]

Hiemstra ,

Kraaij ,

Pohlmann , and

Westerveld . Translation resources, merging strategies, and relevance feedback for cross-language information retrieval . In C. Peters, editor, CLEF , volume 2069 of Lecture Notes in Computer Science , pages 102 - 115 . Springer, 2000 .

[12]

Kraaij ,

J.-Y.

Nie , and

Simard . Embedding web-based statistical translation models in cross-language information retrieval . Comput. Linguist. , 29 ( 3 ): 381 - 419 , 2003 .

[13]

Liu ,

Jin , and

J. Y.

Chai . A maximum coherence model for dictionary-based crosslanguage information retrieval . In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval , pages 536 - 543 , New York, NY, USA, 2005 . ACM Press.

[14]

Monz and

B. J.

Dorr . Iterative translation disambiguation for cross-language information retrieval . In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval , pages 520 - 527 , New York, NY, USA, 2005 . ACM Press.

[15]

J.-Y.

Nie and

Simard . Using statistical translation models for bilingual ir . In CLEF '01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems , pages 137 - 150 , London, UK, 2002 . Springer-Verlag.

[16]

Petras ,

Baerisch , and

Stempfhuber . The domain-specific track at clef 2007 . In CLEF 2007: Proceedings of the Workshop of the Cross-Language Evaluation Forum , Budapest, Hungary, September 19 - 21 , 2007 ., page forthcoming. Springer, 2007 .

[17]

Ponte and

Croft . A language modelling approach to information retrieval . In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 275 - 281 . ACM, 1998 .

[19]

Yang and

J. O.

Pedersen . A comparative study on feature selection in text categorization . In Proceedings of ICML-97, 14th International Conference on Machine Learning , pages 412 - 420 , 1997 .