<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XRCE's Participation to CLEF 2007 Domain-specific Track</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Stephane Clinchant and Jean-Michel Renders Xerox Research Centre Europe</institution>
          ,
          <addr-line>6 ch. de Maupertuis, 38240 Meylan</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <fpage>334</fpage>
      <lpage>342</lpage>
      <abstract>
        <p>Our participation to CLEF07 (Domain-specific Track) was motivated this year by assessing several query translation and expansion strategies that we recently designed and developed. One line of research and development was to use our own Statistical Machine Translation system (called Matrax) and its intermediate outputs to perform query translation and disambiguation. Our idea was to benefit from Matrax' flexibility to output more than one plausible translations and to train its Language Model component on the CLEF07 target corpora. The second line of research consisted in designing algorithms to adapt an initial, general probabilistic dictionary to a particular pair (query, target corpus); this constitutes some extreme viewpoint on the “bilingual lexicon extraction and adaptation” topic that we are investigating since now more than 6 years. For this strategy, our main contributions lie in a pseudo-feedback algorithm and an EM-like optimisation algorithm that realize this adaptation. A third axis was to evaluate the potential impact of “Lexical Entailment” models in a cross-lingual framework, as they were only used in a monolingual setting up to now. Experimental results on CLEF-2007 corpora (domain-specific track) show that the dictionary adaptation mechanisms appear quite effective in the CLIR framework, exceeding in certain cases the performance of much more complex Machine Translation systems and even the performance of the monolingual baseline. In most cases also, Lexical Entailment models, used as query expansion mechanisms, turned out to be beneficial.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We can distinguish at least two families to perform query translation. The first one is to use
Machine Translation systems (such as Babylon, Systran, etc.); the second one is to rely on multilingual
dictionaries or lexicons. Machine Translations systems aims at translating a source sentence into
a target sentence. MT systems are built to produce well formed grammatical sentences. However,
most information retrieval models (or user’s queries) do not rely today on proper syntax: this is
the bag of words hypothesis. A query is a set of terms and no use is made about the order or the
syntax in the query, if it exists. One need not translate properly the query into a correct sentence,
a rough term-to-term translation can be sufficient to capture the concept of a query. Hence,
termto-term translations rely on bilingual dictionaries and cross-lingual information retrieval has been
concerned with the extraction of bilingual dictionaries on the one hand, and with algorithms to
obtain the best translation of a query from a dictionary on the other hand.</p>
      <p>
        The first and naive use of a dictionary, is to use all translations — possibly weighted — of
a query word. Albeit simple, this approach does not address the polysemy of words. A classical
example is the translation of the english word bank. Bank can refer either to a financial institution
or to the edge of a river. Choosing the right translation of a query term can be obvious with the
context of the complete query. If one was to translate the word bank in a query and also observe
the word account, then the translation is no longer ambiguous. Note though that the retrieval
process is a disambiguating process in itself, in that spurious translations are generally filtered
out simply by the fact that it is very unlikely that they co-occur with other translations. Several
approaches [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15 ref8 ref9">15, 12, 8, 13, 14, 9</xref>
        ] resolve the translation of query with the notion of coherence.
Each query term has candidate translation terms and a co-occurrence statistics can be computed
between all the candidate translation terms; then an optimisation algorithm is used to solve some
maximum coherence problem. The idea is that the query defines a lexical field. The more likely
a candidate belongs to the lexical field, the better it is for translation.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Cross-lingual Information Retrieval and Language Modelling</title>
      <p>We will first introduce the standard monolingual language modeling approach to information
retrieval. Then, we will present the classical extensions to cross-lingual information retrieval.</p>
      <p>
        The core idea of language models is to determine the probability P (q|d) — the probability that
the query would be generated from a particular document. Formally, given a query q, the language
model approach to IR [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] scores documents d by estimating P (q|d), the probability of the query
according to some language model of the document. Using some independence assumption, for a
query q = {q1, . . . q`}, we get:
We assume that for each document there exists some parameter θd, which is a probability
distribution over words — a language model. Abusively we note P (q|d) ≡ P (q|θd). Standard language
models in information retrieval are multinomial distributions : the language model of a document
is defined by its parameter vector θd, whose dimension is the size of the vocabulary. As this
multinomial parameter is normalized (the sum of its components sums up to one), another notation is
used : θdw = P (w|d).
      </p>
      <p>For each document d, a simple language model could be obtained by considering the frequency
of words in d, PML(w|d) ∝ #(w, d) (this is the Maximum Likelihood, or ML, estimator). The
probabilities are smoothed by the corpus language model PML(w|C) ∝ Pd #(w, d). The resulting
language model is:</p>
      <p>P (w|d) = λ PML(w|d) + (1 − λ) PML(w|C).</p>
      <p>The reasons of smoothing are twofold: first a word can be present in a query but absent in a
document. However this fact does not make it impossible and the document should give it a
probability. The second reason is to play a role like the Inverse Document Frequency. Smoothing
enables implicitly to renormalize the frequency of one word in a document with respect to its
occurrence in the corpus. Others smoothing methods could be applied (Dirichlet smoothing,
`
P (q|d) = Y P (qi|d).</p>
      <p>i=1
(1)
(2)
Absolute Discounting, ...) and can be found in [20]. The Query Likelihood approach above
gives an intuitive view of how language models works in information retrieval. Others equivalent
ranking functions can be considered and lead to the same ranking function as the Query Likelihood
formulation. For example the KL-divergence and the Cross-Entropy functions can also be used in
information retrieval. Let θq be a multinomial parameter for the language model of a query q, θd
the language model for a document d, the cross-entropy function between these two objects is:
CE(θq|θd) = X P (w|q) log(P (w|d)) = X θqw log(θdw)</p>
      <p>w w</p>
      <p>As far as cross-lingual IR is concerned, the core idea remains the same: modeling the probability
of the query given the document. Let qs be the query in some source language, ws a word in the
source language, dt a document in the target language, wt a word in the target language, P (wt|ws)
the probability that word ws is translated into wt. We can distinguish two methods:</p>
      <p>
        The first method, we will refer as CL LM1, translates the query into a query language model
in the target language [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Then a monolingual search is performed, using a ranking criterion
such as the Cross-Entropy:
      </p>
      <p>CE(qs|dt) =
=
=
∼</p>
      <p>X P (wt|qs) log P (wt|dt)
wt
X P (wt|ws, qs)P (ws|qs) log P (wt|dt)
wt,ws
X P (wt|ws)P (ws|qs) log P (wt|dt)
wt,ws
(3)
(4)
(5)
Both models are based on probabilistic dictionaries, but the first model uses a dictionary from
source language to target language, whereas the second model uses a dictionary from target to
source. In CL LM1, the translation process in independent of the document, whereas, in CL LM2,
one tries to model the probability that a particular document is translated and “distilled” into
the original query.</p>
      <p>In the following part of this report, we will adopt the viewpoint of model CL LM1 for two
reasons: first it is simpler to use because it just requires a monolingual retrieval system, unlike
CL LM2 which need a devoted cross-lingual system. The second reason is a benchmarking one :
we wanted to compare our results with Machine Translation tools, which operate in that direction
(translate the query from source to target) for obvious practical reasons.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dictionary Adaptation</title>
      <p>The main idea of dictionary adaptation is to be able to adapt the entries of a dictionary to a query
and a target corpus. Formally, let qs = (ws1, . . . , wsl) be the query in source language. Ideally,
we are looking for P (wt|qs), the probability of a target term given the source query. As we adopt
the CL LM1 model, this leads us to focus on P (wt|ws, q), which is the probability that source
term ws tranlates to wt, given the context of the query. Computing this probability would need</p>
      <p>
        The second model, we will refer as CL LM2[
        <xref ref-type="bibr" rid="ref11 ref2">2, 11</xref>
        ], models the translation from the document
side : a language model of the document is built in the source language and compared to the
query:
      </p>
      <p>CE(qs|dt) =</p>
      <p>X P (ws|qs) log P (ws|dt)
ws
=
∼</p>
      <p>X P (ws|qs) log(X P (ws|wt)P (wt|dt))
ws wt
to clearly define the context of a query, or its associated “concept”. The next question is how can
we find the context of the query in the target language? We argue that relevant documents in
target language contain such an information. In other words, the coherence is implicitly present in
relevant documents. Even if relevant documents are obviously not known in advance, they can be
found by active relevance feedback or pseudo-relevance feedback (PRF). Hence, our algorithm will
adapt the probabilities in the dictionary based on the set of (pseudo) relevant documents. Before
going into the details of this adaptation mechanism, let us first review monolingual PRF techniques
in the framework of Language Modelling-based retrieval. Their extension to the cross-lingual case
will provide us with the adaptation method.
3.1</p>
      <sec id="sec-3-1">
        <title>Monolingual PRF within the language modeling framework</title>
        <p>Traditional methods, such as Rocchio’s algorithm, extract terms from feedback documents and
add them to the query. The language modeling approach to information retrieval goes beyond
this approach: it extracts a probability distribution over words from the feedback documents. We
shall first present the general setting for pseudo-feedback with monolingual language models.
• Let C be a corpus, dk a document of the corpus.
• Let n the number of top documents selected after a first retrieval.
• F = (d1, . . . , dn) the feedback documents.
• Let θF , a multinomial parameter, standing for the distribution of relevant terms in F: in
other words θF is a probability distribution over words peaked on relevant terms.
Feedback methods have two aspects: first extracting relevant information (identification of θF )
and, secondly, enriching the query.
3.1.1</p>
        <sec id="sec-3-1-1">
          <title>Estimation of θF</title>
          <p>To estimate θF from feedback documents F, we present as an example the method of Zhai and
Lafferty [21]. They propose the following generative process for F:
• For i from 1 to n, draw document di following the distribution:</p>
          <p>– di ∼ M ultinomial(ldi , λθF + (1 − λ)p(.|C))
so that we have the following global likelihood:</p>
          <p>P (F|θ) = Y Y(λθF w + (1 − λ)P (w|C))c(w,dk)
k w
(6)
P (w|C) is word probability built upon the corpus, λ is a fixed parameter, which can be understood
as a noise parameter for the distribution of terms. c(w, dk) is the number of occurence of term w
in document dk. Finally θF is learned by optimising the data loglikelihood with an Expectation
Maximization algorithm.
3.1.2</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Updating the original query</title>
          <p>Now suppose the relevant language model θF has been estimated; how can we add the information
from the feedback to the query? Within the language model approach to IR, a query is represented
as a probability distribution over words (in practice a multinomial distribution which is estimated
from maximum likelihood). If θQ is the multinomial parameter for a query Q, then the
MLestimation of θQw is equal to the proportion of words w in the query Q. To come back with the
initial question of how to combine information from the initial query and feedback documents, a
simple method is to simply mix the parameters of their distributions:
θnew query = αθold query + (1 − α)θF
(7)</p>
          <p>In practice, we restrict θF to their top N words, by considering all other values of this vector
as null.</p>
          <p>We can note that more elaborated techniques exists in [18]. Setting the value of α is done
experimentally and adapted to collections. The robustness of the estimation of θF has a significant
impact on the value of α. Lastly, the value of α could be understood as a trade off between precision
and recall.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Extension to the Cross-lingual case: Dictionary Adaptation</title>
        <p>We generalize the monolingual mixture model for feedback to the case of CLIR: the input data are
an initial source query language model p(ws|qs) and a first dictionary p(wt|ws). The monolingual
mixture model can be interpreted as follows: for each term in a document, first choose between the
relevant topic model or the corpus language model. Then generate the frequency of the term from
the chosen mixture component. We extend this process, by choosing either a source query term
ws (instead of the relevant topic model), or the target corpus (C) language model, for each term
in a feedback document. If a query term ws has been chosen, then a target term wt is generated
with some unkown (ideal) probabilistic dictionary. Mathematically, this gives:
• For i from 1 to n, draw document di with:</p>
        <p>– di ∼ M ultinomial(ldi , λ Pws θsp(ws|qs) + (1 − λ)p(.|C))
where ldi is the length of document di.</p>
        <p>In this framework, θs can be interpreted as an adapted probability of translation : θst ≡
p(wt|ws, qs). But is can be interpreted too as a probability distribution (multinomial parameter)
over the vocabulary of target terms; it is like a language model, but associated to a specific word ws.
To understand the connections between the monolingual model and the bilingual model, we can
make an analogy of this form : θF ≡ Pws θsp(ws|qs). Note that the same algorithm realizes both
the query enrichment and the dictionary adaptation. Note also that the translation/adaptation is
limited to the words of the query (ws) if we adopt a simple maximum likelihood language model
for the query (what is assumed in the following). Lastly, but importantly, the role of the initial
(probabilistic), non-adapted dictionary relies in providing the algorithm with a good starting
candidate solution for θs.</p>
        <p>From this generative process, it remains to solve the problems of estimating the parameters
(θs)ws∈Q and of generating the new query language model (on the target side).
3.2.1</p>
        <sec id="sec-3-2-1">
          <title>Estimation of adapted translation probabilities</title>
          <p>We now proceed to the estimation of the parameters (θs)ws∈Q with maximum likelihood approach
using an EM-like algorithm. Recall that, as in the monolingual setting, λ is a fixed parameter and
p(ws|qs) is also known since it represents the distribution of words in a particular query.</p>
          <p>First, the model likelihood can be written in the equivalent form:</p>
          <p>P (F|θ) = Y Y ¡λ(X θstp(ws|qs)¢ + (1 − λ)P (wt|C)¢c(wt,dk)
(8)
k wt</p>
          <p>ws</p>
          <p>We can maximize the log-likelihood with an EM algorithm. Let twd the hidden random variable
whose value is 1 if word w in document d has been generated by p(.|C). Let rws be the indicator
for which query word has been chosen. Let θts = p(wt|ws, qs) be the unknown parameter of this
model.</p>
          <p>The E-step gives:
Then, rws is only defined for twd = 0:
p(twd = 1|F, θ(i)) =</p>
          <p>(1 − λ)p(wt|C)
λ(Pws θt(si)p(ws|qs)¢ + (1 − λ)P (wt|C)
p(twd = 0|F, θ(i)) = 1 − p(twd = 1|F, θ(i))
p(rws = k|F, θ(i), twd = 0) ∝ p(ws = k|qs)θt(si)
As usual, in the M-step, we try to optimize a lower bound of the expected log-likelihood :
Q(θ(i+1), θ(i)) =</p>
          <p>X c(w, d)³p(twd = 1|θ(i)) log((1 − λ)p(w|C))
d,w
+ p(twd = 0|θ(i)) X p(rws = k|θ(i)) log(p(ws = k|qs)θt(si+1))´</p>
          <p>ws
Differentiating w.r.t. θ(i+1) and adding Lagrange multiplier (for Pwt θts = 1) gives the M-step:
θt(si+1) ∝ X c(wt, d)p(twd = 0|F, θ(i))p(rws = k|F, θ(i), twd = 0)</p>
          <p>d</p>
          <p>As already mentioned, θ(0) is given by the corresponding part of an initial (probabilistic),
non-adapted dictionary.
3.2.2</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Query Update</title>
          <p>When the algorithm converges giving some optimal θ(adapted) parameters, a new query can be
generated by using all entries in the adapted dictionary ((θsadapted)ws∈Q), so no selection method,
nor threshold is required to compute the new query. To make the analogy with monolingual IR
we do not use a parameter like α or, in a sense, we use α = 1, since we only use the dictionary
learnt by feedback. The new query language model becomes:</p>
          <p>P (wt|qs) = X θsatdaptedP (ws|qs)</p>
          <p>ws
In others words, model CL LM1 with p(wt|ws) = θsatdapted is used to perform the retrieval.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Remarks</title>
        <p>The initial dictionary is used as the starting point for the EM algorithm. As a consequence, only
non zero entries are used in this algorithm. During the iterations of the EM, the dictionary weights
are adapted to fit the feedback documents and hence to choose the correct translations for a query.</p>
        <p>
          In the introduction to dictionary adaptation we argued that one should model the probability
P (wt|ws, q). In the model represented by equation 4, we made an independence assumptions which
discard the query q from this latter probability. However, the query q is implicitly present in the
feedback documents, which enables to learn translation probabilities from the context of the query.
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] propose a feedback method for CL LM2 relying also on dictionary adaptation. Our method
is an extension of the classical monolingual mixture model for feedback to the cross-lingual case,
which is also a natural feedback method for CL LM1. However, Hiemstra and al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] experiments
show that their model were unable to perform pseudo-relevance feedback, but was very good with
active relevance feedback.
(9)
(10)
(11)
(12)
(13)
(14)
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Lexical Entailment as Query Expansion Mechanism</title>
      <p>
        Lexical Entailment (LE) [
        <xref ref-type="bibr" rid="ref10 ref3 ref5">3, 10, 5</xref>
        ] models the probability that one term entails another, in a
monolingual framework. It can be understood as a probabilistic term similarity or as a unigram
language model associated to a word (rather than to a document or a query). Let u be a term in
the corpus, then lexical entailment models compute a probability distribution over terms v of the
corpus P (v|u). These probabilities can be used in information retrieval models to enrich queries
and/or documents and to give a similar effect than the use of a semantic thesaurus. However,
lexical entailment is purely automatic, by extracting statistical relationships from the considered
corpus. In practice, a sparse representation of P (v|u) is adopted, where we restrict v to be one of
the Nmax terms that are the closest from u using an Information Gain metric 1 .
      </p>
      <p>
        We refer to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for all technical and practical details of the method. Still one important thing
to be mentioned is that the LE models P (v|u) are used as if this was a cross-lingual framework (for
instance one of the CL LM1 or CL LM2 models), i.e. as if P (v|u) was a probabilitic translation
matrix. If q = (q1, ..., ql) and if CL LM2 is chosen, this gives using the CE criterion:
CE(q|d) = X P (qi|q) log(X P (qi|w)P (w|d))
      </p>
      <p>qi w</p>
      <p>P (qi|w) is the result of the Lexical Entailment model, and P (w|d) is given by equation 2. We
also used a slighty modified formula, introducing a background query-language smoothing P (qi|D).
Instead of eq. 15, the document score is now computed as:</p>
      <p>CE(q|d) = X P (qi|q) log(β X P (qi|w)P (w|d) + (1 − β)P (qi|D))</p>
      <p>qi w
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments on GIRT - 2004 to 2006</title>
      <p>
        We refer to the overview paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for the description of the task, the corpora and the available
resources (see aldo http://www.gesis.org/en/research/information technology/girt4.htm
for specific information).
      </p>
      <p>In order to do some preliminary tunings and validations, we used the domain-specific corpus
GIRT as available in 2006 from the CLEF Evaluation Forum, as well as the 75 queries and their
relevance assessments collected from the years 2004, 2005 and 2006. In the next section, we will
present the results on the test data, namely the new GIRT corpus (extended on the english side,
by additional documents coming from the CSA corpus) and the corresponding new queries. We
used Mean Average Precision (MAP) as retrieval performance measure.</p>
      <p>For the whole collection and the queries, we used our home-made lemmatiser and
wordsegmenter (decompounder) for German. Classical stopword removal was performed. We used
only the title and the description of the queries.</p>
      <p>As multilingual resources, we used on the one hand the English-German GIRT Thesaurus
(considered as domain-specific, but very narrow) and, on the other hand, a probabilistic one, called
ELRAC, that is a combination of a very standard one (ELRA) and a lexicon automatically extracted
from the parallel JRC-AC (Acquis Communautaire) Corpus (see URL: langtech.jrc.it/JRC-Acquis.html)
using the Giza++ word alignment algorithm.</p>
      <p>As already mentioned, one goal of the experiments was to compare the query translation
approach using dictionary adaptation with the use of our Statistical Machine Translation system
(MATRAX). The latter needs two kinds of corpus: a parallel corpus for the alignment models,
and a corpus in the target language to learn a “language model”. We fed MATRAX with the
JRC-AC (Acquis Communautaire) Corpus for the alignment models, and with out GIRT / CSA
corpora (in the target language) for the language models. In this way, we can expect to introduce
some bias or adaptation to our target corpus in the translation process, as the Language Model
component of Matrax will favour translation and disambiguation consistent with this corpus.</p>
      <p>
        1The Information Gain, aka Generalised (or average) Mutual Information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], is used for selecting features in
text categorisation [
        <xref ref-type="bibr" rid="ref18 ref7">19, 7</xref>
        ] or detecting collocations [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
(15)
(16)
Monolingual results enable to evaluate the performance of the cross-lingual results, being a
reference to compete with. Table 1 shows the results of the monolingual experiments. A Dirichlet prior
smoothing was used with a value of 200, and PRF was applied, using the TOP15 documents with
the mixture model algorithm described in 3.1. We observe a significative difference between the
behavior of the english corpus and the german one. English documents are sparser than german
ones, which explains the retrieval deficiency.
      </p>
      <p>Table 2 shows monolingual experiments using lexical entailment models. We used the top
20 entailed terms (Nmax = 20) , for each german term, and the top 10 terms for english term
(Nmax = 10) since the english corpus is sparser than the german one. We applied LE first on
the basic query (results are given in column 2). Then the mixture model algorithm for
pseudofeedback described in 3.1 is applied on the top15 documents, which provides a new query and once
again the lexical entailment model is applied. The lexical entailment model using pseudo-relevance
feedback will also be called PRF+Lexical Entailment, or Double Lexical Entailment (as actually
the top15 documents are retrieved using a first lexical entailment step). Performance of this model
is given in Column 3 of Table 1. One can see that lexical entailment models perform better than
the baseline monolingual models without feedback and that lexical entailment techniques provide
improvment comparable (and better than) to those obtain by pseudo-relevance feedback.
5.2
5.2.1</p>
      <sec id="sec-5-1">
        <title>Cross-lingual Experiments</title>
        <sec id="sec-5-1-1">
          <title>Baseline</title>
          <p>Then, we perform a dictionary adaptation with parameters λ = 0.5 (in equation 8) and the number
of feedback documents is set to 50 ( Table 3). The results show that, with dictionary adaptation,
we gain in performance for every dictionary and translation sense. We obtain a global improvement
ranging from 3% to 10% , and a relative improvement from 10% to 50% and an average gain of
6% for both directions and both dictionaries.</p>
          <p>The thesaurus used is the one provided by GIRT and already performs well since it is adapted
to the corpus of GIRT: there is less ambiguity in this dictionary than in the standard ELRAC
dictionary. Still, the method is able to gain in precision. The interesting fact is the improvement
obtained by the ELRAC dictionary after adaptation. ELRAC is a general dictionary not at all
adapted to social science corpus of GIRT. The initial performance of ELRAC is worst than using
the GIRT thesaurus. However, the dictionary adaptation improves a lot the query translation
process( 8% avg increase in map on both direction). This shows that a general dictionary with the
adequate adaptation mechanism can be used for a specialized corpus, without a huge loss compared
to a domain specific dictionary. Of course, domain specific dictionaries work better but they require
external resources, or comparable corpora to be extracted from, whereas general dictionaries are
always more easily available. Beyond the feature of giving a more accurate translation, a second
reason of these improvements is that dictionary often encodes some semantic enrichment. For
example the word area can be translated in french into region, or zone.</p>
          <p>Figure 1 shows the evolution of mean average precision with an increasing number of
pseudofeedback documents. This graph indicates that the algorithm seems to be very stable and robust
to a large set of feedback documents. One can also notice , that much of the gain can be obtained
using only the top 10 documents. We believe the stability is due to the initialization of algorithm
with the previous dictionary, which make only non zeros entries serves as training data.</p>
          <p>Figure 2 shows the influence of the λ parameter. This parameter can be interpreted as a noise
parameter in the feedback documents. Since, we restrain ourself to non-zero entries, a better
interpretation would be as a noise parameter in the dictionary. The conclusion we can draw from
this graph, is that modeling the noise is useless when only non-zero entries are used. So, the
algorithm should be used with λ = 1. This parameter could have more influence if we extend the
number of feedback documents to a larger value. Then, the data would be noisier. The results
around the influence of the number of top documents show that they are sufficient to disambiguate
the query. However, if we were to “smooth” zero entries of the dictionary (and then allow new
translation candidates that were not present in the initial dictionary), this noise parameter would
influence much more the performance. There are two problems acting at the same time : query
translation and query enrichment. Enriching the query amounts to smoothing non zeros entries
in the dictionary. We believe it is more important to solve the query translation problem first and
enrich the query later (possibly with another monolingual mechanism). Hence, the λ parameter
can be set to 1 without loss of performance.</p>
          <p>Table 4 shows the results of lexical entailment model after a first step of dictionary adaptation.
To sum up, the original query is first roughly translated with an initial dictionary, then a first
retrieval is done and the dictionary is adapted to the query: a new translation of the query is
obtained. The baseline model is the model CL LM1 using the new translated query. Instead of
using CL LM1, the others models rely on a Lexical Entailment model. As before, Simple Lex
Entailment names the model CL LM2 with the lexical entailment model based on the information
gain. PRF Lex Ent denotes the same model, but with a step a pseudo-feedback with the mixture
model introduced previously. Once again, the lexical entailment model outperforms the baseline.
One can argue that, both models CL LM1 and CL LM2 are alternatively use in the same retrieval
process. This comes from historical reasons : we first developped lexical entailment model a
few years ago, and dictionary adaptation model later on (for CLEF07). These two models were
combined afterwards. Theoretically, it could be interesting to develop a single model tackling the
multilinguality and the use of monolingual thesaurus in a single framework.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experimental results on GIRT 07</title>
      <p>
        We now proceed to our participation to the Domain Specific Task in CLEF 2007, on the GIRT
and CSA corpora. Once again, we refer to [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for a precise description of the task, the corpora,
and the available resources. We submitted monolingual runs as well as bilingual runs, restricted
to English and German. Our monolingual runs mainly rely on lexical entailment models. The
bilingual runs are issued from two techniques: either query translation with our home-developed
Statistical Machine Translation System called Matrax, or query translation through dictionary
adaptation.
      </p>
      <sec id="sec-6-1">
        <title>Parameters, Nomenclature and Monolingual Runs</title>
        <p>• PRF Lexical Entailment : this is the Double Lexical Entailment model explained
before, where a first lexical entailment model is used to provide the system with an initial
set of TOPn documents, from which a mixture model for pseudo-feedback is built, and a
second retrieval is performed based once again on the lexical entailment model applied to
the enriched query.
The bilingual retrieval model adopts the same nomenclature as in previous sections.</p>
        <p>As allready explained, all our bilingual runs follow the same schema “query translation”
followed by a monolingual search (most often with PRF or query expansion in the target language).
For the first step —query translation —, we used either our Statistical Machine Translation system
(MATRAX), either one (initial standard) dictionary adapted following the strategy described in
this paper. The monolingual search component obeys the same nomenclature as in the previous
section.</p>
        <p>In order to increase the recall of what can be obtained with MATRAX, we intentionally kept the
TOP5 most plausible translations given by MATRAX and concatenated them to obtain the new
query in the target language (this indeed significantly increased the performance of the retrieval).</p>
        <p>In order to perform lexicon adaptation, the choice of the initial dictionary is crucial to the task.
We used two initial dictionaries that were at our disposal: the first one, CsaGirt, has been extracted
from the concatenation of the GIRT and CSA thesauri. The second dictionary was ELRAC,
composed as described before. Hence, to benefit from both sources, the dictionaries were merged
hierarchically : an entry of the dictionary is added to the other one, if this entry is not already
present in the master dictionary. The dictionary named Hier-CsaGirtElrac (abbreviation: hcge)
is the dictionary obtained by giving priority to the dictionary CsaGirt and then adding any Elrac
entry not already present in CsaGirt. The dictionary named Hier-ElracCsaGirt (abbreviation:
hecg) is the dictionary obtained by giving priority to the dictionary Elrac and then adding the
dictionary CsaGirt.</p>
        <p>Table 8 shows the result of our bilingual runs with their mean average precision and the model
used for translation and retrieval. If no other query expansion (in the target language) is done
beyond the lexical entailment model, Matrax offers the best results (but recall that Matrax is
significantly harder and more time-consuming to train than our simple dictionary extraction and
adaptation). However, it seems that, once we want to adopt more complex PRF techniques after
translation, there is a substantial advantage to use our dictionary adaptation method that,
presumably, gives less noisy translations. Consequently, the best absolute performance are obtained
by combining (1) the hierarchical building of the inital dictionary (the order in the hierarchy
is dependent of the source and target languages, (2) adapting this initial dictionary with the
proposed algorithm and (3) performing a rather sophisticated (PRF+Lexical Entailment) query
expansion/enrichment in the target language. Note that, when English is the target language,
bilingual performances are even better than monolingual ones.</p>
        <p>Table 9 shows the results of some experiments that we performed after the submission to
CLEF, but using the CLEF 2007 queries and relevance assessments. The table intent is to better
understand the individual effect of the basic components of our official runs. We can observe
that the monolingual pseudo-relevance feedback algorithm improves a lot the results: for German,
it boosted the mean average precision form 0.30 to 0.44. We can also see that the dictionary
adaptation also works for queries of this year. Finally, there is still a deficiency when the target
corpus is the english corpus: we still believe this is due to the unbalanced nature of the documents
(german documents are longer in average and, consequently, more reliable, because they most
often contain the abstract field).
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion to GIRT Participation</title>
      <p>Our main goal this year was to validate two query translation and disambiguation strategies. The
first one relies on the use of our Statistical Machine Translation tool, especially taking benefit
from its flexibility to output more than one plausible translations and to train its Language Model
component on the CLEF07 target corpora. The second one relies on a pseudo-feedback adaptation
mechanism that performs simultaneously dictionary adaptation and query expansion.</p>
      <p>Experimental results on CLEF-2007 corpora (domain-specific track) show that the dictionary
adaptation mechanisms appear quite effective in the CLIR framework, exceeding in certain cases
the performance of much more complex Machine Translation systems and even the performance of
the monolingual baseline. The pseudo-feedback adaptation method turns out to be robust to the
number of feedback documents and relatively efficient since we do not need to extract co-occurence
statistics. It is also robust to the noise in feedback documents, contrary to several traditional
monolingual feedback methods that decreased their performances in our experiments. Lastly, it
enables to use general dictionaries in domain specific context with almost as good performance as
domain specific dictionaries.</p>
      <p>We believe that the concept of adaptation of lexicon has other applications in cross-lingual
information access tasks. For instance, if there is some underlying class or category system (built
in a supervised or unsupervised way), lexicons could be adapted to a particular category/cluster.
Moreover, the adaptation model could be useful to adapt a dictionary to a user profile: from
feedback sessions, one can learn an bilingual lexicon adapted to a particular user, which has
significant applications. Our further works will focus on such aspects.</p>
    </sec>
    <sec id="sec-8">
      <title>Aknowledgments</title>
      <p>This work was partly supported by the IST Programme of the European Community, under the
SMART project, FP6-IST-2005-033917. The authors also want to thank Francois Pacull for his
greatly appreciated help in applying the MATRAX tools in CLEF07 experiments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Baerisch</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stempfhuber</surname>
          </string-name>
          .
          <article-title>Domain-specific track clef 2006 : Overview of the results</article-title>
          .
          <source>In CLEF 2006: Proceedings of the Workshop of the Cross-Language Evaluation Forum</source>
          , Alicante, Spain,
          <source>September 20 - 22</source>
          ,
          <year>2006</year>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Berger</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Lafferty</surname>
          </string-name>
          .
          <article-title>Information retrieval as statistical translation</article-title>
          .
          <source>In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>222</fpage>
          -
          <lpage>229</lpage>
          . ACM,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goutte</surname>
          </string-name>
          , and E´ . Gaussier.
          <article-title>Lexical entailment for information retrieval</article-title>
          . In M. Lalmas, A. MacFarlane, S. M.
          <article-title>Ru¨ger, A</article-title>
          . Tombros,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Yavlinsky, editors,
          <source>ECIR</source>
          , volume
          <volume>3936</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>217</fpage>
          -
          <lpage>228</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Colin</surname>
          </string-name>
          .
          <article-title>Information et analyse des donn´ees</article-title>
          . Pub. Inst. Stat. Univ. Paris, XXXVII(3
          <issue>-4</issue>
          ):
          <fpage>43</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Dagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Glickman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          .
          <article-title>The pascal recognising textual entailment challenge</article-title>
          .
          <source>In PASCAL Challenges Workshop for Recognizing Textual Entailment</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dunning</surname>
          </string-name>
          .
          <article-title>Accurate methods for the statistics of surprise and coincidence</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):
          <fpage>61</fpage>
          -
          <lpage>74</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>G. Forman.</surname>
          </string-name>
          <article-title>An extensive empirical study of feature selection metrics for text classification</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>1289</fpage>
          -
          <lpage>1305</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Xun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Improving query translation for cross-language information retrieval using statistical models</article-title>
          .
          <source>In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          , New York, NY, USA,
          <year>2001</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Statistical query translation models for cross-language information retrieval</article-title>
          .
          <source>ACM Transactions on Asian Language Information Processing (TALIP)</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>323</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Glickman</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dagan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Koppel</surname>
          </string-name>
          .
          <article-title>A probabilistic classification approach for lexical textual entailment</article-title>
          .
          <source>In Twentieth National Conference on Artificial Intelligence (AAAI-05)</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pohlmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Westerveld</surname>
          </string-name>
          .
          <article-title>Translation resources, merging strategies, and relevance feedback for cross-language information retrieval</article-title>
          . In C. Peters, editor,
          <source>CLEF</source>
          , volume
          <volume>2069</volume>
          <source>of Lecture Notes in Computer Science</source>
          , pages
          <fpage>102</fpage>
          -
          <lpage>115</lpage>
          . Springer,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Simard</surname>
          </string-name>
          .
          <article-title>Embedding web-based statistical translation models in cross-language information retrieval</article-title>
          .
          <source>Comput. Linguist.</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <fpage>381</fpage>
          -
          <lpage>419</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Chai</surname>
          </string-name>
          .
          <article-title>A maximum coherence model for dictionary-based crosslanguage information retrieval</article-title>
          .
          <source>In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>536</fpage>
          -
          <lpage>543</lpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Dorr</surname>
          </string-name>
          .
          <article-title>Iterative translation disambiguation for cross-language information retrieval</article-title>
          .
          <source>In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>520</fpage>
          -
          <lpage>527</lpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Simard</surname>
          </string-name>
          .
          <article-title>Using statistical translation models for bilingual ir</article-title>
          .
          <source>In CLEF '01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>150</lpage>
          , London, UK,
          <year>2002</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.</given-names>
            <surname>Petras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baerisch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stempfhuber</surname>
          </string-name>
          .
          <article-title>The domain-specific track at clef 2007</article-title>
          .
          <source>In CLEF 2007: Proceedings of the Workshop of the Cross-Language Evaluation Forum</source>
          , Budapest, Hungary,
          <source>September 19 - 21</source>
          ,
          <year>2007</year>
          ., page forthcoming. Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ponte</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A language modelling approach to information retrieval</article-title>
          .
          <source>In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>275</fpage>
          -
          <lpage>281</lpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          .
          <article-title>A comparative study on feature selection in text categorization</article-title>
          .
          <source>In Proceedings of ICML-97, 14th International Conference on Machine Learning</source>
          , pages
          <fpage>412</fpage>
          -
          <lpage>420</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>