<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How One Word Can Make all the Difference - Using Subject Metadata for Automatic Query Expansion and Reformulation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vivien Petras</string-name>
          <email>vivienp@sims.berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms Measurement</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Performance</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Experimentation</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Management and Systems, University of California</institution>
          ,
          <addr-line>Berkeley, CA 94720</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Query enhancement with domain-specific metadata (thesaurus terms) is analyzed for monolingual and bilingual retrieval on the GIRT social science collection. We describe our technique of Entry Vocabulary Modules, which associates query words with thesaurus terms and suggest its use for monolingual as well as bilingual retrieval. Different weighting and merging schemes for adding keywords to queries as well as translation techniques are described. Query enhancement generally improves average precision scores for both monolingual and bilingual retrieval. We take a closer look at individual queries and discuss how the query enhancements (or substitutions in bilingual retrieval) can change retrieval results quite dramatically. A query-by-query analysis provides deeper insight into strengths and weaknesses of strategies and serves as a cautionary reminder that average precision scores don't always tell the whole story.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Controlled vocabulary</kwd>
        <kwd>thesauri</kwd>
        <kwd>automatic query expansion</kwd>
        <kwd>entry vocabulary modules</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The technique of Entry Vocabulary Modules was designed to be just that: serving as an interface between
the query vocabulary of the searcher (natural language) and the controlled vocabulary entries of a database.
Given any search word or phrase, it will suggest controlled vocabulary terms that represent the concept of
the search. A searcher can use these terms to append to his or her query or to substitute his or her own query
terms with those controlled vocabulary terms in the hope of achieving a more precise and complete
retrieval.</p>
      <p>
        Query expansion has been researched in the information retrieval field for a long time [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However,
automatic query expansion has been mostly discussed in the context of blind feedback or highly evolved
expert systems [e.g. 2,3]. Thesauri are mainly used for manual or interactive query expansion (for an
overview, see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), but authors report mixed results [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5-8</xref>
        ] when comparing those techniques to free-text
search.
      </p>
      <p>For CLEF 2005, Berkeley’s group 2 experimented with Entry Vocabulary Modules (EVMs) to
automatically enhance queries with subject metadata terms or to replace query terms with them. The GIRT
collection (German Indexing and Retrieval Test database) contains titles, abstracts and thesaurus terms
providing an ideal test bed for monolingual and bilingual retrieval (German and English documents as well
as a bilingual thesaurus).</p>
      <p>The paper is organized as follows: first, we briefly introduce the GIRT collection and then explain Entry
Vocabulary Modules and the basics of our retrieval technique. Section Five explains the runs for German
and English Monolingual retrieval in detail. Section Six explains our translation techniques and how EVMs
can be used for query translation. Sections 6.2 and 6.3 compare different translation techniques and discuss
combinations for bilingual retrieval for English to German and German to English, respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>2 The GIRT Collection</title>
      <p>
        The GIRT collection (German Indexing and Retrieval Test database) consists of 151,319 documents
containing titles, abstracts and thesaurus terms in the social science domain. The GIRT thesaurus terms are
assigned from the Thesaurus for the Social Sciences [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and are provided in German, English and Russian.
Two parallel GIRT corpora in English and German each containing 151,319 records are made available. For
a detailed description of GIRT and its uses, see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>The English GIRT collection contains only 26,058 abstracts (ca. one out of six records) whereas the German
collection contains 145,941 - providing an abstract for almost all documents. Consequently, the German
collection contains more terms per record to search on. The English corpus has 1,535,445 controlled
vocabulary entries (7064 unique phrases) and the German corpus has 1,535,582 controlled vocabulary
entries (7154 unique phrases) assigned. On average, 10 controlled vocabulary terms / phrases are appended
to each document.</p>
      <p>Controlled vocabulary terms are not uniformly distributed. Most thesaurus terms occur less than a 100
times, but 307 occur more than 1,000 times and the most frequent one, “Bundesrepublik Deuschland”,
occurs 60,955 times.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Entry Vocabulary Modules</title>
      <p>Entry Vocabulary Modules are automatically created search aids that function as intermediaries between the
searcher's queries and the controlled vocabulary of a bibliographic database, in this case the GIRT
thesaurus. They are referred to as Entry Vocabulary Modules because they provide a mapping from the
“query vocabulary” of the searcher to the “entry vocabulary” of the database. A database’s entry vocabulary
consists of the subject metadata. It is this controlled vocabulary that provides an effective “entry” (access
point) to the database records.</p>
      <p>
        An Entry Vocabulary Module is in fact a dictionary of associations between terms in titles and abstracts in
documents and the controlled vocabulary terms associated with the document. If title/abstract words and
thesaurus terms co-occur with a higher than random frequency, there exists a likelihood that they are
associated. A likelihood ratio statistic is used to measure the association between any natural language term
and a controlled vocabulary term. Each pair is assigned an association weight (rank) representing the
strength of their association. The higher the rank, the more a thesaurus term represents the concept
represented by the document word. The methodology of constructing Entry Vocabulary Modules has been
described in detail in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>Once an Entry Vocabulary Module is constructed and a table of associations and their weights exist, we can
look up a word in the dictionary and find its most highly associated thesaurus term. This is how we find
thesaurus terms to associate with the GIRT queries. After experimenting with looking up query title and
description words, we found that query title words are sufficient to find relevant thesaurus terms. For all
CLEF 2005 experiments, only query title words (after stopword removal) were used for thesaurus term
look-up. If more than one word appears in the query title, we need to merge the results from the thesaurus
term look-ups to receive a list of terms for the query as a whole. We experimented with two merging
strategies discussed below.</p>
      <sec id="sec-3-1">
        <title>3.1 Absolute Rank Merging</title>
        <p>For absolute rank merging, an absolute rank for each thesaurus term is calculated by adding the association
weights if it is associated with several title words. The five thesaurus terms with the highest rank are then
added to the query. We will use the English GIRT query 132 to illustrate this:</p>
        <sec id="sec-3-1-1">
          <title>Title 132: Sexual Abuse of Children</title>
          <p>Sexual
3365.05 sexuality
1233.47 sexual abuse
936.22 sex offense</p>
          <p>sexual
650.17 harassment</p>
          <p>Abuse</p>
          <p>Children</p>
          <p>Absolute rank
1014.61 sexual abuse 19711.75 child
767.84 abuse
431.38 child
307.05 sex offense
2778.81 family
2605.75 parents</p>
          <p>parents-child
2344 relationship
20468.45 child
3640.36 sexuality
2836.85 family
2741.82 parents
471.52 homosexuality 275.07 maltreatment 2178.56 adolescent 2569.02 sexual abuse
This table shows a sample of the thesaurus terms associated with each individual title word and the absolute
rank order for thesaurus terms after adding the weights for each thesaurus term and ranking again. For child,
the association rank of the word “sexual” with the thesaurus term *child* is looked up (325.31 not shown in
table), then added to the association rank of the title word “abuse” with *child* (431.38) and then added to
the association for “children” (19711.75). The resulting 20468.45 is the absolute rank for the thesaurus term
*child* and makes it the top-ranking thesaurus term for this query.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Round Robin Merging</title>
        <p>The above example for absolute rank merging also shows the pitfall of this merging strategy: some
association pairs (like “children” - *child* in query 132) have such high weights that other important query
word – thesaurus term combinations will be ranked lower no matter what. To avoid this problem, we also
tested a round robin merging strategy: for each query word, we looked up the two highest ranked thesaurus
terms and added them to the query. The English GIRT query 138 will serve as an example:</p>
        <sec id="sec-3-2-1">
          <title>Title 138: Insolvent Companies</title>
          <p>Absolute rank merging Round robin merging
enterprise liquidity
firm indebtedness
medium-sized firm enterprise
small-scale business firm
flotation
The first two thesaurus terms in the round robin strategy are highly associated with “insolvent”, the second
two with “companies”. As one can see in the absolute rank strategy, the thesaurus terms for “companies”
seem to ‘overpower’ the ones for “insolvent”.</p>
          <p>Sometimes, this strategy is prone to errors as topic 143 proves. The words looked up in the EVM are
“smoking” and “giving”, which is misleading. The absolute rank strategy performs better in this case.</p>
          <p>Absolute rank merging Round robin merging
smoking donation
tobacco consumption social relations
tobacco smoking
behavior modification tobacco consumption
behavior therapy
For German with its compounds (“Unternehmensinsolvenzen” instead of “Insolvent Companies” for topic
138), the round robin strategy sometimes only adds two instead of five thesaurus terms to the query, the
ranking otherwise being equal to the absolute rank strategy.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Retrieval Technique</title>
      <sec id="sec-4-1">
        <title>4.1 Document Ranking</title>
        <p>
          In all its CLEF submissions, the Berkeley 2 group used a document ranking algorithm based on logistic
regression first used in the TREC-2 conference [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The logodds of relevance of document D to query Q is
given by
log O(R | D, Q) =
log P(R | D, Q)
log P(R | D, Q)
        </p>
        <p>= −3.51 + 37.4 * x1 + 0.33 * x2 − 0.1937 * x3 + 0.0929 * x4
where log P(R | D, Q) is the probability of relevance of document D with respect to query Q and
log P(R | D, Q) is the probability of non-relevance of document D with respect to query Q. The regression
variables are defined as follows:
x1 =
x2 =
1 n qtfi</p>
        <p>∑
n +1 i = 1 ql + 35
1 n</p>
        <p>∑ log
n +1 i = 1</p>
        <p>dtfi
dl + 80
(1)
(2)
x3 =
x4 = n
1 n</p>
        <p>∑ log
n +1 i = 1
ctfi
cl
(3)
(4)
where n is the number of terms common to both a document and a query, qtf i/ dtf i represent the frequency
of term i within the query and document respectively, ctf i is the frequency of term i in the collection, ql / dl
represent the number of terms in the query and document respectively and cl is the collection length, i.e. the
number of terms in the collection.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Collection and Query Processing</title>
        <p>
          For all runs, we used a stopword list to remove very common words from the English and German
collections and queries as well as an implementation of the Muscat stemmer for both English and German.
For German runs, we used a decompounding procedure developed and described by Aitao Chen [
          <xref ref-type="bibr" rid="ref14 ref15">14,15</xref>
          ],
which has been shown to improve retrieval results. The decompounding procedure looks up document and
query words in a base dictionary and splits compounds when found.
        </p>
        <p>
          As a general procedure, we also use Aitao Chen’s blind feedback algorithm [
          <xref ref-type="bibr" rid="ref14 ref15">14,15</xref>
          ] in every run. It selects
the top 30 ranked terms from the top 20 ranked documents from the initial search to merge with the original
query.
        </p>
        <p>query Æ stopword removal Æ (decompounding) Æ stemming Æ ranking Æ blind feedback
All query expansion and reformulation experiments described apply to the original query before submission
to those processing steps and remain the same otherwise.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Monolingual Retrieval</title>
      <p>For monolingual retrieval, we experimented with three query expansion strategies:
 adding five thesaurus terms retrieved with the EVM absolute rank merging from query title words;
 adding five thesaurus terms from the absolute rank merging strategy (using only query title words) but
removing all thesaurus terms from the dictionary that occurred more than a 1,000 times in the document
collection, thereby hoping to remove thesaurus terms that would not discriminate effectively;
 adding two thesaurus terms retrieved from the EVM for each query title word using the round robin
merging strategy.</p>
      <p>Last year, we experienced an improvement in precision when we weighted the expanded part of the query (the
thesaurus terms) half as much as the original query words. This is also true for our other expansion mechanism
(blind feedback), where new terms are added with half the weight as compared to the original query terms. For
every expansion strategy, we analyze one run where the thesaurus terms are downweighted and one where they
are treated as equally important part of the query.</p>
      <p>We also experimented with submitting only the title of the query to the retrieval system, assuming that the
shortness of the queries will simulate real user queries better than a title+description query. Since the EVMs
don’t need more information than the title words, we can also use the technique for these sparse queries.
For every run, we not only compared the overall average precision but also the precision scores on a
query-byquery basis. This shows more clearly where the strengths and weaknesses of the individual strategies are but also
reveals that sometimes just one word can influence precision scores dramatically.</p>
      <sec id="sec-5-1">
        <title>5.1 German</title>
        <sec id="sec-5-1-1">
          <title>5.1.1 Title + Description Runs</title>
          <p>As the following table 1 shows, query expansion always improves over the baseline run of title+description if the
expanded part is downweighted. If the thesaurus terms are not downweighted, only the round robin strategy
improves over the baseline run. However, this case is also the dominating strategy, not only improving the
baseline by 13% but also improving on the downweighted strategy and on the other merging strategies.
run
official run
average
precision</p>
          <p>TD baseline ABS HW</p>
          <p>ABS
BK2G
MLGG1
Comparing precision on a query-by-query basis, it becomes clear that downweighting clearly dominates for the
absolute rank strategies, whereas not downweighting equally dominates for the round robin strategy although the
average precision scores are much closer. In 18 of 25 queries, absolute rank merging with downweighting had a
better precision than the not downweighted absolute rank strategy, for the absolute rank –1000 strategy,
downweighting achieved a better result in 20 cases. For round robin, not downweighting turned out to be better
in 17 of 25 cases compared to downweighting.
Comparing all seven runs with each other shows that the best run (RR) dominates in 11 cases, the baseline run in
6 cases, ABS HW in 3 cases, RR HW in 3 cases and ABS –1000 HW in 2 cases, changing the ranking order
compared to average precision scores.</p>
          <p>However, it makes more sense to compare strategies pair wise to see which one is stronger. We will look at the
absolute rank and round robin strategies more closely to see how expanding a query by just a few words can
change the results. Although downweighting works better for absolute rank merging (16 queries better than
baseline) than not downweighting (9 queries better than baseline), we will use the not downweighted strategy to
control for the effects of the weighting schemes.</p>
          <p>125
130
135
140
145
150</p>
          <p>Graph 1. Comparing precision scores per query for German Monolingual Retrieval
Graph 1 shows that results can vary for each strategy and query, the most dramatic change being the
improvement from 0.2812 in the baseline to 0.6003 for ABS in query 139 (an improvement of 113%!). Even
more amazing, looking at individual queries shows how little it takes to improve or degrade.
Query 131 serves as example where the baseline is better than query expansion:
&lt;DE-title&gt; Zweisprachige Erziehung &lt;/DE-title&gt;
&lt;DE-desc&gt; Finde Dokumente, die die bilinguale Erziehung diskutieren. &lt;/DE-desc&gt;</p>
          <p>ABS RR
Erziehung Mehrsprachigkeit
Pädagogik interkulturelle Erziehung
Schule Erziehung
Bildung Pädagogik
interkulturelle Erziehung
The table shows the thesaurus terms that were appended to the query. Even though all of them seem relevant, the
double occurrence of the word “Erziehung” in the thesaurus terms might skew the results too much towards
documents dealing with education (Erziehung) alone and less with the bilingual aspect of it. Indeed, deleting the
word “Erziehung” from the thesaurus terms in the RR strategy raises the precision from 0.43 to 0.55 (+28%),
proving that sometimes one word can cause a huge improvement.</p>
          <p>Query 139 serves as example where the expansion strategy works much better than the baseline:
&lt;DE-title&gt; Gesundheitsökonomie &lt;/DE-title&gt;
&lt;DE-desc&gt; Finde Dokumente, die die Versorgung der Bevölkerung mit medizinischen und ärztlichen
Dienstleistungen aus ökonomischer Sicht diskutieren. &lt;/DE-desc&gt;
Finally, query 148 is an interesting case showing how query expansion can be both advantageous and
disadvantageous – depending on the terms expanded.
&lt;DE-title&gt; Russlanddeutsche und Sprache &lt;/DE-title&gt;
&lt;DE-desc&gt; Finde Dokumente, die die sprachliche Integrität von Russlanddeutschen der ehemaligen Sowjetunion
in Deutschland oder Russland diskutieren. &lt;/DE-desc&gt;</p>
          <p>ABS RR
Sprache Auswanderung
Sprachgebrauch Spätaussiedler
Linguistik Sprache
Fachsprache Sprachgebrauch</p>
          <p>Kommunikation
The absolute rank strategy adds thesaurus terms that are too general for the query, decreasing precision by 62%.
However, just adding the term “Spätaussiedler” from the round robin strategy improves precision by 44%.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>5.1.2 Title only Runs</title>
          <p>For title only runs we only experimented with the best strategy: round robin merging. As table 2 shows, queries
expanded with thesaurus terms clearly improve precision over the baseline run (19%). For title only runs,
downweighting thesaurus terms works better, improving the precision over the baseline by 30% and even more
so, slightly improving on the baseline of the title+description run!
run</p>
          <p>RR</p>
          <p>RR HW
average
precision 0.3643 0.4339 0.4748</p>
          <p>Table 2. Title only runs for German Monolingual retrieval
Comparing these runs on a query-by-query basis shows the dominance of the query expansion strategy even
clearer: in 18 of 25 cases, RR is better than the baseline, and in 22 out of 25 cases RR HW is better than the
baseline. RR HW is better than a title+description run in 14 cases.</p>
          <p>One more experiment gives food for thought: instead of submitting the original query text, we only submitted the
suggested EVM thesaurus terms from the round robin strategy, therefore reformulating the query instead of
expanding it. Although the precision compared to the baseline decreases to 0.3075, substituting the thesaurus
terms for the original query text works better in 12 of the 25 cases, showing that free-text does not dominate a
controlled vocabulary strategy.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 English</title>
        <sec id="sec-5-2-1">
          <title>5.2.1 Title + Description Runs</title>
          <p>As table 3 shows, query expansion with EVM suggested thesaurus terms is not as successful for English
monolingual retrieval. However, the trend remains the same as in German monolingual retrieval. The round
robin strategy without downweighting is still the dominating strategy, improving on the baseline by 6%. For the
absolute rank strategies, downweighting works better, although they don’t improve on the baseline.
run</p>
          <p>RR
average
precision 0.4531 0.4149 0.3462 0.4125 0.3092 0.4697 0.4818</p>
          <p>Table 3. Average precision scores for title + description English Monolingual Runs
The difference between downweighting or not is more pronounced when looking at the results on a
query-byquery basis: in 21 out of 25 cases downweighting is better for the absolute rank strategy and in 20 of 25 cases for
the absolute rank –1000 strategy. Not downweighting works better for round robin merging in 14 out of the 25
cases.</p>
          <p>Comparing all seven runs shows that the best run (RR) only dominates in 9 cases, the baseline in 5, RR HW in 4,
ABS HW in 3, ABS –1000 HW in 2 cases and ABS in 1 case demonstrating a weaker trend than in German
monolingual retrieval.</p>
          <p>Once again, graph 2 shows a comparison of precision scores for the baseline, the absolute rank and the round
robin strategy. The absolute strategy works better than the baseline in 8 cases, but round robin is clearly better in
16 cases.</p>
          <p>125
130
135
140
145
150</p>
          <p>Graph 2. Comparing precision scores per query for English Monolingual Retrieval
Looking at graph 2 reveals two things: First, the absolute strategy seems to make things much much worse in
some cases (131, 138, 141). This is because it adds thesaurus terms that are too general. But even the round robin
strategy doesn’t seem to improve precision as much as in German monolingual retrieval. Ironically, it seems that
the unique characteristics of the German language (compounds) help in suggesting thesaurus terms that are not
only more on the mark but are also compounds themselves retrieving more relevant documents. For example, the
thesaurus term *way of life* translates to *Lebensweise* in German. Whereas for English, the retrieval system
will look for documents containing “way” and “life” (very general!), the retrieval system will look for
“Lebensweise” in German, which is much more precise.</p>
          <p>However, it also cannot be overlooked that the English collection contains less text (fewer abstracts) than the
German collection to search on. It might be that the added thesaurus terms skew search results in that they take
away weight from the free-text search terms ranking documents containing the thesaurus terms (more likely)
higher than ones containing the free-text search terms. This would explain the greater improvement of the
downweighting strategies for absolute rank merging as compared to German (precision increases by 20% and
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
TD baseline
ABS
RR
33% for ABS and ABS –1000 in English, whereas only by 8% and 19% in German) and the smaller
improvement of not downweighting for round robin (2.5% in English vs. 4% in German).</p>
          <p>Nevertheless, one query can serve as an example that one word can make a difference in English also: just
adding the EVM suggested thesaurus term *morals* to query 142 (Advertising and Ethics) will improve
precision by 31%.</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.2 Title only Runs For title only runs, query expansion seems to improve on the baseline (+7%), although not as much as in German (19%). Downweighting again works better, improving the baseline by 14%. run</title>
          <p>RR</p>
          <p>RR HW
Looking at the results on a query-by-query basis shows the dominance of the expansion strategies a little better:
in 16 cases out of 25 RR dominates over the baseline, whereas RR HW is better in 18 cases. The best strategy for
title only runs can compete with the baseline title+description run, with similar average precision and a better
performance in 12 out of 25 cases.</p>
          <p>However, replacing the title words with EVM suggested thesaurus terms works less well than in German: in only
5 cases this strategy performs better, decreasing the overall average precision to 0.2983 (-25%).</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3 EVM Query Expansion vs. Blind Feedback</title>
        <p>Although it has been shown that query expansion with EVM suggested thesaurus terms will improve
monolingual retrieval in general, it might be of interest to compare this automatic technique of query expansion
to another one – blind feedback. We have used blind feedback with success in previous years and now use it in
all our retrieval experiments. Although EVM and blind feedback query expansion are quite different in nature –
EVM works from the query title text, blind feedback from the result set document text – they are used to enhance
the query to achieve better results. Table 5 gives a quick overview of runs using either strategy, both or none.
avg precision 0.4175 0.4517 0.4818
# of best
queries 15 10
Table 5. Comparing blind feedback and EVM query expansion with pair-wise comparison for the blind feedback
and EVM technique. The numbers represent the numbers of queries where this strategy achieved a higher
precision score than the other (e.g. for German, the EVM technique achieved a higher precision in 18 cases).
The combination of both techniques outperforms the baseline and the individual query expansion techniques. For
German monolingual retrieval, only EVM suggested terms improve over the baseline (in 16 out of 25 cases). For
English, however, EVM terms improve only slightly over the baseline (13 cases), whereas blind feedback
improves over the baseline (16 cases) and outperforms EVM expansion (better in 15 cases).
avg precision
# of best
queries</p>
        <sec id="sec-5-3-1">
          <title>Without query expansion 0.4622 blind feedback</title>
          <p>German</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>6.1 Translation Methods</title>
        <p>
          For bilingual retrieval, we experimented with query expansion and query reformulation using EVMs in addition
to query translation. Three translation techniques are compared:
1. Machine translation. We used a combination of the Systran translator (http://babelfish.altavista.com/)
and the L &amp; H Power Translator.
2. Thesaurus matching. Words and phrases from the query are looked up in the thesaurus with a
fuzzymatching algorithm and if a matching thesaurus term in the query language is found, the equivalent
thesaurus term in the target language is used. See [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for a more detailed description.
3. EVM. The query title words were submitted to the query language EVM and the round robin merging
technique was used to retrieve thesaurus terms. The thesaurus terms in the query language were then
replaced by the thesaurus terms in the target language. The query was then reformulated using only
thesaurus terms.
        </p>
        <p>Query 144 serves as example for the different output of the translation strategies.</p>
        <sec id="sec-5-4-1">
          <title>German query title: English query title: Machine translation: Thesaurus matching:</title>
          <p>For bilingual retrieval, we will first compare these translation techniques separately and then in combination. In
previous years, a combination of machine translation and thesaurus matching achieved the best results. For
machine translation and thesaurus matching, both title and description of the query were submitted, for EVM
only the suggested thesaurus terms were submitted.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>6.2. Translation</title>
        <p>This table demonstrates once again that although average precision scores might differ significantly, a
query-byquery analysis shows differently. Although thesaurus matching seems to perform worse in German-English
retrieval (-23%), machine translation is better in only little over half of the cases. And although machine
translation and thesaurus matching seem to perform equally well in English-German retrieval, thesaurus
matching performs better in 3/5th of the cases. The performance of the EVM suggested thesaurus terms
compared to machine translation is astonishing: an automatically associated list of controlled vocabulary terms
performs almost as well as the combined textual-based translations of two commercial machine translation
programs!</p>
      </sec>
      <sec id="sec-5-6">
        <title>6.3 Combining translation techniques</title>
        <p>Combining translation techniques means submitting the translated output from the different methods in one and
the same run. This increases the number of query words and the danger of introducing more non-discriminating
search terms as well as favoring easy to translate terms (they most likely to occur in all methods), but for CLEF,
this strategy has worked successfully in previous years. Combining translation methods helps with hard to
translate words (higher chance of one method getting it right) and reduces the risk of mis-translation.</p>
        <sec id="sec-5-6-1">
          <title>6.3.1 Official runs</title>
          <p>The official runs for bilingual retrieval used the absolute rank merging technique when EVM suggested
thesaurus terms were used. However, later experiments showed that round robin merging is also dominant for
bilingual retrieval and so we report results for round robin merging. For documentation purposes we briefly state
which official runs used which translation combinations below. Later sections will report combination runs with
EVM round robin merging in more detail.</p>
          <p>BK2GBLEG1 / GE1 machine translation + thesaurus matching
BK2GBLEG2 / GE2 machine translation + thesaurus matching + EVM absolute rank
BK2GBLEG3 / GE3 thesaurus matching + EVM absolute rank</p>
          <p>BK2GBLEG4 machine translation + thesaurus matching + EVM absolute rank (downweighted)</p>
        </sec>
        <sec id="sec-5-6-2">
          <title>6.3.2 German-English Bilingual Retrieval Table 7 compares combination runs for German-English retrieval.</title>
          <p>Machine Translation +
Thesaurus Matching</p>
          <p>Machine Translation +</p>
          <p>EVM thesaurus terms
avg.
precision
# of best
queries
avg.
precision
# of best
queries
As one can see, a combination of all three techniques is clearly the dominating strategy – it seems that adding
more words describing the same concept generally improves the precision instead of adding too many
nondiscriminating terms. It is also worth mentioning that all combination runs perform better than machine
translation alone, even if one combines thesaurus matching and EVM terms only. In fact, even though lower in
precision, this combination performs better in 13 out of 25 cases compared to both the machine translation –
thesaurus matching and the machine translation – EVM pairs; a worthy competitor to the commercial translation
solutions.</p>
        </sec>
        <sec id="sec-5-6-3">
          <title>6.3.3 English-German Bilingual Retrieval</title>
          <p>Machine Translation + Machine Translation +
Thesaurus Matching</p>
          <p>EVM thesaurus terms</p>
          <p>Thesaurus Matching +
EVM thesaurus terms</p>
          <p>Machine Translation +
Thesaurus Matching +</p>
          <p>EVM thesaurus terms
Thesaurus Matching +
EVM thesaurus terms
For English-German retrieval, all combination runs seem to perform similarly. However, once again, they clearly
outperform machine translation alone. Of course, not all combinations work equally well for each query and,
sometimes, one translation technique alone works much better. Query 136 serves as example:</p>
        </sec>
        <sec id="sec-5-6-4">
          <title>English query title: German query title: Machine translation: Thesaurus matching:</title>
          <p>Only the EVM round robin strategy manages to suggest the important word “Abfall” (waste) – whereas the other
strategies either mistranslate “waste” or select the wrong thesaurus term due to incorrect fuzzy matching. The
EVM words alone achieve a precision score of 0.6558, whereas the highest combination strategy achieves only
0.414 (thesaurus matching + EVM) – still better than the combination of machine translation and thesaurus
matching (0.136), which is still better than machine translation (0.0295) or thesaurus matching (0.0236) alone.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7 Conclusion</title>
      <p>Query expansion techniques have been a topic of research in the IR field for decades. Automatic query expansion
has been analyzed mostly in terms of blind feedback mechanisms based on a preliminary ranked list of
documents. Query expansion based on thesauri or other controlled vocabularies is mostly a topic for manual
query expansion or interactive modes of query expansion. This paper discusses an automatic query expansion
strategy using controlled vocabulary terms.</p>
      <p>Expanding a query with terms from a thesaurus is like asking an information expert to translate your search
strategy into the search language of the database, hopefully providing better search terms than the original search
statement. The information expert for this set of experiments is an association dictionary of thesaurus terms and
free-text words from titles and abstracts from the collection. Based on title words from the query, thesaurus terms
that are highly associated with those words are suggested. Two merging strategies have been tested: absolute
rank merging, based on all title words as a set and round robin merging, which suggests two thesaurus terms for
each individual query word.</p>
      <p>For monolingual retrieval, query expansion with EVM suggested thesaurus terms improves over the baseline of
title + description submission by 13% (German) and 6% (English), respectively. Downweighting the added terms
performs better for absolute rank but not for the round robin merging. For German, submitting only thesaurus
terms (replacing the original query) decreases the average precision over 25 cases, but achieves better precision
in 12 individual cases.</p>
      <p>Comparing EVM query expansion to blind feedback (terms are taken from ranked result set documents and
downweighted when added to query) shows that EVM query expansion improves over blind feedback in German
and similar in performance in English, and a combination of both dominates either strategy and the baseline.
For bilingual retrieval, using the thesaurus for translation works surprisingly well. Just using thesaurus terms for
the query submission works almost as well as machine translation. Although average precision decreases (9% for
English-German and 15% for German-English), EVM suggested thesaurus terms perform better in one third of
the queries. A combination of two thesaurus techniques (EVM and thesaurus matching) outperforms machine
translation. The combination of machine translation, thesaurus matching and EVM suggested terms outperforms
all other strategies.</p>
      <p>It has been shown that EVM suggested terms can provide the impact to raise precision for a query – if they are
high quality search terms. High quality search terms are those that provide discriminating search power (they
occur mostly in relevant documents), describe the information need exactly and, ideally, add new terms to the
query. Added terms that are too vague will almost always degrade the performance. One word is all it takes to
make the difference – now if we could only figure out which one!</p>
    </sec>
    <sec id="sec-7">
      <title>8 Acknowledgement</title>
      <p>Thanks to Aitao Chen for implementing and permitting the use of the logistic regression formula for probabilistic
information retrieval as well as German decompounding and blind feedback in his MULIR retrieval system.</p>
    </sec>
    <sec id="sec-8">
      <title>9 References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Efthimiadis</surname>
            ,
            <given-names>Efthimis N.</given-names>
          </string-name>
          <year>1996</year>
          .
          <article-title>Query Expansion</article-title>
          .
          <source>In Annual Review of Information Systems and Technology (ARIST)</source>
          ,
          <string-name>
            <surname>edited by</surname>
            <given-names>M. E.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
          </string-name>
          . Medford, NJ: Information Today.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gauch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <year>1993</year>
          .
          <article-title>An expert system for automatic query reformation</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>44</volume>
          (
          <issue>3</issue>
          ):
          <fpage>124</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Doszkocs</surname>
            ,
            <given-names>T.E.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>R.K.</given-names>
            <surname>Sass</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>An Associative Semantic Network for Machine-Aided Indexing, Classification and Searching</article-title>
          .
          <source>In Advances in Classification Research</source>
          , Vol.
          <volume>3</volume>
          .
          <source>Proceedings of the 3rd ASIS SIG/CR</source>
          Classification Research Workshop: Medford, NJ: Learned Information.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shiri</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Revie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Thesaurus-enhanced search interfaces</article-title>
          .
          <source>Journal of Information Science</source>
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <fpage>111</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Interactive thesaurus navigation: intelligence rules OK</article-title>
          ?
          <source>Journal of the American Society for Information Science</source>
          <volume>46</volume>
          (
          <issue>1</issue>
          ):
          <fpage>52</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sihvonen</surname>
            , Anne, and
            <given-names>Pertti</given-names>
          </string-name>
          <string-name>
            <surname>Vakkari</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Subject knowledge improves interactive query expansion assisted by a thesaurus</article-title>
          .
          <source>Journal of Documentation</source>
          <volume>60</volume>
          (
          <issue>6</issue>
          ):
          <fpage>673</fpage>
          -
          <lpage>690</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Joho</surname>
            , Hideo,
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Sanderson</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Beaulieu</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>A study of user interaction with a concept-based interactive query expansion support tool</article-title>
          .
          <source>In ECIR</source>
          <year>2004</year>
          ,
          <article-title>edited by S. McDonald</article-title>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tait</surname>
          </string-name>
          . Berlin Heidelberg: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Suomela</surname>
            , Sari, and
            <given-names>Jaana</given-names>
          </string-name>
          <string-name>
            <surname>Kekäläinen</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Ontology as a search-tool: a study of real users' query formulation with and without conceptual support</article-title>
          .
          <source>In ECIR</source>
          <year>2005</year>
          , edited by D. E. Losada and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Fernández-Luna</surname>
          </string-name>
          . Berlin Heidelberg: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Schott</surname>
          </string-name>
          , Hannelore.
          <year>2000</year>
          .
          <article-title>Thesaurus for the Social Sciences. 2 vols</article-title>
          . Vol.
          <volume>1</volume>
          . German - English, 2. English - German. Bonn:
          <string-name>
            <surname>Informations-Zentrum Socialwissenschaften</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kluck</surname>
          </string-name>
          , Michael.
          <year>2003</year>
          .
          <article-title>The GIRT Data in the Evaluation of CLIR Systems - from 1997 Until 2003</article-title>
          .
          <source>In Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2003</year>
          ,
          <article-title>edited by C. A</article-title>
          .
          <string-name>
            <surname>Peters</surname>
          </string-name>
          . Trondheim, Norway,
          <source>August 21-22</source>
          ,
          <year>2003</year>
          : Lecture Notes in Computer Science 3237,
          <year>Springer 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Plaunt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Norgard</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>An association-based method for automatic indexing with a controlled vocabulary</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>49</volume>
          (
          <issue>10</issue>
          ):
          <fpage>888</fpage>
          -
          <lpage>902</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gey</surname>
          </string-name>
          , Fred et al.
          <year>1999</year>
          .
          <article-title>Advanced Search Technology for Unfamiliar Metadata</article-title>
          .
          <source>Third IEEE Metadata Conference</source>
          ,
          <year>April 1999</year>
          . Bethesda, Maryland.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          , W Cooper, and
          <string-name>
            <given-names>F</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression</article-title>
          .
          <source>In The Second Text Retrieval Conference (TREC-2)</source>
          , edited by D. K. Harman.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , Aitao.
          <year>2003</year>
          .
          <article-title>Cross-Language Retrieval Experiments at CLEF</article-title>
          <year>2002</year>
          . 2785 ed, Lecture Notes in Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>F</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Multilingual Information Retrieval Using Machine Translation</article-title>
          ,
          <source>Relevance Feedback and Decompounding. Information Retrieval</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>149</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Petras</surname>
            , V.,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Perelman</surname>
            , and
            <given-names>F</given-names>
          </string-name>
          <string-name>
            <surname>Gey</surname>
          </string-name>
          .
          <year>2003</year>
          . UC Berkeley at CLEF 2003 -
          <article-title>- Russian Language Experiments and Domain-Specific Cross-Language Retrieval</article-title>
          .
          <source>In Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2003</year>
          . Trondheim, Norway,
          <source>August 21-22</source>
          ,
          <year>2003</year>
          : Lecture Notes in Computer Science 3237,
          <year>Springer 2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>