<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>European Ad Hoc Retrieval Experiments with Hummingbird SearchServerTM at CLEF 2005</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stephen Tomlinson</string-name>
          <email>stephen.tomlinson@hummingbird.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ottawa</institution>
          ,
          <addr-line>Ontario</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Hummingbird participated in the 4 monolingual information retrieval tasks (Bulgarian, French, Hungarian and Portuguese) of the Ad-Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2005. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant documents (with high precision) in a particular document set. We conducted diagnostic experiments with different techniques for matching word variations and handling stopwords. We found that the experimental stemmers significantly increased mean average precision for the 4 languages. Analysis of individual topics found that the algorithmic Bulgarian and Hungarian stemmers encountered some unanticipated stopword collisions. A comparison to an experimental 4-gram technique suggested that Hungarian stemming would further benefit from decompounding. A blind feedback technique which significantly increased mean average precision for some languages was also significantly detrimental to the rank of the first relevant retrieved for one language.</p>
      </abstract>
      <kwd-group>
        <kwd>Bulgarian Retrieval</kwd>
        <kwd>Hungarian Retrieval</kwd>
        <kwd>First Relevant Score</kwd>
        <kwd>Per-Topic Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Hummingbird Ottawa, Ontario, Canada stephen.tomlinson@hummingbird.com http://www.hummingbird.com/</title>
      <p>Hummingbird SearchServer1 is a toolkit for developing enterprise search and retrieval applications.
The SearchServer kernel is also embedded in other Hummingbird products for the enterprise.</p>
      <p>SearchServer works in Unicode internally [3] and supports most of the world’s major
character sets and languages. The major conferences in text retrieval experimentation (CLEF [2],
1SearchServerTM, SearchSQLTMand Intuitive SearchingTM are trademarks of Hummingbird Ltd. All other
copyrights, trademarks and tradenames are the property of their respective owners.</p>
      <p>Language</p>
    </sec>
    <sec id="sec-2">
      <title>Portuguese French Bulgarian Hungarian</title>
      <p>
        NTCIR [6] and TREC [
        <xref ref-type="bibr" rid="ref3">11</xref>
        ]) have provided judged test collections for objective experimentation
with SearchServer in more than a dozen languages.
      </p>
      <p>This (draft) paper describes experimental work with SearchServer for the task of finding
relevant documents for natural language queries in 4 European languages (Bulgarian, French,
Hungarian and Portuguese) using the CLEF 2005 Ad-Hoc Track test collections.
2
2.1</p>
      <sec id="sec-2-1">
        <title>Methodology</title>
        <sec id="sec-2-1-1">
          <title>Data</title>
          <p>The CLEF 2005 Ad-Hoc Track document sets consisted of tagged (SGML-formatted) news articles
in 4 different languages: Bulgarian, French, Hungarian and Portuguese. Table 1 gives the sizes.</p>
          <p>The CLEF organizers created 50 natural language “topics” (numbered 251-300) and translated
them into many languages. One topic was discarded for Bulgarian because it had no relevant
documents. Table 1 gives the final number of topics for each language and their average number
of relevant documents (along with the lowest, median and highest number of relevant documents
of any topic). For more information on the CLEF test collections, see the track overview paper.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Indexing</title>
          <p>
            Our indexing approach was the mostly the same as last year [
            <xref ref-type="bibr" rid="ref7">15</xref>
            ]. Accents were not indexed except
for the combining breve in Bulgarian. The apostrophe was treated as a word separator for the 4
investigated languages. The custom text reader, cTREC, was updated to maintain support for
the CLEF guidelines of only indexing specifically tagged fields.
          </p>
          <p>
            Some stop words were excluded from indexing (e.g. “the”, “by” and “of” in English). For these
experiments, the stop word list for Portuguese was based on the Porter list [7], and the lists for
Bulgarian and Hungarian were based on Savoy’s [
            <xref ref-type="bibr" rid="ref1">9</xref>
            ]. We used our own list for French.
          </p>
          <p>Unlike previous years, this year we added AL=“0-9” to the stopfiles to specify that the digits
0-9 were to be treated as alphabet characters (e.g. so that “G7” would be indexed as 1 term instead
of 2).</p>
          <p>By default, the SearchServer index supports both exact matching (after some Unicode-based
normalizations, such as decompositions and conversion to upper-case) and morphological matching
(e.g. inflections, derivations and compounds, depending on the linguistic component used).</p>
          <p>For many languages (including French and Portuguese), SearchServer provides the option of
finding inflections based on lexical stemming (i.e. stemming based on a dictionary or lexicon for
the language). For example, in English, “baby”, “babied”, “babies”, “baby’s” and “babying” all have
“baby” as a stem. Specifying an inflected search for any of these terms will match all of the others.
The lexical stemming of the post-6.0 experimental development version of SearchServer used for
the experiments in this paper was based on internal stemming component 3.7.0.15. We treat each
linguistic component as a black box in this paper.</p>
          <p>Lexical stemming in SearchServer typically does “inflectional” stemming which generally retains
the part of speech (e.g. a plural of a noun is typically stemmed to the singular form). It typically
does not do “derivational” stemming which would often change the part of speech or the meaning
more substantially (e.g. “performer” is not stemmed to “perform”).</p>
          <p>Lexical stemming in SearchServer includes compound-splitting (decompounding) for compound
words in particular languages (such as Dutch, Finnish, German and Swedish). For example, in
German, “babykost” (baby food) has “baby” and “kost” as stems.</p>
          <p>Lexical stemmers can produce more than one stem, even for non-compound words. For
example, in English, “axes” has both “axe” and “axis” as stems (different meanings), and in French,
“important” has both “important” (adjective) and “importer” (verb) as stems (different parts of
speech). SearchServer records all the stem mappings at index-time to support maximum recall
and does so in a way to allow searching to weight some inflections higher than others.
2.3</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Searching</title>
          <p>We experimented with the SearchServer CONTAINS predicate. Our test application specified
SearchSQL to perform a boolean-OR of the query words. For example, for Bulgarian topic 279
whose Title was “Референдуми в Швейцария” (Swiss referendums), a corresponding SearchSQL
query would be:
SELECT RELEVANCE(’2:3’) AS REL, DOCNO
FROM CLEF05BG
WHERE FT_TEXT CONTAINS ’Референдуми’|’в’|’Швейцария’
ORDER BY REL DESC;
(Note that “в” is a stopword for Bulgarian so its inclusion in the query wouldn’t actually add
any matches.)</p>
          <p>
            Most aspects of the SearchServer relevance value calculation are the same as described last
year [
            <xref ref-type="bibr" rid="ref7">15</xref>
            ]. Briefly, SearchServer dampens the term frequency and adjusts for document length in a
manner similar to Okapi [8] and dampens the inverse document frequency using an approximation
of the logarithm. These calculations are based on the stems of the terms (roughly speaking)
when doing morphological searching (i.e. when SET TERM_GENERATOR ‘word!ftelp/inflect’
was previously specified). The SearchServer RELEVANCE_METHOD setting was set to ‘2:3’
and RELEVANCE_DLEN_IMP was set to 750 for all experiments in this paper.
2.4
          </p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Diagnostic Runs</title>
          <p>
            For the diagnostic runs listed in Tables 2, the run names consist of a language code (“BG” for
Bulgarian, “FR” for French, “HU” for Hungarian and “PT” for Portuguese) followed by one of the
following labels:
² “lex”: (FR and PT only): The run used SearchServer lexical stemming. The /inflect option
(SET TERM_GENERATOR ‘word!ftelp/inflect’) was specified.
² “lexnos”: Same as “lex” except that /nostop was additionally specified which prevents query
terms from being discarded if all of their stems are stopwords (note that stopwords themselves
were still not found because they were not indexed).
² “lexall”: Same as “lex” except that a separate index was used which did not stop any words
from being indexed (specifying /nostop would make no difference with this index).
² “lexsing”: Same as “lex” except that /single was additionally specified (so that just one
stemming interpretation was used at search time).
² “neu” (BG and HU only): Same as “lex” except that the experimental Neuchatel stemmer
was used [
            <xref ref-type="bibr" rid="ref1">9</xref>
            ].
² “neunos”: Same as “lexnos” except that the Neuchatel stemmer was used.
² “neuall”: Same as “lexall” except that the Neuchatel stemmer was used.
          </p>
          <p>Run</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>BG-neuall</title>
      <p>BG-neunos
BG-4gram
BG-snru
BG-neu
BG-none
FR-sn
FR-lex
FR-lexnos
FR-lexall
FR-4gram
FR-lexsing
FR-none
HU-4gram
HU-neunos
HU-neuall
HU-neu
HU-neuposs
HU-none
PT-sn
PT-lexall
PT-lex
PT-lexnos
PT-lexsing
PT-none
PT-4gram
² “neuposs” (HU only): Same as “neu” except that the call to the remove_possessive function
was skipped. (Prof. Savoy suggested to us that it was unclear if removing possessive pronouns
was a good idea, which we interpreted as uncertainty about the remove_possessive function.)
² “sn” (FR and PT only): Same as “lex” except that the Porter (Snowball) stemmer [7] was
used.
² “snru” (BG only): Same as “neu” except that the Porter (Snowball) stemmer for Russian
was used.
² “4gram”: Same as “lexall” except that the run used a different index which primarily consisted
of the 4-grams of terms, e.g. the word ‘search’ would produce index terms of ‘sear’, ‘earc’
and ‘arch’. No stemming was done; searching used the IS_ABOUT predicate (instead of
the CONTAINS predicate) with morphological options disabled to search for the 4-grams of
the query terms.
² “none”: The run disabled morphological searching. (The run used the same index as “lex” for
FR and PT and the same index as “neu” for HU and BG, but SET TERM_GENERATOR
‘’ was specified so that variations from stemming were not matched.)</p>
      <p>Note that all diagnostic runs just used the Title field of the topic.
2.5</p>
      <sec id="sec-3-1">
        <title>Evaluation Measures</title>
        <p>Traditionally in ad hoc retrieval experiments, the primary evaluation measure is “average
precision”. For a topic, it is the average of the precision after each relevant document is retrieved (using
zero as the precision for relevant documents which are not retrieved). By convention, it is based on
the first 1000 retrieved documents for the topic. The score ranges from 0.0 (no relevants found) to
1.0 (all relevants found at the top of the list). Average precision takes into account both precision
and recall, and it is very good for detecting retrieval differences because even small differences in
the ranks of relevant documents affect the score. “Mean Average Precision” (MAP) is the mean of
the average precision scores over all of the topics (i.e. all topics are weighted equally).</p>
        <p>If one wishes to focus on just the first relevant document, the traditional measure is “Reciprocal
Rank” (RR). For a topic, it is 1r where r is the rank of the first row for which a desired page is
found, or zero if a desired page was not found. “Mean Reciprocal Rank” (MRR) is the mean of
the reciprocal ranks over all the topics.</p>
        <p>
          An experimental measure introduced in this paper (along with the companion web retrieval
paper [
          <xref ref-type="bibr" rid="ref4">12</xref>
          ]) is “First Relevant Score” (denoted “FRS”). Like reciprocal rank, it is based on just the
rank of the first relevant retrieved for a topic, but it is better suited to per-topic analysis. FRS is
1:081¡r where r is the rank of the first row for which a desired page is found, or zero if a desired
page was not found. Like reciprocal rank, finding the first relevant at rank 1 produces a score of
1.0. At rank 2, FRS is just 7 points lower (0.93), whereas RR is 50 points lower (0.50). At rank
3, FRS is another 7 points lower (0.86), whereas RR is 17 points lower (0.33). At rank 10, FRS
is 0.50, whereas RR is 0.10. FRS is greater than RR for ranks 2 to 52 and lower for ranks 53
and beyond. A possible interpretation of FRS is that it may be an indicator of the percentage of
potential result list reading the system saved the user to get to the first relevant, assuming that
users are less and less likely to continue reading as they get deeper into the result list.
        </p>
        <p>“Success@n” is the percentage of topics for which at least one relevant document was returned
in the first n rows. Like the other first relevant measures, this measure hides a lot of retrieval
differences (particularly in recall), but it is more intuitive and may be an indicator of a user’s
impression of a method’s robustness across topics. This paper lists Success@1, Success@5 and
Success@10.
2.6</p>
      </sec>
      <sec id="sec-3-2">
        <title>Statistical Significance Tables</title>
        <p>For tables comparing 2 diagnostic runs (such as Table 3), the columns are as follows:
² “Expt” specifies the experiment. The language code is given, followed by the labels of the
2 runs being compared. The difference is the first run minus the second run. For example,
“FR lex-none” specifies the difference of subtracting the scores of the French ‘none’ run from
the French ‘lex’ run (of Table 2).
² “¢MAP” is the difference of the mean average precision scores of the two runs being
compared (and “ ¢FRS” is the difference of the (mean) FRS scores).
² “95% Conf” is an approximate 95% confidence interval for the difference (calculated from
plus/minus twice the standard error of the mean difference). If zero is not in the interval,
the result is “statistically significant” (at the 5% level), i.e. the feature is unlikely to be of
neutral impact (on average), though if the average difference is small (e.g. &lt;0.020) it may
still be too minor to be considered “significant” in the magnitude sense.
² “vs.” is the number of topics on which the first run scored higher, lower and tied (respectively)
compared to the second run. These numbers should always add to the number of topics (49
for Bulgarian, 50 for the others).
² “3 Extreme Diffs (Topic)” lists 3 of the individual topic differences, each followed by the
topic number in brackets (the topic numbers range from 251 to 300). The first difference
is the largest one of any topic (based on the absolute value). The third difference is the
largest difference in the other direction (so the first and third differences give the range of
differences observed in this experiment). The middle difference is the largest of the remaining
differences (based on the absolute value).
3</p>
        <sec id="sec-3-2-1">
          <title>Results of Morphological Experiments</title>
          <p>In the per-topic analysis, the official topic translations were used as much as possible. Online
translation services were consulted at times ([5] was sometimes helpful for Hungarian, and we
found the Russian-to-English translations at [1] often worked for Bulgarian). Prof. Savoy also
assisted with some Bulgarian words. But any translation errors are the responsibility of the
author.
3.1</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Impact of Stemming</title>
        <p>² HU-279 (Sv´ajci n´epszavaza´sok): Without Hungarian stemming, no document contained both
of the query terms. No relevant document contained the query word ‘n´epszavaza´sok’. Only
some of the relevant documents even contained ‘Sv´ajci’ (and lots of non-relevants also did).
With stemming, average precision was 87 points higher from extra matches such as ‘sv´ajciak’,
‘Sv´ajc’, ‘Sv´ajcban’, ‘Sv´ajcot’, ‘Sv´ajcro´l’, ‘n´epszavaza´son’, ‘n´epszavaza´s’, ‘n´epszavaza´st’ and
‘n´epszavaza´ssal’.
² BG-279 (Референдуми в Швейцария): With Bulgarian stemming, average precision was 58
points higher from extra matches for ‘referendums’ such as референдум and референдума.
² FR-279 (R´ef´erendums en Suisse): This French topic scored lower with stemming (the rank
of the first relevant fell from 1 to 13, and average precision fell from 0.10 to 0.01). It
appears that the relevant documents were more likely to use the plural ‘R´ef´erendums’ than
the singular ‘R´ef´erendum’, and the latter was a more common word which generated lots of
matches when stemming.
3.2</p>
      </sec>
      <sec id="sec-3-4">
        <title>Impact of Experimental /nostop Option</title>
        <p>² HU-265 (A Deutsche Bank szerzem´enyei (Deutsche Bank Takeovers)): The query word
‘Bank’ stemmed to ‘ban’ (in) which was a stopword, so by default, the word ‘Bank’ was
not matched in the documents. With the /nostop option, ‘Bank’ was matched and average
precision was 13 points higher. (Incidentally, this issue is presumably why Table 3 shows
that stemming scored 12 points lower on HU-265; without stemming, ‘Bank’ was found in
the documents.) Perhaps this issue would not have arisen with a lexical stemmer which
would preserve the meaning more closely.
² HU-292 (N´emet v´arosok u´jja´´ep´ıt´ese (Rebuilding German Cities)): The query word ‘N´emet’
(German) stemmed to ‘nem’ (not) which was a stopword and so this useful word was dropped
from the query by default. With the /nostop option, average precision was 40 points higher.
² HU-282 (El´ıt´eltekkel szembeni durva ba´na´smo´d (Prison Abuse)): In this topic, the default
scored higher. Using /nostop changed the rank of the first relevant from 3 to 7. The
stopword list contained ‘szemben’ (in front of), and the query word ‘szembeni’ presumably
is a related noise word, and discarding it was useful. The /nostop option kept ‘szembeni’,
which only occurred in 319 documents, so it had a high enough weighting from inverse
document frequency to hurt precision.
² BG-273 (Разширяването на НАТО (NATO Expansion)): НАТО (NATO) stemmed to НА
(on) which was a stopword, so the default behaviour removed a key word from the query.</p>
        <p>
          With /nostop, the first relevant score was 80 points higher.
² BG-267 (Най-добрите чуждоезикови филми (Best Foreign Language Films)): The query
word филми (films) stemmed to филм (film) which surprisingly was a stopword, so the
default behaviour discarded a key query term. Our supplier [
          <xref ref-type="bibr" rid="ref1">9</xref>
          ] has confirmed that this was
an error in the Bulgarian stopword list.
² BG-257 (Етническото прочистване на Балканите (Ethnic Cleansing in the Balkans)): The
query word Балканите (Balkans) stemmed to балкан (Balkan mountain) which surprisingly
was a stopword. Even though it turned out that precision was a little higher without the
Balkans term in this case, in general this appears to be another error in the stopword list.
        </p>
        <p>In the topics we examined, in 3 cases the default behaviour of dropping useful terms may have
been from the stemmers for Bulgarian and Hungarian being algorithmic instead of lexical (a lexical
stemmer typically does not change the meaning of a word, except when words are ambiguous). It
appears for algorithmic stemmers it may be better to use the /nostop option by default.</p>
        <p>In another 2 cases, it appears the stoplist was in error, which illustrates the usefulness of the
CLEF judged test collections: they enable an analyst who does not understand a language to find
issues in a resource for the language and make inferences about its quality.
3.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Impact of Indexing All Words</title>
        <p>² HU-292 (N´emet va´rosok u´jj´a´ep´ıt´ese (Rebuilding German Cities)): We saw earlier that this
topic benefitted from the /nostop option (average precision up 40 points), but when
indexing all words, average precision fell back (33 points). The reason was that the common
word ‘nem’ (not) was now indexed, so ‘N´emet’ (German), which stems to ‘nem’ with the
algorithmic stemmer, had a much lower inverse document frequency than before, and this
useful word received less weight. (Even if it had received more weight, there would have
been potential confusion with all the indexed occurrences of ‘nem’.)
² BG-271 (Бракове между хомосексуални (Gay Marriages)): The stopword между (between)
was not in the 2 relevant documents. When it was indexed, its inclusion caused some
nonrelevants to be preferred, and average precision dropped 55 points.
² BG-295 (Пране на пари (Money Laundering)): This topic scored higher when indexing all
words. Surprisingly, the word пари (money) was a stopword, presumably another error (the
Bulgarian stoplist apparently needs a review). It seems fine that на (on) was a stopword.</p>
        <p>In practice, indexing all words may not be so troublesome because it is typically easy for users
to omit noise words from the query, and stemming issues can be worked around by disabling the
finding of word variants (SearchServer makes it optional at search-time).
3.4</p>
      </sec>
      <sec id="sec-3-6">
        <title>Comparison to 4-grams</title>
        <p>
          Compound words appear to be fairly common in Hungarian, but the algorithimic stemmer did not
perform decompounding, a technique we have found to be useful for languages such as Finnish [
          <xref ref-type="bibr" rid="ref7">15</xref>
          ].
However, [4] has found that using 4-grams as index terms works well in ad hoc ranking experiments
for many European languages, including compound-word languages. Table 6 compares our 4-gram
runs to the stemming runs which indexed all words (because we did not use stopwords with our
4gram index). As anticipated, there was a statistically significant increase in mean average precision
for Hungarian, though there was a decrease for Portuguese which was also statistically significant.
We look at the largest per-topic differences for Hungarian:
² HU-255 (Internetfu¨ggo˝k (Internet Junkies)): Average precision was 46 points higher with
4-grams for this topic (a compound word). The stemmer found the 3 relevant documents
which contained ‘internetfu¨ggo˝’ or the original query word ‘internetfu¨gg˝ok’. 4-grams matched
other variants such as ‘Internetfu¨ggo˝s´eg’ (Internet dependence), ‘internetfu¨gg˝os´eggel’ and
‘internetfu¨ggo˝s´egben’ and found all 6 relevant documents. 4-grams also matched other
potentially helpful words such as ‘internet’, ‘internetezo˝k’, ‘internetez´es’, ‘komputerfu¨ggo˝s´eget’
and ‘fu¨ggo˝v´e’. But 4-grams also produced unwanted matches, such as ‘intervallum’
(interval) and ‘Szinte’ (as good as); these both came from the 4-gram ‘inte’. If the stemmer had
just additionally matched ‘Internetfu¨ggo˝s´eg’, all 6 relevants would have found, but we’re still
investigating if the -seg suffix is one that a Hungarian stemmer should generally remove or
not.
² HU-292 (N´emet v´arosok u´jj´a´ep´ıt´ese (Rebuilding German Cities)): On this topic, 4-grams
still just found 1 of the 2 relevant documents, but it moved it from rank 3 to 1 (compared to
the stemming run). While 4-grams additionally matched ‘u´jja´´ep´ıtik’, the bigger advantage
was probably that the 4-gram method did not match ‘nem’ which we know from earlier was
a troublesome match for the stemming run.
² HU-283 (James Bond-filmek (James Bond Films)): On this topic, the 4-gram run scored 30
points lower in average precision than the stemming run. The 4-gram run favored documents
with the ‘filmek’ pattern (which corresponded to three 4-grams (‘film’, ‘ilme’ and ‘lmek’)
and so it received roughly 3 times the weight compared to the stemming run). However,
the relevant documents tended not to use ‘filmek’; instead they tended to use other variants
matched by the stemmer such as ‘film’, ‘filmet’, ‘filmn´el’, ‘filmben’ and ‘filmhez’.
² HU-286 (Futballs´eru¨l´esek (Football Injuries)): This topic had no matches in the stemming
run, but a relevant document was ranked first in the 4-gram run. 4-gram matches in the
relevant documents included ‘futballista’, ‘futballkapus’ (goalkeeper), ‘futballv´alogatott’,
‘v´alls´eru¨l´est’, ‘v´alls´eru¨l´essel’, ‘v´alls´eru¨l´es’, ‘s´eru¨l´es’ (injury), ‘s´eru¨lt’ and ‘s´eru¨ltet’. This
might be a case for which decompounding would be helpful.
² HU-261 (J¨ovend˝omonda´s (Fortune-telling)): The stemming run only matched the one
document which contained ‘j¨ovendo˝monda´st’ and ‘j¨ovendo˝monda´s’ and it was judged
nonrelevant, so it scored 0 on this topic. The 4-gram returned 1 of the 3 relevant documents at
rank 2 (the others weren’t ranked in the top 100). Matches in the relevant document included
‘j¨ovendo¨l˝ok’ and ‘j¨ovendo˝mondo´k’. The latter of these perhaps could have been matched with
additional stemming rules, but the former would require a stemmer to do decompounding
(or, if the user had decompounded the query, the latter would require index-time
decompounding to match).
        </p>
        <p>SearchServer can find character sequences inside European words without n-gramming if the
user specifies wildcards, so for precise searches it’s unclear if n-gram indexes would add value.
N-gram approaches typically produce larger indexes and its queries can be slower for common
word-searching cases. We’re not aware of them being used in practice for European language
retrieval, except perhaps by web search engines for url indexing.
3.5</p>
      </sec>
      <sec id="sec-3-7">
        <title>Comparison to Alternate Stemmers</title>
        <p>
          Table 8 isolates the impact of using the SearchServer /single option. This option only makes
a difference for the SearchServer lexical stemmers which can produce more than one stem for a
term. Like last year [
          <xref ref-type="bibr" rid="ref7">15</xref>
          ], our method for including all stems without overweighting some of the
terms apparently was effective. Even in the high-variance first relevant score measure, the bigger
differences favored including all stems.
Table 11 isolates the impact of the blind feedback technique (based on using the first 2 returned
rows to expand the query). While mean average precision increased for all 4 languages (and the
increase was statistically significant for 3 of them), the first relevant score decreased for all 4
languages (and the decrease was statistically significant for the other 1 of them).
        </p>
        <p>The blind feedback technique presumably works best if relevant documents appear in the first
2 rows, in which case first relevant score cannot be improved. If the first 2 rows do not contain
relevant documents, then using those rows to expand the query may hurt the query and push
down the first relevant even further.</p>
        <p>This result may explain in part why blind feedback techniques are not known to be used
in practice even though they have been popular with experimenters for several years in ad hoc
evaluations (which typically focus on mean average precision).
[1] AltaVista’s Babel Fish Translation Service. http://babelfish.altavista.com/tr
[2] Cross-Language Evaluation Forum web site. http://www.clef-campaign.org/
[3] Andrew Hodgson. Converting the Fulcrum Search Engine to Unicode. Sixteenth International</p>
        <p>Unicode Conference, 2000.
[4] Paul McNamee and James Mayfield. JHU/APL Experiments in Tokenization and Non-Word</p>
        <p>Translation. Working Notes for the CLEF 2003 Workshop, 2003.
[5] MTA SZTAKI: English-Hungarian,</p>
        <p>http://dict.sztaki.hu/english-hungarian
[6] NTCIR (NII-NACSIS Test Collection
http://research.nii.ac.jp/»ntcadm/index-en.html</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Hungarian-English</title>
    </sec>
    <sec id="sec-5">
      <title>Online</title>
    </sec>
    <sec id="sec-6">
      <title>Dictionary. for IR</title>
    </sec>
    <sec id="sec-7">
      <title>Systems)</title>
    </sec>
    <sec id="sec-8">
      <title>Home</title>
    </sec>
    <sec id="sec-9">
      <title>Page.</title>
      <p>[7] M. F. Porter. Snowball: A language for stemming
http://snowball.tartarus.org/texts/introduction.html
algorithms.</p>
    </sec>
    <sec id="sec-10">
      <title>October 2001. [8] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu and M. Gatford. Okapi at</title>
      <p>TREC-3. Proceedings of TREC-3, 1995.
Multilingual information
retrieval resource</p>
    </sec>
    <sec id="sec-11">
      <title>Hummingbird</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jacques</given-names>
            <surname>Savoy</surname>
          </string-name>
          . CLEF and http://www.unine.ch/info/clef/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B</given-names>
            <surname>¨orkur Sigurbjo</surname>
          </string-name>
          <article-title>¨rnsson, Jaap Kamps</article-title>
          and Maarten de Rijke.
          <source>Overview of WebCLEF</source>
          <year>2005</year>
          . To appear
          <source>in Working Notes for the CLEF 2005 Workshop</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Text REtrieval Conference (TREC) Home</surname>
          </string-name>
          <article-title>Page</article-title>
          . http://trec.nist.gov/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Tomlinson</surname>
          </string-name>
          .
          <article-title>European Web Retrieval Experiments with Hummingbird SearchServerTM at CLEF 2005</article-title>
          . To appear
          <source>in Working Notes for the CLEF 2005 Workshop</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Tomlinson</surname>
          </string-name>
          .
          <article-title>Experiments in 8 European Languages with SearchServerTM at CLEF 2002</article-title>
          .
          <source>Proceedings of CLEF</source>
          <year>2002</year>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Tomlinson</surname>
          </string-name>
          .
          <article-title>Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003</article-title>
          .
          <source>Working Notes for the CLEF 2003 Workshop</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Tomlinson</surname>
          </string-name>
          . Finnish,
          <article-title>Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004</article-title>
          .
          <source>Working Notes for the CLEF 2004 Workshop</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>