=Paper=
{{Paper
|id=Vol-1166/CLEF2000wn-adhoc-MoulinierEt2000
|storemode=property
|title=West Group at CLEF2000: Non-English Monolingual Retrieval
|pdfUrl=https://ceur-ws.org/Vol-1166/CLEF2000wn-adhoc-MoulinierEt2000.pdf
|volume=Vol-1166
|dblpUrl=https://dblp.org/rec/conf/clef/MoulinierML00a
}}
==West Group at CLEF2000: Non-English Monolingual Retrieval==
<pdf width="1500px">https://ceur-ws.org/Vol-1166/CLEF2000wn-adhoc-MoulinierEt2000.pdf</pdf>
<pre>
        West Group at CLEF2000: Non-English Monolingual Retrieval

                          Isabelle Moulinier, J. Andrew McCulloh, Elizabeth Lund
                                                 West Group
                                            610 Opperman Drive
                                              Eagan, MN 55123
                                                    USA
                                    Isabelle.Moulinier@westgroup.com

        West Group participated in the non-English monolingual retrieval task for French and German. Our
        primary interest was to investigate whether retrieval of German or French documents was any
        different from the retrieval of English documents. We focused on two aspects: stemming for both
        languages and compound breaking for German, and studied several query formulations to take
        advantage of compounds. Our results suggest that German retrieval is indeed different from English
        or French retrieval, inasmuch as breaking compounds can significantly improve performance.


Introduction
West Group’s first attempt at non-English monolingual retrieval was through its participation in the
Amaryllis-2 campaign. Our findings during that campaign were that there was little difference between
French and English retrieval, once the inflectional nature of French was handled through stemming or
morphological analysis. For CLEF-2000, our goal for French document retrieval was to investigate the
impact of the stemming method. We compare performing no stemming, stemming using an inflectional
morphological analyzer, and stemming used a rule-based algorithm similar to Porter’s English stemmer.
Our main focus, however, was German document retrieval. German introduced a new dimension to our
previous work: compound terms. We set up our experiments to assess whether we could ignore compound
terms, i.e. handle German retrieval like we handled French or English retrieval, or whether we could
leverage from the existence of compounds.
For both our French and German experiments, we relied on a slightly altered version of the WIN engine,
West Group’s implementation of the inference network retrieval model [Tur90]. We used third-party
stemmers to handle non-English languages.
In the following, we briefly describe the WIN engine and its adaptation to non-English languages. Next,
we describe our variants for German document retrieval. The section following that describes experiments
with stemming for French monolingual retrieval.
General System Description
The WIN system is a full-text natural language search engine, and corresponds to West Group’s
implementation of the inference network retrieval model. While based on the same retrieval model as the
INQUERY system [BCC93], WIN has evolved separately and focused on the retrieval of legal material in
large collections in a commercial environment that supports both Boolean and natural language searches
[Tur94].
The WIN engine supports three types of document scoring: the document as a whole is scored; each
paragraph is scored and the score of the document becomes the best paragraph score; the document score
and the best paragraph score are combined. We used the following scoring techniques:
•   German: document score based on whole document
• French: document score based on combination of whole document and best paragraph.
We indexed non-English collections using a slightly modified WIN for each language:
•   German:
    •     We used a third-party stemmer based on a morphological analyzer. One of the features was
          compound decomposition. Forcing decomposition or not was a parameter in our experiments.
    •     We indexed both German collections as one single retrieval collection. We did not investigate
          merging retrieved sets.
•   French
    •     We added a tokenization rule to handle elision.
    •     We used two kinds of stemmers: a third-party stemmer based on a morphological analyzer, and a
          rule-based stemmer (a la Porter) from the Muscat project.
A WIN query consists of concepts extracted from natural language text. Normal WIN query processing
eliminates stopwords, noise phrases (or introductory phrases) and recognizes phrases or other important
concepts for special handling. Many of the concepts ordinarily recognized by WIN are specific to both
English documents and the legal domain. To perform these tasks, WIN relies on various resources: a
stopword list, a list of introductory phrases (“Find cases about…”, “A relevant document describes…”) , a
dictionary of (legal) phrases.
Query processing for French was similar to English query processing. We used a stopword list of 1745
terms (highly frequent terms, and noise terms like adverbs). Using the TREC-6, 7 and 8 topics, we refined
the list of introductory patterns we created for Amaryllis-2. In the end, there were 160 patterns (a pattern
is a regular expression that handles case variants and some spelling errors). We did not use phrase
identification for lack of a general French phrase dictionary.
We investigated several options to structuring German queries, decomposing or not decomposing
compounds. This specific processing is described below. We used a stopword list of 333 terms. Using the
TREC-6, 7 and 8 topics, we derived a set of introductory patterns for German. There were 11 regular
expressions, summarizing over 200 noise phrases. We did not perform phrase identification through a
dictionary. However German compounds have been treated as “natural phrases” in some of our runs.
Finally, we extracted concepts from the full topics. However, we gave more weight to concepts appearing
in the Title or Description fields than concepts extracted from the Narrative field. Following West’s
participation at TREC3 [TTYF95], we assigned a weight of 4 to concepts extracted from the Title field,
while concepts originating from the Description and Narrative fields were given a weight of 2 and 1,
respectively.
German monolingual retrieval experiments and results
Our experiments with monolingual German retrieval focused on query processing and compound
decomposition. Our submitted runs rely on decomposing compounds, but we also experimented with no
decomposition, and no stemming at all. Indexing followed the choice made for query processing. For
instance, when no decomposition was performed for query terms, parts of compounds were not indexed.
When dealing with breaking compound terms, we faced the choice of considering a compound term as a
single concept in our WIN query, or treating the compound as several concepts (as many concepts as
there were parts in the compound). The submitted run WESTgg1 considers that a compound corresponds
to several concepts; the run WESTgg2 handles a compound as a single concept.
When faced with a compound Energiequellen, the structured query in WESTgg1 introduces 2 concepts,
Energie and Quelle; the structured query in WESTgg2 introduces 1 concept, #PHRASE(Energie Quelle).
The #PHRASE operator is a soft phrase, i.e. the component terms must appear with 3 words of one
another. The score of the #PHRASE concept in our experiment was set to be the maximum score of the
soft phrase itself or of its components.
Table 1 summarizes the results of our two official runs as well as the results of the runs Nostem where no
stemming was used and Nobreak where stemming but no decomposition was used.


                                                         Performance of individual queries
Run          Avg. Prec.    R-Prec.       Best          Above         Median       Below         Worst
WESTgg1      0.3840        0.3706        3             21            3            9             1
WESTgg2      0.3779        0.3628        3             18            6            9             1
Nostem       0.2986        0.3080        0             15            1            19            2
Nobreak      0.2989        0.3141        0             18            1            15            3

Table 1: Summary of individual run performance on the 37 German topics with relevant
documents.
The results reported in Table 1 support the hypothesis that German document retrieval differs from
English document retrieval. Treating compound words as forms of phrases improves the performance of
the German retrieval system. Indeed, searching with compound stems did not perform better than
searching with no stemming.
We expected a greater difference between our two submitted runs. WESTgg1 allows compound terms to
contribute more to the score of a document, while WESTgg2 gives the same contribution to compound
and non-compound terms. The contribution of a compound term in WESTgg1 is weighted by the number
of parts in the compound, so one would expect its occurrence in a document to significantly alter a
document score.
After reviewing the individual queries, we noticed the following behavior. First, for those queries where
both the compounds and their parts had an average frequency, neither particularly common nor
particularly rare, the two runs behaved similarly. Then, the parts helped locate documents, but did not add
to or draw away from the document relevance score. Second, for those queries where the compound itself
is above average, but the individual parts are average, or even fairly common, then the weighted
contribution provided in WESTgg1 performed better. Third, for those queries where at least one part of a
compound was very common, the high occurrence of that part degraded the weighting scheme of
WESTgg1, thus the single concept construct of WESTgg2 provided a more representative score.
Also, compound handling in WESTgg1 as well as WESTgg2 is only as influential as there are compounds
in the query. In the 40 German topics, roughly 16% of the query terms are compound terms. In addition, it
should noted that we indexed the individual parts of compounds. As a result, a simple query term may
also match the part of a compound in a document.
French monolingual retrieval experiments and results
The goal of our experiments with French document retrieval was to assess the difference between
stemming algorithms. Our motivation was to further investigate the particularity of French compared to
English. [Hul96] reported results on various kinds of stemmers for English document retrieval. So far, we
have studied two types of stemmers (out of the 5 types in [Hul96]) as well as no stemming at all:
•     a stemmer based on an inflectional morphological analyzer, e.g. it conflates verb forms to the
      infinitive of the verb, noun forms to the singular noun, adjectives to the masculine singular form.
      This stemmer is based on a lexicon.
•    a rule-based stemmer “a la Porter” that approximates mainly inflectional rules, but also provides a
     limited set of derivational rules based on suffix stripping, e.g. it strips suffixes like –able or –isme.
Our runs also took advantage of the multiple TEXT elements in a document. We considered those
elements to mark paragraph boundaries and used this information for document scoring and ranking.
Our submitted run, WESTff, used the inflectional stemmer. Table 2 summarizes the performance of runs
using the inflectional stemmer, the Porter stemmer and no stemmer at all. We also ran experiments when
a document was either scored as a whole or as its best paragraph. Those runs are not reported here, as they
did not perform as well as the combined score.


                                                            Performance of individual queries
Run           Avg. Prec.    R-Prec.       Best          Above          Median       Below         Worst
WESTff        0.4903        0.4371        11            9              7            7             0
                                              1
Porter        0.4680        0.4297        6             14             1            13            0
Nostem        0.4526        0.4210        71            8              0            19            0

Table 2: Summary of individual run performance on the 34 French topics with relevant documents.
While we usually consider not stemming as a baseline, our tests showed that no stemming performed
better on several topics. In those instances, we found that the Porter stemmer was too aggressive and
stemmed important query terms to very common forms. For instance, parti was stemmed to part, directive
to direct and français to franc. The inflectional stemmer did exactly what it was supposed to do, e.g. stem
française to français. However, certain stems were very common, while their raw form was less common.
Phrase identification, e.g. académie française, and monnaie européenne, may likely improve performance,
as it has proven to be beneficial for the English version of the WIN search engine.
In addition, the inflectional stemmer is only as good as its lexicon. We found a couple of queries where
the Porter stemmer performed better because important query terms were not in the lexicon.
While our analysis is only partial at this time, it appears that our French stemming results follow the
patterns exhibited by [Hul96] for English stemming, except that inflectional stemming seems slightly
superior. We do not know yet whether this is a particularity of the French language or of this particular
collection and set of topics.

1
  For some queries, our runs achieved an average precision that was better than the best average precision
reported at CLEF.
              0.9

                                                                                       WESTff
              0.8
                                                                                       WESTgg1
                                                                                       WESTgg2
              0.7


              0.6


              0.5
  Precision


              0.4


              0.3


              0.2


              0.1


               0
                    0   0.1   0.2   0.3    0.4      0.5     0.6      0.7      0.8      0.9       1
                                                  Recall


Figure 1: Recall-precision curves for our submitted runs in non-English monolingual document
retrieval
Summary
The WIN retrieval system achieved good performance for both German and French document retrieval
without any major modification being made to its retrieval engine. On the one hand, we showed that
German document retrieval required specific handling because of the use of compound words in the
language. Our results showed that decomposing compounds during indexing and query processing
enhanced the capabilities of our system. Our French experiments, on the other hand, did not uncover any
striking difference between French and English retrieval, except a preference towards the use of an
inflectional stemmer.
References
[CCB92] W.B. Croft, J. Callan and J. Broglio. The INQUERY retrieval system. In Proceedings of the 3rd
   International Conference on Database and Expert Systems Applications, Spain, 1992
[Hul96] D. A. Hull. Stemming Algorithms: A Case Study for Detailed Evaluation. In Journal of The
   American Society For Information Science, 47(1): 70-84,1996.
[TTYF95] P. Thompson, H. Turtle, B. Yang and J. Flood, "TREC-3 Ad Hoc Retrieval and Routing
   Experiments using the WIN System," in Overview of the 3rd Text Retrieval Conference (TREC-3),
   NIST Special Publication 500-225, Gaithersburg, MD, April 1995.
[Tur90] H. Turtle. Inference Networks for Document Retrieval. PhD Thesis, Computer Science
   Department, University of Massassuchets, Amherst, 1990.
[Tur94] H. Turtle. Natural language vs. Boolean query evaluation : a comparison of retrieval
   performance. In Proceedings of the 17th Annual International Conference on Research and
   Development in Information Retrieval, Dublin, 1994

</pre>