<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experiments in 8 European Languages with Hummingbird SearchServerTM at CLEF 2002</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stephen Tomlinson Hummingbird Ottawa</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ontario</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>1998</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>4</lpage>
      <abstract>
        <p>Hummingbird submitted ranked result sets for all Monolingual Information Retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2002. Enabling stemming in SearchServer increased average precision by 16 points in Finnish, 9 points in German, 4 points in Spanish, 3 points in Dutch, 2 points in French and Italian, and 1 point in Swedish and English. Accent-indexing increased average precision by 3 points in Finnish and 2 points in German, but decreased it by 2 points in French and 1 point in Italian and Swedish. Treating apostrophes as word separators increased average precision by 3 points in French and 1 point in Italian. Confidence intervals produced using the bootstrap percentile method were found to be very similar to those produced using the standard method; both were of similar width to rank-based intervals for differences in average precision, but substantially narrower for differences in Precision@10.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>erty of their respective owners.</p>
      <sec id="sec-1-1">
        <title>Language</title>
      </sec>
      <sec id="sec-1-2">
        <title>German</title>
        <p>Spanish
Dutch
Swedish
English
Italian
French
Finnish
For the experiments described in this paper, an internal development build of SearchServer 5.3
was used (5.3.500.279).</p>
        <p>
          The CLEF 2002 document sets consisted of tagged (SGML-formatted) news articles (mostly from
1994) in 8 different languages: German, French, Italian, Spanish, Dutch, Swedish, Finnish and
English. Table 1 gives their sizes. For more information on the CLEF collections, see the CLEF
web site [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
2.2
        </p>
        <sec id="sec-1-2-1">
          <title>Text Reader</title>
          <p>The custom text reader called cTREC, originally written for handling TREC collections [12],
handled expansion of the library files of the CLEF collections and was extended to support the
CLEF guidelines of only indexing specific fields of specific documents. The entities described in
the DTD files were also converted, e.g. “&amp;equals;” was converted to the equal sign “=”.</p>
          <p>The documents were assumed to be in the Latin-1 character set, the code page which, for
example, assigns e-acute (´e) hexadecimal 0xe9 or decimal 233. cTREC passes through the Latin-1
characters, i.e. does not convert them to Unicode. SearchServer’s Translation Text Reader (nti),
was chained on top of cTREC and the Win 1252 UCS2 translation was specified via its /t option
to translate from Latin-1 to the Unicode character set desired by SearchServer.
2.3</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>Indexing</title>
          <p>A separate SearchServer table was created for each language, created with a SearchSQL statement
such as the following:</p>
          <p>CREATE SCHEMA CLEF02DE CREATE TABLE CLEF02DE
(DOCNO VARCHAR(256) 128)
TABLE_LANGUAGE 'GERMAN'
STOPFILE 'LANGDE.STP'
PERIODIC
BASEPATH 'e:\data\clef';</p>
          <p>The TABLE LANGUAGE parameter specifies which language to use when performing
stemming operations at index time. The STOPFILE parameter specifies a stop file containing typically
a couple hundred stop words to not index; the stop file also contains instructions on changes to
the default indexing rules, for example, to enable accent-indexing, or to change the apostrophe to
a word separator. Here are the first few lines of the stop file used for the French task:
PST = "'`"
STOPLIST =
a
aµ
afin
The IAC line enables indexing of the specified accents (Unicode combining diacritical marks
0x0300-0x0345). Accent-indexing was enabled for all runs except the Italian and English runs.
Accents were known to be specified in the Italian queries but were not consistently used in the
Italian documents. The PST line adds the specified characters (apostrophes in this case) to the
list of word separators. The apostrophes were changed to word separators except for English runs.</p>
          <p>Into each table, we just needed to insert one row, specifying the top directory of the library
files for the language, using an Insert statement such as the following:</p>
          <p>INSERT INTO CLEF02DE ( FT_SFNAME, FT_FLIST ) VALUES
('German','cTREC/E/d=128:s!nti/t=Win_1252_UCS2:cTREC/C/@:s');
To index each table, we just executed a Validate Index statement such as the following:
VALIDATE INDEX CLEF02DE VALIDATE TABLE;</p>
          <p>By default, the index supports both exact matching (after some Unicode-based normalizations,
such as converting to upper-case and decomposed form) and matching on stems.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Search Techniques</title>
      <p>The CLEF organizers created 50 “topics” (numbered 91-140) and translated them into many
languages. Each topic contained a “Title” (subject of the topic), “Description” (a one-sentence
specification of the information need) and “Narrative” (more detailed guidelines for what a
relevant document should or should not contain). The participants were asked to use the Title and
Description fields for at least one automatic submission per task this year to facilitate comparison
of results.</p>
      <p>We created an ODBC application, called QueryToRankings.c, based on the example
stsample.c program included with SearchServer, to parse the CLEF topics files, construct and execute
corresponding SearchSQL queries, fetch the top 1000 rows, and write out the rows in the results
format requested by CLEF. SELECT statements were issued with the SQLExecDirect api call.
Fetches were done with SQLFetch (typically 1000 SQLFetch calls per query).
3.1</p>
      <sec id="sec-2-1">
        <title>Intuitive Searching</title>
        <p>For all runs, we used SearchServer’s Intuitive Searching, i.e. the IS ABOUT predicate of
SearchSQL, which accepts unstructured text. For example, for the German version of topic 41 (from last
year), the Title was “Pestizide in Babykost” (Pesticides in Baby Food), and the Description was
“Berichteu¨ber Pestizide in Babynahrung sind gesucht” (Find reports on pesticides in baby food).
A corresponding SearchSQL query would be:
SELECT RELEVANCE('V2:3') AS REL, DOCNO
FROM CLEF02DE
WHERE FT TEXT IS ABOUT 'Pestizide in Babykost Berichte Äuber
Pestizide in Babynahrung sind gesucht'
ORDER BY REL DESC;
This query would create a working table with the 2 columns named in the SELECT clause, a
REL column containing the relevance value of the row for the query, and a DOCNO column
containing the document’s identifier. The ORDER BY clause specifies that the most relevant rows
should be listed first. The statement “SET MAX SEARCH ROWS 1000” was previously executed
so that the working table would contain at most 1000 rows.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Stemming</title>
        <p>SearchServer “stems” each distinct word to one or more base forms, called stems. For example,
in English, “baby”, “babied”, “babies”, “baby’s” and “babying” all have “baby” as a stem.
Compound words in German, Dutch and Finnish produce multiple stems; e.g., in German, “babykost”
has “baby” and “kost” as stems. SearchServer 5.3 uses the lexicon-based Inxight LinguistX
Platform 3.3.1 for stemming operations.</p>
        <p>By default, Intuitive Searching stems each word in the query, counts the number of occurrences
of each stem, and creates a vector. Optionally some stems are discarded (secondary term selection)
if they have a high document frequency or to enforce a maximum number of stems, but we didn’t
discard any for our CLEF runs. The index is searched for documents containing terms which stem
to any of the stems of the vector.</p>
        <p>The VECTOR GENERATOR set option controls which stemming operations are performed by
Intuitive Searching. To enable stemming, we used the same setting for each language except for the
/lang parameter. For example, for German, the setting was ‘word!ftelp/lang=german/base/noalt
j * j word!ftelp/lang=german/inflect’. To disable stemming, the setting was just ‘’.</p>
        <p>Besides linguistic expansion from stemming, we did not do any other kinds of query expansion.
For example, we did not use approximate text searching for spell-correction because the queries
were believed to be spelled correctly. We did not use row expansion or any other kind of blind
feedback technique.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Statistical Relevance Ranking</title>
        <p>
          SearchServer calculates a relevance value for a row of a table with respect to a vector of stems based
on several statistics. The inverse document frequency of the stem is estimated from information
in the dictionary. The term frequency (number of occurrences of the stem in the row (including
any term that stems to it)) is determined from the reference file. The length of the row (based on
the number of indexed characters in all columns of the row, which is typically dominated by the
external document), is optionally incorporated. The already-mentioned count of the stem in the
vector is also used. To synthesize this information into a relevance value, SearchServer dampens
the term frequency and adjusts for document length in a manner similar to Okapi [
          <xref ref-type="bibr" rid="ref5">6</xref>
          ] and dampens
the inverse document frequency in a manner similar to [8]. SearchServer’s relevance values are
always an integer in the range 0 to 1000.
        </p>
        <p>SearchServer’s RELEVANCE METHOD setting can be used to optionally square the
importance of the inverse document frequency (by choosing a RELEVANCE METHOD of ‘V2:4’
instead of ‘V2:3’). The importance of document length to the ranking is controlled by
SearchServer’s RELEVANCE DLEN IMP setting (scale of 0 to 1000). For all runs in this paper,
RELEVANCE METHOD was set to ‘V2:3’ and RELEVANCE DLEN IMP was set to 750.
3.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Query Stop Words</title>
        <p>Our QueryToRankings program removed words such as “find”, “relevant” and “document” from
the topics before presenting them to SearchServer, i.e. words which are not stop words in general
but were commonly used in the CLEF topics as general instructions. For the submitted runs, the
lists were developed by examining the CLEF 2000 and 2001 topics (not this year’s topics). For
the diagnostic runs in this paper, “finde” was added as a query stop word because it was noticed
to be common in the German topics this year. An evaluation of the impact of query stop words
is provided below.</p>
        <p>Run
Finnish</p>
        <sec id="sec-2-4-1">
          <title>German</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Spanish</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Dutch</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>French</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>Italian</title>
        </sec>
        <sec id="sec-2-4-6">
          <title>Swedish</title>
        </sec>
        <sec id="sec-2-4-7">
          <title>English</title>
          <p>AvgP
The evaluation measures are likely explained in an appendix of this volume. Briefly: “Precision”
is the percentage of retrieved documents which are relevant. “Precision@n” is the precision after
n documents have been retrieved. “Average precision” for a topic is the average of the precision
after each relevant document is retrieved (using zero as the precision for relevant documents which
are not retrieved). “Recall” is the percentage of relevant documents which have been retrieved.
“Interpolated precision” at a particular recall level for a topic is the maximum precision achieved
for the topic at that or any higher recall level. For a set of topics, the measure is the average of
the measure for each topic (i.e. all topics are weighted equally).</p>
          <p>The Monolingual Information Retrieval tasks were to run 50 queries against document
collections in the same language and submit a list of the top-1000 ranked documents to CLEF for
judging (in June 2002). CLEF produced a “qrels” file for each of the 8 tasks: a list of documents
judged to be relevant or not relevant for each topic. From these, the evaluation measures were
calculated with Chris Buckley’s trec eval program.</p>
          <p>For some topics and languages, no documents were judged relevant. The precision scores are
just averaged over the number of topics for which at least one document was judged relevant.
4.1</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Impact of Stemming</title>
        <p>Most of the remaining tables will focus on one particular precision measure (usually average
precision), comparing the scores when a particular feature (such as stemming) is enabled to when
it is disabled. The columns of these tables are as follows:
² “Experiment” is the language and topic fields used (for example, “-td” indicates the Title
and Description fields were used).
² “AvgDiff” is the average difference in the precision score. In [9], a difference of at least 2
full points (i.e. &gt;=0.020) is considered “noticeable”, 4 points “material”, 6 points “striking”
and 8 points “dramatic”.
² “95% Confidence” is an approximate 95% confidence interval for the average difference
calculated using the bootstrap percentile method (described in the last section). If zero is not
in the interval, the result is “statistically significant” (at the 5% level), i.e. the feature is
unlikely to be of neutral impact, though if the average difference is small (e.g. &lt;0.020) it
may still be too minor to be considered “significant” in the magnitude sense.
² “vs.” is the number of topics on which the precision was higher, lower and tied (respectively)
with the feature enabled. These numbers should always add to the number of topics for the
language (as per Table 2).
² “2 Largest Diffs (Topic)” lists the two largest differences in the precision score (based on
the absolute value), with each followed by the corresponding topic number in brackets (the
topic numbers range from 91 to 140).</p>
        <p>Table 3 shows the impact of stemming on the average precision measure. The benefit for Finnish
and German, for which stemming includes compound-breaking, is dramatic. For example, Finnish
topic 115, regarding “avioerotilastoja” (divorce statistics), apparently benefits from
compoundbreaking. Surprisingly, the other investigated language for which compounds are broken, Dutch,
does not similarly stand out, unlike last year [11], though its confidence interval still overlaps the
one for German.</p>
        <p>Table 4 shows the impact of stemming on the shorter (Title-only) queries. It appears the
benefits are a little bigger for the shorter queries in most languages, with English the only language
without a noticeable benefit on average. Of course, stemming can hurt precision for some queries,
as in English topic 139 (EU fishing quotas), so an application probably should make stemming a
user-controllable option.
4.2</p>
      </sec>
      <sec id="sec-2-6">
        <title>Impact of Query Stop Words</title>
        <p>Table 5 shows the impact of discarding query stop words, such as “find”, “relevant” and
“documents”. Query stop words differ from general stop words (such as “the”, “of”, “by”) in that
they do not seem to be noise words in general, but their common use in past CLEF topic sets
(particularly the Description and Narrative fields) suggests they are likely not useful terms when
encountered in CLEF queries. In the table, a positive difference indicates a benefit from removing
query stop words from the topics.</p>
        <p>Table 5 shows that the impact of discarding query stop words was always minor (the biggest
average benefit was just 1.6 points), though some of the differences are “statistically significant”
because of the consistency of the minor benefits. This is a case where a “statistically significant”
benefit is still not a “significant” benefit.</p>
        <p>Sometimes noise words may occur in relevant documents by chance and scores may fall if
the noise words are discarded. Apparently that happened in French topics 123 and 132
(regarding “mariage” and “Kaliningrad” respectively) in which excluding “trouver” and “documents”
decreased the scores, even though they don’t seem to be meaningful terms for their queries.
4.3</p>
      </sec>
      <sec id="sec-2-7">
        <title>Impact of Stop Words</title>
        <p>Tables 6 and 7 show the impact of using stop words on the average precision measure. To do
this experiment, two tables were created for each language, one indexed with a stopfile containing
typically a couple hundred stop words, the other with no stop words (though other SearchServer
stopfile instructions, such as accent-indexing and apostrophes as word separators, were kept the
same as used for the submitted runs). For this experiment, query stop words were not discarded
for either run, to isolate the impact of the general stop words on precision. In the tables, a positive
difference indicates a benefit to specifying stop words.</p>
        <p>Table 6 shows the impact of using stop words for Title plus Description queries was very slight
on average, and none of the differences were statistically significant. Table 7 shows there was a
noticeable benefit for full topic queries (i.e. when additionally including the Narrative) for some
languages, and a statistically significant benefit for most of them. Other benefits of specifying
stop words are to reduce search time, indexing time and index size. However, there may be cases
when what is usually a stop word is meaningful to a query (e.g. find documents containing “to be
or not to be”), so it may be better to make stop word elimination an option at search time rather
than at index time, depending on the goals of the application.</p>
        <p>Stop word lists for many languages are on the Neuchaˆtel resource page [7]. Our stop word lists
may contain differences.
4.4</p>
      </sec>
      <sec id="sec-2-8">
        <title>Impact of Indexing Accents</title>
        <p>settings were the same as for the submitted runs; in particular, apostrophes were used as word
separators except in English.</p>
        <p>Tables 8 and 9 show that topic 98, regarding the Kaurisma¨ki brothers, was strongly affected in
many languages by whether or not accents were preserved. Spanish, French and Italian topics 98
included the accent in Kaurisma¨ki, but the documents more often did not include the accent, so
accent-indexing hurt precision in those cases. But accent-indexing was helpful for Finnish for this
topic, apparently because in Finnish there were variants which required stemming to match (e.g.
Kaurisma¨kien and Kaurisma¨en), and the stemmer was more effective when given the words with
the accents preserved. It appears it would help if the stemmer was modified to be more tolerant
of missing accents.
4.5</p>
      </sec>
      <sec id="sec-2-9">
        <title>Impact of Apostrophes as Word Separators</title>
        <p>Table 10 shows the impact of treating apostrophes as word separators on the average precision
measure. To do this experiment, two tables were created for each language, one treating
apostrophes as word separators, the other not. No stop words were used and no query stop words
were discarded. Otherwise, the settings were the same as for the submitted runs; in particular,
accent-indexing was enabled except in Italian and English.</p>
        <p>Table 10 shows that treating apostrophes as word separators had a noticeable benefit for
French. For example, French topic 121 may be benefiting from breaking “d’Ayrton” at its
apostrophe. The benefit for Italian may have been less because stemming appears to be handling
apostrophes. For example, in Italian, if apostrophes are not word separators, “l’ombrello” still
matches “ombrello” when stemming is enabled, whereas in French, “l’´ecole” still does not match
“´ecole” (again, this difference is moot when apostrophes are treated as word separators). The
impact for other languages is slight, including for English.
We submitted 10 monolingual runs (the maximum allowed) in June 2002. Runs humDE02,
humFR02, humIT02, humES02, humNL02, humSV02 and humFI02 provided a run for each
language using the Title and Description fields as requested by the organizers (note that English
monolingual runs were not accepted). For the remaining 3 runs, we submitted an extra run for Finnish,
Swedish and Dutch including the Narrative field (runs humFI02n, humSV02n, humNL02n); these
languages were expected to have the fewest participants, so additional submissions seemed more
likely to be helpful for the judging pools. The precision scores of the submitted runs are expected
to be included in an appendix of this volume. Table 11 shows a comparison of the submitted runs
with the median scores of submitted monolingual runs for each language. In all but one case,
SearchServer scored higher than the median on more topics than it scored lower. Note that the
relative performance on different languages may not be meaningful for several reasons, including
that the medians are from a mix of runs where some may have used the Narrative field, multiple
runs may be submitted by the same group, and the mixture can vary across languages.</p>
        <p>The submitted runs of June used an older, experimental build than was used for the diagnostic
runs in August, and there may be minor differences in the scores even when the settings are the
same.
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Confidence Intervals for Precision Differences</title>
      <p>
        The 95% confidence intervals presented in this paper have been produced using Efron’s Bootstrap
(percentile method). If there are 50 topics (i.e. 50 precision differences), then precision differences
are chosen randomly (with replacement) 50 times, producing a “bootstrap sample”, and a mean
(average) is computed from this sample. This step is repeated B times (e.g. B=100,000). The
B sample means are sorted, the bottom and top 2.5% are discarded, and the endpoints of the
remaining range of sample means are an approximate 95% confidence interval for the average
difference in precision (we always rounded so that the listed endpoints are not actually in the
produced interval). The bootstrap percentile method is considered to work well in more cases
than the standard method of using the mean plus/minus 1.96 times the standard error, though
there are more complicated bootstrap methods which are considered even more general [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Table 12 shows the bootstrap confidence intervals produced for the impact of stemming on
average precision with different numbers of iterations. Even at just 1000 iterations the values are
fairly close to the values at 1 million iterations. When comparing 1,000,000 iterations to 100,000,
very few of the endpoints changed, and they only changed by 0.001. For the confidence intervals
in this paper, we used B=100,000.</p>
      <p>Tables 13 and 14 contain side-by-side comparisons of the approximate 95% confidence intervals
produced by the bootstrap percentile method and the standard method. It turns they are very
similar. There is a disagreement on statistical significance (i.e. when zero is not in the interval)
in the case of Dutch in Table 13, but it is a borderline case.</p>
      <p>
        Tables 13 and 14 also include an estimator and 95% confidence interval based on the Wilcoxon
signed rank test (the 2 rightmost columns). (We implemented an exact computation, including for
the case of ties in the absolute values of the differences [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). For differences in average precision,
the widths of the intervals are very similar (the bootstrap intervals are a little smaller than the
Wilcoxon intervals for the Finnish and German results, and the Wilcoxon intervals are a little
smaller for the others); the methods agree on which differences are statistically significant.
However, for differences in Precision@10, the bootstrap intervals are a lot smaller than the Wilcoxon
intervals (because the Wilcoxon is based on ranks, it cannot distinguish between a shift of 0.01
and 0.09 (they have the same effect on the ranks because every difference is a multiple of 0.10));
the methods still agree on statistically significant results (for the 8 cases listed).
for
      </p>
      <p>IR</p>
      <sec id="sec-3-1">
        <title>Systems)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Home</title>
      </sec>
      <sec id="sec-3-3">
        <title>Page.</title>
        <p>[5] NTCIR (NII-NACSIS Test Collection</p>
        <p>http://research.nii.ac.jp/»ntcadm/index-en.html
[7] Jacques Savoy. (Universit´e de Neuchˆatel.) CLEF and Multilingual information retrieval
resource page. http://www.unine.ch/info/clef/</p>
      </sec>
      <sec id="sec-3-4">
        <title>Fulcrum SearchServer</title>
        <p>Proceedings of the
Publication 500-249.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Michael</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Chernick</surname>
          </string-name>
          . Bootstrap
          <string-name>
            <surname>Methods: A Practitioner's Guide</surname>
          </string-name>
          .
          <year>1999</year>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          <article-title>Forum web site</article-title>
          . http://www.clef-campaign.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Hodgson</surname>
          </string-name>
          .
          <article-title>Converting the Fulcrum Search Engine to Unicode</article-title>
          . In Sixteenth International Unicode Conference, Amsterdam, The Netherlands,
          <year>March 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Myles</given-names>
            <surname>Hollander</surname>
          </string-name>
          and
          <string-name>
            <given-names>Douglas A.</given-names>
            <surname>Wolfe</surname>
          </string-name>
          . Nonparametric Statistical Methods.
          <source>Second Edition</source>
          ,
          <year>1999</year>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (City University.)
          <article-title>Okapi at TREC-3</article-title>
          . In D. K. Harman, editor,
          <source>Overview Third Text REtrieval Conference (TREC-3)</source>
          . NIST Special Publication http://trec.nist.gov/pubs/trec3/t3 proceedings.html
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>