<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ray R. Larson School of Information University of California</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we will briefly describe the approaches taken by Berkeley for the main GeoCLEF 2008 tasks (Mono and Bilingual retrieval). The approach this year used probabilistic text retrieval based on logistic regression and incorporating blind relevance feedback for all of the runs and in addition we ran a number of tests combining this type of search with OKAPI BM25 searches using a fusion approach. All translation for bilingual tasks was performed using the LEC Power Translator PC-based MT system. Our results were good overall with Cheshire systems runs appearing in the top 5 participants for each task (German, English and Portuguese both Monolingual and Bilingual) with the highest ranked runs for Monolingual Portuguese and for Bilingual German, English and Portuguese. All of these top-ranked runs used the fusion approach. However, once again this year we did not attempt to do any specialized geographic processing, because it appears that purely textual approaches to GIR are more effective when only textual topics, lacking explicit geographic coordinate constraints, are used.</p>
      </abstract>
      <kwd-group>
        <kwd>Cheshire II</kwd>
        <kwd>Logistic Regression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Geographic Information Retrieval (GIR) as it was originally defined was concerned with providing
access to georeferenced information resources using a combination of Information Retrieval (IR)
and Data Retrieval (DR) (or database) methods[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In GeoCLEF the nature of the topics has
tended to emphasize the IR aspects of GIR, largely because no explicit georeferencing of documents
or topics has been supplied to the experimenter.
      </p>
      <p>Without the explicit georeferencing of documents and/or topics the experimenter (or searcher)
is faced with attempting to provide such georeferencing and therefore solving all of the attendent
problems of ambiguity and multiplicity of toponyms and the issues of name polysemy that explicit
georeferencing is intended to alleviate. An alternative approach is to, in effect, ignore geographic
clues by treating them like any other term in a normal IR search process. Our approach this year
has been to take this latter approach, and use only text retrieval methods on the provided topics
with no explicit identification or treatment of toponyms.</p>
      <p>This paper describes the retrieval algorithms and evaluation results for Berkeley’s official
submissions for the GeoCLEF 2008 track. All of the submitted runs were automatic without manual
intervention in the queries (or translations). We submitted nine Monolingual runs (three German,
three English, and three Portuguese) and eighteen Bilingual runs (three runs for each of the three
languages to each each other language). The runs varied in the topic elements used, and whether
or not a fusion approach (described below) was used.</p>
      <p>This paper first describes the retrieval algorithms and fusion operations used for our
submissions, followed by a discussion of the processing used for the runs. We then examine the results
obtained for our officially submitted runs, and finally present conclusions and future directions for
GeoCLEF participation.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Retrieval Algorithms and Fusion Operators</title>
      <p>
        Note that this section is virtually identical to one that appears in our Adhoc-TEL and Domain
Specific papers, with the addition of the Okapi BM-25 description and fusion operations
subsections. The basic form and variables of the Logistic Regression (LR) algorithm used for all of our
submissions was originally developed by Cooper, et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As originally formulated, the LR model
of probabilistic IR attempts to estimate the probability of relevance for each document based on
a set of statistics about a document collection and a set of queries in combination with a set of
weighting coefficients for those statistics. The statistics to be used and the values of the
coefficients are obtained from regression analysis of a sample of a collection (or similar test collection)
for some set of queries where relevance and non-relevance has been determined. More formally,
given a particular query and a particular document in a collection P (R | Q, D) is calculated and
the documents or components are presented to the user ranked in order of decreasing values of
that probability. To avoid invalid probability values, the usual calculation of P (R | Q, D) uses the
“log odds” of relevance given a set of S statistics, si, derived from the query and database, such
that:
where b0 is the intercept term and the bi are the coefficients obtained from the regression analysis of
the sample collection and relevance judgements. The final ranking is determined by the conversion
of the log odds form to probabilities:
      </p>
      <p>S
log O(R | Q, D) = b0 + X bisi</p>
      <p>i=1
P (R | Q, D) = 1 + elog O(R|Q,D)
elog O(R|Q,D)
2.1</p>
      <sec id="sec-2-1">
        <title>TREC2 Logistic Regression Algorithm</title>
        <p>
          For GeoCLEF we used a version the Logistic Regression (LR) algorithm that has been used very
successfully in Cross-Language IR by Berkeley researchers for a number of years[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The formal
definition of the TREC2 Logistic Regression algorithm used is:
log O(R|C, Q)
= log
        </p>
        <p>p(R|C, Q)
1 − p(R|C, Q)
= log
p(R|C, Q)
p(R|C, Q)
=
c0 + c1 ∗</p>
        <p>1 |XQc| qtfi
p|Qc| + 1 i=1 ql + 35
(1)
(2)
+
−
+
where C denotes a document component (i.e., an indexed part of a document which may be the
entire document) and Q a query, R is a relevance variable,
p(R|C, Q) is the probability that document component C is relevant to query Q,
p(R|C, Q) the probability that document component C is not relevant to query Q, which is 1.0
p(R|C, Q)
|Qc| is the number of matching terms between a document component and a query,
qtfi is the within-query frequency of the ith matching term,
tfi is the within-document frequency of the ith matching term,
ctfi is the occurrence frequency in a collection of the ith matching term,
ql is query length (i.e., number of terms in a query like |Q| for non-feedback situations),
cl is component length (i.e., number of terms in a component), and
Nt is collection length (i.e., number of terms in a test collection).
ck are the k coefficients obtained though the regression analysis.</p>
        <p>If stopwords are removed from indexing, then ql, cl, and Nt are the query length, document
length, and collection length, respectively. If the query terms are re-weighted (in feedback, for
example), then qtfi is no longer the original term frequency, but the new weight, and ql is the
sum of the new weight values for the query terms. Note that, unlike the document and collection
lengths, query length is the “optimized” relative frequency without first taking the log over the
matching terms.</p>
        <p>
          The coefficients were determined by fitting the logistic regression model specified in log O(R|C, Q)
to TREC training data using a statistical software package. The coefficients, ck, used for our
official runs are the same as those described by Chen[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. These were: c0 = −3.51, c1 = 37.4,
c2 = 0.330, c3 = 0.1937 and c4 = 0.0929. Further details on the TREC2 version of the Logistic
Regression algorithm may be found in Cooper et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Blind Relevance Feedback</title>
        <p>
          In addition to the direct retrieval of documents using the TREC2 logistic regression algorithm
described above, we have implemented a form of “blind relevance feedback” as a supplement to the
basic algorithm. The algorithm used for blind feedback was originally developed and described by
Chen [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Blind relevance feedback has become established in the information retrieval community
due to its consistent improvement of initial search results as seen in TREC, CLEF and other
retrieval evaluations [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The blind feedback algorithm is based on the probabilistic term relevance
weighting formula developed by Robertson and Sparck Jones [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Blind relevance feedback is typically performed in two stages. First, an initial search using
the original topic statement is performed, after which a number of terms are selected from some
number of the top-ranked documents (which are presumed to be relevant). The selected terms
are then weighted and then merged with the initial query to formulate a new query. Finally the
reweighted and expanded query is submitted against the same collection to produce a final ranked
list of documents. Obviously there are important choices to be made regarding the number of
top-ranked documents to consider, and the number of terms to extract from those documents. For
ImageCLEF this year, having no prior data to guide us, we chose to use the top 10 terms from 10
top-ranked documents. The terms were chosen by extracting the document vectors for each of the
10 and computing the Robertson and Sparck Jones term relevance weight for each document. This
weight is based on a contingency table where the counts of 4 different conditions for combinations
of (assumed) relevance and whether or not the term is, or is not in a document. Table 1 shows
this contingency table.</p>
        <p>The relevance weight is calculated using the assumption that the first 10 documents are relevant
and all others are not. For each term in these documents the following weight is calculated:
wt = log</p>
        <p>R−RtRt</p>
        <p>Nt−Rt</p>
        <p>N−Nt−R+Rt</p>
        <p>The 10 terms (including those that appeared in the original query) with the highest wt are
selected and added to the original query terms. For the terms not in the original query, the new
“term frequency” (qtfi in main LR equation above) is set to 0.5. Terms that were in the original
query, but are not in the top 10 terms are left with their original qtfi. For terms in the top 10 and
in the original query the new qtfi is set to 1.5 times the original qtfi for the query. The new query
is then processed using the same LR algorithm as shown in Equation 4 and the ranked results
returned as the response for that topic.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Okapi BM-25 Algorithm</title>
        <p>
          The version of the Okapi BM-25 algorithm used in these experiments is based on the description
of the algorithm in Robertson [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and in TREC notebook proceedings [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. As with the LR
algorithm, we have adapted the Okapi BM-25 algorithm to deal with document components :
j=1
|Qc|
X w(1) (k1 + 1)tfj (k3 + 1)qtfj
        </p>
        <p>K + tfj k3 + qtfj
Where (in addition to the variables already defined):
K is k1((1 − b) + b · dl/avcl)
k1, b and k3 are parameters (1.5, 0.45 and 500, respectively, were used),
avcl is the average component length measured in bytes
w(1) is the Robertson-Sparck Jones weight:
w(1) = log
( R−r+r0+.05.5 )
ntj −r+0.5
( N−ntj −R−r+0.5 )
r is the number of relevant components of a given type that contain a given term,
(4)
(5)
R is the total number of relevant components of a given type for the query.</p>
        <p>
          Our current implementation uses only the a priori version (i.e., without relevance information)
of the Robertson-Sparck Jones weights, and therefore the w(1) value is effectively just an IDF
weighting. The results of searches using our implementation of Okapi BM-25 and the LR algorithm
seemed sufficiently different to offer the kind of conditions where data fusion has been shown to
be be most effective [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and our overlap analysis of results for each algorithm (described in the
evaluation and discussion section) has confirmed this difference and the fit to the conditions for
effective fusion of results.
        </p>
        <p>The system used supports searches combining probabilistic and (strict) Boolean elements, as
well as operators to support various merging operations for both types of intermediate result sets.
However, in GeoCLEF we did not use this capability.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Fusion Operators</title>
        <p>The Cheshire II system used in this evaluation provides a number of operators to combine the
intermediate results of a search from different components or indexes. With these operators we
have available an entire spectrum of combination methods ranging from strict Boolean operations
to fuzzy Boolean and normalized score combinations for probabilistic and Boolean results. These
operators are the means available for performing fusion operations between the results for different
retrieval algorithms and the search results from different different components of a document. We
will only describe one of these operators here, because it was the only type used in the GEOCLEF
runs reported in this paper.</p>
        <p>The MERGE PIVOT operator is used primarily to adjust the probability of relevance for one
search result based on matching elements in another search result. It was developed primarily to
adjust the probabilities of a search result consisting of sub-elements of a document (such as titles
or paragraphs) based on the probability obtained for the same search over the entire document.
It is basically a weighted combination of the probabilities based on a “DocPivot” fraction, such
that:</p>
        <p>Pn = DocP ivot ∗ Pd + (1 − DocP ivot) ∗ Ps
(6)
where Pd represents the document-level probability of relevance, Ps represents the subelement
probability, and Pn represents the resulting new probability estimate.</p>
        <p>For all of our fusion experiments this year, the Ps was the estimated probability of relevance for
a document obtained by a TREC2 with blind feedback search using the topic title and description
(as described above) normalized using MINMAX normalization, and Pd was an OKAPI
BM25 search using the topic title, description, and narrative, also normalized to 0-1 range using
MINMAX normalization. The “DocP ivot” value used for all of the runs submitted was 0.29.</p>
        <p>
          Note that this is not the first time we have used some fusion approaches in GeoCLEF. In the
first GeoCLEF (2005) we also employed fusion approaches in some of our runs, but these did not
use the TREC2 with blind feedback algorithm[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which appears to make an important difference.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approaches for GeoCLEF</title>
      <p>In this section we describe the specific approaches taken for our submitted runs for the GeoCLEF
tasks. First we describe the indexing and term extraction methods used, and then the search
features we used for the submitted runs.
3.1</p>
      <sec id="sec-3-1">
        <title>Indexing and Term Extraction</title>
        <p>The Cheshire II system uses the XML structure of the documents to extract selected portions for
indexing and retrieval. Any combination of tags can be used to define the index contents.</p>
        <p>Name
docno
pauthor
headline
topic
date
geotext
geopoint
geobox</p>
        <p>Table 2 lists the indexes created by the Cheshire II system for the GeoCLEF database and the
document elements from which the contents of those indexes were extracted. The “Used” column
in Table 2 indicates whether or not a particular index was used in the submitted GeoCLEF runs.</p>
        <p>The georeferencing indexing subsystem of Cheshire II was used for the geotext, geopoint, and
geobox indexes. This subsystem is intended to extract proper nouns from the text being indexed
and then attempts to match them in a digital gazetteer. For GeoCLEF we used a gazetteer
derived from the World Gazetteer (http://www.world-gazetteer.com) with 224698 entries in both
English and German. The indexing subsystem provides three different index types: verified place
names (an index of names which matched the gazetteer), point coordinates (latitude and longitude
coordinates of the verified place name) and bounding box coordinates (bounding boxes for the
matched places from the gazetteer). All three types were created, but we ended up not using
any of the geographic indexes in this year’s submissions. Because we do not use complete NLP
parsing techniques, the system is unable to distinguish between proper nouns for places from those
for individuals. This leads to errors in geographic assignment where, for example, articles about
Irving Berlin might be tagged as refering to the city.</p>
        <p>Because there was no explicit tagging of location-related terms in the collections used for
GeoCLEF, we applied the above approach to the “TEXT”, “LD”, and “TX” elements of the records
of the various collections. The part of news articles normally called the “dateline” indicating the
location of the news story was not separately tagged in any of the GeoCLEF collections, but often
appeared as the first part of the text for the story.</p>
        <p>Geographic indexes were not created for the Portuguese sub-collection due to the lack of a
suitable gazetteer. We plan for later work to substitute the “GeoNames” database which is much
more detailed and provides a more complete geographical hierarchy in its records, along with
alternate names in multiple languages.</p>
        <p>For all indexing we used language-specific stoplists to exclude function words and very common
words from the indexing and searching. The German language runs did not use decompounding
in the indexing and querying processes to generate simple word forms from compounds. Although
we tried again this year to make this work within the Cheshire system, we again lacked the time
needed to implement it correctly.</p>
        <p>The Snowball stemmer was used by Cheshire for language-specific stemming.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Search Processing</title>
        <p>Searching the GeoCLEF collection using the Cheshire II system involved using TCL scripts to
parse the topics and submit the title and description or the title, description, and narrative from
the topics. For monolingual search tasks we used the topics in the appropriate language (English,
German, and Portuguese), for bilingual tasks the topics were translated from the source language</p>
        <sec id="sec-3-2-1">
          <title>BERKGCMODETD</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>BERKGCMODETDN</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>BERKMODETDNPIV</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>BERKGCMOENTD</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>BERKGCMOENTDN</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>BERKMOENTDNPIV</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>BERKGCMOPTTD</title>
        </sec>
        <sec id="sec-3-2-8">
          <title>BERKGCMOPTTDN</title>
        </sec>
        <sec id="sec-3-2-9">
          <title>BERKMOPTTDNPIV</title>
        </sec>
        <sec id="sec-3-2-10">
          <title>BERKGCBIENDETD</title>
        </sec>
        <sec id="sec-3-2-11">
          <title>BERKGCBIENDETDN</title>
        </sec>
        <sec id="sec-3-2-12">
          <title>BERKBIENDETDNPIV</title>
        </sec>
        <sec id="sec-3-2-13">
          <title>BERKGCBIPTDETD</title>
        </sec>
        <sec id="sec-3-2-14">
          <title>BERKGCBIPTDETDN</title>
        </sec>
        <sec id="sec-3-2-15">
          <title>BERKBIPTDETDNPIV</title>
        </sec>
        <sec id="sec-3-2-16">
          <title>BERKGCBIDEENTD</title>
        </sec>
        <sec id="sec-3-2-17">
          <title>BERKGCBIDEENTDN</title>
        </sec>
        <sec id="sec-3-2-18">
          <title>BERKBIDEENTDNPIV</title>
        </sec>
        <sec id="sec-3-2-19">
          <title>BERKGCBIPTENTD</title>
        </sec>
        <sec id="sec-3-2-20">
          <title>BERKGCBIPTENTDN</title>
        </sec>
        <sec id="sec-3-2-21">
          <title>BERKBIPTENTDNPIV</title>
        </sec>
        <sec id="sec-3-2-22">
          <title>BERKGCBIDEPTTD</title>
        </sec>
        <sec id="sec-3-2-23">
          <title>BERKGCBIDEPTTDN</title>
        </sec>
        <sec id="sec-3-2-24">
          <title>BERKBIDEPTTDNPIV</title>
        </sec>
        <sec id="sec-3-2-25">
          <title>BERKGCBIENPTTD</title>
        </sec>
        <sec id="sec-3-2-26">
          <title>BERKGCBIENPTTDN</title>
        </sec>
        <sec id="sec-3-2-27">
          <title>BERKBIENPTTDNPIV</title>
          <p>to the target language using the LEC Power Translator PC-based machine translation system.
Table 3 shows the runs submitted and the characteristics of each run, including which task it was
submitted for, and the topic elements used in searching (this are indicated by using the “T” for
the “title” element, “D” for the “description” element and “N” for the “narrative” element. The
topic elements used were combined into a single probabilistic query. For those runs including the
term “fusion” in the “Type” column, both TREC2 with blind feedback algorithm results (using
only topic title and description) and Okapi BM-25 results (using title description and narrative)
were combined using the MERGE PIVOT fusion operator described above with a pivot value of
0.29.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results for Submitted Runs</title>
      <p>The summary results (as Mean Average Precision) for the submitted monolingual and bilingual
runs for both English, German and Portuguese are also shown in Table 3, the Recall-Precision
curves for these runs are also shown in Figures 1 (for monolingual) and 2 (for bilingual). In
Figures 1 and 2 the names for the monolingual runs represent the search algorithms and topic
elements used for that particular target language, which can easily be compared with full names
and descriptions in Table 3 (since each combination has only a single run). The names for Bilingual
runs indicate the languages and topic elements used (except for PIV which is the fusion runs that</p>
      <p>PIV-TDN
T2FB-TDN</p>
      <p>T2FB-TD</p>
      <p>PIV-TDN
T2FB-TDN</p>
      <p>T2FB-TD
use title, description and narrative).</p>
      <p>Table 3 indicates the runs that had the highest overall MAP for the task and language by
asterisks next to the run name.</p>
      <p>Once again we found some rather interesting results among the official runs. For example,
it seem clear that using topic title and description alone is a much better approach with our
algorithms than using title description and narrative. However, in most cases the fusion approach
either exceeds, or is very close to the performance of the TREC2 with blind feedback search with
title and description due to “supporting evidence” for relevance from the Okapi BM-25 algorithm.</p>
      <p>Last year the “weak man” in our runs was German, both monolingual and bilingual. At the
time we were not sure if that might just be due to decompounding issues. However, it turned
out that the real cause of last year’s poor performance was that in indexing the database we had
completely forgotten all of the SDA German collection, and thus the database we used in 2007 had
only about half of the German documents. Needless to say, this was remedied in our submitted
runs for this year.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Analysis and Conclusions</title>
      <p>Because we used a similar processing approach (except for fusion operations) this year as we
used for some of our runs submitted for GeoCLEF 2006 and 2007, we build Table 4 examine the
differences. Overall, we did see some distinct improvements in the text-based approaches over
those used in 2006 and 2007. In Table 4 the MAP for our best runs from 2006, 2007, and 2008 for
DEPT-PIV
ENPT-PIV</p>
      <p>DEPT-TD
DEPT-TDN</p>
      <p>ENPT-TD
ENPT-TDN
various tasks are shown, along with the percentage difference between each pair of years.</p>
      <p>Based on the summary data across participants for GeoCLEF available on the DIRECT system,
it is apparent that the text-based and fusion approaches that we used this year are quite effective
relative to some other approaches. In all of the six main GeoCLEF tasks there was a cheshire run
in the top five participants listed. In Monolingual Portuguese and each Bilingual task (German,
English, and Portuguese) one of the cheshire runs was ranked highest in MAP over all participants.</p>
      <p>We need to do further testing with the fusion approaches, since the results are sensitive to the
“pivot” value used. In addition, since the data show the relative importance of using only the title
and description element instead of title, description, and narrative, we need to see if using using
only title and description only in both parts of the fusion query further improves, or degrades
performance.</p>
      <p>The challenge for next year is to reintroduce actual geographic elements to the mix to see if, for
example, automatic expansion of toponyms in the topic texts will enhance or degrade performance
over the purely textual approach. Since this was done explicitly in many of the topic narratives
we, and use of narrative proved counter-productive we suspect that such expansion may be more
a source of noise instead of fostering improved results. In previous years it appeared that implicit
or explicit toponym inclusion in queries led to better performance when compared to using titles
and descriptions alone in retrieval. But given the results this time, some doubt has been cast over
that assumption, at least for the algorithms that we have been using.</p>
      <p>Although we did not do any explicit geographic processing other than in indexing for this year,
we plan to do so in the future, because we still believe that use of geographical knowledge and
evidence in topics and documents should improve performance over purely text-based methods.
However, given the results reported above, this belief is still unsupported in our experiments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Aitao</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Multilingual information retrieval using english and chinese queries</article-title>
          . In Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck, editors,
          <source>Evaluation of CrossLanguage Information Retrieval Systems: Second Workshop of the Cross-Language Evaluation Forum</source>
          , CLEF-2001, Darmstadt, Germany,
          <year>September 2001</year>
          , pages
          <fpage>44</fpage>
          -
          <lpage>58</lpage>
          . Springer Computer Scinece Series LNCS 2406,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Aitao</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <source>Cross-Language Retrieval Experiments at CLEF</source>
          <year>2002</year>
          , pages
          <fpage>28</fpage>
          -
          <lpage>48</lpage>
          . Springer (LNCS #2785),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Aitao</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fredric C.</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Multilingual information retrieval using machine translation, relevance feedback and decompounding</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>7</volume>
          :
          <fpage>149</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Full Text Retrieval based on Probabilistic Equations with Coefficients fitted by Logistic Regression</article-title>
          .
          <source>In Text REtrieval Conference (TREC-2)</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>William</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fredric C. Gey</surname>
          </string-name>
          , and Daniel P. Dabney.
          <article-title>Probabilistic retrieval based on staged logistic regression</article-title>
          .
          <source>In 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Copenhagen, Denmark, June 21-24, pages
          <fpage>198</fpage>
          -
          <lpage>210</lpage>
          , New York,
          <year>1992</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Geographic information retrieval and spatial browsing</article-title>
          .
          <source>In Linda Smith and Myke Gluck</source>
          , editors,
          <source>GIS and Libraries: Patrons, Maps and Spatial Information</source>
          , pages
          <fpage>81</fpage>
          -
          <lpage>124</lpage>
          . University of Illinois at Urbana-Champaign,
          <string-name>
            <given-names>GSLIS</given-names>
            ,
            <surname>Urbana-Champaign</surname>
          </string-name>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Probabilistic retrieval, component fusion and blind feedback for XML retrieval</article-title>
          .
          <source>In INEX 2005</source>
          , pages
          <fpage>225</fpage>
          -
          <lpage>239</lpage>
          .
          <source>Springer (Lecture Notes in Computer Science, LNCS 3977)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fredric C. Gey</surname>
            , and
            <given-names>Vivien</given-names>
          </string-name>
          <string-name>
            <surname>Petras</surname>
          </string-name>
          . Berkeley at GeoCLEF:
          <article-title>Logistic regression and fusion for geographic information retrieval</article-title>
          .
          <source>In Cross-Language Evaluation Forum: CLEF</source>
          <year>2005</year>
          , pages
          <fpage>963</fpage>
          -
          <lpage>976</lpage>
          .
          <source>Springer (Lecture Notes in Computer Science LNCS 4022)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Joon</given-names>
            <surname>Ho Lee</surname>
          </string-name>
          .
          <article-title>Analyses of multiple evidence combination</article-title>
          .
          <source>In SIGIR '97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 27-31</source>
          ,
          <year>1997</year>
          , Philadelphia, pages
          <fpage>267</fpage>
          -
          <lpage>276</lpage>
          . ACM,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. Sparck</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Relevance weighting of search terms</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>146</lpage>
          , May-June
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          , Stephen Walker, and
          <string-name>
            <surname>Micheline</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hancock-Beauliee</surname>
          </string-name>
          .
          <article-title>OKAPI at TREC-7: ad hoc, filtering, vlc and interactive track</article-title>
          .
          <source>In Text Retrieval Conference (TREC7)</source>
          ,
          <source>Nov. 9-1 1998 (Notebook)</source>
          , pages
          <fpage>152</fpage>
          -
          <lpage>164</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            and
            <given-names>Steven</given-names>
          </string-name>
          <string-name>
            <surname>Walker</surname>
          </string-name>
          .
          <article-title>On relevance weights with little relevance information</article-title>
          .
          <source>In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>16</fpage>
          -
          <lpage>24</lpage>
          . ACM Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>