<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cheshire II at GeoCLEF: Fusion and Query Expansion for GIR</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ray R. Larson School of Information Management and Systems University of California</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper I will describe the Berkeley (group 1) approach to the GeoCLEF task for CLEF 2005. The main technique we are testing is the fusion of multiple probabilistic searches against di erent XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. Since this is the rst time that the Cheshire system has been used for CLEF this approach can, at best, be considered a very preliminary base testing of some retrieval algorithms and approaches. The primary geographically based approaches taken for GeoCLEF were to georeference proper nouns in the text using a gazetteer derived from the World Gazetteer with both English and German names for each place, and to expand place names for regions or countries in the queries by the names of the countries or cities in those regions or countries.</p>
      </abstract>
      <kwd-group>
        <kwd>Cheshire II</kwd>
        <kwd>Logistic Regression</kwd>
        <kwd>Data Fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>For GeoCLEF 2005 the Berkeley IR research group split into two groups (Berkeley 1 and Berkeley
2). Berkeley 2 used the same technques as used in previous CLEF evaluations, while Berkeley 1
tried some alternative algorithms and fusion methods for both the GeoCLEF and Domain Speci c
task. This paper will focus on the techniques used by the Berkeley 1 group for GeoCLEF and the
results of our o cial submissions, as well as some additional tests using versions of the algorithms
employed by the Berkeley 2 group. The main technique being tested is the fusion of multiple
probabilistic searches against di erent XML components using both Logistic Regression (LR)
algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations
of queries in cross-language searching. Since this is the rst time that the Cheshire II system has
been used for CLEF, this approach can at best be considered a very preliminary base testing of
some retrieval algorithms and approaches. This paper is organized as follows: In the next section
we discuss the retrieval algorithms and fusion methods used for the submitted runs. We then
discuss the speci c approaches taken for indexing and retrieval in GeoCLEF and the results of the
submitted runs. Then we compare our submitted results to some additional runs with alternate
approaches conducted later. Finally we present conclusions and some discussion of the GeoCLEF
task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Retrieval Algorithms and Fusion Operators</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] we conducted an analysis of the overlap between the result lists retrieved by our Logistic
Regression algorithm and the Okapi BM-25 algorithm for the INEX XML Retrieval test collection.
We found that, on average, over half of the result lists retrieved by each algorithm in these overlap
tests were both non-relevant and unique to that algorithm, ful lling the main criteria for e ective
algorithm combination suggested by Lee[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: that the algorithms have similar sets of relevant
documents and di erent sets of non-relevant. This section is largely a repetition of the material
presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], with additional discussion of how these algorithms were applied for the CLEF
GeoCLEF task.
      </p>
      <p>
        In the remainder of this section we describe the Logistic Regression and Okapi BM-25
algorithms that were used for GeoCLEF and we also discuss the methods used to combine the results
of the di erent algorithms. The algorithms and combination methods are implemented as part
of the Cheshire II XML/SGML search engine [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">6, 7, 5</xref>
        ] which also supports a number of other
algorithms for distributed search and operators for merging result lists from ranked or Boolean
sub-queries.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Logistic Regression Algorithm</title>
        <p>
          The basic form and variables of the Logistic Regression (LR) algorithm used was originally
developed by Cooper, et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. It provided good full-text retrieval performance in the TREC3 ad hoc
task and in TREC interactive tasks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and for distributed IR [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As originally formulated, the
LR model of probabilistic IR attempts to estimate the probability of relevance for each document
based on a set of statistics about a document collection and a set of queries in combination with a
set of weighting coe cients for those statistics. The statistics to be used and the values of the
coefcients are obtained from regression analysis of a sample of a collection (or similar test collection)
for some set of queries where relevance and non-relevance has been determined. More formally,
given a particular query and a particular document in a collection P (R j Q; D) is calculated and
the documents or components are presented to the user ranked in order of decreasing values of
that probability. To avoid invalid probability values, the usual calculation of P (R j Q; D) uses the
\log odds" of relevance given a set of S statistics, si, derived from the query and database, such
that:
where b0 is the intercept term and the bi are the coe cients obtained from the regression analysis of
the sample collection and relevance judgements. The nal ranking is determined by the conversion
of the log odds form to probabilities:
Based on the structure of XML documents as a tree of XML elements, we de ne a \document
component" as an XML subtree that may include zero or more subordinate XML elements or
subtrees with text as the leaf nodes of the tree. Thus, a component might be de ned using any of
the tagged elements in a document. However, not all possible components are likely to be useful
in content-oriented retrieval (e.g., tags indicating that a word in the title should be in italic type,
S
log O(R j Q; D) = b0 + X bisi
        </p>
        <p>i=1
P (R j Q; D) = 1 + elog O(RjQ;D)
elog O(RjQ;D)
(1)
(2)
or the page number range) therefore we de ned the retrievable components selectively, including
the titles, dates, and document ids.</p>
        <p>Naturally, a full XML document may also be considered a \document component". As
discussed below, the indexing and retrieval methods used in this research take into account a selected
set of document components for generating the statistics used in the search process and for
extraction of the parts of a document to be returned in response to a query. Because we are dealing
with not only full documents, but also document components (which for some collections include
elements such as sections and paragraphs or similar structures) derived from the documents, we
will use C to represent document components in place of D. Therefore, the full equation describing
the LR algorithm used in these experiments is:
log O(R j Q; C)
=
+
+
+
+
+
0 0 1 jQcj 11</p>
        <p>X log qtfj AA
b0 + @b1 @ jQcj j=1</p>
        <p>b2
0
pjQj
pcl
0 1 jQcj 11</p>
        <p>X log tfjAA
@ jQcj j=1</p>
        <p>N</p>
        <p>ntj
ntj
11</p>
        <p>AA
(b6 log jQdj)
(3)</p>
        <sec id="sec-2-1-1">
          <title>Where:</title>
          <p>Q is a query containing terms T ,
jQj is the total number of terms in Q,
jQcj is the number of terms in Q that also occur in the document component,
tfj is the frequency of the jth term in a speci c document component,
qtfj is the frequency of the jth term in Q,
ntj is the number of components (of a given type) containing the jth term,
cl is the document component length measured in bytes.</p>
          <p>N is the number of components of a given type in the collection.
bi are the coe cients obtained though the regression analysis.</p>
          <p>
            This equation, used in estimating the probability of relevance in this research, is essentially the
same as that used in [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] for TREC3. The bi coe cients in the \Base" version of this algorithm were
estimated using relevance judgements and statistics from the TREC/TIPSTER test collection.
For GeoCLEF we used this Base version for our retrieval of all components with the addition
of the component fusion methods described later. The coe cients for the Base version were
b0 = 3:70; b1 = 1:269; b2 = 0:310; b3 = 0:679; b4 = 0:021; b5 = 0:223 and b6 = 4:01.
The version of the Okapi BM-25 algorithm used in these experiments is based on the description
of the algorithm in Robertson [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], and in TREC notebook proceedings [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. As with the LR
algorithm, we have adapted the Okapi BM-25 algorithm to deal with document components :
j=1
jQcj
X w(1) (k1 + 1)tfj (k3 + 1)qtfj
          </p>
          <p>K + tfj k3 + qtfj
(4)</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Where (in addition to the variables already de ned):</title>
          <p>K is k1((1</p>
          <p>b) + b dl=avcl)
k1, b and k3 are parameters (1.5, 0.45 and 500, respectively, were used),
avcl is the average component length measured in bytes
w(1) is the Robertson-Sparck Jones weight:
w(1) = log</p>
          <p>( Rr+r0+:05:5 )
( N nnttjj rR+0r:5+0:5 )
r is the number of relevant components of a given type that contain a given term,
R is the total number of relevant components of a given type for the query.</p>
          <p>
            Our current implementation uses only the a priori version (i.e., without relevance information)
of the Robertson-Sparck Jones weights, and therefore the w(1) value is e ectively just an IDF
weighting. The results of searches using our implementation of Okapi BM-25 and the LR algorithm
seemed su ciently di erent to o er the kind of conditions where data fusion has been shown to
be be most e ective [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], and our overlap analysis of results for each algorithm (described in the
evaluation and discussion section) has con rmed this di erence and the t to the conditions for
e ective fusion of results.
          </p>
          <p>The system used supports searches combining probabilistic and (strict) Boolean elements, as
well as operators to support various merging operations for both types of intermediate result sets.
However, in GeoCLEF we did not use this capability.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Result Combination Operators</title>
        <p>The Cheshire II system used in this evaluation provides a number of operators to combine the
intermediate results of a search from di erent components or indexes. With these operators we
have available an entire spectrum of combination methods ranging from strict Boolean operations
to fuzzy Boolean and normalized score combinations for probabilistic and Boolean results. These
operators are the means available for performing fusion operations between the results for di erent
retrieval algorithms and the search results from di erent di erent components of a document.
We will only describe two of these operators here, because they were the only type used in the
GEOCLEF runs reported in this paper.</p>
        <p>
          The MERGE CMBZ operator is based on the \CombMNZ" fusion algorithm developed by
Shaw and Fox [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and used by Lee [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In our version we take the normalized scores, but then
further enhance scores for components appearing in both lists (doubling them) and penalize
normalized scores appearing low in a single result list, while using the unmodi ed normalized score
for higher ranking items in a single list.
        </p>
        <p>The MERGE PIVOT operator is used primarily to adjust the probability of relevance for one
search result based on matching elements in another search result. It was developed primarily to
adjust the probabilities of a search result consisting of sub-elements of a document (such as titles
or paragraphs) based on the probability obtained for the same search over the entire document.
It is basically a weighted combination of the probabilities based on a \DocPivot" fraction, such
that:</p>
        <p>Pn = DocP ivot Pd + (1</p>
        <p>DocP ivot) Ps
(5)
where Pd represents the document-level probability of relevance, Ps represents the subelement
probability, and Pn representing the resulting new probability. The \DocP ivot" value used for all
of the runs submitted was 0.64. Since this was the rst year for GeoCLEF, this value was derived
from experiments on 2004 data for other CLEF collections (which may have been inappropriate
for the GeoCLEF data, which further testing will reveal). The basic operator can be applied to
either probabilistic results, or non-probabilistic results or both (in the latter case the scores are
normalized using MINMAX normalization to range between 0 and 1).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approaches for GeoCLEF</title>
      <p>In this section we describe the speci c approaches taken for our submitted runs for the GeoCLEF
task. First we describe the indexing and term extraction methods used, and then the search
features we used for the submitted runs.
3.1</p>
      <sec id="sec-3-1">
        <title>Indexing and Term Extraction</title>
        <p>For both the monolingual and bilingual tasks we indexed the documents using the Cheshire II
system. The document index entries and queries were stemmed using the Snowball stemmer,
and a new georeferencing indexing subsystem was used. This subsystem extracts proper nouns
from the text being indexed and attempts to match them in a digital gazetteer. For GeoCLEF we
used a gazetteer derived from the World Gazetteer (http://www.world-gazetteer.com) with 224698
entries in both English and German. The indexing subsystem provides three di erent index types:
veri ed place names (an index of names which matched the gazetteer), point coordinates (latitude
and longitude coordinates of the veri ed place name) and bounding box coordinates (bounding
boxes for the matched places from the gazetteer). All three types were created, but due to time
constraints we only used the veri ed place names in our tests. Text indexes were also created for
separate XML elements (such as document titles or dates) as well as for the entire document. It
is worth noting that, although the names are compared against the gazetteer, it is quite common
for proper name of persons and places to be the same and this leads to potential false associations
between articles mentioning persons with such name and particular places.</p>
        <p>Name
docno
pauthor
headline
topic
date
geotext
geopoint
geobox</p>
        <p>Description
Document ID
Author Names
Article Title
Content Words
Date of Publication
Validated place names
Validated coordinates
for place names
Validated bounding boxes
for place names</p>
        <p>Content Tags
Searching the GeoCLEF collection used Cheshire II scripts to parse the topics and submit the
title and description from the topics to one or more indexes. For monolingual search tasks we
used the topics in the appropriate language (English or German), for bilingual tasks the topics
were translated from the source language to the target language using three di erent machine
translation (MT) systems, the L&amp;H PC-based system, SYSTRAN (via Babel sh at Altavista),
and PROMT (also via their web interface). Each of these translations were combined into a
single probabilistic query. The hope was to overcome the translation errors of a single system by
including alternatives.</p>
        <p>We tried two main approaches for searching, the rst used only the topic text from the title
and desc elements, the second included the spatialrelation and location elements as well. In all
cases the di erent indexes mentioned above were used, and probabilistic searches were carried
out on each index, and the results combined using the CombMNZ algorithm, and by a weighted
combination of partial element and full document scores. For bilingual searching we used both the
Berkeley TREC3 and the Okapi BM-25 algorithm, for monolingual we used only TREC3. For one
submitted run in each task we did no query expansion and did not use the location elements in the
topics. For the other runs each of the place names identi ed in the queries were expanded when
that place was the name of a region or country. For example when running search against the
English databases the name \Europe" was expanded to \Albania Andorra Austria Belarus
Belgium Bosnia and Herzegovina Bulgaria Croatia Cyprus Czech Republic Denmark Estonia Faroe
Islands Finland France Georgia Germany Gibraltar Greece Guernsey and Alderney Hungary
Iceland Ireland Isle of Man Italy Jersey Latvia Liechtenstein Lithuania Luxembourg Macedonia Malta
Moldova Monaco Netherlands Norway Poland Portugal Romania Russia San Marino Serbia and
Montenegro Slovakia Slovenia Spain Svalbard and Jan Mayen Sweden Switzerland Turkey Ukraine
United Kingdom Vatican City", while for searches against the German databases \Europa" was
expanded to \Albanien Andorra sterreich Weirussland Belgien Bosnien und Herzegowina
Bulgarien Kroatien Zypern Tschechische Republik Dnemark Estland Frer-Inseln Finnland Frankreich
Georgien Deutschland Gibraltar Griechenland Guernsey und Alderney Ungarn Island Irland Man
Italien Jersey Lettland Liechtenstein Litauen Luxemburg Mazedonien Malta Moldawien Monaco
Niederlande Norwegen Polen Portugal Rumnien Russland San Marino Serbien und Montenegro
Slowakei Slowenien Spanien Svalbard und Jan Mayen Schweden Schweiz Trkei Ukraine
Grobritannien Vatikan". Example queries for monolingual searches are shown in Figure 3</p>
        <p>The indexes combined in searching included the headline, topic, and geotext indexes (as
described in Table 1) for searches that include the location element, and the headline and topic for
the searches without the locations element. For the bilingual tasks, three sub-queries, one for each
query translation were run and then the results were merged using the CombMNZ algorithm. For
Monolingual tasks the title and topic results were combined with each other using CombMNZ and
the nal score combined with an expanded search for place names in the topic and geotext indexes.
0.9
0.8
Examples of the queries used are shown in Figures 3 and 4 in Appendix A, as you may observe
by close inspection there were some bugs in the scripts used to generate these queries some of
which have been removed for this paper. These included things such as including \Kenya" in the
expansion for Europe, and including two copies of all expansion names, when a single copy should
have been used. We intend (time permitting) to rerun a number of the queries to see if, and how,
these errors a ected the results.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results for Submitted Runs</title>
      <p>The summary results (as Mean Average Precision) for the submitted bilingual and monolingual
runs for both English and German are shown in Table 2, the Recall-Precision curves for these runs
are also shown in Figures 1 (for monolingual) and 2 (for bilingual). In Figures 1 and 2 the name
are abbrevated to the nal letters and numbers of the full name in Table 2, and those beginning
with \POST" are uno cial runs described in the next section.</p>
      <p>Table 2 indicates some rather curious results that warrant further investigation as to the cause.
Notice that the result for all of the English monolingual runs exceed the rsults for bilingual German
to English runs - this is typical for cross-langauge retrieval. However, in the case of German this
expected pattern is reversed, and the German monolingual runs perform worse than either of the
bilingual English to German runs. We haven't yet determined exactly why this might be the case,
BERK1BLDEENLOC01
BERK1BLDEENNOL01
BERK1BLENDELOC01
BERK1BLENDENOL01
BERK1MLDELOC02
BERK1MLDELOC03
BERK1MLDENOL01
BERK1MLENLOC02
BERK1MLENLOC03
BERK1MLENNOL01
Bilingual German)English
Bilingual German)English
Bilingual English)German
Bilingual English)German
Monolingual German
Monolingual German
Monolingual German
Monolingual English
Monolingual English
Monolingual English</p>
      <p>Location
yes
no
yes
no
yes
yes
no
yes
yes
no
but there are number possible reasons (e.g., since a combination of Okapi and Logistic Regression
searches are used for the bilingual task this may be an indication that Okapi is more e ective for
German). Also, in the monolingual runs, both English and German, use of the location tag and
expansion of the query (runs numbered LOC02 and LOC03 respectively) did better than no use
of the location tag or expansion. For the bilingual runs the results are mixed, with German to
English runs showing an improvement with location use and expansion (LOC01) and English to
German showing the opposite.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Additional Runs</title>
      <p>
        After the o cial submission we used the same version of the Logistic Regression algorithm as the
Berkeley2 group (the \TREC2" algorithm), which incorporates blind feedback (which is lacking
in the LR algorithm described above). The parameters used for blind feedback were 13 documents
and the top-ranked 16 terms from those documents added to the original query. We used essentially
an identical algorithm to that de ned by Cooper, Gey and Chen in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The results from the
bilingual and monolingual runs for both English and German are shown in Table 3, the
RecallPrecision curves for these runs are also shown in Figures 1 (for monolingual) and 2 (for bilingual).
In Figures 1 and 2 the names abbrevated to the nal letters of the full name in Table 3, pre xed by
\POST". These are uno cial runs to test the di erence in the algorithms in an identical runtime
environment.
      </p>
      <p>Run Name
POSTBLDEENEXP
POSTBLDEENNOL
POSTBLENDEEXP
POSTBLENDENOL
POSTMLDELOC
POSTMLDENOL
POSTMLENEXP
POSTMLENLOC
POSTMLENNOL</p>
      <p>Description
Bilingual German)English
Bilingual German)English
Bilingual English)German
Bilingual English)German
Monolingual German
Monolingual German
Monolingual English
Monolingual English
Monolingual English</p>
      <p>Location
yes
no
yes
no
yes
no
yes
yes
no</p>
      <p>MAP</p>
      <p>As can be seen by comparing Table 3 with Table 2, all of the comparable runs for show
improvement in results with the TREC2 algorithm with blind feedback. We have compared notes
with the Berkeley2 group and with minor di erences to be expected given the di erent indexing
methods, stoplists, etc. used, these results are comparable to theirs.</p>
      <p>The queries submitted in these uno cial runs were much simpler than those used in the o cial
runs. For monolingual retrieval only the \topic" index was used and the geotext index was not used
at all, for the bilingual runs the same pattern of using multiple query translations and combining
the results was used as in our o cial runs. This may actually be detrimental to the performance,
since the expanded queries perform worse than the unexpanded queries - the opposite behaviour
observed in the o cial runs.</p>
      <p>In the monolingual runs there appears to be similar behavior, The best using the topic titles
and description along with the location tag provided the best results, but expanding the locations
as in the o cial runs (the English ML run ending in EXP) performed considerably worse than the
the unexpanded runs.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>Analysis of these results is still ongoing. There are a number of, as yet, unexplained behaviors
in some of our results. We plan to continue working on the use of fusion, and hope to discover
e ective ways to combine highly e ective algorithms, such as the TREC2 algorith, as well as work
on adding the same blind feedback capability to the TREC3 Logistic Regression algorithm.</p>
      <p>One obvious conclusion that can be drawn is that basic TREC2 is a highly e ective algorithm
for the GeoCLEF tasks, and the fusion approaches tried in these tests are most de nitely NOT
very e ective (in sprite of their relatively good e ectiveness in other retrieval tasks such as INEX).</p>
      <p>Another conclusion is that, in some cases, query expansion of region names to a list of names
of particular countries in that region is modestly e ective (although we haven't yet been able to
test for statistical signi cance). In other cases, however it can be quite detrimental. However we
still need to determine if the problems with the expansion were due the nature of the expansion
itself, or errors in how it was done.</p>
      <p>A</p>
      <p>Example Queries submitted
search ((headline @ {vegetable exporters of europe what countries are
exporters of fresh, dried or frozen vegetables? })
!MERGE_CMBZ (topic @ {vegetable exporters of europe what countries
are exporters of fresh, dried or frozen vegetables? }))
!MERGE_PIVOT/64 (topic @ {vegetable exporters of europe what</p>
      <p>countries are exporters of fresh, dried or frozen vegetables? })
search ((headline @ {vegetable exporters of europe what countries are
exporters of fresh, dried or frozen vegetables? vegetable exporters europe }
!MERGE_CMBZ (topic @ {vegetable exporters of europe what countries are
exporters of fresh, dried or frozen vegetables? vegetable exporters europe})
!MERGE_CMBZ ((geotext @ {vegetable exporters of europe what countries are
exporters of fresh, dried or frozen vegetables? vegetable exporters europe })
!MERGE_CMBZ (topic @ { Albania Andorra Austria Belarus Belgium</p>
      <p>Bosnia and Herzegovina Bulgaria Croatia Cyprus Czech Republic Denmark
Estonia Faroe Islands Finland France Georgia Germany Gibraltar Greece
Guernsey and Alderney Hungary Iceland Ireland Isle of Man Italy Jersey
Latvia Liechtenstein Lithuania Luxembourg Macedonia Malta Moldova Monaco
Netherlands Norway Poland Portugal Romania Russia San Marino
Serbia and Montenegro Slovakia Slovenia Spain Svalbard and Jan Mayen
Sweden Switzerland Turkey Ukraine United Kingdom Vatican City }))
!MERGE_PIVOT/64 (topic @ {vegetable exporters of europe what countries are
exporters of fresh, dried or frozen vegetables? vegetable exporters europe })
PART QUERY1: search (topic @+ { shark attacks against australia and california
the documents reports over attacks of sharks on people.})
!MERGE_CMBZ (topic @ { shark attacks against australia and california the
documents reports over attacks of sharks on people.}) RESULTSETID SET1
PART QUERY2: search (topic @+ { shark fish attacks before australia and
california the documents report?r attacks of shark fish on humans.})
!MERGE_CMBZ (topic @ { shark fish attacks before australia and california
the documents report?r attacks of shark fish on humans.}) RESULTSETID SET2
PART QUERY3: search (topic @+ { shark fish attacks before australia and
california the documents report about attacks about shark fishing on person.})
!MERGE_CMBZ (topic @ { shark fish attacks before australia and california
the documents report about attacks about shark fishing on person.})
RESULTSETID SET3
FINAL QUERY: search SET1: !MERGE_CMBZ SET2: !MERGE_CMBZ SET3: RESULTSETID SET4
PART QUERY1: search (topic @+ { shark attacks against australia and california
the documents reports over attacks of sharks on people. shark attacks
australia : california})
!MERGE_CMBZ (topic @ { shark attacks against australia and california the
documents reports over attacks of sharks on people. shark attacks australia
: california})
!MERGE_CMBZ (topic @ { australien californien australien</p>
      <p>californien }) RESULTSETID SET1
PART QUERY2: search (topic @+ { shark fish attacks before australia and
california the documents report?r attacks of shark fish on humans.
shark fish attacks australia : california})
!MERGE_CMBZ (topic @ { shark fish attacks before australia and california the
documents report?r attacks of shark fish on humans. shark fish attacks
australia : california})
!MERGE_CMBZ (topic @ {australien californien australien californien})</p>
      <p>RESULTSETID SET2
PART QUERY3: search (topic @+ {shark fish attacks before australia and
california the documents report about attacks about shark fishing on person.
shark fish attacks australia : california})
!MERGE_CMBZ (topic @ { shark fish attacks before australia and california the
documents report about attacks about shark fishing on person. shark fish
attacks australia : california})
!MERGE_CMBZ (topic @ {australien californien australien californien})</p>
      <p>RESULTSETID SET3
FINAL QUERY: search SET1: !MERGE_CMBZ SET2: !MERGE_CMBZ SET3: RESULTSETID SET4</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Aitao</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Cross-language retrieval experiments at clef 2002</article-title>
          . pages
          <fpage>28</fpage>
          {
          <fpage>48</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>William</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cooper</surname>
          </string-name>
          , Aitao Chen, and
          <string-name>
            <surname>Fredric</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Experiments in the probabilistic retrieval of full text documents</article-title>
          . In Donna K. Harman, editor,
          <source>Overview of the Third Text Retrieval Conference (TREC-3): (NIST Special Publication</source>
          <volume>500</volume>
          -225), Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          ,
          <year>1994</year>
          . National Institute of Standards and Technology.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>William</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fredric C. Gey</surname>
          </string-name>
          , and Daniel P. Dabney.
          <article-title>Probabilistic retrieval based on staged logistic regression</article-title>
          .
          <source>In 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Copenhagen, Denmark, June 21-24, pages
          <fpage>198</fpage>
          {
          <fpage>210</fpage>
          , New York,
          <year>1992</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>TREC interactive with cheshire II</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>37</volume>
          :
          <fpage>485</fpage>
          {
          <fpage>505</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>A logistic regression approach to distributed IR</article-title>
          .
          <source>In SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 11-15</source>
          ,
          <year>2002</year>
          , Tampere, Finland, pages
          <volume>399</volume>
          {
          <fpage>400</fpage>
          . ACM,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Cheshire II at INEX: Using a hybrid logistic regression and boolean model for XML retrieval</article-title>
          .
          <source>In Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX)</source>
          , pages
          <fpage>18</fpage>
          {
          <fpage>25</fpage>
          . DELOS workshop series,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Cheshire II at INEX 03: Component and algorithm fusion for XML retrieval</article-title>
          .
          <source>In INEX 2003 Workshop Proceedings</source>
          , pages
          <volume>38</volume>
          {
          <fpage>45</fpage>
          . University of Duisburg,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>A fusion approach to XML structured document retrieval</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>8</volume>
          :
          <fpage>601</fpage>
          {
          <fpage>629</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Joon</given-names>
            <surname>Ho Lee</surname>
          </string-name>
          .
          <article-title>Analyses of multiple evidence combination</article-title>
          .
          <source>In SIGIR '97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 27-31</source>
          ,
          <year>1997</year>
          , Philadelphia, pages
          <fpage>267</fpage>
          {
          <fpage>276</fpage>
          . ACM,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          , Stephen Walker, and
          <string-name>
            <surname>Micheline</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hancock-Beauliee</surname>
          </string-name>
          .
          <article-title>OKAPI at TREC-7: ad hoc, ltering, vlc and interactive track</article-title>
          .
          <source>In Text Retrieval Conference (TREC7)</source>
          ,
          <source>Nov. 9-1 1998 (Notebook)</source>
          , pages
          <fpage>152</fpage>
          {
          <fpage>164</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            and
            <given-names>Steven</given-names>
          </string-name>
          <string-name>
            <surname>Walker</surname>
          </string-name>
          .
          <article-title>On relevance weights with little relevance information</article-title>
          .
          <source>In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>16</volume>
          {
          <fpage>24</fpage>
          . ACM Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Joseph</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shaw</surname>
            and
            <given-names>Edward A.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          .
          <article-title>Combination of multiple searches</article-title>
          .
          <source>In Proceedings of the 2nd Text REtrieval Conference (TREC-2)</source>
          ,
          <source>National Institute of Standards and Technology Special Publication 500-215</source>
          , pages
          <fpage>243</fpage>
          {
          <fpage>252</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>