<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experiments in Classification Clustering and Thesaurus Expansion for Domain Specific Cross-Language Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ray R. Larson</string-name>
          <email>ray@sims.berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we will describe Berkeley's approach to the Domain Specific (DS) track for CLEF 2007. This year we are using forms of the Entry Vocabulary Indexes and Thesaurus expansion approaches used by Berkeley in 2005[10]. Despite the basic similarity of approach, we are using quite different implementations with different characteristics. We are not, however, using the tools for de-compounding for German that were developed over the past many years and used very successfully in earlier Berkeley entries in this track. All of the runs submitted were performed using the Cheshire II system. This year Berkeley submit a total of submitted 24 runs, including one for each subtask of the DS track. These include 6 Monolingual runs for English, German, and Russian, 12 Bilingual runs (4 X2EN, 4 X2DE, and 4 X2RU), and 6 Multilingual runs (2 EN, 2 DE, and 2 RU). Since the overall results were not available at the time this paper was due, we do not know how these results fared compared to other participants, so the discussion in this paper focuses on comparisions between our own runs.</p>
      </abstract>
      <kwd-group>
        <kwd>Cheshire II</kwd>
        <kwd>Logistic Regression</kwd>
        <kwd>Entry Vocabulary Indexes</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper discusses the retrieval methods and evaluation results for Berkeley’s participation in
the CLEF 2007 Domain Specific track. Last year for this track we used a baseline approach using
text retrieval methods only[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] without query expansion or use of the Thesaurus. This year we have
focused instead on query expansion using Entry Vocabulary Indexes(EVIs)[
        <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
        ], and thesaurus
lookup of topic terms. We continue to use probabilistic IR methods based on logistic regression.
      </p>
      <p>
        All of the submitted runs for this year’s Domain Specific track used the Cheshire II system
for indexing and retrieval. The “Classification Clustering” feature of the system was used to
generate the EVIs used in query expansion. The original approach for Classification Clustering
was in searching was described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Although the method has experienced considerable
changes in implementation, the basic approach is still the same: topic-rich elements extracted
from individual records in the database (such as titles, classification codes, or subject headings)
are merged based on a normalized version of a particular organizing element (usually the
classification or subject headings), and each such classification cluster is treated as a single ”document”
containing the combined topic-rich elements of all the individual documents that have the same
values of the organizing element. The EVI creation and search approach taken for this research is
described below in Section 3.3.
      </p>
      <p>
        This paper first very briefly describes the probabilistic retrieval methods used, including our
blind feedback method for text, which are discussed in greater detail in our ImageCLEF notebook
paper[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We then describe our submissions for the various DS sub-tasks and the results obtained.
Finally we present conclusions and discussion of future approaches to this track.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Retrieval Algorithms</title>
      <p>
        As we have discussed in our other papers for the ImageCLEF and GeoCLEF tracks in this volume,
basic form and variables of the Logistic Regression (LR) algorithm used for all of our submissions
were originally developed by Cooper, et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To formally the LR method, the goal of the logistic
regression method is to define a regression model that will estimate (given a set of training data),
for a particular query Q and a particular document D in a collection the value P (R | Q, D), that
is, the probability of relevance for that Q and D. This value is then used to rank the documents
in the collection which are presented to the user in order of decreasing values of that probability.
To avoid invalid probability values, the usual calculation of P (R | Q, D) uses the “log odds” of
relevance given a set of S statistics, si, derived from the query and database, giving a regression
formula for estimating the log odds from those statistics:
where b0 is the intercept term and the bi are the coefficients obtained from the regression analysis
of a sample set of queries, a collection and relevance judgements. The final ranking is determined
by the conversion of the log odds form to probabilities:
2.1
      </p>
      <sec id="sec-2-1">
        <title>TREC2 Logistic Regression Algorithm</title>
        <p>
          For all of our Domain Specific submissions this year we used a version of the Logistic Regression
(LR) algorithm that has been used very successfully in Cross-Language IR by Berkeley researchers
(1)
(2)
for a number of years[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and which is also used in our GeoCLEF and Domain Specific submissions.
For the Domain Specific track we used the Cheshire II information retrieval system
implementation of this algorithm. One of the current limitations of this implementation is the lack of
decompounding for German documents and query terms in the current system. As noted in our
other CLEF notebook papers, the Logistic Regression algorithm used was originally developed by
Cooper et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for text retrieval from the TREC collections for TREC2. The basic formula is:
log O(R|C, Q) = log
        </p>
        <p>p(R|C, Q)
1 − p(R|C, Q)
= log
p(R|C, Q)
p(R|C, Q)
1 |XQc| qtfi
p|Qc| + 1 i=1 ql + 35
where C denotes a document component (i.e., an indexed part of a document which may be the
entire document) and Q a query, R is a relevance variable,
p(R|C, Q) is the probability that document component C is relevant to query Q,
p(R|C, Q) the probability that document component C is not relevant to query Q, which is 1.0
p(R|C, Q)
|Qc| is the number of matching terms between a document component and a query,
qtfi is the within-query frequency of the ith matching term,
tfi is the within-document frequency of the ith matching term,
ctfi is the occurrence frequency in a collection of the ith matching term,
ql is query length (i.e., number of terms in a query like |Q| for non-feedback situations),
cl is component length (i.e., number of terms in a component), and
Nt is collection length (i.e., number of terms in a test collection).
ck are the k coefficients obtained though the regression analysis.</p>
        <p>
          More details of this algorithm and the coefficients used with it may be found in our ImageCLEF
notebook paper where the same algorithm and coefficients were used. In addition to this primary
algorithm we used a version that performs “blind feedback” during the retrieval process. The
method used is described in detail in our ImageCLEF notebook paper[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Our blind feedback
approach uses the 10 top-ranked documents from an initial retrieval using the LR algorithm above,
and selects the top 10 terms from the content of those documents, using a version of the Robertson
and Sparck Jones probabilistic term relevance weights [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Those ten terms are merged with the
original query and new term frequency weights are calculated, and the revised query submitted
to obtain the final ranking.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approaches for Domain Specific Retrieval</title>
      <p>In this section we describe the specific approaches taken for our submitted runs for the Domain
Specific track. First we describe the database creation and the indexing and term extraction
methods used, and then the search features we used for the submitted runs.
For the purposes of this research we combined the GIRT German/English thesaurus along with
the English and Russian mappings for the CSASA and ISISS databases to produce a
multilingual thesaurus where elements from each of the original sources, as well as transliterations and
capitalizations and the conversion of all data to UTF-8 encoding (this was also performed on the
databases themselves before indexing). An example entry from this thesaurus is shown below:
&lt;entry&gt;
&lt;german&gt;Absatz&lt;/german&gt;
&lt;german-caps&gt;ABSATZ&lt;/german-caps&gt;
&lt;scope-note-de&gt;nicht im Sinne von Vertrieb&lt;/scope-note-de&gt;
&lt;english-translation&gt;sale&lt;/english-translation&gt;
&lt;german_utf8&gt;Absatz&lt;/german_utf8&gt;
&lt;russian&gt;
sbyt
&lt;/russian&gt;
&lt;translit&gt;sbyt &lt;/translit&gt;
&lt;mapping&gt;
&lt;original-term&gt;Absatz&lt;/original-term&gt;
&lt;mapped-term&gt;Sales&lt;/mapped-term&gt;
&lt;/mapping&gt;
&lt;mapping&gt;
&lt;original-term&gt;sale&lt;/original-term&gt;
&lt;mapped-term&gt;Sales&lt;/mapped-term&gt;
&lt;/mapping&gt;
&lt;/entry&gt;</p>
      <p>Note that the spacing around the Russian cyrillic term was inserted in the paper formatting
process and was not in the original data.</p>
      <p>Because not all of the terms had mappings, or equivalent Russian terms those parts are not
present for all of the thesaurus entries.
3.2</p>
      <sec id="sec-3-1">
        <title>Indexing and Term Extraction</title>
        <p>Although the Cheshire II system uses the XML structure of documents and extracts selected
portions of the record for indexing and retrieval, for the submitted runs this year we used only a
single one of these indexes that contains the entire content of the document.</p>
        <p>
          Table 1 lists the indexes created for the Domain Specific database and the document elements
from which the contents of those indexes were extracted. The “Used” column in Table 1 indicates
whether or not a particular index was used in the submitted Domain Specific runs. This year we
used the Entry Vocabulary Indexes (search term recommenders) that were used in somewhat
different form by Berkeley in previous years (see [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]), without overall data on the track performance
this year it is difficult to say whether this approach improved upon, or degraded, the text-retrieval
baseline we established last year. Given the changes in the collections used (the addition of the
CSASA English collection and elimination of the Russian SocioNet data), it is not possible to
directly compare MAP or other evaluation measures across years. The implementation of the
Classification Cluster -based EVIs will be discussed in the next section.
        </p>
        <p>For all indexing we used language-specific stoplists to exclude function words and very common
words from the indexing and searching. The German language runs, however, did not use
decompounding in the indexing and querying processes to generate simple word forms from compounds
(actually we tried, but there was a bug that failed to match any compounds in our runs). This
is another aspect of our indexing for this year’s Domain Specific task that reduced our results
relative to last year.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Entry Vocabulary Indexes</title>
        <p>
          As noted above earliest versions of Entry Vocabulary Indexes were developed to facilitate
automatic classification of library catalog records, and first used in searching in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Those used a
simple frequency-based probabilistic model in searching, but a primary feature was that the
“Classification clusters”, were treated as documents and the terms associated with top-ranked clusters
were combined with the original query, in a method similar to “blind feedback”, to provide an
enhanced second stage of search.
        </p>
        <p>
          Our later work with EVIs used a maximum likelihood weighting for each term (word or phrase)
in each classification. This was the approach described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and used for Cross-language
DomainSpecific retrieval for CLEF 2005. One limitation of that approach is that the EVI can produce
maximum likelihood estimates for only a single term at a time, and alternative approaches needed
to be explored for combining terms (see [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for the various approaches).
        </p>
        <p>Although the method has experienced considerable changes in implementation, the basic
approach for “Classification Clustering” in Cheshire II is still the same. Various topic-rich elements
are extracted from individual records in the database (such as titles, classification codes, or
subject headings) and are merged into single records based on a normalized version of a particular
organizing element (usually the classification or subject headings, e.g., one record is created for
each unique classification or subject heading). Each of these classification clusters is treated as
a single ”document” containing the combined topic-rich elements of all the individual documents
that have the same values of the organizing element. In place of the simpler probabilistic model
used in the early research, we use the same logistic regression based algorithm that is used for
text retrieval. In effect, we just search the “Classification Clusters” as if they were documents
using the TREC2 algorithm with blind feedback described above, then take some number of the
top-ranked terms and use those to expand the query for submission to the normal document
collection. Testing with the 2006 data showed that just taking the single top-ranked term performed
better than using multiple terms for this approach, so only the single top-ranked recommended
term was used in the experiments reported here.</p>
        <p>Two separate EVIs were built for the databases in each target language. The first used the
contents of the “CONTROLLED-TERM-??” (or “KEYWORD” for Russian) fields as the
organizing element. The second EVI used the contents of the “CLASSIFICATION-??” fields. Both of
these EVIs were used in query expansion. One problem was that some records included multiple
controlled terms in a single field instead of as separate fields. This was particularly common for the
Russian “KEYWORD” terms. For this year we just ignored this problem rather than attempting
to fix it, but we will be examining the effects in our analysis of the results.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Search Processing</title>
        <p>Searching the Domain Specific collection used Cheshire II scripts to parse the topics and submit
the title and description elements from the topics to the “topic” index containing all terms from
the documents. For the monolingual search tasks we used the topics in the appropriate language
(English, German, or Russian), and for bilingual tasks the topics were translated from the source
language to the target language using the LEC Power Translator PC-based program. Our original
testing of LEC Power Translator seemed to show a good translations between any of the languages
needed for the track, but we intend to do some further testing to compare to previous approaches
(which used web-based translation tools like Babelfish and PROMT). We suspect that, as always,
different tools provide a more accurate representation of different topics for some languages, but
the LEC Power Translator seemed to do pretty good (and often better) translations for all of the
needed languages.</p>
        <p>Because all of our submitted runs this year used some form of query expansion, each required
a 2-phase search process. The first phase involved a search in the EVI or the merged thesaurus,
and the second phase combined some of the results of first phase search with the original query
and used the expanded query to search the collections in the target language.
For the monolingual and bilingual EVI searches (all those indicated in Table 2 with “EVI” in the
“Exp.” or expansion column) the first search phase used all terms included in the “title” and
“desc” fields of the topics (or the tranlated version of these fields). These terms were searched
using the TREC2 algorithm with blind feedback to obtain a ranked result of classification clusters
from the EVIs. The main or “organizing term” phrases for the top-ranked two clusters from the
results for the “CONTROLLED-TERM” EVI, and the single top-ranked result phrase for the
“CLASSIFICATION” EVI were extracted for use in the second phase.</p>
        <p>For example, Topic #190 was searched using “mortality rate : find information on mortality
rates in individual european countries” and the two EVIs yielded the following terms: “child
mortality : infant mortality : demography and human biology; demography (population studies)”.</p>
        <p>
          For the second phase search the original query was searched using the initial title+desc from
the topic using the “topic” index and the expansion terms were searched in the “subject” index,
these searches were merged using a weighted sum for items in both lists that is based on the
“Pivot” method described by Mass and Mandelbrod[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to combine the results of different document
components. In our case the probability of relevance for a component is a weighted combination of
the initial estimate probability of relevance for the subject search and the probability of relevance
for the entire document. Formally this is:
        </p>
        <p>P (R | Q, Cnew) = (X ∗ P (R | Q, Csubj )) + ((1 − X) ∗ P (R | Q, Cdoc))
(3)</p>
        <p>Where X is a “pivot value” between 0 and 1, and P (R | Q, Cnew), P (R | Q, Csubj) and
P (R | Q, Cdoc) are the new weight, the original subject search weight, and document weight for
a given query. We found that a pivot value of 0.15 was most effective for CLEF2006 data when
combining EVI and search queries.
The basic steps for the searched doing thesaurus lookup is the same for EVIs, but the search
structure is different. For the first phase search the topic title is searched among the
languageappropriate main terms of the thesaurus, and the description is searched among all terms in the
thesaurus entry. These intermediate results are combined using the pivot merger method described
about with a pivot weight of 0.55. The top two results are used, and both the language-appropriate
main term, and the appropriate mapping terms are used for the query expansion. In the second
phase the full topic title and desc fields are searched as topics, and the thesaurus terms are also
searched as topics. These searches are combined using the pivot merge with a pivot weight of 0.07.</p>
        <p>For topic #190 the first part of the query (i.e., the topic title and desc terms) is the same as
for the EVI searches, but the second part of the search uses the terms yielded by the thesaurus
search: “mortality : Infant mortality” (only a single thesaurus entry was retrieved in the search).</p>
        <p>For multilingual searches, we combined the various translations of the topic title and desc fields
produced by the LEC Power Translator for each source language and searched those combined
translations in each target language. The results for each language were merged based on the
MINMAX normalized score for each resultset. Within each language the same approaches were
used as for EVI and Thesaurus-based expansion of bilingual and monolingual searches.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results for Submitted Runs</title>
      <p>The summary results (as Mean Average Precision) for all of our submitted runs for English,
German and Russian are shown in Table 2, the Recall-Precision curves for these runs are also
shown in Figure 1 (for monolingual), Figure 2 (for bilingual) and Figure 3 (for multilingual). In
Figures 1, 2, and 3 the names are abbrevated to the letters and numbers of the full name in Table 2
describing the languages and query expansion approach used. For example, in Figure 2 DEEN-CC
corresponds to run Berk B DEEN CC p15 in Table 2.</p>
      <p>Since summary information on the scores for all submissions were not available at the time
this paper was written, we have no idea of how our result stack up against other approaches for
the same data. We can, however, compare the results for EVIs versus Thesaurus lookup.</p>
      <p>Since our experiments were conducted using the same topics, database, translation tools, and
basic combination approaches for both EVIs and Thesaurus-based expansion, we were hoping
to find a clear benefit for one approach versus the other. Unfortunately, the results are not at
all clear. While EVIs seem to best results when English is the target language, the opposite
is true for German and Russian targets. As always our multilingual results are significantly
lower than monolingual or bilingual results for a given source language, with the exception of
German⇒Russian, which is the lowest MAP of any of the runs.</p>
      <p>As the precision/recall graphs show, (Figures 1, 2, and 3) there is very little difference in
the curves for the EVI and Thesaurus-based expansion in a given source/target language set.
Although we did not run significance testing of the results, we suspect that there is no statistically
significant different between these runs.</p>
      <p>It is worth noting that the approaches used in our submitted runs provided the best results
when testing with 2006 data and topics. However, as we discovered after the 2007 qrels were made
available, some simpler approaches worked as well or better than the more complex methods
described above. For example a simplified version of English monolingual search using only the
topic title and desc fields, and searching each of those in the topic and subject indexes, and
merging the results using a pivot value of 0.15 obtained a MAP result of 0.2848, compared to
the 0.2814 obtained in our best submitted monolingual run. Further simplification to individual
index searches does not, however provide results approaching those of the pivot-merged results.
We suspect that the range of MAP scores for the track is different from previous years, or else our
results are much worse than we thought they would be with the 2007 databases and topics.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We cannot say, overall, how effective query expansion by EVI or Thesaurus are relative to other
approaches for this task. We can assume that there is very little difference in the effectiveness of
the two methods, and that both seem to perform better than simple single-index “bag of words”
searches of the collection contents.</p>
      <p>We plan to conduct further runs to test whether modifications and simplifications, as well
as combinations, of the EVI and Thesaurus-based approaches will provide can provide improved
performance for the Domain Specific tasks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Aitao</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fredric C.</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Multilingual information retrieval using machine translation, relevance feedback and decompounding</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>7</volume>
          :
          <fpage>149</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Full Text Retrieval based on Probabilistic Equations with Coefficients fitted by Logistic Regression</article-title>
          .
          <source>In Text REtrieval Conference (TREC-2)</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>William</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fredric C. Gey</surname>
          </string-name>
          , and Daniel P. Dabney.
          <article-title>Probabilistic retrieval based on staged logistic regression</article-title>
          .
          <source>In 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Copenhagen, Denmark, June 21-24, pages
          <fpage>198</fpage>
          -
          <lpage>210</lpage>
          , New York,
          <year>1992</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Fredric</given-names>
            <surname>Gey</surname>
          </string-name>
          , Michael Buckland, Aitao Chen, and
          <string-name>
            <given-names>Ray</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Entry vocabulary - a technology to enhance digital search</article-title>
          .
          <source>In Proceedings of HLT2001, First International Conference on Human Language Technology</source>
          , San Diego, pages
          <fpage>91</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>March 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Classification clustering, probabilistic information retrieval, and the online catalog</article-title>
          .
          <source>Library Quarterly</source>
          ,
          <volume>61</volume>
          (
          <issue>2</issue>
          ):
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Evaluation of advanced retrieval techniques in an experimental online catalog</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Domain specific retrieval: Back to basics. In Evaluation of Multilingual and Multi-modal Information Retrieval - Seventh Workshop of the Cross-Language Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2006</year>
          , LNCS, page to appear, Alicante, Spain,
          <year>September 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Linked relevance feedback for the imageclef photo task</article-title>
          .
          <source>In CLEF 2007 - Notebook Papers</source>
          , page to appear, Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Yosi</given-names>
            <surname>Mass</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matan</given-names>
            <surname>Mandelbrod</surname>
          </string-name>
          .
          <article-title>Component ranking and automatic query refinement for xml retrieval</article-title>
          .
          <source>In Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX2004</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>84</lpage>
          . Springer (LNCS #3493),
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Vivien</surname>
            <given-names>Petras</given-names>
          </string-name>
          , Fredric Gey, and
          <string-name>
            <given-names>Ray</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Domain-specific CLIR of english, german and russian using fusion and subject metadata for query expansion</article-title>
          .
          <source>In Cross-Language Evaluation Forum: CLEF</source>
          <year>2005</year>
          , pages
          <fpage>226</fpage>
          -
          <lpage>237</lpage>
          .
          <source>Springer (Lecture Notes in Computer Science LNCS 4022)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. Sparck</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Relevance weighting of search terms</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>146</lpage>
          , May-June
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>