<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>REINA at the WebCLEF Task: Combining evidences and Link Analysis</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Carlos G. Figuerola, Joes L. Alonso Berrocal, Angel F. Zazo Rodgruez, Emilio Rodgruez REINA Research Group, University of Salamanca</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The participation of the REINA Research Group in WebCLEF 2005 is focused in the monolingual mixed task. Queries or topics are of two types: named and home pages. For both, we rst perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, text of backlinks) and then we combine the results. For queries about home pages, we try to detect them with a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coecients.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Web Search</kwd>
        <kwd>Link Analysis</kwd>
        <kwd>Search Fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Our participation in WebCLEF 2005 is focused in the monolingual (spanish) mixed task. This
task has two goals: to nd named web pages and home web pages. Every query has an only right
answer: both kinds of queries are mixed, and we don’t know in advance wich kind is every query.</p>
      <p>In principle, the basic approach consists of nding the pages whose content is more similar to
each query; it is hoped that the valid answer is in the rst retrieves pages, and depends on the
techniques applied in this search that the ranking is worse or better.</p>
      <p>For the queries searching a home page we will apply some procedure that rearranges the
retrieved documents list, considering, in addition to its similarity with the query, several evidences
of which can be home pages. An additional problem is that we do not know a priori what queries
or topics look for home pages and which not, so we will have to include some procedure that
analyzes the queries and determines which persecute a home page and which not.</p>
      <p>This paper is organized as follow2: section 2 describes the part of the collecion of documents
which we have worked with. Section 3 decribes our aproach to task; next, we show the runs
submitted anthe their results; last, conclusions are given.</p>
      <p>Format
PDF
MS Word
empty docs</p>
    </sec>
    <sec id="sec-2">
      <title>The collection of documents</title>
      <p>Our participation this year is limited to domain .es in the EuroGov collection. This domain has
35,168 documents; not all of these are HTML pages, and not always is easy to identify the format
of every document. For this year, all the topics are on the HTML pages; the organizers provide a
blacklist of 4,365 documents (in the .es domain) which are not HTML.</p>
      <p>Nevertheless, documents in other formats nonentered in the black list exist. Thus, of 35,168
documents of the domain .es 8,642 does not contain the &lt;HTML&gt; tag.</p>
      <p>Of another side, documents seems to be stripped in a size next to 64 K; in binaries les, as is
the case of some PDFs, chars chr(0) seems to be replaced by a space ( chr(32)).
2.1</p>
      <sec id="sec-2-1">
        <title>Topics</title>
        <p>There are 118 topics in spanish, 59 searching for home pages and 59 for named pages. The concept
of home page, however, is some fuzzy; the consideration of some of the searched pages as home is
quite debatable.</p>
        <p>In addition, there are some mistakes in the topics set. Thus, some topics are duplicated, or
even triplicated. Some of them, with diferent correct page as answer in the qrels le. Some topics
are a formulation too wide. By example, topic WC0098: Consejear de Educacoin y Cultura ; there
are, in Spain, 17 Autonomous Communities and every one of them has a Council of Education
and Culture. Besides, we have found that many embassies have also a Consejear de Educacoin y
Cultura, and there is a lot of embassies. Which of all these is the right answer?</p>
        <p>A few topics have as correct answer a page which is not in the .es domain. This is, maybe,
right; but, since we work only in the .es domain, we can’t nd the correct page anyway.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Our approach</title>
      <p>As we said before, the basic idea is to nd the most similar pages to every query, and, for the
home pages queries, rearrange the list of retrieved documents boosting those more likely home
pages.This carry us, in addition, to analyze the queries to determine the type of these.</p>
      <p>First part, to nd the most similar pages to every query, can be solved by a classic information
retrieval aproach. Nevertheless, web pages have informative elements other than the simple text
which we can see at the browser’s window. Thus, we can use these elements to improving the
retrieval
3.1</p>
      <sec id="sec-3-1">
        <title>Combining elements</title>
        <p>The possible list of elements we can take in account in the web pages is extensive, but we focused
in:
the eld body, which seems the most important
the eld title
the contents of some META tags, as is the case of Description and Keywords
the text of the backlinks, that is the links wich, in the other documents are pointing to the
page tha we are analyzing.</p>
        <p>All this elements are evidences tha we can combine to nd the most similar pages to every query.
There are several ways to do the fusion, or combining these elements; a rst issue is to do the
fusion prior or after run the query.</p>
        <p>
          Our choice is to do it after; so, the procedure tha we applied is as follows:
to build an index with every of the elements tha we take in account
to run the query in every one of these indexes
to combine the results achieved with every of indexes
For the rst step, we have used our software Karpanta [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], based on the well known vector space
model, and we built indexes of: body, title, meta description, meta keywords and text of backlinks.
Terms weights are computed in a classic way based int tf IDF known as atc. In all cases stop
words (from a standard list of about 300 spanish words) were removed, and a enhanced s-stemmer
was applied [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>The size of the indexes is dierent, as are the elds on wich the indexes are based on. Almost
all HTML pages have a eld body (some of them only have java scripts and so on), but is not the
same with the other indexes. So, 71.5 % of the pages in the .es domain have a eld title, and
the average size of the titles is about 40 characters; this is likely the titles are, in general, very
shorts.</p>
        <p>On the META Description tag, is present in only 16.9 % of the documents, with an average
size of 38.6 characters. From these documents with META Description tag, in 7.4 % of them the
content of the META Description tag is identical to the eld title.</p>
        <p>About the keywords (META Keywords tag), they are present in 24.7 % of the documents, with
7.7 keywords per document, in average (a keyword is not a term, but every expresion delimited
with a semicolon inside the tag; so, there are keywords wich are multiword expresions).</p>
        <p>24.7 % of the documents don’t receive any link (from the documents in the collection);
documents with backlinks receive an average of 9 per document. Text of these backlinks is very short
(18.7 characters in average), but, perhaps, very signicative.</p>
        <p>So, it seems clear that, except the body eld, the other elements seems to have a limited
importance, as they are absents in lots of documents.</p>
        <p>
          For the fusion of the list produced by every retrieval of every index, a z-score normalization
of the similarity values [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] was performed and then the lists were merged with the CombMNZ
algorithm [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], adapted to weight in dierents ways the results obtained with every index:
Score =
n
X
i=1
scorei
ki
        </p>
        <p>number of score ! = 0</p>
        <p>
          There are several procedures of combining [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Most of them are based on
combining the similarity values obtaines after run the query on every of indexes; nevertheless, we can
also work with the rank positions in the lists of retrieved documents in every index [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This
algorithm has the advantage of the simplicity, as not even is necessary to normalize the similarity
values.
3.2
        </p>
        <p>To nd</p>
        <p>home pages
First we must determine wich queries are about home pages. The concept of home page,
nevertheless, is fuzzy; so, some of the correct answers to some queries, everybody would not consider
home pages.</p>
        <p>In a exploratory phase, we examined manually several home pages from the .es; specially,
we examined de title eld, as we think that a query searching for these page, probably was
enough similar to the title of this one. Besides, we examined the home page queries used in
TREC. They are in English, but, after translated to Spanish, they can aproximate the structure
and characteristics of this kind of query.</p>
        <p>
          In this exploratory phase we observed some common elements in the structure of the home
page queries. This structure lies about using certain terms in relationship with the searched home
page. Thus, this kind of pages are entry pages to the webs of certains institutions: ministries,
institutes, centers, etc. So, these terms will be present in the query [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Besides, they will be in certains positions inside the query, and they will go accompanied,
before and later, of certain auxiliary words (articles and other connectors). This allowed us to
build a set of home page query patterns, to which we added a simple heuristic: the presence of
expresions as home page, portal, etc.</p>
        <p>With this technique we were able to correctly identify 32 home page queries, 4 were erroneously
considered as home, and 27 could not be classied.</p>
        <p>Once identied, trough this way, the assumed home page queries, the results of a retrieval made
with the fusion of evidences as we have seen before, were re-ranked in a way that the relevant
pages most probably home page were in the rst places.</p>
        <p>There are several techniques to determine which retrieved pages can be home pages. These are
not excluding techniques and they can be combined. The most known techniques are based on
using two types of information: the URL page structure, and the link analysis.</p>
        <p>
          Techniques based on URL structure work with the URL deep. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] studied the statistical
distribution of home pages in several URL deep levels, and also [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] also use techniques based
on the URL length, as [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] do.
        </p>
        <p>
          Techniques based on link analysis also are widely used. Although considered of smaller utility
in the searches by content, they seem eective to retrieve home pages [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Several coecients
are used, from the simples in and out-degrees [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], to most sostied page-rank [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] or HITS [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
algorithms.
        </p>
        <p>
          We have tried with Page-Rank [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and with Centrality [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], both based on backlinks.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Runs submitted</title>
      <p>Our goal is to determine which elements or evidences are useful in a search based on contents;
also, to test the eectiveness of coecients based on link analysis to nd home pages.</p>
      <p>Ocial results are given in table 2. Run USAL0 acts as baseline, and it consist in queries in
Spanish against the pages of the .es of EuroGov Collection. In this run, we work with the eld
body only.</p>
      <p>Run USAL1 combines results of elds body, title, META Description and text of backlinks
of every page.</p>
      <p>Run USAL2 adds to the USAL1 the eld META Keywords. Runs USAL3 and USAL4 try to
apply specic techniques to nd home pages. On the retrieved documents of the run USAL1, a
try to detect the home page topics is done, and then, results are been re-ranked with Page-Rank
(USAL3) and centrality (USAL4).
4.1</p>
      <sec id="sec-4-1">
        <title>Evaluation</title>
        <p>It seems clear that working with more elements, in addition to the body eld, improves retrieval.
This is true in the case of title, META Description and the text of the backlinks. However,
including META Keywords makes worse the results. This can be surprising (some simplistics
retrieval systems are based only on this eld), but, if we examine the uses tha pages do of this eld,
we will see that, at least, it is a strange use. Table 4 shows the most used keyword expressions
(not individual terms) in the .es domain.</p>
        <p>Most of them are very generic expressions, little useful for searches that take place on a
governmental collection. Some are included in pages also translated to English, some are directly
included in English, without version in Spanish (although the language of the rest of the page is
the Spanish).</p>
        <p>A manual examination of some page of the collection shows that there are pages (specially
home pages of certain institutions) having, literally, hundreds of keywords. In some cases, these
lists of keywords are inherited with no variation by the rest of the pages of that site. Probably
this has something to see with some myths that circulate on the form in which the search engines
nd and rank the pages. Some pages repeat a lot of times same keyword, in the hope of search
engines place it in the rst positions of the list.</p>
        <p>As for the location of home pages, it seems that the use of patterns to distinguish home page
queries and to treat them specically works on, since runs USAL3 and USAL4 improves on the
previous ones. Of these two, Centrality produces better results to detect home pages. Centrality
is simpler and it does not discriminate backlinks, but it seems that the home pages not necessarily
are the most prestigious.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We have described our participation in WebCLEF 2005, based on the retrieval by contents by
means of the fusion or combination of dierent elements, as well as on the use of coecients
coming from the link analysis for the location of home pages.</p>
      <p>The use of elements of information as the TITLE or the text of backlinks improves clearly the
retrieval, although many pages even lack TITLE or backlinks; and although the texts of many
backlinks are very short. Nevertheless, keywords introduced by the authors of the pages is from
little aid and they do not produce good results.</p>
      <p>Coecients based on the analysis of links, like Page-Rank or the simple Centrality Coecient,
helps to locate home pages.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Basterr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Cottrell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Belew</surname>
          </string-name>
          .
          <article-title>Automatic combination of multiple ranked retrieval systems</article-title>
          .
          <source>In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval</source>
          . Dublin. Ireland,
          <volume>3</volume>
          {
          <issue>6 July 1994</issue>
          (
          <article-title>Special Issue of the SIGIR Forum)</article-title>
          . ACM/Springer-Verlag,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Steve</given-names>
            <surname>Beitzel</surname>
          </string-name>
          , Eric Jensen, Rebecca Cathey, Ling Ma, David Grossman,
          <string-name>
            <given-names>Ophir</given-names>
            <surname>Frieder</surname>
          </string-name>
          , Abdur Chowdury, Greg Pass, and
          <string-name>
            <given-names>Herman</given-names>
            <surname>Vandermolen</surname>
          </string-name>
          .
          <article-title>Task classication and document structure for known-item search</article-title>
          .
          <source>In TREC12 [16].</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Brin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <article-title>The anatomy of a large-scale hypertextual Web search engine</article-title>
          .
          <source>Computer Networks and ISDN Systems</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          {7):
          <volume>107</volume>
          {
          <fpage>117</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mohamed</given-names>
            <surname>Farah</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Vanderpooten</surname>
          </string-name>
          .
          <article-title>Novel approaches in text information retrieval. experiments in the web track of trec-2004</article-title>
          . In TREC13 [17].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Figuerola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Zazo</given-names>
            <surname>Rodgruez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L. Alonso</given-names>
            <surname>Berrocal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodgruez</surname>
          </string-name>
          . Karpanta:
          <article-title>Un motor de busqueda para la investigacoin experimental en recuperacoin de la informacoin</article-title>
          .
          <source>In IBERSID</source>
          <year>2003</year>
          , Zaragoza, Spain,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Carlos</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Figuerola</surname>
          </string-name>
          , Angel F. Zazo, Emilio Rodgruez Vazquez de Aldana, and Joes Luis Alonso Berrocal.
          <article-title>La recuperacoin de informacoin en espan~ol y la normalizacoin de etrminos</article-title>
          .
          <source>Revista Iberoamericana de Inteligencia Articial</source>
          ,
          <volume>8</volume>
          (
          <issue>22</issue>
          ):
          <volume>135</volume>
          {
          <fpage>145</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Fox</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>Combination of multiples searches</article-title>
          .
          <source>In Overview of the Third Text REtrieval Conference (TREC-3)</source>
          , pages
          <fpage>243</fpage>
          {
          <fpage>252</fpage>
          . NIST Special Publication 500-
          <issue>226</issue>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>David</given-names>
            <surname>Hawking</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          .
          <article-title>Very large scale retrieval and web search</article-title>
          .
          <source>In Ellen Voorhees and Donna Harman</source>
          , editors,
          <source>TREC: Experiment and Evaluation in Information Retrieval</source>
          . MIT Press,
          <year>2005</year>
          . http://es.csiro.au/pubs/trecbook for website.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Jon</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg</surname>
            , Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and
            <given-names>Andrew S.</given-names>
          </string-name>
          <string-name>
            <surname>Tomkins</surname>
          </string-name>
          .
          <article-title>The web as a graph: measurements, models, and methods</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          ,
          <volume>1627</volume>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Westerveld</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>The importance of prior probabilities for entry page search</article-title>
          .
          <source>In "5th Annual International ACM SIGIR Conference</source>
          , pages
          <volume>27</volume>
          {
          <fpage>34</fpage>
          .
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Joon</surname>
            <given-names>Ho</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Combining multiple evidence from dierent relevance feedback methods</article-title>
          .
          <source>Technical report, Center for Intelligent Information Retrieval (CIIR)</source>
          , Department of Computer Science, University of Massachusetts,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Joon</surname>
            <given-names>Ho</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Analyses of multiple evidence combination</article-title>
          .
          <source>In SIGIR '97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>267</volume>
          {
          <fpage>276</fpage>
          , New York, NY, USA,
          <year>1997</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          , I. Ounis,
          <string-name>
            <surname>C. J. van Rijsbergen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Cacheda</surname>
          </string-name>
          . University of glasgow at the web track:
          <article-title>Dynamic application of hyperlink analysis using the query scope</article-title>
          .
          <source>In TREC12 [16], page 646.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model</article-title>
          .
          <source>Information Processing and management</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <volume>371</volume>
          {
          <fpage>382</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Tomlinson</surname>
          </string-name>
          .
          <article-title>Robust, web anf terabyte retrieval with hummingbird searchserver at trec 2004</article-title>
          . In TREC13 [17].
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>The</given-names>
            <surname>Twelfth Text REtrieval Conference</surname>
          </string-name>
          (TREC
          <year>2003</year>
          ), Gaithersburg, Maryland,
          <year>2003</year>
          . NIST Special Publication 500-
          <issue>255</issue>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>The</given-names>
            <surname>Thirteen Text REtrieval Conference</surname>
          </string-name>
          (TREC
          <year>2004</year>
          ), Gaithersburg, Maryland (USA) .
          <source>NIST Special Publication 500-261</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Albertson</surname>
          </string-name>
          .
          <article-title>Widit in trec 2004 genomics, hard, robust and web tracks</article-title>
          .
          <source>In TREC13 [17].</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Hugo</surname>
            <given-names>Zaragoza</given-names>
          </string-name>
          , Nick Craswell, Michael Taylor, Suchi Saria, and
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Robertson</surname>
          </string-name>
          . Microsoft cambridge at trec-13:
          <article-title>Web and hard tracks</article-title>
          .
          <source>In TREC13 [17].</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>