<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of a Recursive Weighting Scheme for Federated Web Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuele Di Buccio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivano Masiero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Melucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The informative resources available on the Web are not always directly accessible and cannot therefore be crawled since access is permitted only through the adoption of appropriate services, e.g. specialized search engines. On the other hand, specialized search engines can help address the problem of heterogeneity of the informative resources due to the type of content, the structure or the media. Federated Web Search systems address the problem of searching multiple, heterogeneous, and possibly uncooperative collections. One issue of Federated Web Search is resource selection, i.e. the selection of the search engines which most likely provide documents relevant to the query. This paper reports on the experimental evaluation in Federated Web Search setting of a recursive weighting scheme for ranking informative resources in architectures that involve an arbitrary number of resource levels.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The informative resources available on the Web are not always directly
accessible, and therefore cannot be crawled, because access is permitted only through
the adoption of appropriate services, e.g. specialized search engines. Access to
informative resources is only one of the issues that motivate the adoption of
diverse search engines. For instance, the adoption of specialized search engines
can help the problem of heterogeneity among the informative resources to be
addressed. Informative resource, hereafter denoted as documents, can be
heterogeneous because of the type of content (e.g. medical documents), the structure
(e.g. patents), or the media (e.g. video and image).</p>
      <p>Federated Web Search [12] concerns with the problem of searching multiple,
heterogeneous, and possibly uncooperative collections. In Federated Web Search
setting, the broker is a system that should select the most promising search
engines to which the query should be forwarded on the basis of a description of
the diverse collections handled by the engines; the selected engines will return the
most promising results, that will be then merged by the broker in a nal ranked
list. Forms of federated search are vertical search or peer-to-peer search. In the
rst case, the objective is to select the most promising verticals, e.g. web pages,
images or videos, and then merge the results from those verticals in a unique
result page. In the second case, retrieval is performed in a Peer-To-Peer (P2P)
networks, where each participating node can act both as client and server |
in an Information Retrieval (IR) perspective it can both submit a query to the
other participating nodes and act as a search server, providing the most relevant
documents in its local collection in response to a given query. In P2P search,
besides documents in the peer collections, also peers should be ranked in order
to select the most promising peers to be contacted, thus avoiding ooding that
can be unfeasible for large networks.</p>
      <p>This paper is focused on the problem of resource selection, i.e. on ranking
resources at higher levels, e.g. search engines, verticals or peers. The problem is
addressed by the adoption of Term Weighted Frequency (TWF) Inverse Resource
Frequency (IRF). This weighting framework was originally introduced in [3] to
address the problem of resource selection in Hybrid Hierarchical P2P Networks.
In this kind of networks there are two types of nodes, i.e. peers and super-peers.
Yet, a peer has to update and transfer the data structures which summarizes
its own document collection to the super-peers. A query is sent from a peer
to the super-peers and then it is routed from a super-peer to the other
superpeers on the basis of the summaries stored in each super-peer. While all the
peers are involved when routing the query in an unstructured network, only
the super-peers are involved in routing in a hybrid unstructured network. When
routing the query a super-peer ranks both the other super-peers and the peers
by expected recall.</p>
      <p>TWF IRF addresses the problem of informative resource ranking in
architectures with an arbitrary number of resource levels. This paper reports on the
experimental investigation of the e ectiveness of this scheme in Federated Web
Search settings. The evaluation was carried out through the participation to
the Federated Web Search Track of the Twenty-Second Text REtrieval
Conference (TREC) (FedWeb13). In the FedWeb13 setting there are three resource
levels: (i) document, (ii) search engines, and (iii) set of search engines. In
particular, there is a single set of 157 search engines and the objective of the resource
selection task is to rank them according to (their predictive capability for) a
given query. Even if search engines use features that in peer-to-peer settings
could not be available,1 the participation to the track has given insights on the
TWF IRF e ectiveness in ranking peers in a group when considering a
completely uncooperative environment. Indeed, a broker of FedWeb13 performs the
same task as a super-peer does for forwarding queries to the most e ective peers
in its group. Moreover, the summaries stored in the broker are the results of
query-based sampling performed on the considered search engines since the
index of the distinct search engines cannot be accessed | for this reason they are
considered \uncooperative".
1 We consider the case where each participating peer provides search functionalities
to access their local collection, e.g. part of the personal documents of a user.</p>
    </sec>
    <sec id="sec-2">
      <title>A Recursive Weighting Scheme</title>
      <p>According to the literature in Distributed IR reported for example in [2, 4], the
speci c approach adopted in this work is to describe the informative resources at
the diverse levels (document, search engines) in terms of document descriptors,
e.g. terms. Therefore, a search engine is described as a set of document
descriptors, speci cally the distinct descriptors appearing in the documents stored in
it.</p>
      <p>
        The innovative contribution of our approach has been the computation of the
weights. In order to support the description of the TWF IRF, let us consider an
architecture with three resource levels, e.g. that depicted in Figure 1. Examples
of resource levels are (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) documents, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) peers, and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) super-peers in Hybrid
P2P networks or (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) documents, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) search engines and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) sets of search engines
in Federated Web Search setting. For instance, the search engines adopted in the
FedWeb13 test collection have been categorized by the track organizers according
to a set of categories that include news, books, academic, travel.
      </p>
      <p>
        In our approach the weight of a descriptor t in a resource i at level z is
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
where
      </p>
      <p>wi(;zt) = twfi(;zt) irft(z);
twfi(;zt) =</p>
      <p>X
r2Riz</p>
      <p>twfi(;zt 1) irft(z 1)
and Riz denotes the sets of resources in the ith resource at level z. For instance,
in Figure 1, when z = 2 the r's are search engines and the Ri3's are sets of search
engines; in Equation 2 the sum for the resource Ri3 = se3 is computed over
the engines e8 and e9. For a given query q, resources at level z can be ranked
according to Pt2q wi(;zt). Equation 1 shows how the weight of a descriptor t in a
resource is the product of two components: TWF and IRF. The TWF is peculiar
of this scheme and its de nition is recursive since it relies on the TWF of the
resources at lower levels | e.g. the TWF of a set of search engines is computed
as the weighted sum of the TWF of the search engines in the considered set.</p>
      <p>The Inverse Resource Frequency (IRF) is a generalization of the Inverse
Document Frequency (IDF), that is,</p>
      <p>
        irft(z) = log N (z)=nt(z)
where t denotes the term, N (z) is the number of resources at level z contained
by the resource at level z + 1 and n(z) is the number of those resources that are
t
indexed by t. For instance, the search engine e2 in Figure 1 is contained in the
set se1 and, for that set, N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 4. A generalization of the IDF was proposed
in [2] to rank collections (Inverse Collection Frequency, ICF) and another was
proposed in [4] to rank peers (Inverse Peer Frequency, IPF). ICF and IPF are
instances of the IRF weight at level 2. In the FedWeb13 informative resources
at level 2 are search engines.
over the documents in that posting list; in the considered example,
twfe(21;)t irft(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
twfe(21;)t = tf (t; e2)
but normalized values of TF can be adopted.
      </p>
      <p>Figure 3 illustrates how search engine ranking can be performed on the basis
of the TWFs of the diverse engines. The matrix on the right basically reports
information stored in the broker index; this is a search engine level index that
contains the TWFs and IRFs for all the terms in the search engine indexes
(or a subset of them in uncooperative environments). The score assigned to a
search engine ei is the sum of the TWF IRF scores computed over the query
terms. In the reported example, the nal ranking will be: e2; e4; e1; e3. Some
remarks on how this information can be stored in an inverted index and the
actual implementation adopted in the experiments are reported in Section 3.3.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Research Task and Questions</title>
        <p>The experiments reported in this paper consider the following task. Given a set
S of search engines, a set QT of queries and a set of sample documents obtained
by query-based sampling performed on each of the search engines, a Federated
Web Search system should return a ranked list of search engines for each query</p>
        <p>TF
ti b 0
c 8
d 0
e 2
TWF
in QT ranked by a measure of the capability of satisfying the user's information
need expressed by the query.</p>
        <p>The objective of our work is to investigate the e ectiveness of TWF IRF for
search engine selection in Federated Web Search setting. In particular, we have
investigated the following research questions:
{ Is TWF IRF e ective when adopted to rank the most promising search
engines at high rank positions?
{ Which is the e ect of IRF when TWF IRF is adopted for search engine
ranking? Is TWF, which is peculiar to this scheme, \su cient"?
The results reported in this paper are representative of the results obtained in
the TREC 2013 Federated Web Search Track. We compared the e ectiveness
of the search engine ranking based only on TWF with the e ectiveness of the
original TWF IRF. We report the comparison with one well known retrieval
model for distributed collection selection, i.e. bGlOSS [6], the boolean version of
the Glossary of Servers Server (GlOSS). bGlOSS ranks the collection according
to the estimated number of documents that satisfy the query q:
ESizeInd(q; ei) = jeij</p>
        <p>
          Y df (t; ei)
t2q
jeij
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
where df (t; ei) denotes the number of documents in the collection ei { in our
setting ei is a search engine { that is indexed with the term t; jeij denotes the
number of documents in the ith collection.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Test Collection and E ectiveness Measures</title>
        <p>The research questions described in Section 3.1 were addressed using the
FedWeb13 test collection. This collection is constituted of a list of 157 search
engines2 and a set of sample search results obtained by performing query-based
sampling on those search engines. A set QS of 2000 queries was adopted to
perform the sampling. For each search engine and for each query in the given query
set, the top 10 results were retrieved { both snippets and landing documents. Half
of the queries in QS was obtained using the Zips method, which exploits
\single term queries taken evenly from the binned term distribution in ClueWeb09,
where terms were binned on a log-scale of their document frequency (df) to
ensure that there are queries from the complete frequency distribution." [10]. The
other half of the queries was built by randomly selecting terms from the sample
documents collected from the search engine.</p>
        <p>A set of 200 queries, QT , were provided by the track organizers to address
the two research tasks described in Section 3.1.</p>
        <p>The evaluation for the two tasks was performed on a subset of 50 queries
among those in QT . The primary e ectiveness measure adopted for the resource
selection task was NDCG@20. The Normalized Discounted Cumulative Gain
(NDCG) [7] version adopted in the experiments is that proposed in [1]. The
relevance of a search engine was computed by using the graded precision [8] on
the top 10.3</p>
        <sec id="sec-3-2-1">
          <title>2 The list of search engines is available at the following url:</title>
          <p>http://snipdex.org/datasets/fedweb2013/FW13-engines.txt
3 Details are provided in the FedWeb13 Track web page:
https://sites.google.com/site/trecfedweb/
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Parsing and Indexing</title>
        <p>The indexing module of our system relies on an XML parser written in Java for
extracting the document elds from the sample searches and the sample
documents in the test collection, and on the Apache Lucene library. The sample
documents in the FedWeb13 Test Collection were indexed by creating a distinct
index for each of the 157 search engines. These indexes are document-level
indexes. Each (Lucene) document in a document-level index is constituted of four
elds: link, title, description, and the content of the document associated to the
sample search result. For each eld, the document-level index stores information
on the frequency of the descriptors in each document and in the collection, as
well as their positions in each document.</p>
        <p>Starting from these indexes, a search engine-level index was built. The set of
descriptors in this index is the union of all the distinct descriptors in the distinct
document-level indexes associated to the search engines. As for the
documentlevel index, in the search engine-level index a list of posting is associated to each
descriptor. Each posting stores information on the identi er of the search engine,
the number of documents in the search engine were the descriptor appears,
and the TWF of the descriptor. In the speci c Lucene-based implementation
adopted, TWF was stored in the payload that can be associated to each term;
the weight value was approximated and stored as a oat.4
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Resource Selection</title>
        <p>The second instantiation exploits both TWF and IRF; search engines are ranked
according to:
The experiments exploit two speci c instantiations of the weighting scheme
described in Section 2. The rst instantiation exploits only TWF for search engine
ranking:</p>
        <p>
          X twfi(;2t)
t2q
where twfi(;2t) = Pdj2Di twfi(;1t) irft(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and Di denotes the sets of documents in
the ith search engine, twfj(;1t) = tf (t; j) is the term frequency of term t in the
document dj . The IRF at the document level was implemented as:
where twfi(;2t) is computed as above and irft(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) is computed as
        </p>
        <p>
          irft(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) = log 1 +
        </p>
        <sec id="sec-3-4-1">
          <title>4 Single-precision 32-bit IEEE 754 oating point</title>
          <p>
            irft(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) = log 1 +
          </p>
          <p>
            N (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
n(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) + 0:5
          </p>
          <p>
            t
n(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) + 0:5
          </p>
          <p>
            t
X twfi(;2t) irft(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
t2q
          </p>
          <p>
            N (
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
n(
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) + 0:5
          </p>
          <p>
            t
n(
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) + 0:5
          </p>
          <p>
            t
where N (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) is the number of search engines { in this test collection N (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) = 157
{ and n(
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) is the number of those search engines that are indexed by t.
          </p>
          <p>t</p>
          <p>For each instantiation, we considered two runs, the label of which ends with
sh and mu. In the sh runs, the query is built by performing an OR among the
terms appearing in the query.5 The ranked list of search engines that constitute
the mu runs are obtained by appending three ranked lists:
L1: the list of search engines ranked by their TWFs with regard to the query,
and using the AND boolean constraint among the occurrence of the distinct
terms in the query6;
L2: the list of search engines that did not belong to L1 and ranked by their
TWFs with regard to the query by using the OR boolean constraint among
the occurrence of the distinct terms in the query;
L3: the list of search engines that did not belong to L1 and L2, ranked by their
identi er | the identi er associated to the search engine in the test
collection.</p>
          <p>The nal ranked list of search engines was obtained by appending L2 to L1, and
then L3 to the fusion of the rst two lists.
3.5</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Results</title>
        <p>Results are reported in Figure 4. With regard to the rst research question,
TWF IRF in the two instantiations outperforms bGlOSS for all runs. Moreover,
the mu runs perform better than the sh runs for both sh and mu. The only
drawback of the mu runs is that two queries should be performed { one performing
the AND and one performing the OR among the query terms. However, the
number of resources to rank is much lower than the number of documents in a
collection { e.g. hundreds of search engines versus billion of documents in web
search setting; therefore the additional computational load of the mu runs can
be acceptable.</p>
        <p>With regard to the second research question, IRF provides an improvement in
terms of NDCG@20. An 7.37 % increment is observed for the sh runs, while the
increment, 1.73 %, is negligible for the mu runs { the di erence is not signi cant.
Therefore, for this test collection, when it is wanted that all the query terms
must occur in the documents, TWF seems to be \su cient" for search engine
ranking. In contrast, when it is accepted that only some of the query terms occur
in the documents, IRF, i.e. both TWF and IRF appear \necessary" for search
engine ranking.
5 The Lucene query was a BooleanQuery constituted of PayloadTermQuery connected
by SHOULD clause.
6 The Lucene query was a BooleanQuery constituted of PayloadTermQuery connected
by MUST clause.
0.75
0.5
0.25
0
bGlOSS
0.244
This paper reports on the investigation of TWF IRF in Federated Web Search
setting. We participated in the TREC 2013 Federated Web Search Track. This
weighting scheme was shown to be e ective to address the problem of loss in
recall in Hierarchical Hybrid P2P Networks [9]. The results reported in this
paper show that this weighting scheme can also support Federated Web Search.</p>
        <p>Future works will be focused on further experimental investigations,
particularly:
{ the comparison with the most e ective retrieval models, e.g. Collection
Retrieval Inference Network (CORI) [2], Decision Theoretic Framework (DTF)
[5], Relevant Document Distribution Estimation (ReDDE) [13], and
CentralRank-Based Collection Selection (CRCS) [11];
{ the e ect of di erent resource descriptions, e.g. based on result snippet or
combination of snippet and document content { snippets are available in
the FedWeb13 test collection, or the adoption of external resources, e.g. to
perform \resource description expansion";
{ the e ect of the sampling strategy on resource selection e ectiveness;
{ the e ect of IRF in diverse test collections.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shaked</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Renshaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deeds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hullender</surname>
          </string-name>
          .
          <article-title>Learning to rank using gradient descent</article-title>
          .
          <source>In Proceedings of the 22nd international conference on Machine learning, ICML '05</source>
          , pages
          <fpage>89</fpage>
          {
          <fpage>96</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Searching distributed collections with inference networks</article-title>
          .
          <source>In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <source>SIGIR '95</source>
          , pages
          <fpage>21</fpage>
          {
          <fpage>28</fpage>
          , New York, NY, USA,
          <year>1995</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Castiglion</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Melucci</surname>
          </string-name>
          .
          <article-title>An evaluation of a recursive weighing scheme for information retrieval in peer-to-peer networks</article-title>
          .
          <source>In Proceedings of the 2005 ACM workshop on</source>
          <article-title>Information retrieval in peer-to-peer networks</article-title>
          ,
          <source>P2PIR '05</source>
          , pages
          <fpage>9</fpage>
          {
          <fpage>16</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cuenca-Acuna</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          .
          <article-title>Text-Based Content Search and Retrieval in Ad-hoc P2P Communities</article-title>
          . In E. Gregori,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cherkasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cugola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Panzieri</surname>
          </string-name>
          , and G. Picco, editors, Web Engineering and
          <article-title>Peer-to-</article-title>
          <string-name>
            <surname>Peer</surname>
            <given-names>Computing</given-names>
          </string-name>
          , volume
          <volume>2376</volume>
          of Lecture Notes in Computer Science, pages
          <volume>220</volume>
          {
          <fpage>234</fpage>
          . Springer Berlin Heidelberg,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          .
          <article-title>A decision-theoretic approach to database selection in networked ir</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <volume>229</volume>
          {
          <fpage>249</fpage>
          ,
          <year>July 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garc</surname>
          </string-name>
          a
          <article-title>-Molina, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomasic</surname>
          </string-name>
          .
          <article-title>The e ectiveness of gioss for the text database discovery problem</article-title>
          .
          <source>ACM SIGMOD Record</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <volume>126</volume>
          {
          <fpage>137</fpage>
          , May
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>K. J</surname>
          </string-name>
          <article-title>arvelin</article-title>
          and J.
          <string-name>
            <surname>Keka</surname>
          </string-name>
          <article-title>lainen. Cumulated gain-based evaluation of IR techniques</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>422</volume>
          {
          <fpage>446</fpage>
          ,
          <year>October 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Keka</surname>
          </string-name>
          <article-title>lainen and K. Jarvelin. Using graded relevance assessments in ir evaluation</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>53</volume>
          (
          <issue>13</issue>
          ):
          <volume>1120</volume>
          {
          <fpage>1129</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Melucci</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggiani</surname>
          </string-name>
          .
          <article-title>A Study of a Weighting Scheme for Information Retrieval in Hierarchical Peer-to-peer Networks</article-title>
          .
          <source>In Proceedings of the 29th European Conference on IR Research</source>
          , ECIR'
          <volume>07</volume>
          , pages
          <fpage>136</fpage>
          {
          <fpage>147</fpage>
          , Berlin, Heidelberg,
          <year>2007</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trieschnigg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>Federated search in the wild: the combined power of over a hundred search engines</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          ,
          <source>CIKM '12</source>
          , pages
          <year>1874</year>
          {
          <year>1878</year>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>M.</given-names>
            <surname>Shokouhi</surname>
          </string-name>
          .
          <article-title>Central-rank-based collection selection in uncooperative distributed information retrieval</article-title>
          .
          <source>In Proceedings of the 29th European Conference on IR Research</source>
          , ECIR'
          <volume>07</volume>
          , pages
          <fpage>160</fpage>
          {
          <fpage>172</fpage>
          , Berlin, Heidelberg,
          <year>2007</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>M.</given-names>
            <surname>Shokouhi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          .
          <article-title>Federated search</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>102</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Relevant document distribution estimation method for resource selection</article-title>
          .
          <source>In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03</source>
          , pages
          <fpage>298</fpage>
          {
          <fpage>305</fpage>
          , New York, NY, USA,
          <year>2003</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>