8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR’10)


            Efficient Dynamic Pruning with Proximity Support

                             Nicola Tonellotto                                      Craig Macdonald, Iadh Ounis
          Information Science and Technologies Institute                            Department of Computing Science
                    National Research Council                                            University of Glasgow
                Via G. Moruzzi 1, 56124 Pisa, Italy                                     Glasgow, G12 8QQ, UK
                     nicola.tonellotto@isti.cnr.it                                 {craigm,ounis}@dcs.gla.ac.uk


ABSTRACT                                                                         Dynamic pruning strategies reduce the scoring of docu-
Modern retrieval approaches apply not just single-term weight-                ments, such that efficient retrieval can be obtained, with-
ing models when ranking documents - instead, proximity                        out impacting on the retrieval effectiveness before rank K
weighting models are in common use, which highly score                        - such strategies are safe-up-to-rank-K. However, when ad-
the co-occurrence of pairs of query terms in close proximity                  ditional proximity scores must be calculated for each docu-
to each other in documents. The adoption of these prox-                       ment, the computational overhead impacts the efficiency of
imity weighting models can cause a computational overhead                     the retrieval process. While pruning techniques have been
when documents are scored, negatively impacting the effi-                     studied to efficiently score documents without considering
ciency of the retrieval process. In this paper, we discuss the                term proximity [4, 20], there are very few proposals con-
integration of proximity weighting models into efficient dy-                  sidering efficient top K retrieval where proximity is consid-
namic pruning strategies. In particular, we propose to mod-                   ered [19, 21, 22]. Moreover, these proposals require modifi-
ify document-at-a-time strategies to include proximity scor-                  cations of the index structure to implement efficient scoring
ing without any modifications to pre-existing index struc-                    strategies. Indeed, such modifications include sorting the
tures. Our resulting two-stage dynamic pruning strategies                     posting lists by frequency or impact [2, 10], or using addi-
only consider single query terms during first stage pruning,                  tional index structures containing the intersection of pairs
but can early terminate the proximity scoring of a docu-                      of posting lists [19, 21, 22]. However, these can lead to neg-
ment if it can be shown that it will never be retrieved. We                   ative effects on other aspects of the IR system, such as the
empirically examine the efficiency benefits of our approach                   compression of index structures or the impossibility to use
using a large Web test collection of 50 million documents                     other existing ranking strategies.
and 10,000 queries from a real query log. Our results show                       This work contributes a study into the behaviour of dy-
that our proposed two-stage dynamic pruning strategies are                    namic pruning strategies when combined with proximity
considerably more efficient than the original strategies, par-                weighting models. In particular, we analyse two existing
ticularly for queries of 3 or more terms.                                     document-at-a-time (DAAT) dynamic pruning strategies, na-
                                                                              mely MaxScore [20] and Wand [4], that can efficiently
Categories and Subject Descriptors: H.3.3 [Informa-                           score documents without decreasing the retrieval effective-
tion Storage & Retrieval]: Information Search & Retrieval                     ness at rank K, nor requiring impact sorted indices. More-
General Terms: Algorithms, Performance, Experimenta-                          over, we propose a runtime modification of these strate-
tion                                                                          gies to take into account proximity scores. We generate at
Keywords: Dynamic Pruning, Efficient Proximity                                runtime the posting lists of the term pairs, and transpar-
                                                                              ently include the processing of these pair posting lists in the
1.    INTRODUCTION                                                            MaxScore and Wand strategies. Next, we propose a re-
                                                                              organisation of these strategies to increase their efficiency.
   In most information retrieval (IR) systems, the relevance                  Using thorough experiments on a 50 million document cor-
score for a document given a query follows the general out-                   pus and 10,000 queries from a real query log, we evaluate
line given by the best match strategy: a score is calculated                  the proposed modification to determine their efficiency.
for each query term occurring in the document. These scores                      The remainder of this paper is structured as follows: In
are then aggregated by a summation to give the final doc-                     Section 2, we describe the state-of-the-art approaches to ef-
ument relevance score. However, there are many queries                        ficient ranking and the current existing solutions taking into
where the relevant documents contain the query terms in                       account proximity scoring. In Section 3, we describe in de-
close proximity. Hence, modern retrieval systems apply not                    tail the proposed framework to support proximity scores in
just single-term weighting models when ranking documents.                     DAAT strategies, and in Section 4, we evaluate the effi-
Instead, proximity weighting models are commonly applied,                     ciency of the proposed modification. We provide concluding
which highly score the co-occurrence of pairs of query terms                  remarks in Section 5.
in close proximity to each other in documents [8].


                                                                              2.   BACKGROUND
                                                                                In the following, we outline the state-of-the-art strategies
Copyright c 2010 for the individual papers by the papers’ authors. Copying    of dynamic pruning, followed by a discussion on proximity
permitted only for private and academic purposes. This volume is published
and copyrighted by its editors.                                               weighting models.
LSDS-IR Workshop, July 2010. Geneva, Switzerland.


                                                                         31
LSDS-IR’10                                                     Efficient Dynamic Pruning with Proximity Support (Best Paper)


2.1    Dynamic Pruning                                               pacted. Generally, speaking, Wand is more efficient [14],
   The algorithms to match and score documents for a query           due to its ability to skip postings for unimportant query
fall into two main categories [16]: in term-at-a-time (TAAT)         terms. Note that both strategies examine at least one term
scoring, the query term posting lists are processed and scored       from each document, and hence cannot benefit efficiency for
in sequence, so that documents containing query term ti gain         single term queries.
a partial score before scoring commences on term ti+1 . In
contrast, in document-at-a-time (DAAT) scoring, the query            2.2    Proximity
term postings lists are processed in parallel, such that all            There are many queries where the relevant documents
postings of document dj are considered before scoring com-           contain the query terms in close proximity. Hence, modern
mences on dj+1 . Compared to TAAT, DAAT has a smaller                retrieval systems apply not just single-term weighting mod-
memory footprint than TAAT, due to the lack of maintain-             els when ranking documents. Instead, proximity weighting
ing intermediate scores for many documents, and is report-           models are commonly applied, which highly score the co-
edly applied by large search engines [1]. An alternative strat-      occurrence of pairs of query terms in close proximity to each
egy to DAAT and TAAT is called score-at-a-time [2], how-             other in documents [8]. Hence, some scoring proximity (or
ever this is suitable only for indices sorted or partially sorted    term dependence) models have recently been proposed that
by document importance, which must be calculated before              integrate single term and proximity scores for ranking doc-
the actual query processing. The algorithms from the family          uments [5, 15, 18]. In this manner, the basic ranking model
of threshold algorithms [10] work similarly.                         of an IR system for a query Q can be expressed as:
   Efficient dynamic pruning strategies do not rank every                                        X
document in the collection for each user query; they manage          scoreQ (d, Q) = ω S(d) + κ     score(tfd , ∗d , t) + φ prox(d, Q)
                                                                                                       t∈Q
to rank only the documents that will have a chance to enter
in the top-K results returned to the users. These strate-            where S(d) is the combination of some query independent
gies are safe-up-to-rank-K [20], meaning that the ranking            features of document d (e.g. PageRank, URL length), and
of documents up to rank K will have full possible effective-         score(tfd , ∗d , t) is the application of a weighting model to
ness, but with increased efficiency. Dynamic pruning strate-         score tfd occurrences of term t in document d. ∗d denotes
gies rely on maintaining, at query scoring time, a threshold         any other document statistics required by a particular weight-
score that documents must overcome to be considered in the           ing model (e.g. document length). prox(d, Q) represents
top-K documents. To guarantee that the dynamic pruning               some proximity document scoring function. The influence of
strategy will provide the correct top-K documents, an upper          the various features is influenced using weights ω, κ and φ.
bound for each term on its maximal contribution to the score            However, none of the proximity weighting models pro-
of any document in its posting list is used. In this paper, we       posed have been designed for efficient document scoring.
focus on two state-of-the-art safe-up-to-rank-K DAAT dy-             The main approaches to integrate proximity weighting mod-
namic pruning strategies, namely MaxScore and Wand.                  els into pruning strategies require modifications to the in-
   The MaxScore strategy maintains, at query scoring time,           dex structure to include information on the proximity scores
a sorted list containing the current top-K documents scored          upper bounds. In [19, 21, 22], the authors detail several
so far. The list is sorted in decreasing order of score. The         approaches to leverage early termination when proximity
score of the last top-K document is a threshold score that           scores are included in the ranking model. While these strate-
documents must overcome to be considered in the top-K                gies alter the index structure (e.g. by adding term-pair
documents. A new document is given a partial score while             inverted indices), we aim to exploit the proximity scores
the posting lists with that document are processed. A docu-          without modifying the index structure (other than keeping
ment scoring can terminate early when it is possible to guar-        position occurrence information in the standard inverted in-
antee that the document will never obtain a score greater            dex posting list). In particular, we use the sequential term
than that of the current threshold. This happens when the            dependence model of Markov Random Fields (MRF) [15],
current document score plus the upper bounds of terms yet            which has been shown to be effective at modelling the prox-
to be scored is not greater than the threshold.                      imity of query term occurrences in documents. In MRF, the
   The Wand strategy maintains the same top-K documents              proximity score is calculated as follows:
list and the threshold score, but, for any new document,                                    X      “      `                                ´
it calculates an approximate score, summing up some up-               prox(d,Q) =                    score pf (ti , ti+1 , d, k1 ), ld , p
per bounds for the terms associated with the document. If                            p=(ti ,ti+1 )∈Q

this approximate score is greater than the current threshold,                                                 `                                ´”
                                                                                                       + score pf (ti , ti+1 , d, k2 ), ld , p
then the document is fully scored. It is then inserted in the
top-K candidate document set if this score is greater than           where pf (ti , ti+1 , d, k) represents the number of occurrences
the current threshold, and the current threshold is updated.         of the pair of sequential query terms (ti , ti+1 ) occurring in
If the approximate score check fails, the next document is           document d in windows of size k (abbreviated as pair fre-
processed. The selection of the next document to score is            quency pfd ). Following [15], we set κ = 1, φ = 0.1, and
optimised [4] – however, for our purposes, it is of note that        k1 = 2 and k2 = 8 to account for the proximity of two
the set of postings lists are sorted by the document identifier      terms as an exact phrase, and proximity at distance 8, re-
(docid) they currently represent. More details on the Wand           spectively. score(pfd , ld , p) is implemented using Dirichlet
document selection strategy, which uses the skipping [16] of         language modelling [11], but where pair frequency takes the
postings in the posting lists to reduce disk IO and increase         role of term frequency. However, for the background statis-
efficiency, is presented in Appendix A.                              tics of the language model, in contrast to term weighting,
   The MaxScore and Wand dynamic pruning strategies                  when using proximity weighting, it is common to assume a
can both enhance retrieval efficiency, whilst ensuring that          constant frequency for the pair in the collection [13]1 .
the top K documents are fully scored – i.e. that the re-             1
                                                                      As implemented by the authors of MRF in the Ivory re-
trieval effectiveness at rank K is not at all negatively im-         trieval system, see www.umiacs.umd.edu/~jimmylin/ivory


                                                                32
LSDS-IR’10                                                     Efficient Dynamic Pruning with Proximity Support (Best Paper)


3.    FRAMEWORK                                                      3.2    Dynamic pruning with proximity
   The integration of proximity weighting models within ef-             The dynamic pair posting lists can be directly put into
ficient dynamic pruning strategies requires the materialisa-         work in existing DAAT strategies without modification. When
tion of term pair posting lists and their integration into the       a term pair posting is selected for scoring, it is necessary to
existing dynamic pruning decision mechanism. In the fol-             calculate the exact value for the pair frequency at window
lowing we discuss how we proposed to address both aspects.           size k, by comparing the lists of positions stored in both term
                                                                     postings. With dynamic pruning strategies (MaxScore and
3.1    Term pair posting lists                                       Wand), this computation can be avoided if the posting is not
   Most dynamic pruning algorithms use posting list itera-           considered for scoring. Moreover, both the MaxScore and
tors – object-oriented interfaces to a posting list, allowing a      Wand pruning strategies require upper bounds on the score
posting to be read, or to be moved on to the next posting.           contributions of single terms. Hence, when using proximity,
With a standard inverted index, one posting list’s iterator          we need also to provide upper bounds on the score contri-
represents the documents in which a single query term oc-            butions of pairs as well. In [4], the authors proposed using
curs, ordered by docid.                                              a dynamic estimation of the inverse document frequency of
   Proximity weighting models require knowledge of the oc-           pairs to determine the upper bound (the particular proxim-
currence of pairs of query terms in a document. The post-            ity weighting model is not defined, but assumed to be similar
ing list of pairs of terms can be constructed either statically      to [5]). In [14], we proposed a new approximation for upper
(i.e., at indexing time, calculating the intersections of all        bounds of the Markov Random Fields, requiring only the
pairs of term posting lists) or dynamically (i.e., at retrieval      knowledge of the maximum term frequency of the postings
time, generating term pair postings on the fly). Previous ap-        in the two term posting lists.
proaches [19, 21, 22] investigated different methodologies to           We now describe how proximity weighting is achieved us-
statically calculate these intersections. However, the static        ing the dynamic pruning strategies. In particular, the Max-
approach has two drawbacks. Firstly, storing new posting             Score strategy must always know the minimum docid in the
lists requires additional space on disk, and secondly, the           currently processed posting lists set (which can be obtained
pairs of terms whose posting lists must be intersected must          by maintaining a heap), while the Wand strategy must have
be known in advance (e.g. by identifying popular phrases             access to the posting lists sorted by docid (i.e., in the worst
in the corpus [22]), to avoid generating a large number of           case, every posting in each posting list must be removed and
new, potentially useless posting lists. While these draw-            inserted in a sorted set). However, when proximity is con-
backs may be lightened by caching solutions to store paired          sidered, many extra pair postings must be considered (i.e.,
posting lists [12], even in this case, there is always a relative    |Q| single term postings, plus an additional 2(|Q| − 1) pair
consumption of disk or memory resources.                             postings) – causing the efficiency of Wand to be hindered.
   Instead, the pair posting lists can be built dynamically.         Moreover, both strategies must make additional checks to
Given two single term iterators on postings lists, there is a        ensure that only ‘valid’ pair postings are considered, which
valid term pair posting each time they point to the same             can cause a performance bottleneck.
docid. In order to transparently include these pair postings            To deal with these limitations, we propose a modification
in existing DAAT strategies, we must be sure that they are           that can be applied to both MaxScore and Wand pruning
ordered by docid. A pair posting list is illustrated in Fig-         strategies, whereby the processing of single terms is sepa-
ure 1, based on the postings for terms t1 and t2 . In our            rated from that of term pairs during each document scor-
proposed approach, to avoid additional I/O operations at             ing. We refer to these two-stage strategies as MaxScoreP
runtime, only the single term posting lists are responsible          and WandP. In particular, if a pair posting is updated after
for reading from disk and decompressing the single post-             each term posting update, we will generate two potentially
ings, while the pair posting docid is updated each time a            invalid pair postings. With the proposed modification, we
new single posting is read with the minimum of the current           update the pair postings only after all single terms have
single term docids. The pair posting is valid only when the          been moved to their respective next posting. This implies
docids of the underlying single term posting lists are equal         that we can generate only one pair posting instead of two
(i.e., in Figure 1, only two valid postings exist, namely do-        each time both of the single term posting iterators advance.
cid 1 and docid 8.). When a term posting list ends, all the          Hence, MaxScoreP and WandP process the single term
associated pair posting lists end as well. Overall, the pair         posting lists according to their respective algorithms, how-
posting list is docid-sorted and cannot skip over potentially        ever the term pairs are subsequently processed in a second
useful term pair postings, however, a number of invalid pair         stage using early termination, according to the MaxScore
postings will occur (e.g. (8,2) and (9,14) in Figure 1).             strategy. The use of early termination of proximity scoring
                                                                     is motivated by the fact that the pair frequency of a pair
                                                                     posting is expensive to compute (in comparison to term fre-
                   t1        1      8     9    ✕
                                                                     quency, which is directly recorded in the posting) – hence
         disk                                                        early termination can reduce the unnecessary pair frequency
                   t2        1      2     8     14                   and proximity score calculations.
                                                                        In summary, we propose to implement proximity scoring
                   t1        1      8     8     9     ✕              using only normal index structures at retrieval time, and
                   t2        1      2     8     14    14             in such a way to integrate directly with existing DAAT dy-
                                                                     namic pruning strategies. Moreover, we propose a modifica-
                                                                     tion to these dynamic pruning strategies, where the single
Figure 1: The dynamic creation of a pair posting
                                                                     terms are processed first according to the original dynamic
list for terms t1 and t2 . Bold entries are valid pair
                                                                     pruning strategy, while the terms pairs are processed ac-
postings, while × indicates the end of a posting list.
                                                                     cording to the MaxScore strategy. This can significantly


                                                                33
LSDS-IR’10                                                    Efficient Dynamic Pruning with Proximity Support (Best Paper)


reduce the performance impact of additional data structures                                                             # of query terms
required at runtime, and, as will be shown in Section 4,                                                         1       2      3      4                        >4
leads to improved retrieval efficiency. Moreover, both of the                                        # queries 3456 3149 1754         853                       550
MaxScoreP and WandP modified strategies remain safe-                                                 Full      0.53 2.76 5.57        9.95                      16.42
up-to-rank-K.                                                                                                   Original strategies
                                                                                                     MaxScore  0.53 2.07 3.92        6.45                      12.68
                                                                                                     Wand      0.53 3.01 5.78 10.67                            18.42
4.     EVALUATION                                                                                              Two-stage strategies
   In the following experiments, we want to evaluate the ben-                                        MaxScoreP 0.53 1.90 3.32        5.27                      8.51
efits of the proposed modification of the dynamic pruning                                            WandP     0.53 1.67 2.73 4.46                             7.77
strategies. We are mainly interested in efficiency, because
all strategies are safe-up-to-rank-K – hence have no impact         Table 1: Average response times (in seconds), orig-
on effectiveness. We tackle the following research questions:       inal strategies and two-stage strategies.
     1. How do MaxScore and Wand compare when apply-                                                 60%
        ing the Markov Random Fields proximity weighting                                                    MaxScore (original)

        model?                                                                                              Wand (orginal)
                                                                                                     50%
                                                                                                            MaxScoreP (two‐stage)
                                                                                                            WandP (two‐stage)
     2. Do the proposed modifications benefit the efficiency                                         40%


                                                                      % improvement over Full DAAT
        when applying proximity?
                                                                                                     30%
     3. What impact does the length of the query have?
                                                                                                     20%
   Experiments are performed using a 230GB 50 million En-
glish document subset of the TREC ClueWeb 09 (CW09B)                                                 10%

corpus [6]. Documents are ranked using the Dirichlet LM
                                                                                                      0%
weighting model [11] (with parameter setting µ = 4000)
and the Markov Random Fields proximity weighting (see                                                ‐10%
Section 2.2). No query-independent features are used (i.e.,
ω = 0). CW09B is indexed using the Terrier IR plat-                                                  ‐20%
form [17]2 , applying Porter’s English stemmer and removing                                                       2                 3                      4   >4
                                                                                                                                        # of query terms
standard stopwords. In the posting list, docids are encoded
using Elias Gamma-encoded deltas [9] and term frequencies           Figure 2: The relative impact on the average re-
using Elias Unary [9]. The positions of occurrences of the          sponse time of the proposed strategies w.r.t. the
term within the document are also recorded in each posting,         Full DAAT strategy.
using Elias Gamma-encoded deltas. Each posting list also            our assertion that Wand is not suitable for proximity scor-
includes skip points [16], one every 10,000 postings. The           ing in its normal form. Indeed, when using the pair posting
resulting size of the inverted index is 72GB.                       lists, there is a higher number of posting lists to maintain
   For testing retrieval efficiency, we extract a stream of user    in the docid sorted set of posting lists used by Wand, in
queries from a real search engine log. In particular, we se-        addition to the check that each pair posting is valid.
lect the first 10,000 queries of the MSN 2006 query log [7],           In contrast, the two-stage pruning strategies perform bet-
applying Porter’s English stemmer and removing standard             ter in comparison to their original versions. MaxScoreP
stopwords (empty queries are removed). The experiments              exhibits improvements of average response times varying
measure the average query response time for each dynamic            from 31% for two terms, up to 48% for more than four
pruning strategy, broken down by the number of query terms          terms, while WandP benefits vary from 39% to 58%. More-
(1, 2, 3, 4 and more than 4). The number of documents re-           over, we note that WandP exhibits better efficiency than
trieved for each query is K = 1, 000. All experiments are           MaxScoreP, in common with MaxScore versus Wand
made using a dual quad-core Intel Xeon 2.6GHz, with 8GB             for non-proximity queries [14]. For single term queries, no
RAM and a 2TB SATA2 disk containing the index.                      proximity is applied, and, as expected, all dynamic pruning
   In the following, we compare five strategies, namely: an         strategies are equally efficient to Full DAAT scoring.
exhaustive “Full” DAAT strategy, which fully scores every              Figure 2 summarises the percentage differences of the dy-
posting for all query terms and pairs; the original Max-            namic pruning strategies with respect to the Full DAAT
Score and Wand dynamic pruning strategies without any               scoring, for varying lengths of query. As already reported,
modification; and the proposed two-stage MaxScoreP and              the two-stage MaxScoreP and WandP strategies outper-
WandP dynamic pruning strategies which integrate the mod-           form their original equivalents. Moreover, their benefits in-
ification proposed in Section 3.2. Every strategy uses dy-          crease as the length of the queries (and hence pairs of terms)
namic pair posting lists, as discussed in Section 3.1.              increase, up to 40-55% improvements for queries with 3 or
   Table 1 details the average query response time for both         more terms.
the original and two-stage strategies per number of query
terms. From this table, we note that the average response           5.                                CONCLUSIONS
time are reduced by approximately 22%-30% by applying
the original MaxScore, but for the original Wand the av-              In this work, we examined how to efficiently score doc-
erage response times are worse than for Full DAAT scoring.          uments using dynamic pruning strategies when using the
This counters the normal efficiency of Wand, and supports           Markov Random Field proximity weighting model [15]. In
                                                                    particular, we discussed how pair posting lists could be used
2
    http://terrier.org                                              to allow proximity scoring without changes to the underlying


                                                               34
LSDS-IR’10                                                  Efficient Dynamic Pruning with Proximity Support (Best Paper)


index. Moreover, the most efficient way to score documents        [14] C. Macdonald, N. Tonellotto and I. Ounis. Upper
using DAAT dynamic pruning strategies was discussed. We                Bound Approximations for Dynamic Pruning.
proposed that dynamic pruning should be performed in a                 Manuscript submitted for publication, 2010.
two-stage process (as exemplified by the MaxScoreP and            [15] D. Metzler and W. B. Croft. A Markov random field
WandP strategies), whereby only single query terms are                 model for term dependencies. In Proc. of SIGIR 2005,
processed and pruned in the first stage. The pair posting              472–479.
lists are only considered during a second stage, since they       [16] A. Moffat and J. Zobel. Self-indexing inverted files for
are omitted from consideration in the first stage.                     fast text retrieval. Transactions on Information
   We performed large-scale experiments comparing the orgi-            Systems, 14(4):349–379, 1996.
nal and proposed two-stage dynamic pruning strategies, us-        [17] I. Ounis, G. Amati, V. Plachouras, B. He,
ing a corpus of 50 million documents, and 10,000 user queries          C. Macdonald and C. Lioma. Terrier: a high
from a real query log. Our results demonstrated the bene-              performance and scalable information retrieval
fit of the two-stage versions of the dynamic pruning strate-           platform. In Proc. of OSIR Workshop at SIGIR 2006.
gies, particularly for queries of 3 or more terms, and are a      [18] J. Peng, C. Macdonald, B. He, V. Plachouras and
promising start for the future efficient examination of other          I. Ounis. Incorporating term dependency in the DFR
proximity weighting models.                                            framework. In Proc. of SIGIR 2007, 843-844.
   Dynamic pruning techniques have previously been shown
                                                                  [19] R. Schenkel, A. Broschart, S. Hwang, M. Theobald
to be readily appliable in distributed retrieval settings [3].
                                                                       and M. Gatford. Efficient text proximity search. In
Similarly, we infer that our strategies can also be applied in
                                                                       Proc. of SPIRE 2007, 287–299.
distributed retrieval without requiring adaptation.
                                                                  [20] H. Turtle and J. Flood. Query evaluation: strategies
                                                                       and optimizations. Information Processing and
6.   REFERENCES                                                        Management, 31(6):831–850, 1995.
                                                                  [21] M. Zhu, S. Shi, M. Li and J.-R. Wen. Effective top-k
 [1] A. Anagnostopoulos, A. Z. Broder and D. Carmel.                   computation in retrieving structured documents with
     Sampling search-engine results. In Proc. of WWW                   term-proximity support. In Proc. of CIKM 2007,
     2005, 245–256.                                                    771–780.
 [2] V. N. Anh and A. Moffat. Pruned query evaluation             [22] M. Zhu, S. Shi, N. Yu and J.-R. Wen. Can phrase
     using pre-computed impact scores. In Proc. of SIGIR               indexing help to process non-phrase queries? In Proc.
     2006, 372–379.                                                    of CIKM 2008, 679–688.
 [3] R. Baeza-Yates, A. Gionis, F. Junqueira,
     V. Plachouras and L. Telloli. On the feasibility of          APPENDIX
     multi-site web search engines. In Proc. of CIKM 2009,
     425–434.                                                     A.          WAND NEXT DOCUMENT SELECTION
 [4] A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer and
                                                                   4   current docid      6   current threshold
     J. Zien. Efficient query evaluation using a two-level
     retrieval process. In Proc. of CIKM 2006, 426–434.            2     t1      11    13     24          1       t2      22   26            4   t3   23   27
 [5] S. Büttcher, C. L. A. Clarke and B. Lushman. Term
                                                                   1     t2      11    22     26          4       t3      22   23       27   3   t4   23   25   27
     proximity scoring for ad-hoc retrieval on very large
     text collections. In Proc. of SIGIR 2006, 621–622.            4     t3      22    23     27          3       t4      23   25       27   2   t1   24

 [6] C. L. A. Clarke, N. Craswell and I. Soboroff. Overview        3     t4      23    25     27          2       t1      24                 1   t2   26

     of the TREC 2009 Web track. In Proc. of TREC 2009.
                                                                       term upper bound            pivot term          pivot document
 [7] N. Craswell, R. Jones, G. Dupret and E. Viegas. Proc.
     of the Web Search Click Data Workshop at WSDM
     2009.                                                        Figure 3: How the WAND strategy selects the next
 [8] W. B. Croft, D. Metzler and T. Strohman. Search              document to score
     Engines: Information Retrieval in Practice Addison
                                                                     The selection of the next document performed in the Wand
     Wesley, 2009.
                                                                  strategy is explained with the help of Figure 3. The post-
 [9] P. Elias. Universal codeword sets and representations        ing lists are maintained in increasing order of docid. Then
     of the integers. Information Theory, IEEE                    a pivot term is computed, i.e. the first term for which the
     Transactions on, 21(2):194–203, 1975.                        accumulated sum of upper bounds of preceding terms and
[10] R. Fagin, A. Lotem and M. Naor. Optimal                      itself exceeds the current threshold (e.g., term t3 with accu-
     aggregation algorithms for middleware. J. Comput.            mulated score of 7). The corresponding docid identifies the
     Syst. Sci. 66(4):614–656, 2003.                              pivot document, i.e. the smallest docid having a chance to
[11] J. Lafferty and C. Zhai. A study of smoothing                overcome the current threshold. If the current docids of the
     methods for language models applied to information           previous terms are equal to the pivot document docid, the
     retrieval. In Proc. of SIGIR 2001, 334–342.                  document is fully scored. Otherwise, one of the preceding
[12] X. Long and T. Suel. Three-level caching for efficient       terms posting list is moved to the pivot document docid, and
     query processing in large web search engines. In Proc.       the procedure is repeated. In the example, at the third step
     of WWW 2005, 257–266.                                        a good candidate document is found (23) and it is fully pro-
[13] C. Macdonald and I. Ounis. Global Statistics in              cessed. This implementation can benefit from skipping every
     Proximity Weighting Models. In Proc. of Web N-gram           posting list to a particular document, however the selection
     Workshop at SIGIR 2010.                                      of the right document requires some additional CPU time.


                                                             35