<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Random input helps searching predecessors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Djamal Belazzougui CAPA</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>DTISI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CERIST</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Algeria dbelazzougui@cerist.dz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alexis C. Kaporis ICSD University of the Aegean and Hellenic Open University</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Paul G. Spirakis University of Liverpool and University of Patras</institution>
        </aff>
      </contrib-group>
      <fpage>106</fpage>
      <lpage>115</lpage>
      <abstract>
        <p>A data structure problem consists of the nite sets: D of data, Q of queries, A of query answers, associated with a function f : D Q ! A. The data structure of le X is \static" (\dynamic") if we \do not" (\do") require quick updates as X changes. An important goal is to compactly encode a le X 2 D, such that for each query y 2 Q, function f (X; y) requires the minimum time to compute an answer in A. This goal is trivial if the size of D is large, since for each query y 2 Q, it was shown that f (X; y) requires O(1) time for the most important queries in the literature. Hence, this goal becomes interesting to study as a trade o between the \storage space" and the \query time", both measured as functions of the le size n = jXj. The ideal solution would be to use linear O(n) = O(jXj) space, while retaining a constant O(1) query time. However, if f (X; y) computes the static predecessor search ( nd largest x 2 X : x y), then Ajtai [Ajt88] proved a negative result. By using just nO(1) = jXjO(1) data space, then it is not possible to evaluate f (X; y) in O(1) time 8y 2 Q. The proof exhibited a bad distribution of data D, such that 9y 2 Q (a \di cult" query y ), that f (X; y ) requires !(1) time. Essentially [Ajt88] is an existential result, resolving the worst case scenario. But, [Ajt88] left open the question: do we typically, that is, with high probability (w.h.p.) 1 encounter such \di cult" queries y 2 Q, when assuming reasonable distributions with respect to (w.r.t.) queries and data? Below we make reasonable assumptions w.r.t. the distribution of the queries y 2 Q, as well as w.r.t. the distribution of data X 2 D. In two interesting scenarios studied in the literature, we resolve the typical (w.h.p.) query time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>First scenario: we assume arbitrary distribution of data X 2 D, but,
we restrict ourselves to uniform queries y 2 Q. We show that Ajtai's
result is not typical: by just using O(n) space, w.h.p. f (X; y) can
be computed in O(1) time. Then we extend the w.h.p. O(1) time to
average O(1) time.</p>
      <p>Second scenario: we assume arbitrary queries y 2 Q, but we restrict
to \real world" (f1; f2)-smooth [AM93, MT93, Wil85] distributions of
data X 2 D. (i) We show that: w.h.p. f (X; y) can be computed
in O(1) time just using O(n1+ ) space for any constant &gt; 0. Also
we extend the w.h.p. O(1) time to average O(1) time. (ii) We limit
further the data space to n1+ log l1og n = n1+o(1), and, show that w.h.p.
f (X; y) can be computed in the slightly higher time O(log log log n).
(iii) Finally, we limit even more the data space to O(n), and, show
that w.h.p. f (X; y) can be computed in O(log log n) time.
1</p>
      <p>Introduction
in [Wil83] to O
The problem. Let X be a dynamic le of n keys, each stored in a memory cell with ` b bits, b = log jU j
is the word length of any key in the universe U = f1; : : : ; 2bg. The ` bits must uniquely represent the n keys
in le X, therefore ` log n. The data size of X, the total of bits stored in the n memory cells, in our work
must be O(n1+ ); 8 &gt; 0, therefore, as we show, ` must be O(log n). To prove time upper bounds in the RAM
model of computation, we use O(log n) bits per cell, which makes realistic the assumption that cell content based
operations take O(1) time in the C language. Thus, in this context of the cell probe [Yao81] model that we use,
it is not realistic if cells contain real numbers, or keys with arbitrarily large digit precision. We consider queries
as: Membership Memb(y; X) \determine if y 2 X, for any y 2 U ", Insertion Ins(y; X) \insert y into X", Deletion
Del(y; X) \delete y 2 X" and Predecessor Pred(y; X) \determine the largest x 2 X such that x y, for an
arbitrary y 2 U ". An intrinsic trade o arises between size and time, as the two examples suggest. If le X is
static and we wish the minimum time but compromise space, then 8y 2 U we can tabulate using 2b = jU j cells
the corresponding predecessor in X. Inspecting the appropriate cell we get the predecessor in O(1) time. But,
the size of cells equals the universe size jU j = 2b, which may be arbitrarily larger than le size jXj. On the other
example, if we wish the minimum space but compromise time, we can store in O(n) cells the ordered keys in X
and via binary search in O(log n) time, we get the predecessor 8y 2 U . Paragraph (i) below presents related
work with no input distributional assumptions, it describes how prohibitive time lower bounds arise as space
gets closer to linear. Then paragraph (ii) presents related work with input distributional assumptions, it shows
that such time lower bounds can be surpassed as space becomes almost linear via careful probabilistic analyses.
Our goal. We show in our main results Theorem 2.1 (Section 2) and Theorem 3.1 (Section 3), that under
reasonable assumptions about distributions w.r.t. queries and data, queries w.h.p. take close to O(1) time, while
space is at most O(n1+ ); 8 &gt; 0 (almost linear). To the best of our knowledge, these are the best space bounds
w.r.t. the total bits stored such that O(1) time can be achieved, as paragraph (ii) in the related work suggests.</p>
    </sec>
    <sec id="sec-2">
      <title>Space vs time complexity results. (i) Let us start with the related work without assumptions about query</title>
      <p>and data distributions. Membership requires O(1) time and O(n) space [FKS84, Pag00]. Willard [Wil83] showed
O(n) space and O(log b) = O(log log jU j) time upper bound for predecessor search. Ajtai's [Ajt88] original time
lower bound, revisited by Miltersen [Mil94], shows that there exists a le size n, as a function of the memory
cell bits `, such that predecessor search takes (p`) time, when space is limited to poly(n), or equivalently,
there exists an ` function of n such that time becomes ( p3n). Later [FW93] improved the time upper bound
o</p>
      <p>= O(plog n) using O(n) space, where here and in the related work of this
min n lloogg n` ; log `
paragraph ` = b = log jU j. The followup [MNSW98] generalized the lower bound in [Ajt88] to randomized
algorithms. Beam and Fich [BF02] improved the corresponding lower bounds in [Ajt88], up to lolgolgo`g ` and
q log n</p>
      <p>log log n
bounds to O
respectively. Similar lower bounds appeared in [Xia92]. Also [BF02] improved the time upper
min n lloogg n` ; lolgolgo`g ` o
= O(q lolgoglong ` ) but using (n2) space, while posing the important question if
their time bound can be achieved via linear space. The work of Sen and Venkatesh [SV08] yields the lower bounds
lloogg nb ; loglolgog` S , with S the total le size and b `. Finally, the work [PT06a] proved detailed optimal space vs
time trade-o s, the followup [PT07] generalized to randomized algorithms. In particular, [PT06a] answered the
important question in [BF02], showing that, as close to linear space as: n1+1= exp(lg1 " `); 8" &gt; 0, it is achievable
the O lolgolgo`g ` time, here lg x = dlog2(x + 2)e. However, [PT06a] also proved that if space drops to n logO(1) n
the van Emde Boas (vEB) O(lg `) time can not be improved.
(ii) We proceed with related work that assumes query and data distributions. Peterson's experimental
Interpolation Search (IS) [Pet57] for predecessors, assumed uniform n real keys stored in a static le X. The idea of IS
is to use the input distribution towards estimating the expected location in the static le X of the predecessor
of each query y 2 U . Yao and Yao [YY76] proved IS takes (log log n) average time, for static le X with n real
keys uniformly distributed. Interestingly, their result and the subsequent ones presented below, surpass the time
bounds mentioned in the above paragraph. However the space is not bounded up to O(n) w.r.t. the total bits
stored in the cells. Since this and subsequent results simplify their probabilistic analysis [KMSTTZ06, Sect. 2]
by assuming real keys of arbitrary precision. In [GRG80, PR77, Gon77] several aspects of the IS are described
and analyzed on (almost) uniform and static le X. Willard [Wil85] proved the same time assuming regular
input distributions of real keys, signi cantly extending over the uniform input assumed before. Recently, [DJP04]
considered non-random input data of a static le X that possess enough \pseudo-randomness" for e ective IS
to be applied. The study of a dynamic le X, with uniform insertions/deletions was initiated in [Fre83, IKR81].
In [Fre83] the dynamic data structure supports insertions/deletions of real keys in O(n ); &gt; 0, time and
O(log log n) expected time for IS. The dynamic structure of [IKR81] has expected insertion O(log n) time for
real keys, amortized insertion time O(log2 n) and claimed that supports IS . Mehlhorn and Tsakalidis [MT93]
analyzed a dynamic version of IS, the Interpolation Search Tree (IST) with O(log log n) expected search/update
time for the (f1; f2)-smooth continuous distributions of real keys, which are signi cantly larger than the regular
ones in [Wil85]. Andersson and Mattson [AM93] further generalized and re ned the notion of (f1; f2)-smooth
continuous distributions w.r.t. real keys of [MT93], presented the Augmented Sampled Forest, extending the
class of continuous input distributions with (log log n) expected search time. Notably, they surpassed the lower
bounds of the previous paragraph, achieving w.h.p. O(1) time for bounded densities of real keys. Their use of real
keys made elusive to bound the space w.r.t. the total bits stored. In [KMSTTZ03], with real keys as in [AM93],
a nger search version of (IS) was studied with O(1) worst-case update time (update position given) and w.h.p.
O(log log d) expected search time. The novelty here is that search initiates from a leaf node (pointed by a nger)
of the search tree, d is the distance from the ngered leaf up to the searched key. Then [KMSTTZ06] initiated
the probabilistic analysis for discrete (f1; f2)-smooth distributions with keys of limited bits. It alleviated the
obstacles of previous work about retaining randomness per recursive step. However in [KMSTTZ06] no space
upper bounds w.r.t. the total of bits where proven. Recently, [BHM13] reviews how distributional information
improves time and space. Also [BFHM16] bounds the time w.r.t. the entropy of the input distribution, while
bounds space w.r.t. the universe size. Finally a preliminary work [Cun15] assumes smooth input distribution
w.r.t. queries and claims to achieve O(log log n) time in O(n) space.</p>
      <p>Smooth data distributions. [MT93] introduced the class of (f1; f2)-smooth real distributions, subsequently
generalized by [AM93]. For appropriate f1; f2 parameters the class contains regular [Wil85], as well as bounded
(an extension of uniform) distributions [GRG80, PR77, Gon77]. Recently [KMSTTZ06] introduced the analogous
De nition 1.1 for discrete distributions.</p>
      <p>De nition 1.1 ( [KMSTTZ06]). An unknown discrete probability distribution P is (f1; f2)-smooth w.r.t. the
interval [a; b] that corresponds to the discrete universe U = fa; : : : ; bg of the input keys, if 9 &gt; 0 : 8a c1 &lt;
c2 &lt; c3 b and n 2 N, for a P-random key y to hold:</p>
      <p>P c2
c3</p>
      <p>c1
f1(n)
y
c2jc1
y
c3
=
c2</p>
      <p>X
xi=c2 cf31(nc)1</p>
      <p>P[c1; c3](xi)
f2(n)
n
P[c1; c3](xi) = 0; 8xi &lt; c1; xi &gt; c3 &amp; P[c1; c3](xi) = P(xi)= Pxj2[c1; c3] P(xi), 8xi 2 [c1; c3].</p>
      <p>In De nition 1.1, function f1 partitions an arbitrary subinterval [c1; c3] [a; b] into f1 equal parts, each of
length c3f1c1 = O( f11 ). That is, f1 measures how ne is the partitioning. Function f2 guarantees that no part,
of the f1 possible, gets more probability mass than nf2 ; that is, f2 measures the sparseness of any subinterval
[c2 c3f1c1 ; c2] [c1; c3]. Actually, any probability distribution is (f1; (n))-smooth, for a suitable choice of
. The most general class of unknown (f1; f2)-smooth input distributions (De nition 1.1), so that O(log log n)
expected search times is achievable by IS, was de ned in [AM93] with f1; f2 parameters as: f1(n) = log1+n log n ,
f2(n) = n ; &lt; 1. In our Theorem 3.1 (Section 3) we consider more general parameters: f1(n) = n and
f2(n) = n with constants 0 &lt; &lt; 1; &gt; 0.
2</p>
      <p>Uniform queries, arbitrary data distribution: O(1) time &amp; O(n) space
Theorem 2.1. In a static set X stored by an adversary, regardless of the distribution of the data D, (almost)
uniform queries take O(1) time w.h.p..</p>
      <p>Proof. It su ces to assume that the queries are uniformly distributed, while an adversary selects from the
universe U and stores in le X the n keys in the worst possible way. Our approach is simple: divide the universe
U into n logc n equally sized buckets (c is constant) and store a bitmap in which we set to 1 only non-empty
buckets. This bitmap has size m with redundancy m= logc m with O(1) rank queries [Pat08]. It will contain n
ones (as only n = jXj buckets can be non empty) and n logc n n zeros, thus it can be coded in O(n) space
while supporting predecessor queries in O(1) time. In this bitmap, if a query key lands in an empty bucket i we
immediately get its predecessor which is the largest key in bucket j &lt; i, the closest non empty bucket before
empty bucket i. If a query is chosen uniformly at random, then it lands w.h.p. 1 n longc n = 1 log1c n = 1 o(1)
in an empty bucket and the query is answered in O(1) time. Only when a query lands in a non empty bucket
(which happens with the remaining probability 1= logc n = o(1)) we do need to do more complicated things
(up to O q lolgolgong n = o(logc n) time, see Section 4.3 for this worst-case guarantee) to answer the predecessor
query in !(1) time. In turn, this yields O(1) average time.
3</p>
      <p>Arbitrary queries, smooth data distribution: O(1) time &amp; space O(n1+ ); 8
&gt; 0
In this section we prove our main Theorem 3.1 using the data structure DS illustrated in Figure 1.</p>
      <p>Theorem 3.1. Consider arbitrary queries y 2 Q. Data X 2 D are drawn from unknown f1; f2-smooth
distribution (De nition 1.1), with f1(n) = n , f2(n) = n with constants 0 &lt; &lt; 1; &gt; 0. For any positive constant
&gt; 0 the following hold: a. it requires O(n1+ ) space and time to build and supports (n1+ ) queries, where
Insrt(y; X) and Del(y; X) can occur in arbitrary order, subject to the load constraint that (n) = Cmaxn keys
exist in the DS over all query steps. b. DS supports O(1) query time w.h.p.. c. DS supports O(log log log n)
query time w.h.p., using only n1+ log l1og n = n1+o(1) space (construction time). d. DS supports O(log log n) query
time w.h.p., using O(n) space (built time). e. DS supports O(1) w.h.p. amortized query time. f. DS supports</p>
      <p>During the preprocessing phase that takes O(n1+ ) time, in Section 3.1 we build the upper static vEB tree shown
in Figure 1 and prove time &amp; space bounds in Corollary 3.4. During the operational phase that consists of (n1+ )
queries, the role of this upper static vEB tree is to place each query y 2 Q into the appropriate lower dynamic
bucket (pointed by a leaf), that in turn answers the query y 2 Q by implementing a q -heap. More precisely, in
Section 3.2 we construct the lower dynamic buckets illustrated in Figure 1, which are implemented as q -heaps
with time &amp; space bounds proved in Corollary 3.6. Hence, Theorem 3.1 directly follows by combining Corollary
3.4 and Corollary 3.6. For the worst-case time, we use two dynamic predecessor search data structures [BF02]
with worst case O(q</p>
      <p>lolgolgong n ) query and update times and linear space, see Section 4.3.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>The upper static vEB tree in Figure 1</title>
    </sec>
    <sec id="sec-4">
      <title>Construction of the upper static vEB tree in Figure 1. As it is folklore in the literature of dynamic IS,</title>
      <p>during the preprocessing phase, we draw n keys from U = f0; : : : ; 2bg, according to an unknown f1; f2-smooth
distribution (De nition 1.1), with parameters as in Theorem 3.1: f1(n) = n , f2(n) = n , where 0 &lt; &lt; 1; &gt; 0
are constants. We order increasingly the n keys:</p>
      <sec id="sec-4-1">
        <title>We select 1 &lt; n representative keys (order statistics):</title>
        <p>X = fx1
: : :</p>
        <p>xng
R = fr0; : : : ; r g
(1)
(2)
from (1), setting r0 = 0; r = 2b, for each 0 &lt; i &lt; the (bucket) representative key ri 2 R in (2) is the key
xi (n)n 2 X in (1). These consecutive keys in R in (2) are the endpoints of 1 consecutive buckets that
constitute the lower dynamic part of the data structure (Figure 1). The value (n)n = (log n) is tuned in
Section 4.2 (the proof of Property 3.2) and Section 4.1 (the proof of Property 3.5). Intuitively, these 1 keys
in R are su ciently \ (n)n-apart" such that in each part of partition P (Property 3.2 below) to appear w.h.p.
at most one representative from R. This follows from Property 3.2 (proved in Section 4.2), since each part in P
w.h.p. contains &lt; (n)n keys from X .</p>
        <p>Property 3.2. If we partition by P the interval [0; 2b], with endpoints dictated by the universe U = f0; : : : ; 2bg
of the discrete input keys, into jPj = nC1 ; C1 = O(1), equal sized parts, then w.h.p. each part contains at most
one representative from R in (2).</p>
        <p>Thus we uniquely index with partition P in Property 3.2 each representative key ri 2 R in (2) using log jPj =
C1 log n = O(log n) bits. Let Re this indexing of R in (2) obtained by partition P. We store Re as the nodes of
the upper static vEB data structure (Figure 1), as described in [PT06b, Lem. 17].</p>
        <p>During the operational phase that consists of (n1+ ); 0 &lt; &lt; 1 queries, we use the upper static vEB (Figure
1) and for each query w.r.t. key y 2 Q we determine in O(1) time the iy-th lower dynamic bucket (Figure 1)
with endpoints in Re that satisfy: reiy y &lt; reiy+1. This is done by running query Pred(y; Re) in the upper vEB
tree that returns reiy in O(1) time.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Space &amp; time bounds for the upper static vEB tree in Figure 1. These are derived by Theorem 3.3</title>
      <p>and Corollary 3.4 below.</p>
      <p>Theorem 3.3. [PT06b, Lem. 17] Let be any positive integer. Given any set S [0; m 1] of n keys, we can
build in O(n2 log m) time a static data structure which occupies O(n2 log m) bits of space and solves the static
predecessor search problem in O(log( log m log n )) time per query.</p>
      <p>In our case and terminology, we map the characteristic function of the set Re to a bitstring S with length
jSj = m that equals the number of possible values of the keys stored in Re. To gure out how large is, recall that
each key rei 2 Re is uniquely indexed by P (Property 3.2) with C1 log n bits. Hence the number m of possible
values of the keys in Re are jSj = m = 2C1 log n. Thus by setting = log n we can build the predecessor data
structure on the Re, it takes O(n1+ ) space and answers to queries in time O(log( log m log n )) = O( C1 1 ) = O(1).
O(log( (C1 lo1g)nlog n )) = O(log log log n). Finally by setting</p>
      <p>log log n
and answers to queries in time O(log( log m log n )) = O(log((C1
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>The lower dynamic structure of q -heaps in Figure 1</title>
      <p>Corollary 3.4. Setting = log n, the predecessor data structure built on the Re takes O(n1+ ) space and
answers to queries in time O(log( log m log n )) = O(log( C1 1 )) = O(1). By setting = lolgolgong n , the data structure
built on Re takes O(n1+1= log log n) = O(n1+o(1)) space and answers to queries in time O(log( log m log n )) =
= 1, the data structure built on Re takes O(n) space
1) log n)) = O(log log n).</p>
      <p>In the lower dynamic structure in Figure 1, the i-th bucket is implemented as a q -heap and has endpoints
ri ; rei+1 2 Re, which are indexed with C1 log n bits using P (Property 3.2) that truncates the representatives in
e
ri ; ri+1 2 R in (2). Each query y 2 Q is landed by the upper static vEB tree into the iy-th lower dynamic
bucket with endpoints satisfying: reiy y &lt; reiy+1. The iy-th bucket is located by running query Pred(y; Re) in
the upper vEB tree that returns reiy in O(1) time. Given that Property 3.5 (proved in Section 4.1) holds, we get
in Corollary 3.6 the space &amp; time bounds.</p>
      <p>Property 3.5. During the operational phase that consists of (n1+ ); 0 &lt; &lt; 1 queries, each such lower
dynamic bucket (Figure 1) w.h.p. has load: C3 log n M C5 log n, with C3; C5 = O(1).
Corollary 3.6 (Corollary 3.2, [Wil92]). Assume that in a database of n elements, we have available the use of
precomputed tables of size o(n). Then for sets of arbitrary cardinality M n, it is possible to have available variants
log M
of q -heaps using O(M ) space that have a worst-case time of O(1 + log log n ) for doing member, predecessor, and
rank searches, and that support an amortized time O(1 + lolgogloMg n ) for insertions and deletions.
the worst-case O(q lolgolgong n ) time proved in Section 4.3 yields O(1) average query time.</p>
      <p>To apply Corollary 3.6 in our case, the database is the dynamic le X with n elements, so for our purposes
it su ces to use pre-computed table of size o(n). Also, the sets of arbitrary cardinality M in our case are the
lower dynamic buckets (Figure 1), each with endpoints: [ri ; rei+1), implemented as q -heaps. By Property 3.5
e
(proved in Section 4.1), each such bucket has w.h.p. load within: C3 log n M C5 log n with C3; C5 = O(1).
It follows that w.h.p. each of the q -heap time is O(1 + lolgogloMg n ) = O(1 + log(C5 log n) ) = O(1) and the space per
log log n
q -heap is O(M ) = O(C5 log n) = O(log n). Also, the exponentially small failure probability when multiplied to
4
4.1</p>
      <p>Technical properties</p>
    </sec>
    <sec id="sec-7">
      <title>Proof of Property 3.5</title>
      <p>In the lower dynamic structure in Figure 1, the i-th bucket is implemented as a q -heap and has endpoints
ri ; rei+1 2 Re, which are indexed with C1 log n bits using P (Property 3.2) that truncates the representatives in
e
ri ; ri+1 2 R in (2). Let qi(n) be the i-bucket probability mass of the unknown smooth (De nition 1.1) distribution
P over the part with endpoints these representatives in ri ; ri+1 2 R in (2). That is, qi(n) = Py2U:ri y&lt;ri+1 P(y).
Furthermore, by the construction of the ordering in (2), these ri ; ri+1 are (n)n = (log n)-apart, than is,
during the initialization of the data structure, exactly (n)n = (log n) random keys appear between these
ri ; ri+1. Now, assume the i-bucket bad scenario that this probability mass qi(n) is either qi(n) = !( (n)) or
qi(n) = o( (n)). But, the probability of this i-bucket bad scenario is dominated by the Binomial distribution:
n
(n)n
qi(n) (n)n(1
qi(n))(1
(n))n
" qi(n)
(n)
(n)
which vanishes exponentially in n if qi(n) = !( (n)) or if qi(n) = o( (n)). Note that there are
scenaria (the possible buckets) hence the union bound gives:
&lt; n such bad
n
"
max
i2[ ]
( qi(n)
(n)
(n)
which also vanishes if qi(n) = !( (n)) or if qi(n) = o( (n)). Thus, w.h.p. each bucket has probability mass
( (n)) = ( long n ). From exponentially strong tail bounds, w.h.p. no bucket contains less than C3 log n keys
and more than C5 log n per query step, for appropriate constants C3; C5 = (1).
4.2
We partition the universe into nC1 ; C1 = O(1), equal parts, such each part in P gets expected load log n &lt;
C5 log n during each query step t = 1; : : : ; (n1+ ) of the operational phase. The result follows by exponential
tail bounds w.r.t. the load of the parts in P, since by Property 3.5, each pair ri ; ri+1 2 R in (2) is at least
C5 log n apart, with constant C5 appropriately high.</p>
      <p>We construct P recursively, noting that the endpoints of each P part is a deterministic function of the parameters
f1; f2 as assumed in our main Theorem 3.1. Also in Theorem 3.1 we assume that that per query step t =
1; : : : ; (n1+ ) the total number of stored keys are &lt; Cmaxn = with Cmax a constant. In the 1st recursive
step, P partitions the interval [0; 2b], with endpoints dictated by the universe U = f0; : : : ; 2bg of the discrete
input keys, into f1( ) f1(nt) equally sized parts. Smoothness (De nition 1.1) imply that each P part gets
f2n(ntt) f2n(t ) probability mass per update step t (f2 is increasing w.r.t. n). Here is a constant (De nition
1.1) depending only on the characteristics of unknown distribution. Hence, during each query step t, each P part
gets f2n(t ) nt = keys in expectation of the nt keys currently stored in X. Moreover, the number of
input keys distributed on any such P part has exponentially small probability of deviating from its expectation
. We will not take into account constant in the subsequent deterministic partitioning without loss of
generality. This partitioning is applied recursively within each such P part, until we reach a su ciently small
P part with expected number of keys log n per query step t = 1; : : : ; (n1+ ). Let h be the number of such
recursive partitions, then, it su ces:
h
log n ! h
ln ln ln(n)
ln( )
ln
&lt;
ln ln ln(n)
ln(n)
ln
=
log (ln(n))
(5)
where the strict inequality follows from n &lt; . We upper bound the number of P parts obtained by the
h recursions in (5). In the 1-st partition of the universe U , the number of P parts is f1( ) = f1( 0 ) =
( 0 ) = 0 . In the 2-nd recursive partition, each P part will be further partitioned into f1( ) = f1( 1 ) =
( 1 ) = 1 subparts. In general, in the (i + 1)-th recursive partition an arbitrary P part will be divided into
f1( i ) = ( i ) = i subparts. Taking into account Eq. (5), in the nal level of recursive partition, the total
number jPj of subparts is
h
Y f1
i=0
i</p>
      <p>h
= Y(
i=0
i</p>
      <p>h
) &lt; Y
i=0
i
=</p>
      <p>Pih=0 i
=
nal part of P is at most log (jPj) = 1
log n + log Cm1 ax</p>
      <p>C1 log n.
4.3</p>
      <p>Proof of O(q lolgolgong n ) worst-case time
1. In the rst predecessor data structure which we note by B1 we initially insert all the numbers in interval [1; ].
Then B1 will maintain the set of non empty buckets during the operational phase. The role of B1 is to ensure
the correctness of predecessor queries when some buckets get empty.
2. The second predecessor data structure which we note by B2 is initially empty. The role of B2 is to store all
the over owing elements which could not be stored in the q -heaps corresponding to their buckets.
Handling over ows. In addition, we maintain an array of counters where each counter ci associated with bucket
i stores how many keys are stored inside that bucket and assume that the capacity of the q -heap associated with
each bucket is C = (log n). Now at initialisation, all the buckets have initial load (log n) and all the keys of
any bucket i are stored in the corresponding q -heap. Then at operational phase the insertions/deletions of keys
belonging to a given bucket are directed to the corresponding q -heaps unless it is over own. More precisely
when trying to insert a key x into a bucket number i and we observe that ci C, we instead insert the key x
in B2. Symmetrically, when deleting a key x from a bucket i when ci C proceeds as follows : If the key x is
found into the q -heap, we delete it from there and additionally look into B2 for any key belonging to bucket i
and transfer it to q -heap of bucket i (that is delete it from B2 and insert it into the q -heap). If the key x to be
deleted is not found in the q -heap, we instead try to delete the key from B2. By using this strategy we ensure
that any insertion/deletion in any bucket takes at worst O(q
lolgolgong n ) time and still O(1) time w.h.p.. Queries
corresponding q -heap, then the key x is searched also in B2 in time O(q lolgolgong n ). The event of an over owing
bucket is very rare, so the time of queries remains O(1) w.h.p..</p>
      <p>Handling empty buckets. The data structure B1 will help us handle a subtle problem which occurs when we have
some empty buckets. Suppose that we have a non empty bucket i followed by a range of empty buckets [i + 1; j].
Then the answer to any predecessor search directed towards any bucket k 2 [i + 1; j] should return the largest
element in bucket i. Thus in the highly unlikely case that a predecessor search lands in an empty bucket k
(which is checked by verifying that ck = 0 ) we will need to be able to e ciently compute the largest non empty
bucket index i such that i &lt; k and this can precisely be done by querying B1 for the value k which obviously
will return i as B1 is a predecessor data structure which stores precisely the index of non empty buckets and i is
the largest non empty bucket index preceding k. This last step takes O(q lolgolgog ) = O(q lolgolgong n ) time. What
remains is to show how to maintain B1. For that we only need to insert a bucket i into B1 whenever it gets non
empty after it was empty or to delete it from B1 whenever it gets empty after it was non empty and those two
events (a bucket becoming empty or a bucket becoming non empty) are expected to be rare enough that the
time bound for any insertion/deletion remains O(1) w.h.p..
[Ajt88] M. Ajtai. A lower bound for nding predecessors in Yao' s cell probe model. Combinatorica, 8:235{247,
1988.
[AFK84] M. Ajtai, M. Fredman and J. Komlos. Hash functions for priority queues. Information and Control,
63:217{225, 1984.
[AM93] A. Andersson and C. Mattsson. Dynamic Interpolation Search in o(log log n) Time. In: Proceedings
of 20th Colloquium on Automata, Languages and Programming (ICALP '93), Lecture Notes in Computer
Science, 700:15{27, 1993.
[BF99] P. Beame and F. E. Fich. Optimal Bounds for the Predecessor Problem. STOC '99, pp. 295{304.
[BF02] P. Beame and F. Fich. Optimal Bounds for the Predecessor problem and related problems. Journal of</p>
      <p>Computer and Systems Sciences, 65(1):38{72, 2002.
[BKZ77] P. van Emde Boas, R. Kaas and E. Zijlstra. Design and Implementation of an E cient Priority Queue.</p>
      <p>Mathematical Systems Theory, 10:99{127, 1977.
[BFHM16] P. Bose, R. Fagerberg, J. Howat and P. Morin. Biased Predecessor Search. Algorithmica, 76(4):1097{
1105, 2016.
[BHM13] P. Bose, J. Howat and P. Morin. A History of Distribution-Sensitive Data Structures. Space-E cient
Data Structures, Streams, and Algorithms - Papers in Honor of J. I. Munro on the Occasion of His 66th
Birthday, Lecture Notes in Ccomputer Science, 8066:133{149, 2013.
[Cun15] V. Cunat. Predecessor problem on smooth distributions. At https://arxiv.org/abs/1511.08598, 2015.
[DJP04] E. Demaine, T. Jones and M. Patrascu. Interpolation Search for Non-Independent Data. SODA '04,
pp. 522{523.
[Fel71] W. Feller. An introduction to probability theory and its applications. Vol. II. Second edition. John Wiley
&amp; Sons, Inc., New York-London-Sydney, 1971.
[Fre83] G. Frederickson. Implicit Data Structures for the Dictionary Problem. Journal of the ACM, 30(1):80{94,
1983.
[FKS84] M. Fredman, J. Komlos and E. Szemeredi. Storing a sparse table with O(1)worst case access time.</p>
      <p>Journal of the ACM, 31:538{544, 1984.
[FW90] M. L. Fredman and D. E. Willard. Blasting through the Information Theoretic Barrier with fusion
trees. In: Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC '90), pp. 1{7,
Baltimore, Maryland, USA.</p>
      <sec id="sec-7-1">
        <title>Time-Space</title>
      </sec>
      <sec id="sec-7-2">
        <title>Trade-O s for</title>
      </sec>
      <sec id="sec-7-3">
        <title>Predecessor Search. At [PT07] M. Patrascu and M. Thorup. Randomization does not help searching predecessors. In: Proceedings of</title>
        <p>the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA '07), New Orleans, Louisiana, USA, pp.
555{564, 2007.
[PG92] Y. Perl, L. Gabriel. Arithmetic Interpolation Search for Alphabet Tables. IEEE Transactions on
Computers, 41(4):493{499, 1992.
[PIA78] Y. Perl, A. Itai and H. Avni. Interpolation Search { A log log N Search. Communications of the ACM,
21(7):550{554, 1978.
[PR77] Y. Perl, E. M. Reingold. Understanding the Complexity of the Interpolation Search. Information
Processing Letters, 6(6):219{222, 1977.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>[FW93] M. L. Fredman</surname>
            and
            <given-names>D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Willard</surname>
          </string-name>
          .
          <article-title>Surpassing the information theoretic bound with fusion trees</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>47</volume>
          :
          <fpage>424</fpage>
          {
          <fpage>436</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Gon77]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonnet. Interpolation</surname>
          </string-name>
          and
          <string-name>
            <surname>Interpolation-Hash Searching</surname>
          </string-name>
          .
          <source>PhD Thesis</source>
          , University of Waterloo, Waterloo,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [GRG80]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rogers</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>George</surname>
          </string-name>
          .
          <article-title>An Algorithmic and Complexity Analysis of Interpolation Search</article-title>
          .
          <source>Acta Informatica</source>
          ,
          <volume>13</volume>
          :
          <fpage>39</fpage>
          {
          <fpage>52</fpage>
          ,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Gra06]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graefe</surname>
          </string-name>
          .
          <article-title>B-tree indexes, interpolation search, and skew</article-title>
          .
          <source>In: Proceedings of the Workshop on Data Management on New Hardware (DaMoN '06)</source>
          , Chicago, Illinois, USA, June 25,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [IKR81]
          <string-name>
            <given-names>A.</given-names>
            <surname>Itai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Konheim</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodeh</surname>
          </string-name>
          .
          <article-title>A Sparse Table Implementation of Priority Queues</article-title>
          .
          <source>In: Proceedings of the 8th ICALP, Lecture Notes in Computer Science</source>
          ,
          <volume>115</volume>
          :
          <fpage>417</fpage>
          {
          <fpage>431</fpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [KMSTTZ03]
          <string-name>
            <given-names>A.C.</given-names>
            <surname>Kaporis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Makris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sioutas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tsichlas</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaroliagis</surname>
          </string-name>
          .
          <article-title>Improved bounds for nger search on a RAM</article-title>
          .
          <source>ESA '03, Lecture Notes in Computer Science</source>
          ,
          <volume>2832</volume>
          :
          <fpage>325</fpage>
          {
          <fpage>336</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [KMSTTZ06]
          <string-name>
            <given-names>A.C.</given-names>
            <surname>Kaporis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Makris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sioutas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tsichlas</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaroliagis</surname>
          </string-name>
          .
          <article-title>Dynamic Interpolation Search Revisited</article-title>
          . ICALP '
          <volume>06</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [MT93]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mehlhorn</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          .
          <article-title>Dynamic Interpolation Search</article-title>
          .
          <source>Journal of the ACM</source>
          ,
          <volume>40</volume>
          (
          <issue>3</issue>
          ):
          <volume>621</volume>
          {
          <fpage>634</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>[Mil94] P. B. Miltersen</surname>
          </string-name>
          .
          <article-title>Lower bounds for Union-Split-Find related problems on random access machines</article-title>
          .
          <source>In: Proceeding of the 26th Annual ACM Symposium on Theory of Computing (STOC '06)</source>
          , pp.
          <volume>625</volume>
          {
          <issue>634</issue>
          , Montreal, Quebec, Canada,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>[MNSW95] P. B. Miltersen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Nisan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Safra</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Wigderson</surname>
          </string-name>
          .
          <article-title>On data structures and asymmetric communication complexity</article-title>
          .
          <source>STOC `95</source>
          , pp.
          <volume>103</volume>
          {
          <fpage>111</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>[MNSW98] P. B. Miltersen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Nisan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Safra</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Wigderson</surname>
          </string-name>
          .
          <article-title>On Data Structures and Asymmetric Communication Complexity</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>57</volume>
          (
          <issue>1</issue>
          ):
          <volume>37</volume>
          {
          <fpage>49</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Pag00]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pagh</surname>
          </string-name>
          .
          <article-title>Faster deterministic dictionaries</article-title>
          .
          <source>In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '00)</source>
          , pp.
          <volume>487</volume>
          {
          <issue>493</issue>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Pat08]
          <string-name>
            <given-names>M.</given-names>
            <surname>Patrascu</surname>
          </string-name>
          . Succincter.
          <source>FOC'S 08</source>
          , pp.
          <volume>305</volume>
          {
          <fpage>313</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [PT06a]
          <string-name>
            <given-names>M.</given-names>
            <surname>Patrascu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Thorup</surname>
          </string-name>
          .
          <article-title>Time-space trade-o s for predecessor search</article-title>
          .
          <source>In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC' 06)</source>
          , Seattle, WA, USA, pp.
          <volume>232</volume>
          {
          <issue>240</issue>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [PT06b]
          <string-name>
            <given-names>M.</given-names>
            <surname>Patrascu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Thorup</surname>
          </string-name>
          . https://arxiv.org/abs/cs/0603043,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Pet57]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Peterson</surname>
          </string-name>
          .
          <article-title>Addressing for Random Storage</article-title>
          .
          <source>IBM Journal of Research and Development</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <volume>130</volume>
          {
          <fpage>146</fpage>
          ,
          <year>1957</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Sen03]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          .
          <article-title>Lower bounds for predecessor searching in the cell probe model</article-title>
          .
          <source>IEEE Conference on Computational Complexity</source>
          , pp.
          <volume>73</volume>
          {
          <issue>83</issue>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [SV08]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkatesh</surname>
          </string-name>
          .
          <article-title>Lower bounds for predecessor searching in the cell probe model</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>74</volume>
          :
          <fpage>364</fpage>
          {
          <fpage>385</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Yao81]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Yao</surname>
          </string-name>
          . Should tables be sorted?
          <source>Journal of the ACM</source>
          ,
          <volume>28</volume>
          (
          <issue>3</issue>
          ):
          <volume>615</volume>
          {
          <fpage>628</fpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [YY76]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Yao</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Yao</surname>
          </string-name>
          .
          <article-title>The Complexity of Searching an Ordered Random Table</article-title>
          .
          <source>In: Proceedings of the 17th IEEE Symposium on Foundations of Computer Science (FOCS '76)</source>
          , pp.
          <volume>173</volume>
          {
          <issue>177</issue>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Wil83]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Willard</surname>
          </string-name>
          .
          <article-title>Log-Logarithmic Worst-Case Range Queries are Possible in Space (N )</article-title>
          .
          <source>Information Processing Letters</source>
          , (
          <volume>17</volume>
          )2:
          <fpage>81</fpage>
          {
          <fpage>84</fpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Wil85]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Willard</surname>
          </string-name>
          .
          <article-title>Searching Unindexed and Nonuniformly Generated Files in log log N Time</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>14</volume>
          (
          <issue>4</issue>
          ):
          <volume>1013</volume>
          {
          <fpage>1029</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Wil92]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Willard</surname>
          </string-name>
          .
          <article-title>Applications of the Fusion Tree Method to Computational Geometry and Searching</article-title>
          .
          <source>In: Proceedings of the 3rd ACM-SIAM Symposium on Discrete Algorithms (SODA '92)</source>
          , pp.
          <volume>286</volume>
          {
          <issue>295</issue>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Xia92]
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>New bounds in cell probe model</article-title>
          .
          <source>PhD thesis</source>
          , UC San Diego,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>