Metric Pseudoentropy: Characterizations
                  and Applications

                                   Maciej Skorski

            Cryptology and Data Security Group, University of Warsaw
                           maciej.skorski@gmail.com


       Abstract. Metric entropy is a computational variant of entropy, often
       used as a convenient substitute of HILL Entropy, slightly stronger and
       standard notion for entropy in cryptographic applications. In this paper
       we develop a general method to characterize metric-type computational
       variants of entropy, in a way depending only on properties of a chosen
       class of test functions (adversaries). As a consequence, we obtain a nice
       and elegant geometric interpretation of metric entropy. We apply these
       characterization to simplify and modularize proofs of some important re-
       sults, in particular: (a) computational dense model theorem, (b) deriva-
       tion of the improved version of Leftover Hash Lemma and (c) equivalence
       between unpredictability entropy and HILL entropy for short strings.


1     Introduction

1.1   Computational Entropy

Entropy. Entropy, as a measure of uncertainty or randomness, is a fundamen-
tal notion in information-theory. The most known metric of entropy is Shannon
Entropy [15]. For cryptographic applications such as extracting randomness, it is
more convenient to work with so called min-entropy, which gives an upper bound
on the probability that computationally unbounded adversary can guess a value
sampled according to a given distribution. A slightly weaker but also very useful,
especially in the context of hashing, is the notion of collision entropy which up-
perbounds the probability that two independent samples of a given distribution
collide.
Defining computational variants of entropy. Computational analogues
of entropy can be defined in different ways. In any case, we need to formalize
that a distribution has, from a computational point of view, the same of almost
the same properties like a distribution having “true” information-theoretic en-
tropy. This might be based on hardness of compressing-decompressing, hardness
of prediction or hardness of distinguishing. In this paper we follow the last ap-
proach, which is most widely used. A good survey of different entropy notions
and their properties can be found in [3] and [12]. We stress that, contrarily to the
information-theoretic case, for computational entropy it’s not only the amount
of entropy that matters but also its quality is important.
                  Metric Pseudoentropy: Characterizations and Applications      91

Computational Indistinguishability. Indistinguishability is a fundamental
concept in computational complexity and cryptography. For two distributions
X, Y taking values in the same space, a class D of [0, 1]-valued functions (refere-
ed to as the “attackers class”) and a parameter  (refereed to as the “distin-
guishing advantage”), we say that X and Y are (D, )-indistinguishable if for all
D ∈ D we have | E D(X) − E D(Y )| 6 . An attacker D can distinguish X and
Y if E D(X) − E D(Y ) > 0 or E D(X) − E D(Y ) < 0, and the far from 0 this
difference is, the better “advantage” he achieves. Sometimes we want to define
indistinguishability between two sets X and Y of probability distributions. We
can formalize this by saying that no single adversary D can achieve bigger than
0 advantage for every pair (X, Y ) where X comes from X and Y comes from Y.
Since the expectation E D(X) can be thought as the scalar product of vectors
representing D and the distribution of X, the concept of indistinguishability is
exactly the same concept as the idea of separating hyperplanes.
Computational Entropy. Having formalized the concept of “computational
closeness”, one can define the “computational” entropy, called also pseudoen-
tropy, of a distribution X by one of the following ways:
(a) (stronger) X is computationally indistinguishable from a single distribu-
    tion having required amount of information-theoretic entropy (min-entropy,
    Shannon Entropy etc.)
(b) (weaker) is computationally indistinguishable from a set of all distributions
    having required amount of information-theoretic entropy.
Both approaches turn out to be useful. Setting the underlying information-
theoretic entropy measure to be the min-entropy, for case (a) we obtain the
notion of HILL entropy [9] which directly generalizes the notion of pseudoran-
domness, whereas for case (b) we get the notion of the so called Metric Entropy
[3]. Roughly speaking, with HILL entropy one generalizes most of information-
theoretic facts about entropy, into the computational setting. Metric entropy
is commonly thought as a less intuitive and understood notion than HILL en-
tropy. Quite surprisingly it has been proven to be technically more convenient
in many problems. The typical approach is to work with metric entropy and to
convert it to HILL entropy (which is possible with some loss in quality [3]). For
example, the use of metric entropy simplifies and improves the proof of the com-
putational variant of the dense model theorem [2], applicable in leakage-resilient
cryptography [7]. Notions of pseudoentropy have found also important applica-
tions in general complexity theory, for example in [18] a HILL-like variant of
Shannon entropy is used to simplify the construction of a PRG from a one-way
function. These two examples show also that the notion of pseudoentropy is
a key ingredient of important or even breakthrough results and as such is worth
of studying.
Worst Case Distributions. In problems which involve computational indis-
tinguishability it is often convenient to know the distributions which makes the
attacker’s advantage maximal. This distribution is typically subjected to some
entropy restrictions. In particular, one might ask the following question
92       M. Skorski

      Given D and X, what is the best (minimal) attacker advantage |∆D | =
      | E D(X) − E D(Y )| over all distributions Y of entropy as least k?

An answer to this question yields a bound on how (computationaly) close is X
to the set of all distributions of entropy k. Such problems arises naturally where
one uses HILL and Metric entropy, see for instance [3,5,18].


1.2    Our Results

Summary of our contribution. As mentioned, the concept of characterizing
the “worst case” distribution which optimizes the attacker advantage is very
common, thought not always explicitly stated [2,3,5,14]. In this paper we give a
uniform treatment of this idea and use to obtain characterizations for pseudoen-
tropy and other interesting corollaries.
Characterizing Metric Pseudoentropy via Optimizing Attacker’s
Advantage. Using standard constrained optimization techniques, we develop
a general method to characterize metric-type pseudoentropy. A characterization
is based on explicitly calculating the distribution which minimizes the attacker’s
advantage, subject to entropy constraints. These characterizations could be used
in studying properties of variants of pseudoentropy based on entropy different
than min-entropy. In particular, they could be applied in studying the problem
of comparing the amount of metric pseudoentropy against deterministic and
randomized adversaries, or verifying the so called “chain rule”. We also unify
the definitions of metric and HILL entropy in a nice geometric way.
Applications: the power of pseudoentropy characterizations. Our
technique leads to interesting corollaries besides the basic properties of pseu-
doentropy. From the characterization of metric pseudo-entropy we immediately
obtain the computational Dense Model Theorem [2,7,14]. Extending our cha-
racterization into the conditional case when side information is available to the
attacker, we reprove equivalence between unpredictability and indistinguisha-
bility based definition of pseudoentropy for short strings [18]. Finally, from the
characterization of collision-pseudoentropy we derive the improved Leftover Hash
Lemma 1. Our results show that metric entropy is a powerful tool which deserves
the systematic study.


2     Preliminaries

Entropy notions. The min-entropy of a distribution X equals P             H∞ (X) =
− log(maxx Pr[X = x]). The collision entropy of X is H2 (X) = − log( x Pr[X =
x]2 ). If there is side information Z, we define the average conditional min-entropy
[6] of X given Z by H     e ∞ (X|Z) = − log(Ez←Z maxx Pr[X = x|Z = z]).
Computational advantage. The advantage of an attacker D in distinugish-
ing random variables X and Y , which take values in the same space, is defined
to be ∆D (X; Y ) = E D(X) − E D(Y ).
                  Metric Pseudoentropy: Characterizations and Applications         93

Computational Entropy. There is many ways to define computational ana-
logues of entropy. We follow the most popular approach, which is based on the
concept of computational indistinguishability.
Definition 1 (HILL Pseudoentropy [9]). Let X be a distribution with the
following property: there exists Y of min-entropy at least k such that for all
circuits D of size at most s we have |∆D (X; Y )| 6 . Then we say that X has k
                                                           HILL,(s,)
bits of HILL min-entropy of quality (s, ) and denote by H∞           (X) > k.

Remark 1 (HILL entropy against different circuits classes). It is known that
for HILL entropy all kind of circuits: deterministic boolean, deterministic real
valued and randomized boolean, are equivalent (for the same size s). That’s why
we can abbreviate the notation and omit declaring circuits type in Definition 1.

Definition 2 (Metric Pseudoentropy [3]). Let X be a distribution with the
following property: for every deterministic boolean (respectively: deterministic
real valued or boolean randomized) circuit D of size at most s there exists Y
of min-entropy at least k such that |∆D (X; Y )| 6 . Then we say that X has
k bits of deterministic (respectively: deterministic real valued or boolean ran-
                                                                  M,det{0,1},(s,)
domized) metric min-entropy of quality (s, ) and denote by H∞                     (X)
                M,det[0,1],(s,)          M,rand{0,1},(s,)
(respectively: H∞                (X) and H∞                 (X)).

Definitions of HILL and metric entropy for entropy notions different than min-
entropy, for instance collision entropy can be obtained by replacing min-entropy
with collision entropy in Definition 1 and Definition 2.
Remark 2 (Metric Entropy against different circuits class). For metric min-
entropy, it does not matter if the deterministic circuits are boolean or real valued
(see [12] and the errata of [3]). However, this is not true for the conditional case
and does not extend to other entropy notions.
Computational Entropy - Side information. Sometimes we assume that
information Z correlated to X might be available to an adversary.
Definition 3 (Conditional HILL Pseudoentropy [10]). Let X, Z be a joint
distribution with the following property: there exists Y of average conditional
min-entropy at least k given Z such that for all circuits D of size at most s
we have |∆D (X, Z; Y, Z)| 6 . Then we say that X given Z has k bits of HILL
                                              HILL,(s,)
min-entropy of quality (s, ) and denote by H∞           (X|Z) > k.

Remark 3 (HILL entropy against different circuits classes). Similarly to Remark
2, here all kinds of circuits: deterministic boolean, deterministic real valued and
randomized boolean, are equivalent (for the same size s).

Definition 4 (Conditional Metric Pseudoentropy [2]). Let X, Z be a joint
distribution with the following property: for every deterministic boolean (respec-
tively: deterministic real valued or boolean randomized) circuit D of size at most
s there exists Y of average conditional min entropy at least k given Z such
94         M. Skorski

that |∆D (X, Z; Y, Z)| 6 . Then we say that X given Z has k bits of deter-
ministic (respectively: deterministic real valued or boolean randomized) metric
                                                  M,det{0,1},(s,)
min-entropy of quality (s, ) and denote by H∞                     (X|Z) (respectively:
  M,det[0,1],(s,)            M,rand{0,1},(s,)
H∞                 (X|Z) and H∞                 (X|Z)).

There is a variant of conditional pseudoentropy where (X, Z) is required to be
computationally close to (Y, Z 0 ) but Z 0 is not necessarily the same as Z. This no-
tion is called the “relaxed” HILL entropy [12] and denoted by HHILL−rlx,(s,) (X)
(for metric variants HM−rlx,det{0,1},(s,) (X) and HM−rlx,det[0,1],(s,) (X) ). Typ-
ically we want Z to be the same as Z 01 but this relaxed notion is also useful
[8,12]. It satisfies the so called chain rule, a property desired in leakage-resilient
cryptography, which doesn’t hold for HILL entropy [11].
Relations between HILL and Metric Pseudoentropy. For any “reaso-
nable” notion of (information-theoretic) entropy, metric and HILL variants are
equivalent up to some loss in quality parameters s, .

Lemma 1 (HILL vs Metric Pseudoentropy, [3]). Let H be an entropy
notion which is concave2 . Then for any n-bit random variable X we have
                                   0   0
                          HHILL,(s , ) (X) > HM,det[0,1],(s,) (X)

where δ ∈ (0, 1) is arbitrary, s0 = O s · δ 2 /n and 0 =  + δ. The same
                                                
                                                                        is true
                                                                                       2
for conditional pseudoentropy and relaxed pseudoentropy, with s0 = O s · n+m
                                                                          δ

where m is the length of Z.


3      Characterizing Metric Pseudoentropy

In what follows we assume that H is a concave entropy notion (like min-entropy
or collision entropy), and that all distributions and distinguishers are over {0, 1}n .


3.1      Connections to Separating Hyperplanes

We start with the following simple observation, which gives a nice geometrical
formulation of the definition of pseudo-entropy. We say that the sets X and Y of
probability distributions are (D, )-indistinguishable if there exists no adversary
D such that | E D(X) − E D(Y )| >  for all X ∈ X and all Y ∈ Y. It is easy to
see that if X and Y are convex and if D is closed under complements (that is
D ∈ D implies 1 − D ∈ D) then this is equivalent to

     There is no D ∈ D such that: E D(X) − E D(Y ) >  for all X ∈ X, Y ∈ Y.
1
     For instance, when Z represents information that adversary might have learned.
2
     That is, a convex combination of distributions with entropy at least k is a distribution
     with entropy at least k. This assumption is fulfilled for most notions, for example
     for all Renyi entropies which include min-entropy and collision entropy
                  Metric Pseudoentropy: Characterizations and Applications        95

We can interpret the expectation E D(X) as the scalar product hD, PX i by
                                                           n
identifying D and distributions of X with the vectors in R2 . Hence we can
write the above condition as

      There is no D ∈ D such that: hD, PX − PY i >  for all X ∈ X, Y ∈ Y,

which means that the distinguisher D is precisely a separating hyperplane. If D
is a circuit class, X = {X} and Y = {Y : H(Y ) > k} we obtain3

Corollary 1 (Alternative definitions of metric and HILL entropy). Let
X be an n-bit random variable and let H be a concave entropy notion. Then

(a) HHILL,(s,) (X) > k iff X is (D, )-indistinguishable from some Y of entropy
    H at least k, where D is the class of boolean circuits4 of size s with n-inputs.
(b) HM,det{0,1},(s,) (X) > k iff X is (D, )-indistinguishable from the set of all
    Y of entropy H at least k,

where D is the class of all deterministic boolean circuits of size s with n-inputs
(analogously for randomized and deterministic real valued circuits).


3.2    Reduction to Constrained Optimization

By the “geometric” view on pseudoentropy, given in Corollary 1, we obtain the
following characterization of pseudoentropy.

Lemma 2 (Characterization of metric pseudoentropy). Let X and
H be as in Corollary 1. Then HM,det{0,1},(s,) (X) > k, respectively
HM,det[0,1],(s,) (X) > k if and only if for every boolean (respectively real va-
lued) deterministic circuit D of size at most s we have

                             E D(X) 6 E D(Y ∗ ) + ,

where Y ∗ is optimal to the following optimization problem

                              maximize     E D(Y )
                                     Y
                                                      .                          (1)
                              s.t.        H(Y ) > k

This results is useful if we can solve the optimization problem in Eqzatuib (1).
In the next subsections we explain how to solve it in general and discuss the two
concrete and simple cases: min-entropy and collision entropy.
3
  We can assume that the class circuits of size at most s is closed under complements
  because every complement is of size at most s + 1. Formally we need to start with
  size s0 = s + 1 but we omit this negligible difference
4
  Randomized or deterministic- it makes no difference
96      M. Skorski

3.3   Maximizing Expectations Under Convex Constraints

We can characterize optimal solutions of [1] in terms of Lagrange multipliers.
Due to convexity, the characterization is both: necessary and sufficient.

Lemma 3 (Maximizing expectation under convex constraints). Let f
be a differentiable convex real-valued function on Rd . Assume that a is a number
such that minp f (p) < a where the minimum is over all probability vectors, and
consider the following optimization program
                                          X
                             maximize         Di pi
                               (pi )i
                                               i
                                             
                                              f (p) 6 a
                                             
                                                        .                      (2)
                                                −pi 6 0
                                             
                            s.t.               X
                                                  pi = 1
                                             
                                             
                                             
                                             
                                                   i

   Then a feasible point p = p∗ is optimal to [2] if and only if there exist
λ1 > 0, λ2 > 0 and λ3i ∈ R for i = 1, . . . , m such that the following relations
hold

                Di = λ1 (∇f (p∗ ))i − λ3i + λ2         for i = 1, . . . , m     (3)

and the following complementary condition is satisfied:

                                        pi · λ3i = 0                            (4)

Proof. The Slater Constraint Qualification holds, by the assumption on a, and
we have strong duality. In other words, the first order Karush-Kuhn-Tucker
condition is sufficient and necessary [4]. The numbers λ1 , λ2 , λ3i are exactly KKT
multipliers for the convex program in Equation (2), and Equation (3) states that
the gradient of the objective function is a combination of gradients of constraints.
The condition in Equation (4) means that we take only active constraints into
account. Finally, to the inequality constraints we assign non-negative multipliers
which explains the requirement λ1 > 0 and λ3i > 0.                                 t
                                                                                   u

Remark 4. If f is not differentiable, we replace the gradient of f in optimality
conditions by the subdifferential of f , which always exists for a convex function.


3.4   Characterization of Metric Min Entropy

For H = H∞ we obtain from Lemma 3 the following simple characterization of
pseudoentropy based on min-entropy (see [3] for a restricted variant)

Theorem 1 (Characterization of metric min-entropy). Let X be an n-bit
            M,det{0,1},(s,)                        M,det[0,1],(s,)
r.v.. Then H∞                (X) > k, respectively H∞                (X) > k if and
                   Metric Pseudoentropy: Characterizations and Applications         97

only if for every boolean (respectively real valued) deterministic circuit D of size
at most s with n inputs we have
                              E D(X) 6 E D(Y ∗ ) + ,

where Y ∗ is uniform over the set of 2k values of x which correspond to the biggest
values of D(x).
Extending Lemma 3 by adding additional constraints, to cover the case of side
information, we obtain the characterization of conditional metric entropy
Theorem 2 (Characterization of conditional metric min-entropy). Let
                                                                    M,det{0,1},(s,)
X and Z be, respectively, n and m-bit random variables. Then H∞                      (X)
                    M,det[0,1],(s,)
> k (respectively H∞                 (X) > k) iff for every boolean (respectively real
valued) deterministic circuit D of size at most s on {0, 1}n+m we have
                           E D(X, Z) 6 E D(Y ∗ , Z) + ,
for Y ∗ such that Y ∗ |Z = z is uniform over the set {D(x, z) > t(z)} for every z,
where the thresholds t(z) satisfy the following two conditions
                               E E max(D(x, z) − t(z)) = const         for all z
                             x←Un

       E [1/# {x : D(x, z) > t(z)}] 6 2−k 6 E [1/# {x : D(x, z) > t(z)}] .
      z←Z


3.5   Characterization of Metric Collision Entropy
The characterization of the worst-case collision entropy distribution is slightly
different. It is proportional to a distinguisher, after taking a threshold.
Theorem 3 (Characterization of metric collision entropy). Let X be an
                   M,det{0,1},(s,)                        M,det[0,1],(s,)
n-bit r.v.. Then H2                 (X) > k, respectively H2                (X) > k if
and only if for every boolean (respectively real valued) deterministic circuit D of
size at most s with n inputs we have
                              E D(X) 6 E D(Y ∗ ) + ,
where Y ∗ satisfies λ · PY ∗ (x) = max(D(x) − t, 0) for some t ∈ R and λ > 0.
                                                                        2
Remark 5. Note that t is a solution of E D0 (U )2 = 2n−k E D0 (U ) where
D0 (x) = max(D(x) − t, 0) p               E D0 (U ). It follows that E D0 (Y ∗ ) =
                               and λ = 2n √
2n−k E D0 (U ) = E D0 (U ) + VarD0 (U ) · 2n−k − 1.


4     Applications
4.1   Computational Dense Model Theorem
We say that a distribution A is γ-dense in B if we have Pr[A = x] 6 Pr[B = x]/γ.
The Dense Model Theorem is the statement of the following form: if X is (s, )-
indistinguishable from the uniform distribution R and X 0 is γ-dense in X, then
98        M. Skorski

there exists a distribution R0 which is γ-dense in R and is (s0 , 0 )- indistinguish-
able from X 0 , where s0 and 0 depends as explicit functions on s and . In this
sense, R is a dense “model” for X 0 . The dense model theorem was proved first by
Tao and Ziegler [17]. It’s efficient versions5 have found important applications
in complexity theory and cryptography [2,7,14], see also [16]. Below we recall
a version with improved parameters, stated in language of pseudoentropy and
called the “leakage lemma”:
Theorem 4 (Leakage Lemma [2,7]). Let X be an n-bit random variable
                 HILL,(s,)
such that H∞                (X) > k and let Z be correlated with X. Then we have
  HILL,(s0 ,0 )
                 (X|Z=z ) > k 0 where k 0 = k − log(1/ Pr[Z = z]), s0 = O s · δ 2 /n
                                                                                     
H∞
and 0 = / Pr[Z = z] + δ, for any δ ∈ (0, 1).
The lemma states that the amount of pseudoentropy due to leakage of t bits
of information decreases roughly by t, hence its name. The original proof was
simplified by the use of metric entropy [2]. We show how it can be simplified
even further: just few lines using the basic facts about metric entropy!
Proof. If we can prove that

     HM,det{0,1},(s,/
      ∞
                       Pr[Z=z]))             M,det{0,1},(s,)
                                 (X|Z=z ) > H∞                (X) − log(1/ Pr[Z = z])

then the result will follow by Lemma 1 and Remark 2. Note that by Theorem 1
                                        (X) > k if and only if E D(X) 6 |D|
                       M,det{0,1},(s,)
for any X we have H∞                                                    2k
                                                                            +  for
all boolean D of size at most s. From this we get

        E D(X|Z=z ) 6 E D(X)/ Pr[Z = z] 6 |D|/2k Pr[Z = z] + / Pr[Z = z]

for any D. Since the characterization is also sufficient, the results follows.          t
                                                                                        u

4.2      Equivalence of HILL Entropy and Unpredictability Entropy for
         Short Strings
Unpredictability entropy. The notion of unpredictability entropy is based
on the (assumed) hardness of guessing X given auxiliary information Z. More
formally, we have HUnp,s (X|Z) > k if and only if no adversary of size at most s
can predict X given Z better than with probability 2−k . For Z independent of
X or of the relatively short length, this reduces to the min-entropy of X 6 .
Seperation from HILL entropy. If f is a one-way function, U is the uniform
distribution and X = U, Z = f (U ) then we see that X|Z has large amount of
unpredictability. It is also easy to see that X|Z has almost no HILL entropy.
Equivalence for short strings. On the positive side, using metric entropy
and the characterization in Theorem 2, we reprove the following result of Vadhan
and Zheng who established the equivalence when X is short7
5
  With the loss at most poly(1/δ) in s and . In the original proof the loss is exp(1/δ)
6
  Provided that s > 2m n so that the adversary can hardcore his best guess.
7
  Logarithmically in the security parameter
                  Metric Pseudoentropy: Characterizations and Applications        99

Theorem 5 ([18]). Suppose that X and Z are, respectively, n and m-bit ran-
                     HILL,(s0 ,)
dom variables. Then H∞            (X|Z) & HUnp,s (X|Z) with s0 = poly(2sn ,1/) .
The original proof is based on a result similar to Theorem 2 proved in a much
more complicated way. We note that this part is a trivial consequence of KKT
optimality conditions and also simplify the rest of the proof.
                                  M,det[0,1],(s0 ,)
Proof (Sketch). We prove that H∞              (X|Z) < k implies HUnp,s (X|Z) <
k. Suppose not, then we have E D(X, Z) − E D(Y, Z) >  for all Y such that
e ∞ (X|Z) > k. Let Y ∗ be the distribution which minimizes this expression,
H
that is which maximizes E D(Y, Z). Let t(z) be P as in Theorem 2 and denote
D0 (x, z) = max(D(x, z) − t(z), 0) and let λ = x D0 (x, z) (according to Theo-
rem 2 this sum does not depend on z). Consider the following predictor A:

 On input z sample x according to the probability Pr[A(z) = x] = D0 (x, z)/λ

Note that Y ∗ |Z=z is uniform over the set {x : D0 (x, z) > 0}. By Theorem 2 (the
sufficiency part) it follows that Y ∗ is also maximal for D. For every z we have
E D0 (Y ∗ |Z=z , z) = E D(Y ∗ |Z = z, z) − t(z). We have also E D0 (X|Z=z , z) >
E D(X|Z=z , z) − t(z) by the definition of D0 . This proves

        E D0 (X, Z) − E D0 (Y, Z) >  for all Y such that H
                                                          e ∞ (X|Z) > k.

It is easy to observe that

                  E D0 (X, Z)        E D0 (Y |Z=z , z)
                                                      
                                                                    ∗
  Pr [A(Z) = X] =             > E     P 0                > E 2−H∞ (Y |Z=z )
 z←Z                   λ        z←
                                        x D (x, z)
                                                          z←Z


which is at least 2−k . The circuit D0 (x, z) is of complexity 2m · size(D), which is
too big. However, if the domain of x is small, we can approximate the numbers
t(z) given λ from relations in Theorem 2 (and even λ, from the second relation,
for the uniform setting). Indeed, knowing that E max(D(U, z) − t(z)) = λ, we
estimate E max(D(U, z) − t) for fixed t and then find a “right” value t = t(z)
by the binary search. This way for every z we can approximate D0 (·, z), and
hence the distribution Pr[A(z) = x], up to a maximal error δ  2−k and with
overwhelming probability 1 − exp(−poly(1/δ)), using poly(1/δ) samples of D.
On average over z we predict X with probability 2−k − δ ≈ 2−k .                    t
                                                                                   u

4.3   Improved Leftover Hash Lemma for Square-Secure Applications
In the key derivation problem we want to derive a secure m-bit key for some
application P from an imperfect source of randomness X. The generic approach
is to use a randomness extractor. However, as implied by the RT-bounds [13],
the min-entropy in X needs to be at least m + 2 log(1/) if we want the derived
key to be -secure. Fortunately, as shown by Barak et al. [1], for many crypto-
graphic applications, one can reduce this loss by half, that is to L = log(1/).
To this end, they introduce the class of square-secure applications, where the
100     M. Skorski

squared advantage, over the uniform choice of keys, of every bounded attacker is
small8 . This class contains for example all unpredictability applications, stateless
chosen plaintext attack secure encryption and weak pseudo-random functions.
The reduction of entropy loss follows by combining universal hashing with the
following lemma
Lemma 4 ([1]). For a function D : {0, 1}` → [−1, 1] and X ∈ {0, 1}` of colli-
sion entropy k we have
                                   p            p
                E D(X) 6 E D(U` ) + VarD(U` ) · 2`−k − 1.

To see this, let WinA (r, h), for arbitrary attacker A ∈ A, be the probability that
A breaks the key r given in addition9 h and let DA (r, h) = WinA (r, h) − 12 be its
advantage. Let X be any n-bit random variable of min-entropy m + log(1/). We
apply a randomly chosen universal hash function10 H from n to m bits. It is easy
to see that H(X), H is a distribution with collision entropy m+log |H|−log(1+).
From the lemma it follows now that
                                                  p                √
              E DA (H(X), H) 6 E DA (U, H) + VarDA (U, H) · 

If we assume that maxh E DA (U, h) 6  (which means -security against A with
the uniform key) and that maxh E DA (U, h)2 6 σ with σ = O () (which means
σ-square-security against A with the uniform key) then we achieve O() security
for the extracted key, with entropy loss only log(1/).
An alternative proof. We show that Theorem 3 implies Lemma 4. Indeed,
set k = ` and  = 0 in Theorem 3. Let Y ∗ be the distribution of collision
entropy at least k = ` which maximizes E D(Y ), and let t, λ and D0 be as in the
characterization. Denote S = {x : D(x) > t} and let D|S be the restriction of
                                   d
D to the set S. Note that Y ∗ |S = Y ∗ maximizes D|S and D|S (x) = D0 |S (X) + t
for every x ∈ S. By Remark 5 we get
                                                   p              q
E D(X) 6 E D(Y ∗ ) = E D|S (Y ∗ |S) = E D|S (US )+ VarDS (US ) · |S|2−k − 1.

We show that one can replace S by the {0, 1}` on the right hand side. This will
follow by the following general lemma
Lemma 5. Let X be a random variable, c > 1 be a constant and S be an event
of probability P(S) > c−1 . Then
                  p            p                  p         √
        E[X|S] + Var[X|S] · cP(S) − 1 6 E[X] + Var[X] · c − 1          (5)

The proof follows by a few algebraic manipulations and is omitted.
8
   Which essentially means that the probability that an attacker break the key is con-
   centrated over keys
 9
   For the uniformly chosen key this doesn’t help the adversary, at least in the nonuni-
   form model
10
   A family H functions from n to m bits is universal if Prh←H [h(x) = h(x0 )] = 2−m
   for x 6= x0
                   Metric Pseudoentropy: Characterizations and Applications         101

4.4   Some Further Applications
Lower bounds on square security. Using the characterization from
Theorem 3 one can derive some non-trivial lower bounds on square-security
needed for key derivation. We discuss this problem in a separate paper.


References
 1. Barak, B., Dodis, Y., Krawczyk, H., Pereira, O., Pietrzak, K., Standaert, F.-X., Yu,
    Y.: Leftover hash lemma, revisited. Cryptology ePrint Archive, Report 2011/088
    http://eprint. iacr.org/ (2011)
 2. Fuller, B., Reyzin, L.: A unified approach to deterministic encryption: New con-
    structions and a connection to computational entropy. In TCC 2012, vol, 7194 of
    LNCS, pp. 582–599, Springer (2012)
 3. Barak, B., Shaltiel, R., Wigderson, A.: Computational analogues of entropy. In:
    Arora, S. et al. (eds.) RANDOM-APPROX, Lecture Notes in Computer Science,
    vol. 2764, pp. 200–215, Springer (2003)
 4. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press,
    New York, NY, USA (2004)
 5. Chung, K.-M., Kalai, Y.T., Liu, F.-H., Raz, R.: Memory delegation. Cryptology
    ePrint Archive, Report 2011/273, http: //eprint.iacr.org/ (2011)
 6. Dodis, Y., Ostrovsky, R., Reyzin, L., Smith, A.: Fuzzy extractors: How to generate
    strong keys from biometrics and other noisy data. SIAM J. Comput. 38, No. 1,
    97–139 (2008)
 7. Dziembowski, S., Pietrzak, K.: Leakage-resilient cryptography in the standard
    model. IACR Cryptology ePrint Archive 2008, p. 240 (2008)
 8. Gentry, C., Wichs, D.: Separating succinct non-interactive argu- ments from
    all falsifiable assumptions. Cryptology ePrint Archive, Report 2010/610,
    http://eprint.iacr.org/ (2010)
 9. Hastad, J., Impagliazzo, R., Levin, L.A., Luby, M.: A pseudorandom generator
    from any one-way function. SIAM J. Comput. 28, No. 4, 1364–1396 (1999)
10. Hsiao, Ch.-Y., Lu, Ch.-J., Reyzin, L.: Conditional computational entropy, or to-
    ward separating pseudoentropy from compressibility. In Proceedings of the 26th
    annual international conference on Advances in Cryptology (Berlin, Heidelberg),
    EUROCRYPT ’07, Springer-Verlag, pp. 169–186 (2007)
11. Krenn, S., Pietrzak, K., Wadia, A.: A counterexample to the chain rule for condi-
    tional hill entropy. In: Sahai, A. (ed.) Theory of Cryptography, Lecture Notes in
    Computer Science, vol. 7785, pp. 23–39, Springer Berlin Heidelberg (2013)
12. Reyzin, L.: Some notions of entropy for cryptography. In: Fehr, S. (ed.) Information
    Theoretic Security, Lecture Notes in Computer Science, vol. 6673, pp. 138–142,
    Springer Berlin Heidelberg (2011)
13. Radhakrishnan, J., Ta-Shma, A.: Bounds for dispersers, extractors, and depth-two
    superconcentrators. In SIAM JOURNAL ON DIS- CRETE MATHEMATICS 13
    (2000), Metric Pseudoentropy: Characterizations and Applications 13, (2000)
14. Reingold, O., Trevisan, L., Tulsiani, M., Vadhan, S.: Dense subsets of pseudoran-
    dom sets. In Proceedings of the 2008 49th Annual IEEE Symposium on Founda-
    tions of Computer Science (Washington, DC, USA), FOCS ’08, pp. 76–85, IEEE
    Computer Society (2008)
15. Shannon, C.E.: A mathematical theory of communication. In Bell system technical
    journal 27 (1948)
102     M. Skorski

16. Trevisan, L., Tulsiani, M., Vadhan, S.: Regularity, boosting, and efficiently simu-
    lating every high-entropy distribution. In Proceedings of the 2009 24th Annual
    IEEE Conference on Computational Complexity (Washington, DC), CCC ’09,
    pp. 126–136, IEEE Computer Society (2009)
17. Tao, T., Ziegler, T.: The primes contain arbitrarily long polynomial progressions.
    In Acta Mathematica (English), 201:2, 213–305 (2008)
18. Vadhan, S., Zheng, C.J.: Characterizing pseudoentropy and simplifying pseudo-
    random generator constructions. In Proceedings of the 44th symposium on Theory
    of Computing (New York), pp. 817–836, STOC ’12, ACM (2012)