<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linear-time exact sampling of sum-constrained random variables</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Andrea Sportiello</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Frederique Bassino</institution>
        </aff>
      </contrib-group>
      <fpage>94</fpage>
      <lpage>105</lpage>
      <abstract>
        <p>In the problem of exactly-sampling from a probability distribution, one is often led to sample from a measure of the form (x1; : : : ; xn) / Qi fi(xi) x1+ +xn;m, with fi's integer-valued distributions. When the fi's are all equal, an algorithm of Devroye essentially solves this problem in linear time. However, in several important cases these fi's are in the same family, but have distinct parameters. A typical example is the sampling of random set partitions, related to a set of fi's which are geometric distributions with di erent averages. We describe here a simple algorithmic solution for a large family of n-tuples of functions. At this aim we provide the notion of \positive decomposition" of one probability distribution into another.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>x1+ +xn;m
and whose average complexity is linear in n, the constant depending only on F and on the ratio m=n.
This problem is essentially solved, by Devroye [Dev12], when the distributions fi are all equal. However his
ideas do not apply in the generic case, and it is often the case in real-life applications that these distributions
are indeed in the same family, but have distinct underlying parameters (see the examples in Section 4).
(1)
where
2
2.1
g(x) is a `simple' probability distribution (for example, g = Bern 1 );
2
the qi(s) are non-negative, and, seen as probability distributions for the variable s, can be sampled e ciently;
call 2 = Pi Varfi ; we need pm=
=</p>
      <p>(1).</p>
      <p>In this case, as we will show in the following, we can sample in linear time the n-tuple s = (s1; : : : ; sn), from
the distribution q(s) = Qi qi(si), apply a rejection scheme with average acceptance rate related to the quantity
pm= above (and thus of order 1), and nally sample x from the measure m(xjs) = Qi g si (xi) x1+ +xn;m,
this latter operation can be performed e ciently because the function g is `simple', and it enters this nal stage
just with an n-fold convolution. Thus, the resulting algorithm has linear average complexity.</p>
      <p>The understanding of which functions f admit a decomposition as in equation (2) turns out to be an interesting
question per se. We outline here the main results, for which more details will be given in a forthcoming paper.
Then, we provide a detailed description of the algorithm, justify the claim on its complexity, and illustrate it
on some relevant examples. In particular, we provide a linear-time algorithm for the uniform sampling of set
partitions of a set of cardinality n + m into exactly n parts. This marks an improvement w.r.t. previous results
of the authors [BS15] (indeed this application has been the main motivation for this work), and, by previous
results [BN07, BDS12], implies a linear exact sampling algorithm for uniform accessible deterministic complete
automata (ACDA), and for minimal automata, in any given alphabet of nite size.</p>
    </sec>
    <sec id="sec-2">
      <title>On positive decomposition</title>
      <p>Positive decomposition of integer-valued probability distributions
In this section we introduce our `fundamental' problem in Probability. As we have anticipated, for implementing
our algorithm we are faced with the following basic construction. Let F and G = fgs(x)g be two families of
integer-valued probability distributions.</p>
      <p>De nition 2.1. We say that F decomposes in G if all f 2 F are linear combinations of functions in G:</p>
      <p>The main idea of our algorithm is as follows: suppose that you can decompose 1
fi(x) = X qi(s) g s(x)</p>
      <p>s2Nd</p>
      <p>We are now ready to state our problem.</p>
      <p>Problem 2.5. For a given function g, which functions f decompose positively in G = Conv[g]?
A rst useful property is the following
1For s
0, here we use g s as a shortcut for the `convolution power', i.e. the s-fold convolution of g.
(2)
(3)
(4)
Proposition 2.6. If f1 and f2 decompose positively in the same G = Conv[g], also f1 f2 does, and the resulting
set of coe cients, seen as a probability distribution, is the convolution of the two sets.</p>
      <p>(f1 f2)(x) =
When G is a convolution family, the average and variance of gs are linear in s, and we have</p>
      <p>Ef (x) =
Varf (x) =</p>
      <p>X xf (x) =
x</p>
      <p>X q(s) X xgs(x) =
s x</p>
      <p>X q(s) X xg s(x) =
s x</p>
      <p>X q(s)s Eg(x) = Eq(s) Eg(x) ;
s
X x(x
x;x0
x0)f (x)f (x0) =</p>
      <p>X q(s)q(s0) X x(x
s;s0
x;x0</p>
      <p>x0)gs(x)gs0 (x0)
= X q(s)q(s0) s Varg(x) + s(s
s;s0</p>
      <sec id="sec-2-1">
        <title>Solving for Eq(s) and Varq(s) gives</title>
        <p>which implies in particular that
Eq(s) =</p>
        <p>Ef (x)
Eg(x)
;
Varq(s) =</p>
      </sec>
      <sec id="sec-2-2">
        <title>Varf (x)</title>
        <p>Eg(x)2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Ef (x)Varg(x)</title>
        <p>Eg(x)3
;
Varf (x)
Ef (x)</p>
      </sec>
      <sec id="sec-2-4">
        <title>Varg(x)</title>
        <p>Eg(x)
:
Equation (9) implies that, if F = Conv[f ], G = Conv[g], F / G and G / F , then f = g and F = G. In other
words, the symbol / is an order relation among convolution families, which thus form a semilattice, in which the
top element is Conv[ x;1].2</p>
        <p>Write fe(y) = Px yxf (x) for the generating function associated to f . It is well known that gfs = (ge)s, and
thus
fe(y) =</p>
        <p>X yxf (x) = X X q(s) yxgs(x) =
x x s</p>
        <p>X q(s)gfs(y) =
s</p>
        <p>X q(s)(g(y))s = qe(ge(y)) :</p>
        <p>e
s
Call y(z) the function such that g(y(z)) = z (a solution to g(y(z)) = 1 + z exists at least as a formal power series,
y(z) = 1 + Pn 1 aizi, and we caen set y(z) = y(z 1)). Ase a result, we have the formal rule
q(s) = [zs]fe(y(z)) =</p>
        <p>I</p>
        <p>dz
2 izs+1 fe(y(z)) :
This formula is valid provided that fe(y(z)) is analytic in a neighbourhood of z = 0. This is a non-trivial
condition, as analyticity is guaranteed under weak hypotheses only in a neighbourhood of z = 1.</p>
        <p>Let now F = Conv[f ] and G = Conv[g]. In light of Proposition 2.6, a necessary and su cient condition for
F / G is that f decomposes positively in G, that, from (11) above, means that fe(y(z)) shall be analytic in a
neighbourhood of z = 0, and have all non-negative Taylor coe cients.</p>
        <p>This is our best answer to the question posed by Problem 2.5.
2.2</p>
        <p>Three fundamental distributions
There are three fundamental examples of basic functions g, to be used for constructing convolution families
G = Conv[g], corresponding to the three fundamental statistics of identically-distributed particles in statistical
physics. We start with two of them:</p>
        <p>2Because Conv[ x;1] = f s;xgs2N is just the canonical basis for distributions on N, and any distribution decomposes positively in
it, so we have proven that this poset is at least a semilattice, and we have identi ed its top element.
(5)
(6)
(7)
(8)
(9)
(10)
(11)
Fermionic statistics: For b 2 (0; 1], the family GbFermi is de ned by
gs(r) = Binos;b(r) = Bernbs(r) = br(1
b)s r s
r
:
The functions gs are Binomial distributions, that is, s-fold convolutions of Bernoulli random variables of
parameter b. The average and variance of the Bernoulli distribution of parameter b are</p>
        <p>Eg(r) = b ;</p>
        <p>Varg(r) = b(1
b) :</p>
        <p>At b = 1 this set of functions reduces to the canonical basis gs(r) = s;r.</p>
        <p>Bosonic statistics: For b 2 R+, the family GbBose is de ned as
The functions gs are s-fold convolutions of Geometric random variables of parameter b. The average and
variance of the geometric distribution of parameter b are</p>
        <p>Eg(r) = b ;</p>
        <p>Varg(r) = b(1 + b) :
tDhueeottohetrh:e Gobevomious;sb(fra)ct= nkBin=o (s; 1b)(kr). n+Akks 1a ,retshuelste, ttwheo pfaomsiitliivees adreecoimnpfaocstititohne fan=alyPtics cqo(sn)tginus,atiinonthoenetwoof
cases of binomial and geometric distributions, have the same limit b ! 0, related to the Poissonian distribution
Poiss (x) = e x=x!. In this case the discretisation is lost, as well as the parameter b, and we have a di erent
notion of positive decomposition, namely one of the form f decomposes positively in Conv[Poiss] i
f (x) =</p>
        <p>dt q(t) Poisst(x)
with q(t) : R+ ! R+.3</p>
        <p>The possibility of having an integral, instead of a sum, is obviously related to the fact that the Poissonian
is in nitely divisible, and would hold for all and only the in nitely divisible distributions. This leads us to our
third fundamental statistics:
Classical statistics: The family GClass is de ned as</p>
      </sec>
      <sec id="sec-2-5">
        <title>The average and variance of gt are</title>
        <p>gt(r) = Poisst(r) = e t
tr
r!</p>
        <p>:
Egt (r) = t ;</p>
        <p>
          Vargt (r) = t :
(
          <xref ref-type="bibr" rid="ref3 ref6">12</xref>
          )
(13)
(14)
(
          <xref ref-type="bibr" rid="ref1 ref5">15</xref>
          )
(16)
(
          <xref ref-type="bibr" rid="ref2">17</xref>
          )
(18)
The analogue of (11) states in this case that q(t) is the inverse Laplace transform of fe(y + 1), and the
condition for f to be positively decomposable in Conv[Poiss] is that this function is both supported on and
valued in R+.
        </p>
        <p>By investigating the expressions fe(y(g)), we can identify the restriction to these families of the poset induced by
e
the order F / G. In particular, for our three families of distributions we have the tables
g(y)
e
1 + b(y</p>
        <p>1)</p>
      </sec>
      <sec id="sec-2-6">
        <title>Binob</title>
      </sec>
      <sec id="sec-2-7">
        <title>Geomb (1</title>
        <p>b(y
1)) 1
Poiss
exp(y
1)
bz
y(ge = z)
b 1 + z
b
1 + z
bz
1 + ln(z)
g
fe n e</p>
      </sec>
      <sec id="sec-2-8">
        <title>Binoa</title>
      </sec>
      <sec id="sec-2-9">
        <title>Geoma</title>
      </sec>
      <sec id="sec-2-10">
        <title>Poiss exp</title>
        <p>Binob</p>
        <p>+ z
b</p>
        <p>a
b</p>
        <p>b
(a + b)
1
b
(z
a
b
az
1)</p>
        <p>a
exp</p>
        <p>Geomb
1
b
bz
(a
1
b)z
1
z</p>
        <p>Poiss
1
1
a ln z
3We will often write just Poiss, with no parameter, contrarily to Binob and Geomb. When referring to a family G, we mean a
decomposition as in (16), while, when referring to a function, we mean Poiss Poiss1.
(obvious entries are omitted). The case of Poiss shall be treated separately, by explicit calculation, because of
the variation in the criterium (11), which involves integrals instead of sums. In summary, we have,
Binoa(x) =
Poissa(x) =
s
e a Poisss(x) =</p>
        <p>X Geom a b (s
s 1 a</p>
      </sec>
      <sec id="sec-2-11">
        <title>1) Geoms;b(x)</title>
        <p>(19)
(20)
(21)
(22)
(23)
As a result we have a rather simple picture of the associated portion of our semilattice: the restriction of
the partial order to these families is indeed a total order, isomorphic to an interval 2 ( 1; 1], through the
identi cation:</p>
        <p>Geom</p>
      </sec>
      <sec id="sec-2-12">
        <title>Bino Poiss</title>
        <p>
          s
0
1
Recall that, as implied by equation (9), the order relation must be compatible with the total order given by
Varg=Eg. Referring back to the explicit expressions (13), (
          <xref ref-type="bibr" rid="ref1 ref5">15</xref>
          ) and (18), we see that in this parametrisation we
just have Varg=Eg = 1 for our three families of distributions.
        </p>
        <p>Having identi ed this ordering is important for our algorithm. Indeed, in our problem we are faced with the
question if a given function f is positively decomposable in some family Conv[g]. The transitivity property of
positive decomposition implies that, within our three families of distributions, the set of functions g for which
f is positively decomposable in the family Conv[g] is an up-set of the lattice, and thus, under the identi cation
above, an interval [ min(f ); 1] for some min(f ) 1.</p>
        <p>Analogously, for a family F of functions we de ne
min(F ) := max
f2F
min(f ) :
We have thus established that all the functions in the family F can be positively decomposed by any family
G = Conv[g], for g being one of our fundamental distributions, with parameter min(F ) 1, and under the
identi cation above.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The algorithm</title>
      <sec id="sec-3-1">
        <title>The rejection paradigm</title>
        <p>There exists a correspondence between measure decompositions and rejection schemes in exact-sampling
algorithms, namely, given a decomposition of a measure
(x) / X 1(y) 2(xjy) a(y)</p>
        <p>y
with a(y) 2 [0; 1], Algorithm 1 is an exact-sampling algorithm for .</p>
        <sec id="sec-3-1-1">
          <title>Algorithm 1: Rejection paradigm.</title>
          <p>begin
repeat
y
1;</p>
          <p>Berna(y);
until = 1;
x 2( jy);
return x</p>
          <p>Call T1(y) and T2(y) the average complexities for sampling from 1 when the outcome is y (plus sampling
one Bernoulli with parameter a(y)), and for sampling from 2( jy), respectively. The number of failed runs
in the repeat-until loop is geometrically-distributed, and the complexity of each of these runs is an independent
random variable. As a result, the average complexities of the failed runs of the loop, of the successful run of the
loop, and of the unique run of the nal instruction just add up, and the average complexity of the full algorithm
is easily calculated via some surprising cancellations
1</p>
          <p>Py 1(y)a(y) Py T1(y) 1(y)(1
Py 1(y)a(y) Py 1(y)(1</p>
          <p>a(y))
a(y))
+</p>
          <p>Py T1(y) 1(y)a(y)</p>
          <p>Py 1(y)a(y)
+</p>
          <p>Py T2(y) 1(y)a(y)</p>
          <p>Py 1(y)a(y)
where averages E( ) are taken w.r.t. 1. If the functions T1(y) and T2(y) are bounded by the constants T1 and
T2, we get the bound</p>
          <p>T</p>
          <p>T1
E(a)
+ T2 :
Of course, the inequality above is an equality if the functions T1(y) and T2(y) are indeed constants.
3.2</p>
          <p>Why the algorithm has linear complexity
First, let us address the question of why we shall hope that linear complexity can be achieved, without
concentrating on the technical detail of how a single variable can be sampled with (1) complexity.</p>
          <p>Following the framework described in the introduction, n is a `large' positive integer, and f = (f1; : : : ; fn) is
a n-tuple of integer-valued probability distributions, taken in a family F of functions satisfying the hypotheses
of the Central Limit Theorem, and such that min(F ) &lt; 1.</p>
          <p>Let xi be the random variable associated to the function fi, and suppose that m = Pi E(xi) is an integer.
Under our hypotheses, it is trivial to sample in linear time from the unconstrained measure
8 BinoN;b(m) if g = Binob
PgN (m) = g N (m) = &lt; PoissN (m) if g = Poiss
: GeomN;b(m) if g = Geomb
(24)
(25)
(26)
(27)
(29)
(30)
(31)
however we want an algorithm which samples from the constrained measure
(f1;:::;fn)(x1; : : : ; xn) = (f)(x) = Y fi(xi) ;</p>
          <p>i
(mf)(x) /
with average complexity which is linear in n, the constant depending only on F and on the ratio m=n.</p>
          <p>Call Pf (m0) the probability, in the unconstrained measure, that Pi xi = m0. By our CLT hypothesis, this
probability is maximal around m0 = m, has variance which is linear in n, and thus its value is of order 1=pn at
m0 = m. By de nition,</p>
          <p>Pf (m0) = x1+ +xn;m0 : (28)
x Qi fi(xi)
Now, choose 2 [ min(F ); 1), and decompose the fi's in the family G = Conv[g], for g the fundamental
distribution corresponding to b. Call qi(s) the corresponding expansion coe cients. We use a sum notation,
with the disclaimer that for Poiss sums are replaced by integrals.</p>
          <p>Thus we can write (call gs = (gs1 ; : : : ; gsn ) and q = (q1; : : : ; qn))
(f)(x) = Y fi(xi) = X Y qi(si)gsi (xi) = X (q)(s) (gs)(x) ;</p>
          <p>
            i s i s
and
(mf0)(x)Pf (m0) = Y fi(xi) Pixi;m0 = X Y qi(si)gsi (xi) Pixi;m0 = X (q)(s) (mg0s)(x)Pgs (m0) :
i s i s
Now, call N = N (s) = Pi si. As gs = g s, in fact Pgs (m0) depends on s only through N (s), and is
In all three cases, we can easily evaluate N := argmaxN (PgN (m)) (the distributions above are easily seen
to be log-concave in N , by direct calculation on (
            <xref ref-type="bibr" rid="ref3 ref6">12</xref>
            ), (14) and (
            <xref ref-type="bibr" rid="ref2">17</xref>
            )), thus it su ces to analyse the ratios
PgN (m)=PgN 1 (m)). We get
          </p>
          <p>N = &lt;8 b b c</p>
          <p>m
b(1</p>
          <p>m
: d b e</p>
          <p>if g = Binob
e m1 ) 1c if g = Poiss</p>
          <p>if g = Geomb
i</p>
          <p>i
N = X(Eqi ) = X (Efi ) =</p>
          <p>Eg
m
Eg
;
Note that, in all three cases, when m is large, this corresponds to N ' m=Eg. This is in agreement with (8), as
indeed we have
and the fact that, as by CLT we converge to a normal distribution, average and mode are equal at leading order.
Combining the previous results, we can write
(f1;:::;fn)(x) /
m</p>
          <p>X (q1;:::;qn)(s) (mgs1 ;:::;gsn )(x) PgN(s) (m) :</p>
          <p>s PgN (m)
algorithm as in Section 3.1. Indeed, at this point we can apply Algorithm 1 with the identi cations
By de nition of N , the ratio a(N ) := PPggNN ((mm)) is in [0; 1], and thus can be used as the acceptance rate in a rejection
x x
y s
a(y) PgN(s) (m)=PgN (m)
(x)
1(y)
2(xjy)
(f1;:::;fn)(x)
m
(q1;:::;qn)(s)
(mgs1 ;:::;gsn )(x)
Let us neglect the dependence of the complexities T1(s) and T2(s) from s, and use directly equation (25). Let
us assume that we have determined that T1 and T2 are linear in n. We shall thus calculate the acceptance rate
E(a) of that formula, and veri es that it scales as (1) for large n. In this setting, this reads
E(a) = X (q1;:::;qn)(s) PgN(s) (m) :</p>
          <p>s PgN (m)
E(a) = X ^(N ) PgN (m) = E^(PgN (m)) :</p>
          <p>N PgN (m) PgN (m)
In other words, call ^(N ) the distribution on N = N (s) induced by (q1;:::;qn)(s). Then we have
This already makes clear why our strategy is successful. When n tends to in nity, because of our CLT hypothesis,
^ converges to a normalised Gaussian, with average N and some variance 12 of order n. On the other side,
PgN (m)=PgN (m) converges to a non-normalised Gaussian, with average N and variance 22 of order n. This
Gaussian has been rescaled as to be valued 1 when N = N . As a result, we just have</p>
          <p>E(a) '</p>
          <p>Z
dx p
The resulting expression is homogeneous, so that it evaluates to a constant of order 1 for a whatever scaling in
n of 12 and 22 (provided that the two variances have the same scaling). In this case the average complexity is
and thus, by our assumptions, is linear.</p>
          <p>Note that we would have obtained an expression still of order 1, even if we did take a \wrong" tuning of m,
in the sense that Pi Efi = m + m, with m 1, provided that m is small w.r.t. 1 and 2 (which, under the
hypotheses of the Central Limit Theorem, are of order pn). That is, in the language of Boltzmann samplers, we
have a tolerance on the tuning of the Boltzmann parameter up to a relative error on the average outcome size
of order 1=pn.
(32)
(33)
(34)
(35)
(36)
(37)
(38)</p>
          <p>
            Now let us relate the parameters 1 and 2 to the parameters of the problem. One case is rather easy: as the
variances sum up under convolution,
For what concerns 2, we shall calculate the variance in the variable N of the function g N (m). From the
calculation on the explicit expressions (
            <xref ref-type="bibr" rid="ref3 ref6">12</xref>
            ), (14) and (
            <xref ref-type="bibr" rid="ref2">17</xref>
            )) one easily derives that, in all three cases, 22 = m VEa3grg .
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Now, using (8) we get so that</title>
          <p>If we take as function g the one with parameter min(F ) (which we have determined to be optimal), the asymptotic
maximal acceptance rate for the set F = ff1; : : : ; fng within our algorithm is given by
Under our assumption that the fi's satisfy the hypotheses of the Central Limit Theorem, we have that all Efi 's
and Varfi 's are of order 1, so that m = Pi Efi and Pi Varfi have the same scaling behaviour. Once again we
have obtained that, for n going to in nity, a(F ) converges to a nite quantity.
3.3</p>
          <p>How the algorithm works in practice
At the level of generality of this paper, we cannot give an algorithm that works just out of the box. For a given
problem, given by the n-tuple (f1; : : : ; fn), a preliminary analytical study of the functions fi is mandatory. One
has to</p>
          <p>Determine that F / G for G = Conv[g ], for some parameter
smaller than 1.</p>
          <p>Devise an algorithm to sample from the corresponding distributions qi(s), in average time i, so that Pi i
scales linearly with n.</p>
          <p>Only at this point, one is ready to apply our general algorithm, which reads as in Algorithm 2.
(39)
(40)
(41)
(42)</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Algorithm 2: Our new algorithm.</title>
          <p>begin</p>
          <p>N = round(m=Eg);
repeat</p>
          <p>N = 0;
for i 1 to n do
si qi;</p>
          <p>N += si;
until
Sample x with
return x
a = g N (m)=g N (m);</p>
          <p>Berna;
= 1;</p>
          <p>(mgs)(x);</p>
          <p>Contrarily to the rst part of the algorithm, where a preliminary analysis concerning the functions qi is
required, the nal step of the algorithm, of sampling x with (mgs)(x), is \universal", as it depends only on the
family of basic distribution g , among binomial, geometric or Poissonian.</p>
          <p>The Poissonian case is trivial in spirit (it su ces to sample m independent values i 2 [0; N ], and then
increment xi if 0 &lt; i Pij=11 sj &lt; si), however, as now the intermediate variables si are real-valued, must be
performed with oat numbers and has complexity O(m ln n). In the special case in which the si are
integervalued, that is the qi's are supported on bN for some b, we have a genuinely linear alternate algorithm, which we
will describe elsewhere.</p>
          <p>The binomial and geometric case are in a sense `dual', and are solved by the same algorithm, provided by
Bacher, Bodini, Hollender and Lumbroso in [BBHL15, sec. 1.1] (see also [BBHT17]), which is genuinely linear.
This algorithm, that we shall call \BBHL-shu ing(n; k)", may be seen as a black box which samples strings
= ( 1; : : : ; n) 2 f0; 1gn with Pi i = k, uniformly, and that, for n=k 1 = (1), requires a time linear in n.4
Then, in the two cases we have:
Case g N (m) = BinoN;b(m): Use BBHL-shu ing(N; m), then set
xi = j(i 1)+1 +
+ j(i) ;
j(i) = s1 + s2 +
+ si :
(43)
Case g N (m) = GeomN;b(m): Use BBHL-shu ing(m + N
which are valued 0. Let j0 = 0. Then
1; m). Call j1, . . . , jN 1 the positions of the i's
xi = ji
ji 1 :
(44)
3.4</p>
          <p>Bacher, Bodini, Hollender and Lumbroso algorithm
Here we describe an algorithm, given by Bacher, Bodini, Hollender and Lumbroso in [BBHL15, sec. 1.1], for the
uniform sampling of strings = ( 1; : : : ; n) 2 f0; 1gn with Pi i = k. In Section 3.3 this algorithm is called
\BBHL-shu ing(n; k)". Recall that it is `optimal' for the randomness resource, in the sense that the random-bit
complexity at leading order saturates the Shannon bound.</p>
          <p>Call = k=n. The idea is that you sample the i's one by one, as if were to be sampled with the measure
( ) = Bernn, (this costs (1) per variable), and, as soon as you have an excess of i's equal to 0 or to 1, complete
with a sequence of Fisher{Yates random swaps. These swaps cost ln n each, but are performed on average
only pn times, so the swap part of the algorithm has subleading complexity, and the overall complexity is
genuinely linear, with no logarithmic factors. For completeness, we give in Algorithm 3 a summary of the main
features. In this algorithm RndIntj returns an uniform random integer in f1; : : : ; jg.</p>
          <p>Algorithm 3: BBHL-shu ing.</p>
          <p>begin
a = k; b = n
repeat
i ++;</p>
          <p>k; i = 0;
i Bern ;
if i = 1 then a --;
else b --;
until a &lt; 0 or b &lt; 0;
if a &lt; 0 then = 0;
else = 1;
for j i to n do
j = ;
h RndIntj ;
swap j and h;
return</p>
          <p>Note that, if is almost a 2-adic number, namely = k=n = a2 d + " for some integers a and d, and
" = O(1=pn), it is convenient to just use the source of randomness Bern 0 for the value 0 = a2 d, which speeds
up the main part of the algorithm, at the price of slowing down the subleading part.</p>
          <p>4For completeness, this algorithm is quickly described in the following subsection.
The drawback of the algorithm outlined in Section 3.3 is that the user needs to perform a preliminary analytic
study of its given probability distributions fi. There are obvious cases in which this work is not required: when
all of the fi's are distributions in our three basic families, i.e. binomial, geometric or Poissonian. In this case
min(F ) is evaluated directly, the functions qi(s) are given explicitly in Section 2.2, and are just, yet again,
binomial, geometric or Poissonian (we provide the generating function for the basic cases, Bernb = Bino1;b and
Geomb = Geom1;b, however the generic case is deduced immediately via Proposition 2.6). It is well known that
one can sample from these distributions in constant complexity, so that we know that we have a linear-time
algorithm as soon as we verify that Varfi =(Efi (1 )) = (1), which is straightforward in most cases.</p>
          <p>The paradigm of sum-constrained geometric random variables with distinct parameters applies, for example,
to set partitions, enumerated by the Stirling numbers of the second kind. The same paradigm with Bernoulli
variables applies, for example, to permutations with a given number of cycles, enumerated by the Stirling numbers
of the rst kind. In this section we describe in detail these two examples.
4.1</p>
          <p>Set partitions, Stirling numbers of the second kind and Automata
Each of the Pn partitions B of a set of cardinality n has a number k(B) of parts in the range f1; 2; : : : ; ng.
The Stirling numbers of the second kind S2(n; k) are the corresponding enumeration, and a classical algorithmic
problem is to sample B uniformly from the corresponding ensemble S2(n; k) [FS09].</p>
          <p>This problem has acquired further interest since it has been shown, in [BN07], that accessible deterministic
complete automata (ACDA) are in bijection with a certain subset of set partitions in S2(kn + 1; n) (where
k = (1) is the size of the alphabet, and n is the number of states), characterised by a property easy to test
(the `backbone' should stay above the diagonal), and that this subset is asymptotically a nite fraction of the
ensemble. As a result, up to a rejection scheme, a uniform sampling algorithm for set partitions provides a
uniform sampling algorithm for accessible deterministic complete automata.</p>
          <p>Furthermore, in [BDS12] it is shown that minimal automata, which are at sight a subset of ACDA's, are
asymptotically dense in this ensemble (i.e., the ratio of the cardinalities of the sets converges to 1), so, yet again,
up to a rejection scheme, a uniform sampling algorithm for set partitions provides a uniform sampling algorithm
for minimal automata.</p>
          <p>Sampling from S2(n + m; n), when m=n is (1), can be done by the Boltzmann Method [DFLS04] with
complexity O(n 23 ), and we show here that our new method gives a linear complexity. The recursive description
of S2(n; k) provides a natural bijection with certain tableaux, which are described in [FS09, pag. 63], and, in
more detail, in [BDS12], for which the so-called `backbone' relates to a list x = (x1; : : : ; xn) as in our setting,
with (f)(x) / Qi Geombi (xi) x1+ +xn;n k. As a result, a uniform set partition in S2(n + m; n) can be
sampled in two steps: rst we sample the backbone x = (x1; : : : ; xn) with the measure above, then, for every
1 i n, we sample xi independent random integers in f1; : : : ; ig, which describe the so-called `wiring part'
of the tableau. The latter step has a complexity O(m ln n), which is intrinsic to this problem and is in fact
implied by the Shannon bound (i.e., the asymptotics of ln S2( n; n), as evinced by well-known saddle-point
calculations, see e.g. [BN07]). As this latter step is invoked only once in the algorithm, it turns out that this
part of the algorithm is optimal for the randomness resource. As, in our algorithm, the rst part of sampling
the backbone has complexity linear in n and o from optimal only by a multiplicative factor, we deduce that
the whole algorithm is asymptotically optimal (although, admittedly, the convergence to optimality is slow, as
it goes as 1= ln n).</p>
          <p>The parameters bi of the geometric variables are tuned as to have bi=(bi + 1) / i and Pi E(xi) = m + o(pm),
which is solved by
bi =
n
!i
!i
;
1 +</p>
          <p>=
m
n
ln(1
!
!)
:
Then, using the fact, seen in Section 2.2, that Geoma(x) = Ps Geom a+ab (s) Binos;b(x), and choosing for simplicity
= 12 , we have</p>
          <p>Geombi (x) =</p>
          <p>X Geom 1 (s) Binos; 21 (x) =
s 1+2bi</p>
          <p>X Geom nn+!!ii (s) Binos; 12 (x)</p>
          <p>s
(45)
(46)</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>However, a =</title>
          <p>r</p>
          <p>m
2 Pi bi(1 + bi)</p>
          <p>:
X bi(1 + bi) ' n
i</p>
          <p>Z 1
0
dx
(1
!x
!x)2 = n
ln(1</p>
          <p>!)
!
+</p>
          <p>1
1
!
m=n small, and as p 2mn e 2mn for m=n large.
for
! 1, so that a
Each of the n! permutations of Sn has a number k( ) of cycles in the range f1; 2; : : : ; ng. The Stirling numbers
of the rst kind S1(n; k) are the corresponding enumeration, and a classical algorithmic problem is to sample
uniformly from the corresponding ensemble S1(n; k) [FS09, pag. 121]. When n=k 1 is (1), this problem is
solved by Boltzmann sampling [DFLS04] with complexity O(n 32 ), and we show here that our new method gives
a linear complexity.</p>
          <p>The recursive description of S1(n; k), namely S1(n; k) ' S1(n 1; k 1) [ S1(n 1; k) f1; : : : ; n 1g, provides
a natural bijection with certain 0-1 matrices Mij , for which the diagonal relates to a list x = (x1; : : : ; xn) as in
our setting, with xi = 1 Mii and (f)(x) / Qi Bernbi (xi) x1+ +xn;n k. The parameters bi are tuned as to
have bi=(1 bi) / i 1 and m := n k = Pi E(xi) + o(pn), with solution
bi =</p>
          <p>!(i
n + !(i
1)
1)
;
1
m
n
=
ln(1 + !)
!
:
Then, using the fact, seen in Section 2.2, that Binoa(x) = Ps Bino ab (s) Binos;b(x), this expansion being positive
for a b, and choosing = 1+!! maxi bi, we have
Also, N = 2m. At this point we are ready to use our Algorithm 2, with this value of N , functions qi given by
Geom nn+!!ii , and acceptance factor
a(N ) =
g N (m)
g N (m) = 22m N
(N</p>
          <p>N ! m!
m)! (2m)!
and the part \Sample x with (mgs)(x)" is performed via the BBHL-shu ing(N; m) with parameter = 21 . The
acceptance rate a = E(a(N )) is given by (42), just with min = 0 traded for = 21 , which we have chosen as a
good compromise between e ciency and complexity of coding, and gives
Also, N = bm= c. At this point we are ready to use our Algorithm 2, with this value of N , functions qi given
by Bino (i 1)(1+!) , and acceptance factor
n+(i 1)!</p>
          <p>Binobi (x) =</p>
          <p>X Bino (i 1)(1+!) (s) Binos; (x)</p>
          <p>s n+(i 1)!
a(N ) =
g N (m)
g N (m)
= (1
)N N N ! (N
(N</p>
          <p>m)!
m)! N !
(47)
(48)
(49)
(50)
(51)</p>
          <p>for
(52)
(53)
(54)</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Acknowledgements</title>
        <p>A.S. is supported by ANRMetaConc (ANR-15-CE40-0014, France).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [BBHL15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hollender</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Lumbroso.</surname>
          </string-name>
          <article-title>MergeShu e: A Very Fast, Parallel Random Permutation Algorithm</article-title>
          . At https://arxiv.org/abs/1508.03167,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [BBHT17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodini</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          K. Hwang and
          <string-name>
            <given-names>T.-H.</given-names>
            <surname>Tsai</surname>
          </string-name>
          .
          <article-title>Generating random permutations by coin tossing: classical algorithms, new analysis, and modern implementation</article-title>
          .
          <source>ACM Transactions on Algorithms</source>
          ,
          <volume>13</volume>
          :#24,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BDS12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bassino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>David</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sportiello</surname>
          </string-name>
          .
          <article-title>Asymptotic enumeration of minimal automata</article-title>
          .
          <source>In: STACS</source>
          <year>2012</year>
          , Paris, France, LIPIcs { Leibniz International Proceedings in Informatics,
          <volume>14</volume>
          :
          <fpage>88</fpage>
          {
          <fpage>99</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [BN07]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bassino</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Nicaud</surname>
          </string-name>
          .
          <article-title>Enumeration and Random Generation of Accessible Automata</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>381</volume>
          :
          <fpage>86</fpage>
          {
          <fpage>104</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [BS15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bassino</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sportiello</surname>
          </string-name>
          .
          <article-title>Linear-time generation of inhomogenous random directed walks</article-title>
          .
          <source>In: ANALCO'15</source>
          , San Diego, California, pp.
          <volume>51</volume>
          {
          <issue>65</issue>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Dev12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Devroye</surname>
          </string-name>
          .
          <article-title>Simulating Size-constrained Galton-Watson Trees</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>41</volume>
          :1{
          <fpage>11</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [DFLS04]
          <string-name>
            <given-names>P.</given-names>
            <surname>Duchon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flajolet</surname>
          </string-name>
          , G. Louchard and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Schae er. Boltzmann Samplers for the Random Generation of Combinatorial Structures</article-title>
          .
          <source>Combinatorics Probability and Computing</source>
          ,
          <volume>13</volume>
          :
          <fpage>577</fpage>
          -
          <lpage>625</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [FS09]
          <string-name>
            <given-names>P.</given-names>
            <surname>Flajolet</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sedgewick</surname>
          </string-name>
          . Analytic Combinatorics. Cambridge University Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>