-

Linear-time exact sampling of sum-constrained random variables

0 Andrea Sportiello 1 Frederique Bassino

94 105

In the problem of exactly-sampling from a probability distribution, one is often led to sample from a measure of the form (x1; : : : ; xn) / Qi fi(xi) x1+ +xn;m, with fi's integer-valued distributions. When the fi's are all equal, an algorithm of Devroye essentially solves this problem in linear time. However, in several important cases these fi's are in the same family, but have distinct parameters. A typical example is the sampling of random set partitions, related to a set of fi's which are geometric distributions with di erent averages. We describe here a simple algorithmic solution for a large family of n-tuples of functions. At this aim we provide the notion of \positive decomposition" of one probability distribution into another.

x1+ +xn;m and whose average complexity is linear in n, the constant depending only on F and on the ratio m=n. This problem is essentially solved, by Devroye [Dev12], when the distributions fi are all equal. However his ideas do not apply in the generic case, and it is often the case in real-life applications that these distributions are indeed in the same family, but have distinct underlying parameters (see the examples in Section 4). (1) where 2 2.1 g(x) is a `simple' probability distribution (for example, g = Bern 1 ); 2 the qi(s) are non-negative, and, seen as probability distributions for the variable s, can be sampled e ciently; call 2 = Pi Varfi ; we need pm= =

(1).

In this case, as we will show in the following, we can sample in linear time the n-tuple s = (s1; : : : ; sn), from the distribution q(s) = Qi qi(si), apply a rejection scheme with average acceptance rate related to the quantity pm= above (and thus of order 1), and nally sample x from the measure m(xjs) = Qi g si (xi) x1+ +xn;m, this latter operation can be performed e ciently because the function g is `simple', and it enters this nal stage just with an n-fold convolution. Thus, the resulting algorithm has linear average complexity.

The understanding of which functions f admit a decomposition as in equation (2) turns out to be an interesting question per se. We outline here the main results, for which more details will be given in a forthcoming paper. Then, we provide a detailed description of the algorithm, justify the claim on its complexity, and illustrate it on some relevant examples. In particular, we provide a linear-time algorithm for the uniform sampling of set partitions of a set of cardinality n + m into exactly n parts. This marks an improvement w.r.t. previous results of the authors [BS15] (indeed this application has been the main motivation for this work), and, by previous results [BN07, BDS12], implies a linear exact sampling algorithm for uniform accessible deterministic complete automata (ACDA), and for minimal automata, in any given alphabet of nite size.

On positive decomposition

Positive decomposition of integer-valued probability distributions In this section we introduce our `fundamental' problem in Probability. As we have anticipated, for implementing our algorithm we are faced with the following basic construction. Let F and G = fgs(x)g be two families of integer-valued probability distributions.

De nition 2.1. We say that F decomposes in G if all f 2 F are linear combinations of functions in G:

The main idea of our algorithm is as follows: suppose that you can decompose 1 fi(x) = X qi(s) g s(x)

s2Nd

We are now ready to state our problem.

Problem 2.5. For a given function g, which functions f decompose positively in G = Conv[g]? A rst useful property is the following 1For s 0, here we use g s as a shortcut for the `convolution power', i.e. the s-fold convolution of g. (2) (3) (4) Proposition 2.6. If f1 and f2 decompose positively in the same G = Conv[g], also f1 f2 does, and the resulting set of coe cients, seen as a probability distribution, is the convolution of the two sets.

(f1 f2)(x) = When G is a convolution family, the average and variance of gs are linear in s, and we have

Ef (x) = Varf (x) =

X xf (x) = x

X q(s) X xgs(x) = s x

X q(s) X xg s(x) = s x

X q(s)s Eg(x) = Eq(s) Eg(x) ; s X x(x x;x0 x0)f (x)f (x0) =

X q(s)q(s0) X x(x s;s0 x;x0

x0)gs(x)gs0 (x0) = X q(s)q(s0) s Varg(x) + s(s s;s0

Solving for Eq(s) and Varq(s) gives

which implies in particular that Eq(s) =

Ef (x) Eg(x) ; Varq(s) =

Varf (x)

Eg(x)2

Ef (x)Varg(x)

Eg(x)3 ; Varf (x) Ef (x)

Varg(x)

Eg(x) : Equation (9) implies that, if F = Conv[f ], G = Conv[g], F / G and G / F , then f = g and F = G. In other words, the symbol / is an order relation among convolution families, which thus form a semilattice, in which the top element is Conv[ x;1].2

Write fe(y) = Px yxf (x) for the generating function associated to f . It is well known that gfs = (ge)s, and thus fe(y) =

X yxf (x) = X X q(s) yxgs(x) = x x s

X q(s)gfs(y) = s

X q(s)(g(y))s = qe(ge(y)) :

e s Call y(z) the function such that g(y(z)) = z (a solution to g(y(z)) = 1 + z exists at least as a formal power series, y(z) = 1 + Pn 1 aizi, and we caen set y(z) = y(z 1)). Ase a result, we have the formal rule q(s) = [zs]fe(y(z)) =

dz 2 izs+1 fe(y(z)) : This formula is valid provided that fe(y(z)) is analytic in a neighbourhood of z = 0. This is a non-trivial condition, as analyticity is guaranteed under weak hypotheses only in a neighbourhood of z = 1.

Let now F = Conv[f ] and G = Conv[g]. In light of Proposition 2.6, a necessary and su cient condition for F / G is that f decomposes positively in G, that, from (11) above, means that fe(y(z)) shall be analytic in a neighbourhood of z = 0, and have all non-negative Taylor coe cients.

This is our best answer to the question posed by Problem 2.5. 2.2

Three fundamental distributions There are three fundamental examples of basic functions g, to be used for constructing convolution families G = Conv[g], corresponding to the three fundamental statistics of identically-distributed particles in statistical physics. We start with two of them:

2Because Conv[ x;1] = f s;xgs2N is just the canonical basis for distributions on N, and any distribution decomposes positively in it, so we have proven that this poset is at least a semilattice, and we have identi ed its top element. (5) (6) (7) (8) (9) (10) (11) Fermionic statistics: For b 2 (0; 1], the family GbFermi is de ned by gs(r) = Binos;b(r) = Bernbs(r) = br(1 b)s r s r : The functions gs are Binomial distributions, that is, s-fold convolutions of Bernoulli random variables of parameter b. The average and variance of the Bernoulli distribution of parameter b are

Eg(r) = b ;

Varg(r) = b(1 b) :

At b = 1 this set of functions reduces to the canonical basis gs(r) = s;r.

Bosonic statistics: For b 2 R+, the family GbBose is de ned as The functions gs are s-fold convolutions of Geometric random variables of parameter b. The average and variance of the geometric distribution of parameter b are

Eg(r) = b ;

Varg(r) = b(1 + b) : tDhueeottohetrh:e Gobevomious;sb(fra)ct= nkBin=o (s; 1b)(kr). n+Akks 1a ,retshuelste, ttwheo pfaomsiitliivees adreecoimnpfaocstititohne fan=alyPtics cqo(sn)tginus,atiinonthoenetwoof cases of binomial and geometric distributions, have the same limit b ! 0, related to the Poissonian distribution Poiss (x) = e x=x!. In this case the discretisation is lost, as well as the parameter b, and we have a di erent notion of positive decomposition, namely one of the form f decomposes positively in Conv[Poiss] i f (x) =

dt q(t) Poisst(x) with q(t) : R+ ! R+.3

The possibility of having an integral, instead of a sum, is obviously related to the fact that the Poissonian is in nitely divisible, and would hold for all and only the in nitely divisible distributions. This leads us to our third fundamental statistics: Classical statistics: The family GClass is de ned as

The average and variance of gt are

gt(r) = Poisst(r) = e t tr r!

: Egt (r) = t ;

Vargt (r) = t : ( 12 ) (13) (14) ( 15 ) (16) ( 17 ) (18) The analogue of (11) states in this case that q(t) is the inverse Laplace transform of fe(y + 1), and the condition for f to be positively decomposable in Conv[Poiss] is that this function is both supported on and valued in R+.

By investigating the expressions fe(y(g)), we can identify the restriction to these families of the poset induced by e the order F / G. In particular, for our three families of distributions we have the tables g(y) e 1 + b(y

Binob Geomb (1

b(y 1)) 1 Poiss exp(y 1) bz y(ge = z) b 1 + z b 1 + z bz 1 + ln(z) g fe n e

Binoa Geoma Poiss exp

Binob

+ z b

a b

b (a + b) 1 b (z a b az 1)

a exp

Geomb 1 b bz (a 1 b)z 1 z

Poiss 1 1 a ln z 3We will often write just Poiss, with no parameter, contrarily to Binob and Geomb. When referring to a family G, we mean a decomposition as in (16), while, when referring to a function, we mean Poiss Poiss1. (obvious entries are omitted). The case of Poiss shall be treated separately, by explicit calculation, because of the variation in the criterium (11), which involves integrals instead of sums. In summary, we have, Binoa(x) = Poissa(x) = s e a Poisss(x) =

X Geom a b (s s 1 a

1) Geoms;b(x)

(19) (20) (21) (22) (23) As a result we have a rather simple picture of the associated portion of our semilattice: the restriction of the partial order to these families is indeed a total order, isomorphic to an interval 2 ( 1; 1], through the identi cation:

Geom

Bino Poiss

s 0 1 Recall that, as implied by equation (9), the order relation must be compatible with the total order given by Varg=Eg. Referring back to the explicit expressions (13), ( 15 ) and (18), we see that in this parametrisation we just have Varg=Eg = 1 for our three families of distributions.

Having identi ed this ordering is important for our algorithm. Indeed, in our problem we are faced with the question if a given function f is positively decomposable in some family Conv[g]. The transitivity property of positive decomposition implies that, within our three families of distributions, the set of functions g for which f is positively decomposable in the family Conv[g] is an up-set of the lattice, and thus, under the identi cation above, an interval [ min(f ); 1] for some min(f ) 1.

Analogously, for a family F of functions we de ne min(F ) := max f2F min(f ) : We have thus established that all the functions in the family F can be positively decomposed by any family G = Conv[g], for g being one of our fundamental distributions, with parameter min(F ) 1, and under the identi cation above. 3 3.1

The algorithm The rejection paradigm

There exists a correspondence between measure decompositions and rejection schemes in exact-sampling algorithms, namely, given a decomposition of a measure (x) / X 1(y) 2(xjy) a(y)

y with a(y) 2 [0; 1], Algorithm 1 is an exact-sampling algorithm for .

Algorithm 1: Rejection paradigm.

begin repeat y 1;

Berna(y); until = 1; x 2( jy); return x

Call T1(y) and T2(y) the average complexities for sampling from 1 when the outcome is y (plus sampling one Bernoulli with parameter a(y)), and for sampling from 2( jy), respectively. The number of failed runs in the repeat-until loop is geometrically-distributed, and the complexity of each of these runs is an independent random variable. As a result, the average complexities of the failed runs of the loop, of the successful run of the loop, and of the unique run of the nal instruction just add up, and the average complexity of the full algorithm is easily calculated via some surprising cancellations 1

Py 1(y)a(y) Py T1(y) 1(y)(1 Py 1(y)a(y) Py 1(y)(1

a(y)) a(y)) +

Py T1(y) 1(y)a(y)

Py 1(y)a(y) +

Py T2(y) 1(y)a(y)

Py 1(y)a(y) where averages E( ) are taken w.r.t. 1. If the functions T1(y) and T2(y) are bounded by the constants T1 and T2, we get the bound

T1 E(a) + T2 : Of course, the inequality above is an equality if the functions T1(y) and T2(y) are indeed constants. 3.2

Why the algorithm has linear complexity First, let us address the question of why we shall hope that linear complexity can be achieved, without concentrating on the technical detail of how a single variable can be sampled with (1) complexity.

Following the framework described in the introduction, n is a `large' positive integer, and f = (f1; : : : ; fn) is a n-tuple of integer-valued probability distributions, taken in a family F of functions satisfying the hypotheses of the Central Limit Theorem, and such that min(F ) < 1.

Let xi be the random variable associated to the function fi, and suppose that m = Pi E(xi) is an integer. Under our hypotheses, it is trivial to sample in linear time from the unconstrained measure 8 BinoN;b(m) if g = Binob PgN (m) = g N (m) = < PoissN (m) if g = Poiss : GeomN;b(m) if g = Geomb (24) (25) (26) (27) (29) (30) (31) however we want an algorithm which samples from the constrained measure (f1;:::;fn)(x1; : : : ; xn) = (f)(x) = Y fi(xi) ;

i (mf)(x) / with average complexity which is linear in n, the constant depending only on F and on the ratio m=n.

Call Pf (m0) the probability, in the unconstrained measure, that Pi xi = m0. By our CLT hypothesis, this probability is maximal around m0 = m, has variance which is linear in n, and thus its value is of order 1=pn at m0 = m. By de nition,

Pf (m0) = x1+ +xn;m0 : (28) x Qi fi(xi) Now, choose 2 [ min(F ); 1), and decompose the fi's in the family G = Conv[g], for g the fundamental distribution corresponding to b. Call qi(s) the corresponding expansion coe cients. We use a sum notation, with the disclaimer that for Poiss sums are replaced by integrals.

Thus we can write (call gs = (gs1 ; : : : ; gsn ) and q = (q1; : : : ; qn)) (f)(x) = Y fi(xi) = X Y qi(si)gsi (xi) = X (q)(s) (gs)(x) ;

i s i s and (mf0)(x)Pf (m0) = Y fi(xi) Pixi;m0 = X Y qi(si)gsi (xi) Pixi;m0 = X (q)(s) (mg0s)(x)Pgs (m0) : i s i s Now, call N = N (s) = Pi si. As gs = g s, in fact Pgs (m0) depends on s only through N (s), and is In all three cases, we can easily evaluate N := argmaxN (PgN (m)) (the distributions above are easily seen to be log-concave in N , by direct calculation on ( 12 ), (14) and ( 17 )), thus it su ces to analyse the ratios PgN (m)=PgN 1 (m)). We get

N = <8 b b c

m b(1

m : d b e

if g = Binob e m1 ) 1c if g = Poiss

if g = Geomb i

i N = X(Eqi ) = X (Efi ) =

Eg m Eg ; Note that, in all three cases, when m is large, this corresponds to N ' m=Eg. This is in agreement with (8), as indeed we have and the fact that, as by CLT we converge to a normal distribution, average and mode are equal at leading order. Combining the previous results, we can write (f1;:::;fn)(x) / m

X (q1;:::;qn)(s) (mgs1 ;:::;gsn )(x) PgN(s) (m) :

s PgN (m) algorithm as in Section 3.1. Indeed, at this point we can apply Algorithm 1 with the identi cations By de nition of N , the ratio a(N ) := PPggNN ((mm)) is in [0; 1], and thus can be used as the acceptance rate in a rejection x x y s a(y) PgN(s) (m)=PgN (m) (x) 1(y) 2(xjy) (f1;:::;fn)(x) m (q1;:::;qn)(s) (mgs1 ;:::;gsn )(x) Let us neglect the dependence of the complexities T1(s) and T2(s) from s, and use directly equation (25). Let us assume that we have determined that T1 and T2 are linear in n. We shall thus calculate the acceptance rate E(a) of that formula, and veri es that it scales as (1) for large n. In this setting, this reads E(a) = X (q1;:::;qn)(s) PgN(s) (m) :

s PgN (m) E(a) = X ^(N ) PgN (m) = E^(PgN (m)) :

N PgN (m) PgN (m) In other words, call ^(N ) the distribution on N = N (s) induced by (q1;:::;qn)(s). Then we have This already makes clear why our strategy is successful. When n tends to in nity, because of our CLT hypothesis, ^ converges to a normalised Gaussian, with average N and some variance 12 of order n. On the other side, PgN (m)=PgN (m) converges to a non-normalised Gaussian, with average N and variance 22 of order n. This Gaussian has been rescaled as to be valued 1 when N = N . As a result, we just have

E(a) '

Z dx p The resulting expression is homogeneous, so that it evaluates to a constant of order 1 for a whatever scaling in n of 12 and 22 (provided that the two variances have the same scaling). In this case the average complexity is and thus, by our assumptions, is linear.

Note that we would have obtained an expression still of order 1, even if we did take a \wrong" tuning of m, in the sense that Pi Efi = m + m, with m 1, provided that m is small w.r.t. 1 and 2 (which, under the hypotheses of the Central Limit Theorem, are of order pn). That is, in the language of Boltzmann samplers, we have a tolerance on the tuning of the Boltzmann parameter up to a relative error on the average outcome size of order 1=pn. (32) (33) (34) (35) (36) (37) (38)

Now let us relate the parameters 1 and 2 to the parameters of the problem. One case is rather easy: as the variances sum up under convolution, For what concerns 2, we shall calculate the variance in the variable N of the function g N (m). From the calculation on the explicit expressions ( 12 ), (14) and ( 17 )) one easily derives that, in all three cases, 22 = m VEa3grg .

Now, using (8) we get so that

If we take as function g the one with parameter min(F ) (which we have determined to be optimal), the asymptotic maximal acceptance rate for the set F = ff1; : : : ; fng within our algorithm is given by Under our assumption that the fi's satisfy the hypotheses of the Central Limit Theorem, we have that all Efi 's and Varfi 's are of order 1, so that m = Pi Efi and Pi Varfi have the same scaling behaviour. Once again we have obtained that, for n going to in nity, a(F ) converges to a nite quantity. 3.3

How the algorithm works in practice At the level of generality of this paper, we cannot give an algorithm that works just out of the box. For a given problem, given by the n-tuple (f1; : : : ; fn), a preliminary analytical study of the functions fi is mandatory. One has to

Determine that F / G for G = Conv[g ], for some parameter smaller than 1.

Devise an algorithm to sample from the corresponding distributions qi(s), in average time i, so that Pi i scales linearly with n.

Only at this point, one is ready to apply our general algorithm, which reads as in Algorithm 2. (39) (40) (41) (42)

Algorithm 2: Our new algorithm.

begin

N = round(m=Eg); repeat

N = 0; for i 1 to n do si qi;

N += si; until Sample x with return x a = g N (m)=g N (m);

Berna; = 1;

(mgs)(x);

Contrarily to the rst part of the algorithm, where a preliminary analysis concerning the functions qi is required, the nal step of the algorithm, of sampling x with (mgs)(x), is \universal", as it depends only on the family of basic distribution g , among binomial, geometric or Poissonian.

The Poissonian case is trivial in spirit (it su ces to sample m independent values i 2 [0; N ], and then increment xi if 0 < i Pij=11 sj < si), however, as now the intermediate variables si are real-valued, must be performed with oat numbers and has complexity O(m ln n). In the special case in which the si are integervalued, that is the qi's are supported on bN for some b, we have a genuinely linear alternate algorithm, which we will describe elsewhere.

The binomial and geometric case are in a sense `dual', and are solved by the same algorithm, provided by Bacher, Bodini, Hollender and Lumbroso in [BBHL15, sec. 1.1] (see also [BBHT17]), which is genuinely linear. This algorithm, that we shall call \BBHL-shu ing(n; k)", may be seen as a black box which samples strings = ( 1; : : : ; n) 2 f0; 1gn with Pi i = k, uniformly, and that, for n=k 1 = (1), requires a time linear in n.4 Then, in the two cases we have: Case g N (m) = BinoN;b(m): Use BBHL-shu ing(N; m), then set xi = j(i 1)+1 + + j(i) ; j(i) = s1 + s2 + + si : (43) Case g N (m) = GeomN;b(m): Use BBHL-shu ing(m + N which are valued 0. Let j0 = 0. Then 1; m). Call j1, . . . , jN 1 the positions of the i's xi = ji ji 1 : (44) 3.4

Bacher, Bodini, Hollender and Lumbroso algorithm Here we describe an algorithm, given by Bacher, Bodini, Hollender and Lumbroso in [BBHL15, sec. 1.1], for the uniform sampling of strings = ( 1; : : : ; n) 2 f0; 1gn with Pi i = k. In Section 3.3 this algorithm is called \BBHL-shu ing(n; k)". Recall that it is `optimal' for the randomness resource, in the sense that the random-bit complexity at leading order saturates the Shannon bound.

Call = k=n. The idea is that you sample the i's one by one, as if were to be sampled with the measure ( ) = Bernn, (this costs (1) per variable), and, as soon as you have an excess of i's equal to 0 or to 1, complete with a sequence of Fisher{Yates random swaps. These swaps cost ln n each, but are performed on average only pn times, so the swap part of the algorithm has subleading complexity, and the overall complexity is genuinely linear, with no logarithmic factors. For completeness, we give in Algorithm 3 a summary of the main features. In this algorithm RndIntj returns an uniform random integer in f1; : : : ; jg.

Algorithm 3: BBHL-shu ing.

begin a = k; b = n repeat i ++;

k; i = 0; i Bern ; if i = 1 then a --; else b --; until a < 0 or b < 0; if a < 0 then = 0; else = 1; for j i to n do j = ; h RndIntj ; swap j and h; return

Note that, if is almost a 2-adic number, namely = k=n = a2 d + " for some integers a and d, and " = O(1=pn), it is convenient to just use the source of randomness Bern 0 for the value 0 = a2 d, which speeds up the main part of the algorithm, at the price of slowing down the subleading part.

4For completeness, this algorithm is quickly described in the following subsection. The drawback of the algorithm outlined in Section 3.3 is that the user needs to perform a preliminary analytic study of its given probability distributions fi. There are obvious cases in which this work is not required: when all of the fi's are distributions in our three basic families, i.e. binomial, geometric or Poissonian. In this case min(F ) is evaluated directly, the functions qi(s) are given explicitly in Section 2.2, and are just, yet again, binomial, geometric or Poissonian (we provide the generating function for the basic cases, Bernb = Bino1;b and Geomb = Geom1;b, however the generic case is deduced immediately via Proposition 2.6). It is well known that one can sample from these distributions in constant complexity, so that we know that we have a linear-time algorithm as soon as we verify that Varfi =(Efi (1 )) = (1), which is straightforward in most cases.

The paradigm of sum-constrained geometric random variables with distinct parameters applies, for example, to set partitions, enumerated by the Stirling numbers of the second kind. The same paradigm with Bernoulli variables applies, for example, to permutations with a given number of cycles, enumerated by the Stirling numbers of the rst kind. In this section we describe in detail these two examples. 4.1

Set partitions, Stirling numbers of the second kind and Automata Each of the Pn partitions B of a set of cardinality n has a number k(B) of parts in the range f1; 2; : : : ; ng. The Stirling numbers of the second kind S2(n; k) are the corresponding enumeration, and a classical algorithmic problem is to sample B uniformly from the corresponding ensemble S2(n; k) [FS09].

This problem has acquired further interest since it has been shown, in [BN07], that accessible deterministic complete automata (ACDA) are in bijection with a certain subset of set partitions in S2(kn + 1; n) (where k = (1) is the size of the alphabet, and n is the number of states), characterised by a property easy to test (the `backbone' should stay above the diagonal), and that this subset is asymptotically a nite fraction of the ensemble. As a result, up to a rejection scheme, a uniform sampling algorithm for set partitions provides a uniform sampling algorithm for accessible deterministic complete automata.

Furthermore, in [BDS12] it is shown that minimal automata, which are at sight a subset of ACDA's, are asymptotically dense in this ensemble (i.e., the ratio of the cardinalities of the sets converges to 1), so, yet again, up to a rejection scheme, a uniform sampling algorithm for set partitions provides a uniform sampling algorithm for minimal automata.

Sampling from S2(n + m; n), when m=n is (1), can be done by the Boltzmann Method [DFLS04] with complexity O(n 23 ), and we show here that our new method gives a linear complexity. The recursive description of S2(n; k) provides a natural bijection with certain tableaux, which are described in [FS09, pag. 63], and, in more detail, in [BDS12], for which the so-called `backbone' relates to a list x = (x1; : : : ; xn) as in our setting, with (f)(x) / Qi Geombi (xi) x1+ +xn;n k. As a result, a uniform set partition in S2(n + m; n) can be sampled in two steps: rst we sample the backbone x = (x1; : : : ; xn) with the measure above, then, for every 1 i n, we sample xi independent random integers in f1; : : : ; ig, which describe the so-called `wiring part' of the tableau. The latter step has a complexity O(m ln n), which is intrinsic to this problem and is in fact implied by the Shannon bound (i.e., the asymptotics of ln S2( n; n), as evinced by well-known saddle-point calculations, see e.g. [BN07]). As this latter step is invoked only once in the algorithm, it turns out that this part of the algorithm is optimal for the randomness resource. As, in our algorithm, the rst part of sampling the backbone has complexity linear in n and o from optimal only by a multiplicative factor, we deduce that the whole algorithm is asymptotically optimal (although, admittedly, the convergence to optimality is slow, as it goes as 1= ln n).

The parameters bi of the geometric variables are tuned as to have bi=(bi + 1) / i and Pi E(xi) = m + o(pm), which is solved by bi = n !i !i ; 1 +

= m n ln(1 ! !) : Then, using the fact, seen in Section 2.2, that Geoma(x) = Ps Geom a+ab (s) Binos;b(x), and choosing for simplicity = 12 , we have

Geombi (x) =

X Geom 1 (s) Binos; 21 (x) = s 1+2bi

X Geom nn+!!ii (s) Binos; 12 (x)

s (45) (46)

However, a =

m 2 Pi bi(1 + bi)

: X bi(1 + bi) ' n i

Z 1 0 dx (1 !x !x)2 = n ln(1

!) ! +

1 1 ! m=n small, and as p 2mn e 2mn for m=n large. for ! 1, so that a Each of the n! permutations of Sn has a number k( ) of cycles in the range f1; 2; : : : ; ng. The Stirling numbers of the rst kind S1(n; k) are the corresponding enumeration, and a classical algorithmic problem is to sample uniformly from the corresponding ensemble S1(n; k) [FS09, pag. 121]. When n=k 1 is (1), this problem is solved by Boltzmann sampling [DFLS04] with complexity O(n 32 ), and we show here that our new method gives a linear complexity.

The recursive description of S1(n; k), namely S1(n; k) ' S1(n 1; k 1) [ S1(n 1; k) f1; : : : ; n 1g, provides a natural bijection with certain 0-1 matrices Mij , for which the diagonal relates to a list x = (x1; : : : ; xn) as in our setting, with xi = 1 Mii and (f)(x) / Qi Bernbi (xi) x1+ +xn;n k. The parameters bi are tuned as to have bi=(1 bi) / i 1 and m := n k = Pi E(xi) + o(pn), with solution bi =

!(i n + !(i 1) 1) ; 1 m n = ln(1 + !) ! : Then, using the fact, seen in Section 2.2, that Binoa(x) = Ps Bino ab (s) Binos;b(x), this expansion being positive for a b, and choosing = 1+!! maxi bi, we have Also, N = 2m. At this point we are ready to use our Algorithm 2, with this value of N , functions qi given by Geom nn+!!ii , and acceptance factor a(N ) = g N (m) g N (m) = 22m N (N

N ! m! m)! (2m)! and the part \Sample x with (mgs)(x)" is performed via the BBHL-shu ing(N; m) with parameter = 21 . The acceptance rate a = E(a(N )) is given by (42), just with min = 0 traded for = 21 , which we have chosen as a good compromise between e ciency and complexity of coding, and gives Also, N = bm= c. At this point we are ready to use our Algorithm 2, with this value of N , functions qi given by Bino (i 1)(1+!) , and acceptance factor n+(i 1)!

Binobi (x) =

X Bino (i 1)(1+!) (s) Binos; (x)

s n+(i 1)! a(N ) = g N (m) g N (m) = (1 )N N N ! (N (N

m)! m)! N ! (47) (48) (49) (50) (51)

for (52) (53) (54)

Acknowledgements

A.S. is supported by ANRMetaConc (ANR-15-CE40-0014, France).

[BBHL15]

Bacher ,

Bodini ,

Hollender and J. Lumbroso. MergeShu e: A Very Fast, Parallel Random Permutation Algorithm . At https://arxiv.org/abs/1508.03167, 2015 .

[BBHT17]

Bacher ,

Bodini , H. - K. Hwang and

T.-H.

Tsai . Generating random permutations by coin tossing: classical algorithms, new analysis, and modern implementation . ACM Transactions on Algorithms , 13 :#24, 2017 .

[BDS12]

Bassino ,

David and

Sportiello . Asymptotic enumeration of minimal automata . In: STACS 2012 , Paris, France, LIPIcs { Leibniz International Proceedings in Informatics, 14 : 88 { 99 , 2012 .

[BN07]

Bassino and

Nicaud . Enumeration and Random Generation of Accessible Automata . Theoretical Computer Science , 381 : 86 { 104 , 2007 .

[BS15]

Bassino and

Sportiello . Linear-time generation of inhomogenous random directed walks . In: ANALCO'15 , San Diego, California, pp. 51 { 65 , 2015 .

[Dev12]

Devroye . Simulating Size-constrained Galton-Watson Trees . SIAM Journal on Computing , 41 :1{ 11 , 2012 .

[DFLS04]

Duchon ,

Flajolet , G. Louchard and G. Schae er. Boltzmann Samplers for the Random Generation of Combinatorial Structures . Combinatorics Probability and Computing , 13 : 577 - 625 , 2004 .

[FS09]

Flajolet and

Sedgewick . Analytic Combinatorics. Cambridge University Press, 2009 .