1. Introduction

International Conference on Theoretical Computer Science, September

Large Deviation Properties for Pattern Statistics in Primitive Rational Models

Massimiliano Goldwurm

Marco Vignati

0 0 Dipartimento di Matematica “Federigo Enriques”, Università degli Studi di Milano , via Saldini 50, 20133 Milano , Italy

2023

1 3 15

We present a large deviation property for the pattern statistics representing the number of occurrences of a symbol in words of given length generated at random according to a rational stochastic model. The result is obtained assuming that in the model the overall weighted transition matrix is primitive. In particular we obtain a rate function depending on the main eigenvalue and eigenvectors of that matrix. Under rather mild conditions, we show that the range of validity of our large deviation estimate can be extended to the interval (0,1), which represents in our context the largest possible open interval of validity of the property.

eol>large deviations limit distributions regular languages rational formal series pattern statistics

1. Introduction

given length is proportional to the total weight of the accepting transitions labelled by . This model is quite general, it includes as special cases the traditional Bernoullian and Markovian sources, widely used in the literature to study the number of occurrences of patterns in a random text [24, 25, 16, 5]. We recall that the research concerning pattern statistics has a broad range of motivations and applications [22]. Moreover, it is known that the rational stochastic model allows to generate random words of length in an arbitrary regular language under uniform distribution: this occurs when the finite automaton defining the model is unambiguous and all transitions have the same weight.

In order to fix our notation, consider a (nondeterministic) weighted finite state automaton over the binary alphabet {, } and, for every ∈ N, let be the number of occurrences of the symbol in a word of length generated at random according to the rational model defined by . The analysis of these sequences of random variables is of interest in several contexts. First of all they can represent the number of occurrences of patterns in a random word of length , generated by a Markovian source, when the set of patterns is given by a regular language [2, 24, 25]. Moreover, they are related to the evaluation of the coeficients of rational formal series (a traditional problem well-studied in the literature [28, 26, 23]) and to the analysis of several problems and properties of regular language. This fact clearly holds for the natural problem of estimating the number of words of given length in a regular language having occurrences of a given symbol [4, 13]. It also holds for the analysis of additive functions defined on regular languages [20] and for the descriptional complexity of languages and computational models [6]. Further, using the local limit properties of the sequences {}, for a wide class of rational series it can be proved that the maximum coeficient of the monomials of degree has an asymptotic growth of the order Θ(/2 ) for some > 0 and some integer ≥ − 1 [8, 3].

The asymptotic behaviour of {}, i.e. mean value, variance, limit distribution both in the global and in the local sense [17, 15], has been studied in the literature under several hypotheses on the corresponding automaton . It is known that if has a primitive transition matrix then has a Gaussian limit distribution [2, 24] and, under a suitable aperiodicity condition, it also satisfies a local limit theorem [ 2], which can be extended to all primitive cases by using a suitable notion of periodicity [3]. The limit distribution of in the global sense is known also when the transition matrix of consists of two primitive components [9], while the local limit properties in this case are recently studied in [18]. When the automaton has several strongly connected components a general analysis of the (global) limit distribution of can be found in [19].

Here we continue this line of research proving in Section 5 that if the transition matrix of is primitive, then satisfies a large deviation property with a rate function depending on the main eigenvalue and the associated eigenvectors. The corresponding proof is rather standard, it relies on traditional tools of analytic combinatorics and the result is implicitly included in the previous literature [21, 11, 15]. However, here our result is significant since it puts in evidence the role played by the main eigenvalue and eigenvectors of the matrix of weights in the definition of both the rate function and the interval of validity of the property. Moreover, in Section 6, we assume a mild condition on the transition matrix of the automaton and, under such hypothesis, we show that the range of validity of the large deviation property can be extended to the entire interval ( 0, 1 ), which represents in our context the largest possible open interval where the property may hold.

2. A quick overview on large deviations

Large deviation estimates usually refer to a sequence of random variables (r.v.’s), say {}, having increasing mean values; it consist of a bound, exponentially decreasing to 0, over the probability that deviates from () by an amount greater or equal to (), > 0. Typical situations occur when () ∼ our cases, here we start with the following fomal definition [11, 15].

for a constant > 0, and since this occurs in all Definition 1. constant >

Let {} be a sequence of random variables such that () ∼ 0, and let (0, 1) be an interval including . Assume () is a function defined for a over (0, 1) taking values in R, such that () > 0 for ̸= . We say that {} satisfies a large deviation property relative to the interval (0, 1) with rate function () if the following limits hold: lim →∞ lim →∞ 1 1 log Pr( ≤ ) = − () log Pr( ≥ ) = − () for 0 < ≤ for ≤ < 1 and Pr( ≥ Pr( ≤

This property is equivalent to require that Pr( ≤ ) = − ()+(), for 0 < ≤ , ) = − ()+(), for ≤

< 1. The first relation concerns the left tail . It is convenient to keep the two limits separated in the definition since the proofs of a large ), while the second one refers to the right tail Pr( ≥ ) of the distribution of deviation property often considers one tail at a time.

A classical example of large deviation property concerns the sequence of binomial random deviations (i.e. of the order √) from the mean, that is variables {,} of parameters and , where ∈ ( 0, 1 ) is fixed. In this case, (,) = and by the Central Limit Theorem, we know that √,(−1− ) converges in distribution to a standard Gaussian random variable ( 0, 1 ). This yields a limit probability concerning “normal” →∞ lim Pr (︀ |, − | ≥ √)︀ = Pr | ( 0, 1 )| ≥ √︀(1 − ) ︃( )︃ ∀ > 0 Such a property implies the following result for a larger deviation

Pr (|, − | ≥ ) = ( 1 ) ∀ > 0 which can also be obtained by applying the Law of Large Numbers. The following proposition states a large deviation property for {,} that improves the last relation, showing that the convergence to 0 is exponential with respect to and the range of validity coincides with the overall interval ( 0, 1 ). Its proof is a consequence of Cramer’s Theorem (see e.g. [11, 10]), a classical result we discuss later. However, here we prefer to briefly outline a simpler proof to give the flavour of this property and to compare it with our subsequent results. property in the interval ( 0, 1 ) with rate function () given by Proposition 1. Any sequence of binomial random variables {,} satisfies a large deviation () = log + (1 − ) log 1 − 1 − for every ∈ ( 0, 1 ). max{Pr(, = ) : ∈ N, 0 ≤ ≤ }. Then we have Proof. We first prove the limit on the left tail. To this end, let 0 < ≤ and let () = () ≤ Pr(, ≤ ) ≤ ( + 1)() ( 1 ) formula leads to

⌊⌋ () = (︀ Recall that the probability Pr(, = ) is increasing for integers such that 0 ≤ ≤ ; hence )︀ ⌊⌋(1 − )−⌊ ⌋ for every ∈ (0, ]. Thus, a direct application of Stirling’s () = exp log ︂{ ︂[ + (1 − ) log 1 − ]︂ 1 − + (log ) ︂} which replaced in ( 1 ) proves log Pr(, ≤

) = − () + (log ).

A similar reasoning holds for the right tail. In this case we have () ≤ Pr(, ≥ ) ≤ ( − + 1)() for every ∈ [, 1) lim→1− ′() = +∞. inequalities yields log Pr(, ≥ ) = − () + (log ). where () = (︀ )︀ ⌊⌋(1 − )−⌊ ⌋. As above, replacing this value in the previous ⌊⌋

The rate function () is strictly convex in the interval ( 0, 1 ), takes a unique minimal value at = , where () = 0, while lim→0+ () = Moreover, () grows vertically for approaching 0 or 1, since lim→0+ ′() = −∞ 1− 1 and lim→1− () = 1 .

□ and Example 1. Let = 5/8. Then, the shape of the rate function () of the large deviation property for the sequence of binomial random viariables {,} is described in Figure 1. = , 5 8 () : set R once we allow the rate function () to assume value +∞. A classical situation of this type is established by Cramér’s Theorem [11, 10], stating that if {} is a sequence of independent and identically distributed random variables, with bounded moment generating function (i.e. () = (1 ) < () = sup∈R[ − log ()], for every ∈ R. {}, where = ∑︀=1 , satisfies a large deviation property all over R with rate function ∞ for any ∈ R), then the sequence of partial sums

3. Symbol statistics for rational models

In order to define our stochastic model consider a formal series in the non-commutative variables , , that is a function : {, }* → R+, where R+ = [0, +∞) and {, }* is the free monoid of all words on the alphabet {, }. As usual, we denote by (, ) the value of at a word ∈ {, }* . Such a series is said to be rational if for some integer > 0 there exists a monoid morphism : {, }* → R+× and two (column) arrays , ∈ R+, such that (, ) = ′ () , for every ∈ {, }* [1, 27] ( 1 ). Note that in this case, if = 12 · · · with ∈ {, } for every = 1, 2, . . . , , then () = (1) (2) · · · (). Thus, as the morphism is generated by the matrices = () and = (), we say that the 4-tuple (, , , ) is a linear representation of . Clearly, such a 4-tuple can be considered as a finite state automaton over the alphabet {, }, with transitions weighted by positive real values. Therefore (resp. ) represents the matrix of the weights of all transitions labelled by (resp. ), while (resp. ) is the array of the weights of the initial (resp. final) states.

Throughout this work, denoting by {, } the family of all words of length in {, }* , we assume that the set { ∈ {, } : (, ) > 0} is non-empty for every ∈ N+ (so that ̸= 0 ̸= ), and that and are non-zero matrices (i.e., each of them has at least one positive entry). Moreover, for every ∈ N, we can easily compute the sum of all values of associated with words in {, }:

⎛ ⎞ ∑︁ (, ) = ′ ∑︁ () = ′ ⎝∏︁ ∑︁ ()⎠ = ′ ( + ) ∈{,} ∈{,} =1 ∈{,} Thus, we can consider the probability measure Pr over the set {, } given by ( 2 ) (, ) Pr() = ∑︀∈{,} (, ) =

′ () ′( + ) ∀ ∈ {, } Note that, if is the characteristic series of a language ⊆ { , }* then Pr is the uniform probability function over the set ∩ {, }. Also observe that the traditional Markovian models (to generate a word at random in {, }* ) occur when + is a stochastic matrix, is a stochastic array and ′ = (1, 1 . . . , 1).

Then, under the previous hypotheses, we can define the integer random variable (r.v.) = ||, where is a word chosen at random in {, } with probability Pr(), and || is the number of occurrences of in . As ̸= [0] ̸= , is a non-degenerate random variable, taking value in {0, 1, . . . , }. It is clear that the probability function of is defined by () := Pr( = ) = ∑︀∑∈︀{,∈{},,|}|(=,(), ) , ∈ {0, 1, . . . , } Since is rational also the previous probability can be expressed by using its linear representation. The denominator is clearly determined by ( 2 ). As far as the numerator is concerned, setting () = 1 and () = 0, for a variable and for every ∈ N one has

( + ) = ∏︁

∑︁ =1 ∈{,}

⎛ () () = ∑︁ ⎝ ∑︁ =0 ∈{,},||=

⎞ ()⎠ 1As is customary, we denote by ′ the transpose of an array ∈ R, i.e. a row array.

As a consequence, if []() denotes (according to tradition) the coeficient of the monomial of degree in a polynomial (), the probabilities ()’s can be written as () = [] ′( + ) ′( + ) , ∈ {0, 1, . . . , }

For the sake of brevity we say that is defined by the linear representation (, , , ). The moment generating function Ψ() of can be defined by means of the map ℎ() given by ℎ() = ′( + ) ,

Observe that, in principle, is the sum of Bernoullian r.v.’s, which however are neither independent nor identically distributed (and hence traditional Cramer’s Theorem cannot be applied in this case). More precisely, can be seen as a sum of the following form: = ∑︁ () , where () = =1 ︂{ 1 0 if = if = , = 1 · · · , ∈ {, } Clearly, the r.v.’s () ( = 1, 2, . . . , ) are not independent since each of them strictly depends on the state reached at the -th step and hence on all the previous transitions; also, they cannot have the same distribution as the weights of the transitions from the various states may be quite diferent. Therefore, in the general case, is a random variable very diferent from a Binomial r.v. ,, for any ∈ ( 0, 1 ), even if they both have the same range of values {0, 1, . . . , }.

4. Primitive models

In this section we summarize the main properties of when the matrix + is primitive. Recall that a matrix ∈ R+× is primitive if there exists a positive integer such that > 0 (i.e. all entries of are strictly positive). The main properties of these matrices are established by the following well-known theorem (see for instance [29, Sec 1.1]). Theorem 1. (Perron-Frobenius) If a matrix = [ ] ∈ R+× is primitive then it admits a real eigevalue > 0 such that: (i) | | < for any eigenvalue of diferent from ; (ii) can be associated with strictly positive left and right eigenvectors; (iii) is a simple root of the characteristic equation of , and hence the associated eigenvectors are unique up to constant multiples;

(iv) if a matrix = [ ] ∈ R+× satisfies ≤ (i.e. ≤ ∀, ) and is an eigenvalue of then | | ≤ . Moreover, | | = implies = . ( 3 ) ( 4 ) Usually is called the Perron-Frobenius eigenvalue of .

Then, assume + is primitive and let be its Perron-Frobenius eigenvalue. In this case it is known that the sequence {} has a Gaussian limit distribution [2]. Its properties (in particular mean value and variance) can be studied through the function = () implicitly defined by the equation

det( − − ) = 0 + , with respect to , such that ′ = 1. in the literature [2, 3] and are useful in our context: characteristic polynomial of + . with initial condition (0) = . Clearly () is eigenvalue of + for every ∈ C. Moreover, () is analytic in a neighbourhood of 0 and ′(0) ̸= 0 since is a simple root of the In the analysis of the asymptotic properties of {}, the following results have been obtained 1) E() = ++(), where || < 1, ∈ R and is a constant satisfying 0 < < by = ′(0) . Moreover ′(0) = ′, where ′ and are left and right eigenvectors of 1 given 2) Var() = + ( 1 ), where is a positive constant defined by = ′′(0) − ︁( ′(0) )︁ 2;

3) In a neighbourhood of 0, the function Ψ() satisfies a “quasi power” condition, that is an equation of the form Ψ() = () ︂( () )︂ (1 + ()) (|| < 1) where () is analytic in = 0 and (0) = 1. A consequence of this result is that − converges in distribution to a Gaussian random variable of mean 0 and variance 1.

Some further properties of the moment generating function Ψ() can be obtained in the case of real . First observe that for every ∈ R also the matrix + is primitive: therefore () is its Perron-Frobenius eigenvalue. By the properties of primitive matrices we know that Theorem 1). Moreover, all the powers of + satisfy a relation of the form () is a positive real function, analytic and strictly increasing for all ∈ R (statement (iv) in ( + ) = () · ′ (1 + ()) (|| < 1, ∀ ∈ R) where ′ and are left and right eigenvectors of + relative to (), normed so that ′ = 1 [29, Th. 1.2]. A first consequence is that applying relation ( 6 ) to all real , we obtain (for every ∈ R) Ψ() = E( ) = ′( + ) ′( + ) ( 6 ) √

5. Large deviations for primitive models

In this section we present a large deviation property for the sequence {} in the primitive case. To this end, assume that + is primitive and consider the random variable () defined by the linear representation (, , , ), for any ∈ R. Since also + is primitive, for any ∈ R, we can apply the results of the previous section to all sequences of random variables {()}. Reasoning as for relation ( 5 ), for any ∈ R we can consider the function () implicitely defined by the equation det( − + − ) = 0 , with initial condition (0) = (). Clearly () = ( + ) and hence () is analytic in a neighbourhood of 0 (for any ∈ R), it admits derivatives of any order around 0 and ′(0) = ′(), ′′(0) = ′′().

Applying property 1) of the previous section to the linear representation (, , , ), for every ∈ R we obtain E(()) = () + + (), where ∈ R and ∈ ( 0, 1 ) are constant and () is a real function given by () = ′((00)) = ′(()) , for all ∈ R. Clearly (0) = . Moreover, by the same property, we have () = ′( + ), ′() = ′ and hence

Analogously, applying property 2) of the previous section to () we get Var(()) = () + ( 1 ), for all ∈ R, where () is a positive constant given by () = ′′(0) (0) − ︂( ′(0) )︂ 2 (0) = ′() > 0 Therefore () is strictly increasing all over R and the following limits exist and are finite: = lim (), →−∞ = lim () →+∞ By relation (9), we have 0 ≤ < (0) < ≤ 1, which, together with relation (10), implies the following statement.

Lemma 1. Let and be defined by (11). Then, for every ∈ (, ) there exists a unique ∈ R such that

( ) = Moreover, < 0 whenever < , = 0 and > 0 when > .

Now we apply property 3) of the previous section to the random variable (): we get a “quasi power” property for the moment generating function of (), that is Ψ()() = () ︁( ((+)) )︁ (1 + ()), for some ∈ ( 0, 1 ), where () is also analytic in = 0 and ( 8 ) (9) (10) (11) (12) (0) = 1. As a consequence, for every ∈ R the sequence of random variables ︂{ √( )− ()() ︂} converges in distribution to a Gaussian random variable of mean 0 and variance 1, i.e. for every constant ∈ R we have lim Pr →∞

1 ∫︁ = √2 −∞ − 2/2 ,

The previous results allow us to prove a large deviation property for {}.

Theorem 2. Let {} be defined by a linear representation (, , , ) where + is primitive. Then {} satisfies a large deviation property in the interval (, ) with rate function ︂( ( ) )︂

() = − log , ∀ ∈ (, ) where and are defined in relation (11) and is given by equation (12).

Proof. We first study the right tail of {}. We have to prove that for every ∈ [, ) the following relation holds:

lim →+∞ 1

log Pr( ≥ ) = log By Markov inequality, for every > 0, we have

Pr( ≥ ) = Pr( ≥ ) ≤ and hence, by relation ( 7 ) we get

Pr( ≥ ) ≤ () ︁( () )︁ (1 + ()), which implies 1 log Pr( ≥ ) ≤ log ︁( () )︁ + (1/).

This bound can be further refined by taking the minimum with respect to > 0 of the first term in the right hand side. To this end let us define the function (14) (15) () = log ︂( () )︂

lim →+∞ 1

log Pr( ≥ ) ≤ log Note that (0) = 0, ′() = () − , and hence by Lemma 1 since ≥ , () takes a unique minimum value at = ≥ 0. This proves Also observe that () is convex since ′′() = ′() > 0 by relation (10).

An analogous lower bound for Pr( ≥ ) can be proved by considering the random variable ( ). Since [] ′( + ) = [] ′( + ) , by relations ( 3 ) and ( 4 ) we have

Pr( = ) =

Pr (︀ ( ) = )︀ Ψ( ) ∀ = 0, 1, . . . , Also note that E(( )) = ( ) + ( 1 ) = + ( 1 ) and by (13) we know that {( )} has a Gaussian limit distribution. This means that, for every > 0, Pr (︀ ( ) > ( + ) )︀ = ( 1 ) and then

Pr (︀ ≤ ( ) ≤ ( + ) )︀ = 1 2 + ( 1 ) Then, from this relation and the identities (16) and ( 7 ) we get log Pr( ≤ ) ≤ log

+ (1/) ︂( ( ) )︂ which yields an upper bound to the limit in (18). relation (17), with obvious changes.

The corresponding lower bound is obtained as in the analysis of the right tail, leading to 1 1 (17) (18) □ Pr( ≥ ) ≥ ≥ =

Pr (︀ ≤ ≤ ( + ) )︀ Pr (︀ ≤ ( ) ≤ ( + ) )︀ ︂( 1 2

︂) + ( 1 ) ( ) (+) ︂(

( ) )︂ (+) Ψ( )

= (1 + ( )) Thus, by the arbitrariness of , we have which yields the required lower bound and concludes the proof of relation (14).

Consider now the left tail. We have to prove that, for every ∈ (, ], log Pr( ≥ ) ≥ log

+ (1/) ︂( ( ) )︂

lim →+∞ 1 log Pr( ≤ ) = log ︂( ( ) )︂ we get where ≤ 0 is defined by equation (12).

The reasoning is similar to the previous case. The main diference is that here < ≤ and one has to use negative values of . Note that the function () given by (15) is well defined also in this case. For every ∈ (, ] and every < 0, by Markov inequality and relation ( 7 ) Pr( ≤ ) = Pr( ≥ ) ≤

E( ) By Lemma 1 the minimum of () = log ︁( () )︁ is taken at = ≤ 0 and this proves that

6. Large deviations in the interval (0,1)

A natural question arising at this point is whether the interval (, ) can be extended to ( 0, 1 ) as in the case of the sequences of binomial random variables considered in section 2.

Let us assume that + is a primitive matrix. Since and are non-null matrices with entries in R+, they admit a real non-negative eigenvalue that is greater or equal to the modulus of any other eigenvalue of the respective matrix. We denote such eigenvalues by and , respectively. Clearly, as is not primitive in general, it may occur = 0 or = | | for some eigenvalue of such that ̸= , and the same may happen for . However, by statement (iv) of Theorem 1, it is clear that < and < , where is the Perron-Frobenius eigenvalue of + .

Now, assume > 0 (which is equivalent to require that has a nonnull eigenvalue) and let and be left and right eigenvectors of with respect to , normed so that ′ = 1. Clearly and cannot be null. Moreover, for → −∞ , the matrix + tends to and hence the eigenvalue () converges to , while the matrix ′ tends ′, implying ′ → ′ = and similarly ′ → ′. As a consequence, for → −∞ , we have

′() = ′ = () = ( 1 ) Therefore, by equality ( 8 ) the last relation implies (for → −∞ ) and hence = 0.

Analogously, if > 0 we get = 1. In fact, assume → +∞ and let () = + − . Exchanging and in the previous argument and recalling that () and + have the same eigenvectors, we obtain ′ = + ( 1 ) and ′− = (− ) = ( 1 ). As a consequence, for → +∞, we have

Theorem 3. Let {} be defined by a linear representation (, , , ) where + is primitive and both and have a nonnull eigenvalue. Then {} satisfies a large deviation property in ︁( ( ) )︁ the interval ( 0, 1 ) with rate function () = − log , for any ∈ ( 0, 1 ), where is the unique real value such that ( ) = .

Under the same hypotheses of the previous theorem we can study the function (). Note that the function = , implicitly defined by ( ) − = 0, is well defined and analytic for ∈ ( 0, 1 ). Thus it is easy to see that

′() = − ( ) ′ + + ′ = and hence () is decreasing in (0, ) and increasing in (, 1), with a unique minimal value ( ) = 0. This also proves that →0+ lim ′() = −∞ ,

→1− and lim ′() = +∞ and hence () grows vertically for → 1− and for → 0+. Moreover, ′′() equals ′, which is always positive since is strictly increasing in ( 0, 1 ), and hence () is a convex function in ( 0, 1 ). we have → −∞ applying the same equation (19), we get

Finally, let us determine the behaviour () for → 0+ and for → 1− . Letting → 0+, and, arguing as for equation (19), we obtain ( ) = + ( 1 ). Moreover,

= ( ) = ( ) = ( 1 ) As a consequence, we can derive the following limit →0+ lim () = l→im0+ − log(( )) + log( ) + = log(/ ) ( ) = ( + ( 1 )) and = + ( 1 ), which implies

Analogously, let → 1− . Then → +∞ and, reasoning as for equation (20), we get →1− lim () = l→im1− − log(( )) + log( ) + = log(/ ) Example 2. As an example, consider the linear representation defined by the weighted finite automaton of Figure 2 (left hand side). In this case we have = 6 and () = 2− 1 (︁ 1 + 3 + √

︁) 1 + 54 + 92 . By using Mathematica and plotting function () = (21) and (22). log 6 + − log () in the interval ( 0, 1 ), with condition ′() = (), we obtain the graphic at the right hand side of the same figure. The numerical computation also confirms that the limit of () for → 0+ and → 1− are log 6 and log 2, respectively, as proved by relations

For a qualitative comparison with the rate function associated with the sequence of binomial r.v.’s {.}, when = 5/8, such a graphic should be compared with that one of Figure 1. (21) (22)

7. Conclusion

In conclusion we can see that the qualitative behaviour of rate function () is rather similar to that one of () discussed in section 2, that is the rate function of the large deviation property for the sequence {,} of binomial random variables. Note that the interval of validity of the property is the same. The constant = ′ , where () takes its minimal value 0, corresponds to success probability of ,. Moreover, the limits of () for → 0+ and → 1− , i.e. log(/ ) and log(/

) respectively, correspond to − log(1 − ) and − log .

Natural problems for further investigation concern the case when or are null; in this case Theorem 3 does not apply immediately but one may guess that a similar property holds. In particular a natural question is whether in this case the large deviation property still holds in the interval ( 0, 1 ). Other subjects for further studies are the validity of large deviation properties for non-primitive rational models, in particular for models consisting of two or more components, like those studied in [9, 18] and in [19].

(, 3-) () = log 6 + − log ( ) 1.5 1.0

Acknowledgments

This work has been supported by Università degli Studi di Milano, under project PSR 2021. series in commuting variables. Proc. 8th DLT Conference, Lecture Notes in Computer Science vol. 3340, pp. 114-126, Springer, 2004. [9] D. de Falco, M. Goldwurm, V. Lonati. Frequency of symbol occurrences in bicomponent stochastic models. Theoret. Comput. Sci., 327 ( 3 ):269–300, 2004. [10] A. Dembo and O. Zeitouni. Large Deviation Techinques and Applications. Corrected reprint of the second (1998) edition. Springer–Verlag, 2010. [11] F. den Hollander. Large Deviations. American Mathematical Society, 2000. [12] A. Denise and M. Régnier. Rare events and conditional events on random strings. Discrete

Math. Theor. Comput. Sci., 6:191–214, 2004. [13] A. Denise, O. Roques and M. Termier. Random generation of words of context-free languages according to the frequency of letters. In D. Gardy, A. Mokkadem (eds), Mathematics and computer Science, Trends in Mathematics, 113–125, Birkäuser, 2000. [14] L. Devroye. Universal limit laws for depths in random trees. SIAM Journal on Computing, 28:409-432, 1999. [15] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Cambridge Univ. Press, 2009. [16] P. Flajolet, W. Szpankowski and B. Vallée. Hidden word statistics. Journal of the ACM, 53:147–183, 2006. [17] B.V. Gnedenko. Theory of probability. Gordon and Breach Science Publ., 1997. [18] M. Goldwurm, J. Lin, M. Vignati. Local limit laws for symbol statistics in bicomponent rational models. Theoret. Comput. Sci., 970:114051, 2023, https://doi.org/10.1016/j.tcs.2023.114051. [19] M. Goldwurm and V. Lonati. Pattern statistics and Vandermonde matrices. Theoret. Comput.

Sci., 356:153–169, 2006. [20] P. Grabner and M. Rigo. Distribution of additive functions with respect to numeration systems on regular languages. Theory Comput. Systems, 40:205-223, 2007. [21] H.-K. Hwang. Large deviations for combinatorial distributions. I: Central limit theorems.

The Annal of Applied Probability, 6( 1 ):297–319, 1996. [22] P. Jacquet and W. Szpankowski. Analytic Pattern Matching: From DNA to Twitter Cambridge

University Press, 2015. [23] P. Massazza and R. Radicioni. On computing the coeficients of bivariate holonomic formal series. Theoret. Comput. Sci., 346: 418–438, 2005. [24] P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. Theoret. Comput. Sci., 287( 2 ): 593–617, 2002. [25] M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22 ( 4 ):621–649, 1998. [26] C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationelles en variables non commutatives. Thèse Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977. [27] A. Salomaa and M. Soittola. Automata-Theoretic Aspects of Formal Power Series. Springer–

Verlag, 1978. [28] M.-P. Schützenberger. Finite counting automata. Inform. and Control, 5: 91–107, 1962. [29] E. Seneta. Non-negative matrices and Markov chains. Springer–Verlag, New York Heidelberg Berlin, 1981.

[1]

Berstel and

Reutenauer . Rational series and their languages , Springer-Verlag, New York - Heidelberg - Berlin, 1988 .

[2]

Bertoni ,

Chofrut ,

Goldwurm ,

Lonati . On the number of occurrences of a symbol in words of regular languages . Theoret. Comput. Sci. , 302 : 431 - 456 , 2003 .

[3]

Bertoni ,

Chofrut ,

Goldwurm and

Lonati . Local limit properties for pattern statistics and rational models . Theory Comput. Systems , 39 : 209 - 235 , 2006 .

[4]

Bertoni ,

Massazza and

Radicioni . Random generation of words in regular languages with fixed occurrences of symbols . Proc. WORDS'03 ,

Fourth

Intern . Conf. on Combinatorics of Words , Turku (Finland) , 332 - 343 , 2003 .

[5]

Bourdon and

Régnier . Large deviation properties for patterns . J. of Discrete Algorithms , 24 : 2 - 11 , 2014 .

[6]

Broda ,

Machiavelo ,

Moreira , and

Reis . A hitchhiker's guide to descriptional complexity through analytic combinatorics . Theoret. Comput. Sci. , 528 : 85 - 100 , 2014 .

[7]

Broutin and

Devroye . Large deviations for the weighted height of an extended class of trees . Algorithmica , 46 : 271 - 297 , 2006 .

[8]

Chofrut ,

Goldwurm and

Lonati . On the maximum coeficients of rational formal