<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Conference on Theoretical Computer Science, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Large Deviation Properties for Pattern Statistics in Primitive Rational Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimiliano Goldwurm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Vignati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica “Federigo Enriques”, Università degli Studi di Milano</institution>
          ,
          <addr-line>via Saldini 50, 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We present a large deviation property for the pattern statistics representing the number of occurrences of a symbol in words of given length generated at random according to a rational stochastic model. The result is obtained assuming that in the model the overall weighted transition matrix is primitive. In particular we obtain a rate function depending on the main eigenvalue and eigenvectors of that matrix. Under rather mild conditions, we show that the range of validity of our large deviation estimate can be extended to the interval (0,1), which represents in our context the largest possible open interval of validity of the property.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;large deviations</kwd>
        <kwd>limit distributions</kwd>
        <kwd>regular languages</kwd>
        <kwd>rational formal series</kwd>
        <kwd>pattern statistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>given length is proportional to the total weight of the accepting transitions labelled by . This
model is quite general, it includes as special cases the traditional Bernoullian and Markovian
sources, widely used in the literature to study the number of occurrences of patterns in a random
text [24, 25, 16, 5]. We recall that the research concerning pattern statistics has a broad range
of motivations and applications [22]. Moreover, it is known that the rational stochastic model
allows to generate random words of length  in an arbitrary regular language under uniform
distribution: this occurs when the finite automaton defining the model is unambiguous and all
transitions have the same weight.</p>
      <p>In order to fix our notation, consider a (nondeterministic) weighted finite state automaton 
over the binary alphabet {, } and, for every  ∈ N, let  be the number of occurrences of the
symbol  in a word of length  generated at random according to the rational model defined by
. The analysis of these sequences of random variables is of interest in several contexts. First
of all they can represent the number of occurrences of patterns in a random word of length
, generated by a Markovian source, when the set of patterns is given by a regular language
[2, 24, 25]. Moreover, they are related to the evaluation of the coeficients of rational formal
series (a traditional problem well-studied in the literature [28, 26, 23]) and to the analysis of
several problems and properties of regular language. This fact clearly holds for the natural
problem of estimating the number of words of given length in a regular language having 
occurrences of a given symbol [4, 13]. It also holds for the analysis of additive functions defined
on regular languages [20] and for the descriptional complexity of languages and computational
models [6]. Further, using the local limit properties of the sequences {}, for a wide class of
rational series it can be proved that the maximum coeficient of the monomials of degree  has
an asymptotic growth of the order Θ(/2 ) for some  &gt; 0 and some integer  ≥ − 1 [8, 3].</p>
      <p>The asymptotic behaviour of {}, i.e. mean value, variance, limit distribution both in the
global and in the local sense [17, 15], has been studied in the literature under several hypotheses
on the corresponding automaton . It is known that if  has a primitive transition matrix
then  has a Gaussian limit distribution [2, 24] and, under a suitable aperiodicity condition, it
also satisfies a local limit theorem [ 2], which can be extended to all primitive cases by using a
suitable notion of periodicity [3]. The limit distribution of  in the global sense is known also
when the transition matrix of  consists of two primitive components [9], while the local limit
properties in this case are recently studied in [18]. When the automaton  has several strongly
connected components a general analysis of the (global) limit distribution of  can be found in
[19].</p>
      <p>
        Here we continue this line of research proving in Section 5 that if the transition matrix of
 is primitive, then  satisfies a large deviation property with a rate function depending
on the main eigenvalue and the associated eigenvectors. The corresponding proof is rather
standard, it relies on traditional tools of analytic combinatorics and the result is implicitly
included in the previous literature [21, 11, 15]. However, here our result is significant since
it puts in evidence the role played by the main eigenvalue and eigenvectors of the matrix of
weights in the definition of both the rate function and the interval of validity of the property.
Moreover, in Section 6, we assume a mild condition on the transition matrix of the automaton
and, under such hypothesis, we show that the range of validity of the large deviation property
can be extended to the entire interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), which represents in our context the largest possible
open interval where the property may hold.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. A quick overview on large deviations</title>
      <p>Large deviation estimates usually refer to a sequence of random variables (r.v.’s), say {},
having increasing mean values; it consist of a bound, exponentially decreasing to 0, over the
probability that  deviates from () by an amount greater or equal to (),  &gt; 0.
Typical situations occur when () ∼
our cases, here we start with the following fomal definition [11, 15].</p>
      <p>for a constant  &gt;
0, and since this occurs in all
Definition 1.
constant  &gt;</p>
      <p>Let {} be a sequence of random variables such that () ∼
0, and let (0, 1) be an interval including  . Assume  () is a function defined
 for a
over (0, 1) taking values in R, such that  () &gt; 0 for  ̸=  . We say that {} satisfies a
large deviation property relative to the interval (0, 1) with rate function  () if the following
limits hold:
lim
→∞ 
lim
→∞ 
1
1
log Pr( ≤ ) = −  ()
log Pr( ≥ ) = −  ()
for 0 &lt;  ≤ 
for  ≤  &lt; 1
and Pr( ≥
Pr( ≤</p>
      <p>This property is equivalent to require that Pr( ≤
) = −  ()+(), for 0 &lt;  ≤  ,
) = −  ()+(), for  ≤</p>
      <p>&lt; 1. The first relation concerns the left tail
. It is convenient to keep the two limits separated in the definition since the proofs of a large
), while the second one refers to the right tail Pr( ≥
) of the distribution of
deviation property often considers one tail at a time.</p>
      <p>
        A classical example of large deviation property concerns the sequence of binomial random
deviations (i.e. of the order √) from the mean, that is
variables {,} of parameters  and , where  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) is fixed. In this case, (,) = 
and by the Central Limit Theorem, we know that √,(−1− ) converges in distribution to a
standard Gaussian random variable  (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ). This yields a limit probability concerning “normal”
→∞

lim Pr (︀ |, − | ≥ √)︀ = Pr | (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        )| ≥ √︀(1 − )
︃(
)︃
∀ &gt; 0
Such a property implies the following result for a larger deviation
      </p>
      <p>
        Pr (|, − | ≥ ) = (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
∀ &gt; 0
which can also be obtained by applying the Law of Large Numbers. The following proposition
states a large deviation property for {,} that improves the last relation, showing that the
convergence to 0 is exponential with respect to  and the range of validity coincides with the
overall interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ). Its proof is a consequence of Cramer’s Theorem (see e.g. [11, 10]), a
classical result we discuss later. However, here we prefer to briefly outline a simpler proof to
give the flavour of this property and to compare it with our subsequent results.
property in the interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) with rate function () given by
Proposition 1. Any sequence of binomial random variables {,} satisfies a large deviation
() =  log


+ (1 − ) log 1 − 
1
− 
for every  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ).
max{Pr(, = ) :  ∈ N, 0 ≤  ≤ }. Then we have
Proof. We first prove the limit on the left tail. To this end, let 0 &lt;  ≤  and let () =
() ≤ Pr(, ≤ ) ≤ ( + 1)()
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
formula leads to
      </p>
      <p>
        ⌊⌋
() = (︀
Recall that the probability Pr(, = ) is increasing for integers  such that 0 ≤  ≤ ; hence
 )︀ ⌊⌋(1 − )−⌊ ⌋ for every  ∈ (0, ]. Thus, a direct application of Stirling’s
() = exp   log
︂{
︂[


+ (1 − ) log 1 −  ]︂
1
− 
+ (log )
︂}
which replaced in (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) proves log Pr(, ≤
      </p>
      <p>) = − () + (log ).</p>
      <p>A similar reasoning holds for the right tail. In this case we have
() ≤ Pr(, ≥ ) ≤ ( −  + 1)()
for every  ∈ [, 1)
lim→1− ′() = +∞.
inequalities yields log Pr(, ≥ ) = − () + (log ).
where () = (︀  )︀ ⌊⌋(1 − )−⌊ ⌋. As above, replacing this value in the previous
⌊⌋</p>
      <p>
        The rate function () is strictly convex in the interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), takes a unique minimal
value at  = , where () = 0, while lim→0+ () = 
Moreover, () grows vertically for  approaching 0 or 1, since lim→0+ ′() = −∞
1− 1  and lim→1− () =  1 .
      </p>
      <p>□
and
Example 1. Let  = 5/8. Then, the shape of the rate function () of the large deviation
property for the sequence of binomial random viariables {,} is described in Figure 1.
 = ,
5
8
() :
set R once we allow the rate function  () to assume value +∞. A classical situation of
this type is established by Cramér’s Theorem [11, 10], stating that if {} is a sequence of
independent and identically distributed random variables, with bounded moment generating
function (i.e.  () = (1 ) &lt;
() = sup∈R[ − log  ()], for every  ∈ R.
{}, where  = ∑︀=1 , satisfies a large deviation property all over R with rate function
∞ for any  ∈ R), then the sequence of partial sums</p>
    </sec>
    <sec id="sec-3">
      <title>3. Symbol statistics for rational models</title>
      <p>
        In order to define our stochastic model consider a formal series in the non-commutative variables
, , that is a function  : {, }* → R+, where R+ = [0, +∞) and {, }* is the free monoid
of all words on the alphabet {, }. As usual, we denote by (, ) the value of  at a word
 ∈ {, }* . Such a series  is said to be rational if for some integer  &gt; 0 there exists
a monoid morphism  : {, }* → R+×  and two (column) arrays ,  ∈ R+, such that
(, ) =  ′ () , for every  ∈ {, }* [1, 27] (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). Note that in this case, if  = 12 · · · 
with  ∈ {, } for every  = 1, 2, . . . , , then  () =  (1) (2) · · ·  (). Thus, as the
morphism  is generated by the matrices  =  () and  =  (), we say that the 4-tuple
(, , ,  ) is a linear representation of . Clearly, such a 4-tuple can be considered as a finite
state automaton over the alphabet {, }, with transitions weighted by positive real values.
Therefore  (resp. ) represents the matrix of the weights of all transitions labelled by  (resp.
), while  (resp.  ) is the array of the weights of the initial (resp. final) states.
      </p>
      <p>Throughout this work, denoting by {, } the family of all words of length  in {, }* ,
we assume that the set { ∈ {, } : (, ) &gt; 0} is non-empty for every  ∈ N+ (so that
 ̸= 0 ̸=  ), and that  and  are non-zero matrices (i.e., each of them has at least one positive
entry). Moreover, for every  ∈ N, we can easily compute the sum of all values of  associated
with words in {, }:</p>
      <p>
        ⎛  ⎞
∑︁ (, ) =  ′ ∑︁  () =  ′ ⎝∏︁ ∑︁  ()⎠  =  ′ ( + ) 
∈{,} ∈{,} =1 ∈{,}
Thus, we can consider the probability measure Pr over the set {, } given by
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(, )
Pr() = ∑︀∈{,} (, )
=
      </p>
      <p>′ ()
 ′( + )
∀  ∈ {, }
Note that, if  is the characteristic series of a language  ⊆ { , }* then Pr is the uniform
probability function over the set  ∩ {, }. Also observe that the traditional Markovian
models (to generate a word at random in {, }* ) occur when  +  is a stochastic matrix,  is
a stochastic array and  ′ = (1, 1 . . . , 1).</p>
      <p>
        Then, under the previous hypotheses, we can define the integer random variable (r.v.)  =
||, where  is a word chosen at random in {, } with probability Pr(), and || is the
number of occurrences of  in . As  ̸= [0] ̸= ,  is a non-degenerate random variable,
taking value in {0, 1, . . . , }. It is clear that the probability function of  is defined by
() := Pr( = ) = ∑︀∑∈︀{,∈{},,|}|(=,(), ) ,  ∈ {0, 1, . . . , }
Since  is rational also the previous probability can be expressed by using its linear representation.
The denominator is clearly determined by (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ). As far as the numerator is concerned, setting
 () = 1 and  () = 0, for a variable  and for every  ∈ N one has
      </p>
      <p>( + ) = ∏︁</p>
      <p>∑︁
=1 ∈{,}</p>
      <p>⎛
 () () = ∑︁ ⎝ ∑︁
=0 ∈{,},||=</p>
      <p>⎞
 ()⎠ 
1As is customary, we denote by ′ the transpose of an array  ∈ R, i.e. a row array.</p>
      <p>As a consequence, if []() denotes (according to tradition) the coeficient of the monomial
of degree  in a polynomial (), the probabilities ()’s can be written as
() =
[] ′( + )
 ′( + )
,
 ∈ {0, 1, . . . , }</p>
      <p>For the sake of brevity we say that  is defined by the linear representation (, , ,  ). The
moment generating function Ψ() of  can be defined by means of the map ℎ() given by
ℎ() =  ′( + ) ,</p>
      <p>
        Observe that, in principle,  is the sum of  Bernoullian r.v.’s, which however are neither
independent nor identically distributed (and hence traditional Cramer’s Theorem cannot be
applied in this case). More precisely,  can be seen as a sum of the following form:

 = ∑︁ () , where () =
=1
︂{ 1
0
if  = 
if  =  ,  = 1 · · ·  ,  ∈ {, }
Clearly, the r.v.’s () ( = 1, 2, . . . , ) are not independent since each of them strictly depends
on the state reached at the -th step and hence on all the previous transitions; also, they cannot
have the same distribution as the weights of the transitions from the various states may be quite
diferent. Therefore, in the general case,  is a random variable very diferent from a Binomial
r.v. ,, for any  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), even if they both have the same range of values {0, 1, . . . , }.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Primitive models</title>
      <p>In this section we summarize the main properties of  when the matrix  +  is primitive.
Recall that a matrix  ∈ R+×  is primitive if there exists a positive integer  such that
  &gt; 0 (i.e. all entries of   are strictly positive). The main properties of these matrices are
established by the following well-known theorem (see for instance [29, Sec 1.1]).
Theorem 1. (Perron-Frobenius) If a matrix  = [ ] ∈ R+×  is primitive then it admits a real
eigevalue  &gt; 0 such that:
(i) | | &lt;  for any eigenvalue  of  diferent from  ;
(ii)  can be associated with strictly positive left and right eigenvectors;
(iii)  is a simple root of the characteristic equation of  , and hence the associated eigenvectors
are unique up to constant multiples;</p>
      <p>
        (iv) if a matrix  = [ ] ∈ R+×  satisfies  ≤  (i.e.  ≤  ∀, ) and  is an eigenvalue
of  then | | ≤  . Moreover, | | =  implies  =  .
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
Usually  is called the Perron-Frobenius eigenvalue of  .
      </p>
      <p>Then, assume  +  is primitive and let  be its Perron-Frobenius eigenvalue. In this case
it is known that the sequence {} has a Gaussian limit distribution [2]. Its properties (in
particular mean value and variance) can be studied through the function  = () implicitly
defined by the equation</p>
      <p>
        det( −  − ) = 0
 + , with respect to  , such that ′ = 1.
in the literature [2, 3] and are useful in our context:
characteristic polynomial of  + .
with initial condition (0) =  . Clearly () is eigenvalue of  +  for every  ∈ C.
Moreover, () is analytic in a neighbourhood of 0 and ′(0) ̸= 0 since  is a simple root of the
In the analysis of the asymptotic properties of {}, the following results have been obtained

1) E() =  ++(), where || &lt; 1,  ∈ R and  is a constant satisfying 0 &lt;  &lt;
by  = ′(0) . Moreover ′(0) = ′, where ′ and  are left and right eigenvectors of
1 given
2) Var() =  + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), where  is a positive constant defined by
 = ′′(0)

−
︁( ′(0) )︁ 2;
      </p>
      <p>3) In a neighbourhood of 0, the function Ψ() satisfies a “quasi power” condition, that is an
equation of the form
Ψ() = ()
︂( () )︂ 

(1 + ())
(|| &lt; 1)
where () is analytic in  = 0 and (0) = 1. A consequence of this result is that − 
converges in distribution to a Gaussian random variable of mean 0 and variance 1.</p>
      <p>
        Some further properties of the moment generating function Ψ() can be obtained in the
case of real . First observe that for every  ∈ R also the matrix  +  is primitive: therefore
() is its Perron-Frobenius eigenvalue. By the properties of primitive matrices we know that
Theorem 1). Moreover, all the powers of  +  satisfy a relation of the form
() is a positive real function, analytic and strictly increasing for all  ∈ R (statement (iv) in
( + ) = () · ′ (1 + ())
(|| &lt; 1, ∀  ∈ R)
where ′ and  are left and right eigenvectors of  +  relative to (), normed so that
′ = 1 [29, Th. 1.2]. A first consequence is that applying relation (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) to all real , we obtain
(for every  ∈ R)
Ψ() = E( ) =
 ′( + )
 ′( + )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
√
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Large deviations for primitive models</title>
      <p>
        In this section we present a large deviation property for the sequence {} in the primitive case.
To this end, assume that  +  is primitive and consider the random variable () defined
by the linear representation (,  , ,  ), for any  ∈ R. Since also  +  is primitive,
for any  ∈ R, we can apply the results of the previous section to all sequences of random
variables {()}. Reasoning as for relation (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), for any  ∈ R we can consider the function
() implicitely defined by the equation
det( − + − ) = 0 ,
with initial condition (0) = (). Clearly () = ( + ) and hence () is analytic
in a neighbourhood of 0 (for any  ∈ R), it admits derivatives of any order around 0 and
′(0) = ′(), ′′(0) = ′′().
      </p>
      <p>
        Applying property 1) of the previous section to the linear representation (,  , ,  ), for
every  ∈ R we obtain E(()) =  () +  + (), where  ∈ R and  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) are
constant and  () is a real function given by  () = ′((00)) = ′(()) , for all  ∈ R. Clearly
 (0) =  . Moreover, by the same property, we have () = ′( + ), ′() = ′
and hence
      </p>
      <p>
        Analogously, applying property 2) of the previous section to () we get Var(()) =
 () + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), for all  ∈ R, where  () is a positive constant given by
 () =
′′(0)
(0) −
︂( ′(0) )︂ 2
(0)
=  ′() &gt; 0
Therefore  () is strictly increasing all over R and the following limits exist and are finite:
 = lim  (),
→−∞
 = lim  ()
→+∞
By relation (9), we have 0 ≤  &lt;  (0) &lt;  ≤ 1, which, together with relation (10), implies
the following statement.
      </p>
      <p>Lemma 1. Let  and  be defined by (11). Then, for every  ∈ (,  ) there exists a unique
  ∈ R such that</p>
      <p>( ) = 
Moreover,   &lt; 0 whenever  &lt;  ,   = 0 and   &gt; 0 when  &gt;  .</p>
      <p>
        Now we apply property 3) of the previous section to the random variable (): we get
a “quasi power” property for the moment generating function of (), that is Ψ()() =
() ︁( ((+)) )︁  (1 + ()), for some  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), where () is also analytic in  = 0 and
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(9)
(10)
(11)
(12)
(0) = 1. As a consequence, for every  ∈ R the sequence of random variables ︂{ √( )− ()() ︂} 
converges in distribution to a Gaussian random variable of mean 0 and variance 1, i.e. for every
constant  ∈ R we have
lim Pr
→∞
      </p>
      <p>1 ∫︁ 
= √2
−∞
− 2/2 ,</p>
      <p>The previous results allow us to prove a large deviation property for {}.</p>
      <p>Theorem 2. Let {} be defined by a linear representation (, , ,  ) where  +  is primitive.
Then {} satisfies a large deviation property in the interval (,  ) with rate function
︂( ( ) )︂</p>
      <p>() = − log
,
∀ ∈ (,  )
where  and  are defined in relation (11) and   is given by equation (12).</p>
      <p>Proof. We first study the right tail of {}. We have to prove that for every  ∈ [,  ) the
following relation holds:</p>
      <p>lim
→+∞ 
1</p>
      <p>log Pr( ≥ ) = log
By Markov inequality, for every  &gt; 0, we have</p>
      <p>
        Pr( ≥ ) = Pr( ≥ ) ≤
and hence, by relation (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) we get
      </p>
      <p>Pr( ≥ ) ≤ () ︁( () )︁  (1 + ()),
which implies 1 log Pr( ≥ ) ≤ log ︁( () )︁ + (1/).</p>
      <p>This bound can be further refined by taking the minimum with respect to  &gt; 0 of the first
term in the right hand side. To this end let us define the function
(14)
(15)
 () = log
︂( () )︂</p>
      <p>lim
→+∞ 
1</p>
      <p>log Pr( ≥ ) ≤ log
Note that  (0) = 0,  ′() =  () − , and hence by Lemma 1 since  ≥  ,  () takes a
unique minimum value at  =   ≥ 0. This proves
Also observe that  () is convex since  ′′() =  ′() &gt; 0 by relation (10).</p>
      <p>
        An analogous lower bound for Pr( ≥ ) can be proved by considering the random
variable ( ). Since [] ′(   + ) =  [] ′( + ) , by relations (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
we have
      </p>
      <p>Pr( = ) =</p>
      <p>
        Pr (︀ ( ) =  )︀ Ψ( )
 
∀  = 0, 1, . . . , 
Also note that E(( )) =  ( ) + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) =  + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and by (13) we know that {( )}
has a Gaussian limit distribution. This means that, for every  &gt; 0, Pr (︀ ( ) &gt; ( + ) )︀ =
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and then
      </p>
      <p>
        Pr (︀  ≤ ( ) ≤ ( + ) )︀ =
1
2
+ (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Then, from this relation and the identities (16) and (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) we get
log Pr( ≤ ) ≤ log
      </p>
      <p>+ (1/)
︂( ( ) )︂
  
which yields an upper bound to the limit in (18).
relation (17), with obvious changes.</p>
      <p>The corresponding lower bound is obtained as in the analysis of the right tail, leading to
1

1

(17)
(18)
□
Pr( ≥ ) ≥
≥
=</p>
      <p>Pr (︀  ≤  ≤ ( + ) )︀
Pr (︀  ≤ ( ) ≤ ( + ) )︀
︂( 1
2</p>
      <p>
        ︂)
+ (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ( )
 (+)
︂(
      </p>
      <p>( ) )︂ 
  (+)
Ψ( )</p>
      <p>=
(1 + ( ))
Thus, by the arbitrariness of , we have
which yields the required lower bound and concludes the proof of relation (14).</p>
      <p>Consider now the left tail. We have to prove that, for every  ∈ (,  ],
log Pr( ≥ ) ≥ log</p>
      <p>+ (1/)
︂( ( ) )︂</p>
      <p>lim
→+∞ 
1
log Pr( ≤ ) = log
︂( ( ) )︂
  
we get
where   ≤ 0 is defined by equation (12).</p>
      <p>
        The reasoning is similar to the previous case. The main diference is that here  &lt;  ≤  and
one has to use negative values of . Note that the function () given by (15) is well defined
also in this case. For every  ∈ (,  ] and every  &lt; 0, by Markov inequality and relation (
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
Pr( ≤ ) = Pr(
≥ )
≤
      </p>
      <p>E( )

By Lemma 1 the minimum of  () = log ︁( () )︁ is taken at  =   ≤ 0 and this proves that</p>
    </sec>
    <sec id="sec-6">
      <title>6. Large deviations in the interval (0,1)</title>
      <p>
        A natural question arising at this point is whether the interval (,  ) can be extended to (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        )
as in the case of the sequences of binomial random variables considered in section 2.
      </p>
      <p>Let us assume that  +  is a primitive matrix. Since  and  are non-null matrices with
entries in R+, they admit a real non-negative eigenvalue that is greater or equal to the modulus
of any other eigenvalue of the respective matrix. We denote such eigenvalues by   and  ,
respectively. Clearly, as  is not primitive in general, it may occur   = 0 or   = | | for
some eigenvalue  of  such that  ̸=  , and the same may happen for  . However, by
statement (iv) of Theorem 1, it is clear that   &lt;  and   &lt;  , where  is the Perron-Frobenius
eigenvalue of  + .</p>
      <p>Now, assume   &gt; 0 (which is equivalent to require that  has a nonnull eigenvalue) and
let  and  be left and right eigenvectors of  with respect to  , normed so that ′ = 1.
Clearly  and  cannot be null. Moreover, for  → −∞ , the matrix  +  tends to 
and hence the eigenvalue () converges to  , while the matrix ′ tends ′, implying
′ → ′ =   and similarly ′ → ′. As a consequence, for  → −∞ , we
have</p>
      <p>
        ′() = ′ = () = (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Therefore, by equality (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) the last relation implies (for  → −∞
)
and hence  = 0.
      </p>
      <p>
        Analogously, if   &gt; 0 we get  = 1. In fact, assume  → +∞ and let () =  + − .
Exchanging  and  in the previous argument and recalling that () and  +  have
the same eigenvectors, we obtain ′ =   + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and ′−  = (− ) = (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). As a
consequence, for  → +∞, we have
      </p>
      <p>
        Theorem 3. Let {} be defined by a linear representation (, , ,  ) where  +  is primitive
and both  and  have a nonnull eigenvalue. Then {} satisfies a large deviation property in
︁( ( ) )︁
the interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) with rate function () = − log    , for any  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), where   is the
unique real value such that  ( ) = .
      </p>
      <p>
        Under the same hypotheses of the previous theorem we can study the function (). Note
that the function  =  , implicitly defined by  ( ) −  = 0, is well defined and analytic for
 ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ). Thus it is easy to see that
      </p>
      <p>′() = −  ( ) ′ +   +  ′ =  
and hence () is decreasing in (0,  ) and increasing in (, 1), with a unique minimal value
( ) = 0. This also proves that
→0+
lim ′() = −∞
,</p>
      <p>
        →1−
and lim ′() = +∞
and hence () grows vertically for  → 1− and for  → 0+. Moreover, ′′() equals  ′,
which is always positive since   is strictly increasing in (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), and hence () is a convex
function in (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ).
we have   → −∞
applying the same equation (19), we get
      </p>
      <p>
        Finally, let us determine the behaviour () for  → 0+ and for  → 1− . Letting  → 0+,
and, arguing as for equation (19), we obtain ( ) =   + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). Moreover,
      </p>
      <p>
        =  ( )  = (  )  = (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
As a consequence, we can derive the following limit
→0+
lim () = l→im0+ − log(( )) + log( ) +   = log(/ )
( ) =   (  + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )) and   =   + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), which implies
      </p>
      <p>Analogously, let  → 1− . Then   → +∞ and, reasoning as for equation (20), we get
→1−
lim () = l→im1− − log(( )) + log( ) +   = log(/ )
Example 2. As an example, consider the linear representation defined by the weighted
finite automaton of Figure 2 (left hand side). In this case we have 
= 6 and () =
2− 1 (︁ 1 + 3 + √</p>
      <p>
        ︁)
1 + 54 + 92 . By using Mathematica and plotting function () =
(21) and (22).
log 6 +  − log () in the interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), with condition ′() = (), we obtain the graphic
at the right hand side of the same figure. The numerical computation also confirms that the
limit of () for  → 0+ and  → 1− are log 6 and log 2, respectively, as proved by relations
      </p>
      <p>For a qualitative comparison with the rate function associated with the sequence of binomial
r.v.’s {.}, when  = 5/8, such a graphic should be compared with that one of Figure 1.
(21)
(22)</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In conclusion we can see that the qualitative behaviour of rate function () is rather similar to
that one of () discussed in section 2, that is the rate function of the large deviation property

for the sequence {,} of binomial random variables. Note that the interval of validity of the
property is the same. The constant  = ′ , where () takes its minimal value 0, corresponds
to success probability  of ,. Moreover, the limits of () for  → 0+ and  → 1− , i.e.
log(/
) and log(/</p>
      <p>) respectively, correspond to − log(1 − ) and − log .</p>
      <p>
        Natural problems for further investigation concern the case when   or   are null; in this
case Theorem 3 does not apply immediately but one may guess that a similar property holds. In
particular a natural question is whether in this case the large deviation property still holds in the
interval (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ). Other subjects for further studies are the validity of large deviation properties for
non-primitive rational models, in particular for models consisting of two or more components,
like those studied in [9, 18] and in [19].
      </p>
      <p>(, 3-)
() = log 6 +   − log ( )
1.5
1.0</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>
        This work has been supported by Università degli Studi di Milano, under project PSR 2021.
series in commuting variables. Proc. 8th DLT Conference, Lecture Notes in Computer Science
vol. 3340, pp. 114-126, Springer, 2004.
[9] D. de Falco, M. Goldwurm, V. Lonati. Frequency of symbol occurrences in bicomponent
stochastic models. Theoret. Comput. Sci., 327 (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ):269–300, 2004.
[10] A. Dembo and O. Zeitouni. Large Deviation Techinques and Applications. Corrected reprint
of the second (1998) edition. Springer–Verlag, 2010.
[11] F. den Hollander. Large Deviations. American Mathematical Society, 2000.
[12] A. Denise and M. Régnier. Rare events and conditional events on random strings. Discrete
      </p>
      <p>Math. Theor. Comput. Sci., 6:191–214, 2004.
[13] A. Denise, O. Roques and M. Termier. Random generation of words of context-free
languages according to the frequency of letters. In D. Gardy, A. Mokkadem (eds), Mathematics
and computer Science, Trends in Mathematics, 113–125, Birkäuser, 2000.
[14] L. Devroye. Universal limit laws for depths in random trees. SIAM Journal on Computing,
28:409-432, 1999.
[15] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Cambridge Univ. Press, 2009.
[16] P. Flajolet, W. Szpankowski and B. Vallée. Hidden word statistics. Journal of the ACM,
53:147–183, 2006.
[17] B.V. Gnedenko. Theory of probability. Gordon and Breach Science Publ., 1997.
[18] M. Goldwurm, J. Lin, M. Vignati. Local limit laws for symbol statistics in bicomponent
rational models. Theoret. Comput. Sci., 970:114051, 2023, https://doi.org/10.1016/j.tcs.2023.114051.
[19] M. Goldwurm and V. Lonati. Pattern statistics and Vandermonde matrices. Theoret. Comput.</p>
      <p>Sci., 356:153–169, 2006.
[20] P. Grabner and M. Rigo. Distribution of additive functions with respect to numeration
systems on regular languages. Theory Comput. Systems, 40:205-223, 2007.
[21] H.-K. Hwang. Large deviations for combinatorial distributions. I: Central limit theorems.</p>
      <p>
        The Annal of Applied Probability, 6(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):297–319, 1996.
[22] P. Jacquet and W. Szpankowski. Analytic Pattern Matching: From DNA to Twitter Cambridge
      </p>
      <p>
        University Press, 2015.
[23] P. Massazza and R. Radicioni. On computing the coeficients of bivariate holonomic formal
series. Theoret. Comput. Sci., 346: 418–438, 2005.
[24] P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. Theoret. Comput. Sci., 287(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ): 593–617,
2002.
[25] M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian
sequence. Algorithmica, 22 (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ):621–649, 1998.
[26] C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationelles en variables
non commutatives. Thèse Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977.
[27] A. Salomaa and M. Soittola. Automata-Theoretic Aspects of Formal Power Series. Springer–
      </p>
      <p>Verlag, 1978.
[28] M.-P. Schützenberger. Finite counting automata. Inform. and Control, 5: 91–107, 1962.
[29] E. Seneta. Non-negative matrices and Markov chains. Springer–Verlag, New York Heidelberg
Berlin, 1981.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Berstel</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Reutenauer</surname>
          </string-name>
          .
          <source>Rational series and their languages</source>
          , Springer-Verlag, New York - Heidelberg - Berlin,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chofrut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldwurm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lonati</surname>
          </string-name>
          .
          <article-title>On the number of occurrences of a symbol in words of regular languages</article-title>
          .
          <source>Theoret. Comput. Sci.</source>
          ,
          <volume>302</volume>
          :
          <fpage>431</fpage>
          -
          <lpage>456</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chofrut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldwurm</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lonati</surname>
          </string-name>
          .
          <article-title>Local limit properties for pattern statistics and rational models</article-title>
          .
          <source>Theory Comput. Systems</source>
          ,
          <volume>39</volume>
          :
          <fpage>209</fpage>
          -
          <lpage>235</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Massazza</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Radicioni</surname>
          </string-name>
          .
          <article-title>Random generation of words in regular languages with fixed occurrences of symbols</article-title>
          .
          <source>Proc. WORDS'03</source>
          ,
          <string-name>
            <given-names>Fourth</given-names>
            <surname>Intern</surname>
          </string-name>
          .
          <source>Conf. on Combinatorics of Words</source>
          ,
          <source>Turku (Finland)</source>
          ,
          <fpage>332</fpage>
          -
          <lpage>343</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bourdon</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Régnier</surname>
          </string-name>
          .
          <article-title>Large deviation properties for patterns</article-title>
          .
          <source>J. of Discrete Algorithms</source>
          ,
          <volume>24</volume>
          :
          <fpage>2</fpage>
          -
          <lpage>11</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Broda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Machiavelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Moreira</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Reis</surname>
          </string-name>
          .
          <article-title>A hitchhiker's guide to descriptional complexity through analytic combinatorics</article-title>
          .
          <source>Theoret. Comput. Sci.</source>
          ,
          <volume>528</volume>
          :
          <fpage>85</fpage>
          -
          <lpage>100</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Broutin</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Devroye</surname>
          </string-name>
          .
          <article-title>Large deviations for the weighted height of an extended class of trees</article-title>
          .
          <source>Algorithmica</source>
          ,
          <volume>46</volume>
          :
          <fpage>271</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chofrut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldwurm</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lonati</surname>
          </string-name>
          .
          <article-title>On the maximum coeficients of rational formal</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>