1. Introduction

On the Pseudorandomness of Parry-Bertrand Automatic Sequences

Pierre Popoli

Manon Stipulanti

0 0 Departement of Mathematics , ULiège, Liège , Belgium

The correlation measure is a testimony of the pseudorandomness of a sequence s and provides information about the independence of some parts of s and their shifts. Combined with the well-distribution measure, a sequence possesses good pseudorandomness properties if both measures are relatively small. In combinatorics on words, the famous -automatic sequences are quite far from being pseudorandom, as they have small factor complexity on the one hand and large well-distribution and correlation measures on the other. This paper investigates the pseudorandomness of a specific family of morphic sequences, including classical -automatic sequences. In particular, we show that such sequences have large evenorder correlation measures; hence, they are not pseudorandom. We also show that even- and odd-order correlation measures behave diferently when considering some simple morphic sequences.

eol>Combinatorics on words Correlation measures Abstract numeration systems Morphic sequences Automatic sequences

1. Introduction

Pseudorandom sequences are generated by algorithms and behave similarly to truly random sequences. To study the pseudorandomness of a sequence, a wide variety of measures have been introduced, called measures of complexity or measures of pseudorandomness. For instance, from a purely theoretical point of view, the famous Kolmogorov complexity of a sequence gives the length of the shortest program that generates it [1]; see also [2, 3]. An infinite sequence is said to be algorithmically random if its Kolmogorov complexity grows like the length of the sequence itself. It is well known that Kolmogorov complexity is not a computable function; for example, see [4] for a formal proof and history behind.

As a consequence, many other measures of complexity can be used, especially for practical issues. The series of papers [5, 6], initiated by Mauduit and Sárközy in 1997, introduces and studies several such measures. Among the most important ones, the well-distribution measure and the correlation measure are considered in various contexts. Let s = (s())≥ 0 be a sequence over the alphabet {0, 1}. For ∈ Z and , ∈ N, write (s, , , ) = ∑︀=− 0 1(− 1)s(+) and for ∈ N≥ 1, = (1, . . . , ) ∈ N such that 0 ≤ 1 < 2 < · · · < , write − 1 (s, , ) = ∑︁ (− 1)s(+1)+s(+2)· +s(+).

=0 The Nth well-distribution measure of the sequence s is defined by (s, ) = m,a,x | (s, , , )| = max ,, ⃒⃒⃒ =0 ⃒⃒⃒ ∑−︁1(− 1)s(+)⃒⃒⃒⃒ , ⃒⃒ where the maximum is taken over all , , ∈ N such that 0 ≤ ≤ + ( − 1) < , and the th correlation measure of order of s is given by ⃒⃒ ∑−︁1(− 1)s(+1)+s(+2)· +s(+)⃒⃒⃒ , (s, ) = ma,x | (s, , )| = ma,x ⃒⃒⃒ =0 ⃒⃒ where the maximum is taken over all vectors = (1, . . . , ) ∈ N and all integers such that 0 ≤ 1 < 2 < · · · < and + ≤ .

It is said that the sequence s possesses good properties of pseudorandomness if both its measures (s, ) and (s, ) are relatively small compared to . Indeed, Alon et al. [7], improving a former result of Cassaigne, Mauduit, and Sárközy [8], prove that the expected order of (s, ) and (s, ) when s is a truly random binary sequence is √ (log )( 1 ). Furthermore, it is crucial to simultaneously consider the two measures, since the smallness of one does not necessarily imply that of the other; see [5] for more details. However, it is suficient to show that the correlation measure of a sequence is large to tell it apart from a pseudorandom sequence. Notice that the combination of the well-distribution and the correlation measures gives birth to the well-distribution-correlation measure, which is however more complicated to study.

Usually, the methods to analyze correlation measures are borrowed from number theory; see [9, 10] and the references therein for general results. However, in this paper, we study a large family of sequences through the lens of their correlation measures by mixing tools from automata theory and general numeration systems. Such a family is that of automatic sequences; see Section 2 for precise definitions. In combinatorics on words, they are classical and have many beautiful properties. Among others, they are known to be bad pseudorandom sequences as explained hereafter. First, it is well known that their factor complexity, which counts the number of diferent blocks of each length appearing in the whole sequence, is bounded by a linear function and is, therefore, small. On the other hand, over an alphabet of size ℓ, the th term of the factor complexity of a random sequence is of order ℓ; see [11, Chapter 10] for more details. Second, Mérai and Winterhof [12] prove that the order-2 correlation of an automatic sequence is large, as stated below.

Theorem 1 ([12, Theorem 7]). Let ≥ 2 be an integer and let s be a -automatic binary sequence generated by a DFAO with states. Then 2(s, ) ≥ (+1) for all ≥ ( + 1).

The focus on correlation measures of automatic sequences dates back to the previous century. In the late 1920s, Mahler [13] analyzes the sequence ((− 1)s2())≥ 0, where s2 : N → N is the sum-of-digit function in base 2. In particular, he considers, for any non-negative integer , the quantity () = 1 (s2, , ) = 1 ∑−︁1(− 1)s2()+s2(+) =0 where = (0, ), and shows that the sequence ( ())≥ 0 converges to some real number (). It turns out that this quantity is related to the order-2 correlation measure of the 2automatic Thue–Morse sequence t = (t())≥ 0 as it is defined by t() = s2() mod 2 for all ≥ 0. We also refer to [14] for more historical background on t and specific values of the quantity (). Later on, Mauduit and Sárközy [6] prove that 2(t, ) ≥ 112 for ≥ 5. However, when Theorem 1 is applied to t, the lower bound is improved as 2(t, ) ≥ 16 for ≥ 6; see [12] for more details. In a similar context, Baake and Coons [15] recently study the correlations of t with specific values of the shifted vector , as well as their means with renormalization techniques. Mazáč [16] carries out an analogous approach for the correlations of the Rudin–Shapiro sequence, another distinguished automatic sequence. We also mention the work of Grant, Shallit, and Stoll [17] (and papers following this trend), where correlations over alphabets of size more than 2 are considered.

It is known that automatic sequences are morphic; see [11, Chapter 6] and Section 2 for precise definitions. In this more general framework, the factor complexity is bounded by a quadratic function, which, in turn, again shows that they are far from being pseudorandom; see [11, Chapter 10]. Consequently, our study is driven by the natural question to know whether the order-2 correlation measure of any morphic sequence is large. As a first step towards answering this question, we presently consider a subfamily of morphic sequences based on Parry–Bertrand numeration systems (see Section 2 for precise definitions). In the theory of numeration systems, they have gained the attention of the scientific community for the nice framework they provide and have especially been used in [18, 19, 20, 21, 22].

Our main result is Theorem 2 below (see again Section 2 for precise definitions). It improves Theorem 1 in at least two ways. First, we consider a much larger class of sequences as classical automatic sequences are -automatic for some integer . Second, we deal with all even-order correlations, whereas only the case of order 2 is considered in [12]. Notice that we also discuss the case of odd-order correlations for which such a theorem fails to hold; see Section 3.1. Note that, for two maps , such that takes only non-negative values, we write ≫ (resp., ≪ ) if there exists a positive constant such that | ()| ≥ () (resp., | ()| ≤ ()) for all suficiently large .

Theorem 2. Let ≥ 1 be an integer, ∈ R>1 be a Parry number and be the corresponding canonical Parry–Bertrand numeration. For all binary -automatic sequences s, we have 2(s, ) ≫ , where the constant depends on , , and the size of the automaton generating s.

The present paper is organized as follows. In Section 2, we recall the necessary background on combinatorics on words, automatic sequences, and numeration systems. Section 3 is divided into two parts. In Section 3.1, we deal with the case of linearly recurrent sequences. In particular, we highlight the fact that even- and odd-order correlation measures behave diferently, which will imply that there is no hope of extending Theorem 2 to odd-order correlations. In Section 3.2, we show with Theorem 2 that our specific morphic sequences have a large correlation measure of even orders. Finally, Section 4 presents possible future work.

2. Background 2.1. On Combinatorics on Words

As a general reference on words, we point out [23]. An alphabet is a finite set of elements called letters. A word on an alphabet is a sequence of letters from . It is either finite or infinite . In order to distinguish them, infinite words are written in bold. The length of a finite word, denoted between vertical bars, is the number of letters it is made of. The empty word is the only 0-length word. For all ≥ 0, we let denote the set of all length- words over . We let * denote the set of finite words over , including the empty word, and or N that of infinite words over . The first is equipped with the concatenation of words. Let and be finite alphabets. A morphism : * → * is a map satisfying () = () () for all , ∈ * . In particular, () = . For an integer ℓ ≥ 2, a morphism is ℓ-uniform if it maps each letter to a length-ℓ word. A 1-uniform morphism is called a coding. An infinite word x is morphic if there exist a morphism : * → * , a coding : * → * , and a letter ∈ such that x = ( ()), where () = lim→∞ ().

2.2. On Generalized Automatic Sequences

Abstract numeration systems were introduced at the beginning of the century by Lecomte and Rigo [24]; see also [25, Chapter 3] for a general presentation. Such a numeration system is defined by a triple = (, , <) where is an alphabet ordered by the total order < and is an infinite regular language over , i.e., accepted by a deterministic finite automaton. We say that is the numeration language of . When we genealogically (i.e., first by length, then using the dictionary order) order the words of , we obtain a one-to-one correspondence rep between N and . Then, the -representation of the non-negative integer is the ( + 1)st word of , and the inverse map, called the (e)valuation map, is denoted by val . Example 1. Consider the abstract numeration system built on the language * * over the ordered alphabet {, } with < . The first few words in the language are , , , , , , . For instance, rep ( 5 ) = and val () = 6.

Let = (, , <) be an abstract numeration system. An infinite word x is -automatic if there exists a deterministic finite automaton with output (DFAO) such that, for all ≥ 0, the th term x() of x is given by the output (rep ()) of . See [11, 26, 27] for general references.

2.3. Representation of Real Numbers

We now recall several definitions and results about representations of real numbers; see, for instance, [25, 28]. Let ∈ R>1 and let = {0, 1, . . . , ⌈ ⌉ − 1}. Every real number ∈ [0, 1) can be written as a series = ∑︀+=∞1 − , where ∈ for all ≥ 1. We let () denote the -expansion of obtained greedily. In an analogous way, the -expansion ( 1 ) of 1 is ( 1 ) = lim→1 (). We define the quasi-greedy -expansion * ( 1 ) of 1 as follows. Write ( 1 ) = (())≥ 1. If ( 1 ) = ( 1 ) · · · ()0 with () ̸= 0, then * ( 1 ) = (( 1 ) · · · ( − 1)(() − 1)); otherwise * ( 1 ) = ( 1 ). We define the -shift as the topological closure of the set = { () | ∈ [0, 1)}. A real number > 1 is a Parry number if ( 1 ) is ultimately periodic.

Example 2. Any integer > 1 is a Parry number. The golden ratio is a Parry number for which ( 1 ) = 11 and * ( 1 ) = (10) (since 1 = 1/ + 1/ 2). The square 2 of the golden ratio is also a Parry number with 2 ( 1 ) = 21 = * 2 ( 1 ).

Parry’s theorem [29] notably provides a purely combinatorial condition to check if an infinite sequence is in by comparing this sequence to * ( 1 ) using the lexicographic order. Proposition 1 ([28, Corollaries 2.71 and 2.72]). Let ∈ R>1 be a Parry number. Write ( 1 ) = ( 1 ) · · · ()(( + 1) · · · ( + )) (where , ≥ 0 are taken to be minimal). Then an infinite word belongs to if and only if it is the label of a path in the automaton = (, 0, , , ), where = {0, . . . , +− 1} and where the transition function is defined as (− 1, ) = 0, for all 1 ≤ < + and all 0 ≤ < (); (− 1, ()) = , for all 1 ≤ < + ; (+− 1, ( + )) = .

Every Parry number is canonically associated with a linear positional numeration system. We now expose this framework. Let = ( ())≥ 0 be an increasing sequence of integers with (0) = 1. Any integer can be decomposed uniquely in a greedy way as = ∑︀ =0 () with non-negative integer coeficients . The finite word rep () = · · · 0 is called the (greedy) -representation of . By convention, the representation of 0 is the empty word . The ifniteness of the digit-set for greedy representations is implied by the boundedness of the sequence sup≥ 0 ( + 1)/ (). A sequence satisfying all the above conditions defines a positional numeration systems. The numeration language is the set = {rep () | ≥ 0}. For any · · · 0 ∈ N* , we let val ( · · · 0) denote the integer ∑︀ =0 (). In addition, a positional numeration system is linear if satisfies a linear recurrence relation. Example 3. The Fibonacci sequence = ( ())≥ 0 with initial conditions (0) = 1 and ( 1 ) = 2 and ( + 2) = ( + 1) + () for all ≥ 0 gives rise to the famous Zeckendorf or Fibonacci linear positional numeration system [30]. The numeration language is = {} ∪ 1{0, 01}* .

Let ∈ R>1 be a Parry number. We define a particular linear positional numeration system = ( ())≥ 0 associated with as follows. Write ( 1 ) = ( 1 ) · · · ()(( + 1) · · · ( + )) (, are minimal and might be zero). We define (0) = 1, () = ( 1 ) ( − 1) + · · · + () (0) + 1 for all ∈ {1, . . . , + − 1} and () = ( 1 ) ( − 1) + · · · + ( + ) ( − − ) + ( − )

− ( 1 ) ( − − 1) − · · · − () ( − − ) for all ≥ + . We set = for the sake of readability.

Example 4. We resume Example 2. For the golden ratio, we obtain = 2, so (0) = 1, ( 1 ) = 1 · (0) + 1 = 2 and () = 1 · ( − 1) + 1 · ( − 2) for all ≥ 2. We find back the Zeckendorf numeration system of Example 3. For its square, we have = 1 = , so 2 (0) = 1, 2 ( 1 ) = 2 · 2 (0) + 1 = 3 and 2 () = 3 · 2 ( − 1) − 1 · 2 ( − 2) for all ≥ 2. Its first few elements are thus 1, 3, 8, 21, 55, the even-indexed Fibonacci numbers.

The numeration system has two useful properties. First, it possesses the Bertrand property, i.e., for all ∈ + , ∈ ⇔ 0 ∈ ; see [31] for the original version and [18] for the recently corrected one. In that case, any word in the set 0* of all -representations with leading zeroes is the label of a path in the automaton from Proposition 1 [18, Theorem 2]. Second, it satisfies the dominant root condition, i.e., there is some complex number such that () lim →+∞ = ; ( 1 ) see, for instance, [18, Theorem 2]. In the following, we will consider Parry–Bertrand automatic sequences, which are defined as -automatic for the abstract numeration system built on the language of for some Parry number . For the sake of conciseness, we call them automatic. Notice that when is an integer , the corresponding sequences are usually called -automatic.

Example 5. In the Zeckendorf numeration system from Example 3, if we write rep () = ℓ · · · 0 for each non-negative integer , then the sum-of-digit function s : N → N is defined by s () = ∑︀ℓ

=0 . When modded out by 2, it is -automatic (or Fibonacci-automatic) as the DFAO in Fig. 1 produces it. Note that it is [32, A095076].

3. Main Results

As stated in the introduction, the main purpose of this paper resides in Theorem 2, which extends Theorem 1 to a larger family of morphic sequences by considering all even-order correlations. Such a result will also imply that this family of morphic sequences cannot be qualified as pseudorandom, such as classical automatic sequences, from the point of view of correlations.

3.1. Linearly Recurrent Sequences

Before developing the arguments for Theorem 2, we first deal with so-called linearly recurrent sequences. A sequence s is recurrent if any factor of s occurs infinitely often in s. It is moreover uniformly recurrent if the distance between two consecutive occurrences of any factor in s is bounded by a constant that depends only on . For instance, the Thue-Morse sequence is uniformly recurrent. We say that s is linearly recurrent if the gap between two consecutive occurrences of any length- factor in s is bounded by · , where is a constant. See [33] for more examples.

Theorem 3. Let s be a linearly recurrent sequence. For all ≥ 1, we have 2(s, ) ≫ .

Proof. Let > 0 be such that two consecutive occurrences of any factor of s being at pos=itio(n0s),(1)sa.t.i.sfie(s⌊ |+ 1− ⌋ −|<1)o|f|s..LeBtyas≥ sum1bpetiosunfic,itehnetrlye leaxrigsetsan d≤ cons⌊ide+r1t⌋hesupcrhefixthat () = ( + ) for all ∈ {0, . . . , ⌊ + 1 ⌋ − 1} and + ⌊ + 1 ⌋ − 1 ≤ . Therefore, for = (0, ) and = ⌊ +1 ⌋, we have ∑︀=− 01(− 1)()+(+) = ∑︀− 1

=0 (− 1)2() = . This proves that 2(s, ) ≫ .

The case of even-order of correlations can be treated similarly by considering 2 consecutive occurrences of the prefix = (0)( 1 ) . . . ( 21 ⌊ + 1 ⌋ − 1), ≤ 2 ⌊ + 1 ⌋ such that () = ( + ) for all ≥ 0 and ∈ {0, . . . , ⌊ +1 ⌋ − 1}, and = (0, , . . . , (2 − 1)). Remark 1. Surprisingly, the case of odd-order correlations cannot be treated similarly. To highlight this, we consider the case of periodic sequences, defined as the infinite repetition = · · · of a (finite) pattern and which are automatic in any numeration system. More specifically, consider the simple periodic sequence s = (01). For all ≥ 1 and all ≥ 1, we have 2+1(s, ) = 1 since, for = (1, . . . , ), | (s, , )| equals 1 if is odd, 0 otherwise. Actually, this simple observation explains why the statement of our main result does not hold for odd-order correlations.

Note that there exist automatic sequences that are not uniformly recurrent (and therefore not linearly recurrent). For instance, consider the 3-automatic sequence ca = · · · defined as the fixed point of the uniform morphism ↦→ , ↦→ (sometimes referred to as the Cantor sequence; see, for instance, [34, Section 2.4.6]) and for which consecutive occurrences of the factor are separated by larger and larger gaps. Therefore, Theorem 2 covers a larger class of sequences than Theorem 3.

3.2. Proof of Theorem 2

Now we attack the proof of Theorem 2 for which we need some additional preparation. First, we count the number of words of each length in the numeration language with leading zeroes, since this language will appear later on.

Lemma 1. Let ∈ R>1 be a Parry number and consider the canonical Parry–Bertrand numeration, whose numeration language is . For all ≥ 0, the number of length- words in 0* is ().

Proof. First, for all ≥ 1, the length- words in correspond to the integers in [ ( − 1), ()), so there are () − ( − 1) of them. The number of length- words in 0* is thus equal to ︃( − 1 ∑︁ () − ( − 1) =1 Therefore, the lemma is proved. )︃ + 1 = ().

Since automata producing Parry–Bertrand automatic sequences recognize the language , we build up one from that of Proposition 1.

Lemma 2. Let ∈ R>1 be a Parry number. Then there exists a deterministic finite automaton recognizing the language .

Proof. Let = (, 0, , , ) be the DFA recognizing the language 0* from Proposition 1 with state-set = {0, . . . , +− 1}. The automaton ′ = (′, ′0, , ′, ′) defined as follows recognizes . We set ′ = {′0} ∪ where ′0 is a new state. We replace the initial state 0 of by ′0. The new transition function ′ is defined as ′(′0, ) = (0, ) for all non-zero letter , and ′(, ) = (, ) for all ∈ {0, . . . , + − 1} and all letter .

It is well known due to Cobham that -automatic sequences are characterized by -uniform morphisms; see [35] or [11, Theorem 6.3.2]. We have the following generalization for automatic sequences.

Theorem 4 ([26]). A word is morphic if and only if it is -automatic for some abstract numeration system .

The proof of the above theorem is constructive: given the morphisms producing the word s, one can build an abstract numeration system and a DFAO generating s, and vice versa. As our proof of Theorem 2 relies on this construction, we develop it now.

Fix a Parry number ∈ R>1 and consider the canonical Parry–Bertrand numeration . For the sake of conciseness, let be the numeration language and let denote the (minimal) alphabet over which is defined. Order the letters of as 0 < 1 < · · · < . Consider also a binary -automatic sequence s. Let = (, 0, , , ) be the DFA accepting as in Lemma 2 and let ℬ = (, 0, , ℬ, : → {0, 1}) be a DFAO generating the sequence s.

We explicitly give a pair of morphisms producing s. Define the Cartesian product automaton = × ℬ , which imitates the behavior of on the first component and ℬ on the second. Its set of states is × , the initial state is (0, 0), and the alphabet is . Its transition function Δ : ( × ) × * → × is defined by Δ((, ), ) = ( (, ), ℬ(, )) for all ∈ , ∈ and ∈ * . In particular, Δ((, ), ) ∈ × if and only if belongs to . Furthermore, if Δ((0, 0), rep ()) = (, ), then s() = (). Now let be a symbol not belonging to × . Define the morphism : ( × ∪ { })* → ( × ∪ { })* by ( ) = (0, 0) and, for all ∈ and ∈ , ((, )) = Δ((, ), 0) · · ·

Δ((, ), ) = ∏︁ Δ((, ), ), =0 where, if Δ((, ), ) is not defined for some , then it is replaced by (note that the product has to be understood as the concatenation). Observe that is not necessarily uniform.

Order the words in genealogically as 0 < 1 < · · · . Define u = ( ). Then, by [28, Lemma 2.25], the shifted sequence of u is the sequence of states reached in by the words in ordered genealogically, i.e., for all ∈ N, the ( + 1)st letter of u is equal to Δ((0, 0), ). Now define the morphism : ( × ∪ { })* → {0, 1}* by ( ) = and, for all ∈ and ∈ , ((, )) = (). Note that is non-erasing on × . By construction, we ihnavEexasm=pl e 6 (buelo)w=. ( ( )). To help the reader with the construction, we illustrate it

We now give the proof of Theorem 2. In particular, one side feature of the proof is a characterization of the distance between the repeating part in the sequence under study. Proof of Theorem 2. Let denote the numeration language and let be the (minimal) alphabet over which is defined. Let = (, 0, , , ) be the DFA accepting as in Lemma 2, ℬ = (, 0, , ℬ, : → {0, 1}) be a DFAO generating the sequence s, = × ℬ be the Cartesian product automaton, with transition function Δ : ( × ) × * → × , and be a symbol not belonging to × . Consider the morphisms : ( × ∪ { })* → ( × ∪ { })* and : ( × ∪ { })* → {0, 1}* as defined above and write u = ( ).

Let , ≥ 1 where is chosen later accordingly to . Consider two distinct words , such that Δ((0, 0), ) = Δ((0, 0), ). Then, for all length- words ∈ 0* , the states Δ((0, 0), ) and Δ((0, 0), ) are equal. The word (Δ((0, 0), )) = (Δ((0, 0), )) is made of the sequence of such states. Indeed, for ∈ {, }, we have (Δ((0, 0), )) =

Δ((0, 0), ).

∏︁ ∈0* ||= Applying the morphism then gives back the original sequence s. As is non-erasing on × , | (Δ((0, 0), ))| = ( ) by Lemma 1.

Order 2. Firstly, we study the case of the order-2 correlation, for the sake of simplicity and since the case of even-order correlations will follow the same lines. Consider two distinct words , ∈ such that Δ((0, 0), ) = Δ((0, 0), ). By the pigeonhole principle, we may choose , among the first || · | | words of . Without loss of generality, we may assume that is lexicographically smaller than and that is non-empty (recall Lemma 2). Then in the sequence s we have two identical blocks of length ( ) starting at positions , respectively associated with the word 0 and 0 in , i.e., = val (0 ) for ∈ {, }. The situation is depicted in Fig. 2. Therefore, we have s( + ) = s( + ) ( 2 ) for all 0 ≤ < ( ). We now make several observations. First, notice that the largest index in Eq. ( 2 ) is smaller than + ( ) by assumption on , . Then, note that, for ∈ {, }, we have = val (0 ) and val () ≤ | | · | | by choice of .

Let ≥ 1 be suficiently large. Choose ≥ 1 such that + ( ) ≤ < + ( +1) and set = (, ). Therefore, using Eq. ( 2 ), we have (− 1)s(+)+s(+) = ( ).

On the other hand, we have ( ) = Θ( ) by Eq. ( 1 ), which yields

+ ( + 1) ≤ ⌈ ⌉+1 · | | · | | + ( + 1) ≪ +1 for some non-zero constant depending on and the sequence s. Then Eqs. ( 3 ) and ( 4 ) lead to ( 3 ) ( 4 ) 2(s, ) ≥ ( ) ≫ ≫ + ( + 1) ≫ by choice of . This concludes the case of the order-2 correlation.

u u s

( )

Even orders. The general case of even-order correlations can be treated similarly. Indeed, let ≥ 1. Consider 2 pairwise distinct words 1, . . . , 2 ∈ such that Δ((0, 0), ) = Δ((0, 0), ) for all , ∈ {1, . . . , 2}. By the pigeonhole principle, we may choose them among the first (2 − 1) · | | · | | words of . Without loss of generality, we may assume that they are increasingly nested lexicographically speaking (and non-empty). Then in the sequence s we have 2 identical blocks of length ( ) starting at positions = val (0 ) for ∈ {1, . . . , 2}. Therefore, we have s( + ) = s( + ) for all 1 ≤ ≤ ≤ 2 and 0 ≤ < ( ). We now finish up the proof by following the same argument as in the previous case.

4. Discussion and Future Work

We can actually obtain a lower bound on correlations using the factor complexity, i.e., the map counting the number of diferent factors of each length within a given sequence. Consider a sequence s that is not ultimately periodic. Let s : N → N denote its factor complexity. Let , be integers with ≥ 1. Then, using Morse and Hedlund’s famous result [36], one can show that, within the prefix of length s( ) of s, there exists a length- factor that occurs at least times. Therefore, we obtain the lower bound 2(s, 2s( )) ≥ . In particular, for Parry–Bertrand automatic sequences, [20, Theorem 3.4] allows to obtain the behavior emphasized in Theorem 2.

However, the aforementioned lower bound does not reflect the expected behavior of when the studied sequence is random, or in some other simple cases. For instance, for the binary Champernowne sequence ch = 011011100101110111 · · · , we have 2(ch, ) > 418 (see [6, Theorem 1]), whereas ch( ) = 2 for all suficiently large .

Furthermore, Theorem 2 extends to numeration systems that are not Parry-Bertrand. In fact, scrutinizing its proof, we need a good control on the proportion of letters erased under the morphism . Indeed, the proportion of non-erased letters is precisely the length of the block that is represented in Fig. 2. So, in turn, a good knowledge on the growth rate of the numeration numeration language, i.e., the number of words of each length within the language, is required. In the next example, using the method presented in the proof of Theorem 2, we exhibit an automatic sequence having a quadratic factor complexity but a linear order-2 correlation. Example 6. Consider the linear positional numeration system = ( ())≥ 0 defined by (0) = 1 and ( + 1) = 3 () + 1 for all ≥ 0. It has several useful properties: its numeration language is given by the DFA in Fig. 3 and is Bertrand but not Parry; see [20, Example 2 and Lemma 2.5], as well as [37, page 131]. We also have () = Θ( ) where = 12 (3 + √13) ≈ 3.30278. Consider the fixed point x of the morphism ↦→ , ↦→ and the coding : ↦→ 0, ↦→ 1. The sequence (x) is produced by the DFAO in Fig. 3. It is binary and -automatic. In addition, its factor complexity is known to be quadratic (see [20, Theorem 3.3]).

Using the notation of Section 3.2, the morphism is defined by ↦→ , ↦→ , ↦→ , ↦→ . Note that it is not uniform. We tgoet t he (seq)ue=nce ,,,,,,,,. ..ofsta t·· e·s ,rewachhoesde bshyifttheindweoerddscoorfrespond=s {, 1, 2, 3, 10, 11, 12, 13, 20, 21, 22, 23, 30, 100, 101, 102, . . .} in the product automaton of Fig. 3. The morphism is defined by ↦→ , , ↦→ 0, and ↦→ 1. Applying to ( ) gives the sequence (x). To illustrate the method of the proof of Theorem 2, we can consider = 10 and = 100 for which (, ) = = (, ). Now we also conclude that 2( (x), ) ≫ .

As a consequence, we raise the following more general and natural question: Is any morphic sequence not pseudorandom as far as the correlation measures is concerned, i.e., is the correlation of any morphic sequence large or small? In the near future, we hope to revisit the method of the proof of Theorem 2 to extend it to a larger class of morphic sequences. In particular, as a ifrst step, we will need to find the appropriate conditions to describe this particular class. On the other hand, as shown in [12], a lower bound on the state complexity of binary sequences in terms of the order-2 correlation measure can be straightforwardly derived from Theorem 1. In our setting, can a refinement of the proof of Theorem 2, involving only the number of states of some automata, lead to similar bounds?

Acknowledgments

We thank the reviewers for providing useful remarks that improved our initial work, especially for thinking of the alternative proof mentioned at the beginning of Section 4, which encourages us to look for extensions of our Theorem 2 to a broader family of sequences. We also thank France Gheeraert and Michel Rigo for insightful discussions.

Pierre Popoli’s research is supported by ULiège’s Special Funds for Research, IPD-STEMA Program. Manon Stipulanti is an FNRS Research Associate supported by the Research grant 1.C.104.24F. [9] K. Gyarmati, On a pseudorandom property of binary sequences, The Ramanujan Journal 8 (2004) 289–302. doi:10.1007/s11139-004-0139-z. [10] K. Gyarmati, C. Mauduit, On the correlation of binary sequences, ii, Discrete Mathematics 312 (2012) 811–818. doi:https://doi.org/10.1016/j.disc.2011.09.013. [11] J.-P. Allouche, J. Shallit, Automatic sequences: Theory, applications, generalizations,

Cambridge University Press, Cambridge, 2003. [12] L. Mérai, A. Winterhof, On the pseudorandomness of automatic sequences, Cryptogr.

Commun. 10 (2018) 1013–1022. doi:10.1007/s12095-017-0260-7. [13] K. Mahler, The spectrum of an array and its application to the study of the translation properties of a simple class of arithmetical functions: Part two on the translation properties of a simple class of arithmetical functions, Journal of Mathematics and Physics 6 (1927) 158–163. doi:https://doi.org/10.1002/sapm192761158. [14] C. Mauduit, Multiplicative properties of the Thue-Morse sequence, Period. Math. Hungar.

43 (2001) 137–153. doi:10.1023/A:1015241900975. [15] M. Baake, M. Coons, Correlations of the Thue—Morse sequence, Indagationes Mathematicae (2023). doi:https://doi.org/10.1016/j.indag.2023.02.001. [16] J. Mazáč, Correlation functions of the Rudin–Shapiro sequence, Indagationes Mathematicae (2023). doi:https://doi.org/10.1016/j.indag.2023.03.003. [17] E. Grant, J. Shallit, T. Stoll, Bounds for the discrete correlation of infinite sequences on symbols and generalized Rudin-Shapiro sequences, Acta Arith. 140 (2009) 345–368. doi:10.4064/aa140-4-5. [18] E. Charlier, C. Cisternino, M. Stipulanti, Robustness of Pisot-regular sequences, Adv. in

Appl. Math. 125 (2021) Paper No. 102151, 22. doi:10.1016/j.aam.2020.102151. [19] E. Charlier, C. Cisternino, M. Stipulanti, A full characterization of Bertrand numeration systems, in: Developments in language theory, volume 13257 of Lecture Notes in Comput.

Sci., Springer, Cham, 2022, pp. 102–114. doi:10.1007/978-3-031-05578-2\_8. [20] A. Massuir, J. Peltomäki, M. Rigo, Automatic sequences based on Parry or Bertrand numeration systems, Adv. in Appl. Math. 108 (2019) 11–30. doi:10.1016/j.aam.2019. 03.003. [21] F. Point, On decidable extensions of Presburger arithmetic: from A. Bertrand numeration systems to Pisot numbers, J. Symbolic Logic 65 (2000) 1347–1374. doi:10.2307/2586704. [22] M. Stipulanti, Convergence of Pascal-like triangles in Parry-Bertrand numeration systems,

Theoret. Comput. Sci. 758 (2019) 42–60. doi:10.1016/j.tcs.2018.08.003. [23] M. Lothaire, Combinatorics on words, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 1997. doi:10.1017/CBO9780511566097. [24] P. B. A. Lecomte, M. Rigo, Numeration systems on a regular language, Theory Comput.

Syst. 34 (2001) 27–44. doi:10.1007/s002240010014. [25] V. Berthé, M. Rigo (Eds.), Combinatorics, Automata, and Number Theory, volume 135 of Encyclopedia of Mathematics and its Applications, Cambridge University Press, 2010. doi:10.1017/CBO9780511777653. [26] M. Rigo, A. Maes, More on generalized automatic sequences, Journal of Automata,

Languages, and Combinatorics 7 (2002) 351–376. doi:10.25596/jalc-2002-351. [27] J. Shallit, A generalization of automatic sequences, Theoret. Comput. Sci. 61 (1988) 1–16.

doi:10.1016/0304-3975(88)90103-X. [28] M. Rigo, Formal languages, automata and numeration systems. 2, Networks and Telecommunications Series, ISTE, London; John Wiley & Sons, Inc., Hoboken, NJ, 2014. Applications to recognizability and decidability. [29] W. Parry, On the -expansions of real numbers, Acta Math. Acad. Sci. Hungar. 11 (1960) 401–416. doi:10.1007/BF02020954. [30] E. Zeckendorf, Représentation des nombres naturels par une somme de nombres de

Fibonacci ou de nombres de Lucas, Bull. Soc. Roy. Sci. Liège 41 (1972) 179–182. [31] A. Bertrand-Mathis, Comment écrire les nombres entiers dans une base qui n’est pas entière, Acta Math. Hungar. 54 (1989) 237–241. doi:10.1007/BF01952053. [32] N. J. A. Sloane, et al., The On-Line Encyclopedia of Integer Sequences, 2024. URL: https: //oeis.org. [33] J.-P. Allouche, J. Cassaigne, J. Shallit, L. Q. Zamboni, A taxonomy of morphic sequences, 2019. Preprint available at https://arxiv.org/pdf/1711.10807. [34] J. Shallit, The logical approach to automatic sequences—exploring combinatorics on words with Walnut, volume 482 of London Mathematical Society Lecture Note Series, Cambridge University Press, Cambridge, 2023. [35] A. Cobham, Uniform tag seqences, Mathematical Systems Theory 6 (1972) 164–192.

doi:10.1007/BF01706087. [36] M. Morse, G. A. Hedlund, Symbolic Dynamics, Amer. J. Math. 60 (1938) 815–866. [37] M. I. Hollander, Greedy numeration systems and regularity, Theory Comput. Syst. 31 (1998) 111–133. doi:10.1007/s002240000082.

[1]

A. N.

Kolmogorov , Three approaches to the quantitative definition of information, Internat . J. Comput. Math. 2 ( 1968 ) 157 - 168 . doi: 10 .1080/00207166808803030.

[2]

G. J.

Chaitin , On the length of programs for computing finite binary sequences , J. Assoc. Comput. Mach . 13 ( 1966 ) 547 - 569 . doi: 10 .1145/321356.321363.

[3]

R. J.

Solomonof , A formal theory of inductive inference . I, Information and Control 7 ( 1964 ) 1 - 22 .

[4]

P. M.

Vitányi , How incomputable is kolmogorov complexity? , Entropy 22 ( 2020 ). doi: 10 . 3390/e22040408.

[5]

Mauduit ,

Sárközy , On finite pseudorandom binary sequences. I. Measure of pseudorandomness, the Legendre symbol , Acta Arith . 82 ( 1997 ) 365 - 377 . doi: 10 .4064/ aa-82-4- 365 -377.

[6]

Mauduit ,

Sárközy , On finite pseudorandom binary sequences. II. The Champernowne, Rudin-Shapiro, and Thue-Morse sequences, a further construction , J. Number Theory 73 ( 1998 ) 256 - 276 . doi: 10 .1006/jnth. 1998 . 2286 .

[7]

Alon ,

Kohayakawa ,

Mauduit ,

C. G.

Moreira ,

Rödl , Measures of pseudorandomness for finite sequences: typical values , Proc. Lond. Math. Soc. (3) 95 ( 2007 ) 778 - 812 . doi: 10 .1112/plms/pdm027.

[8]

Cassaigne ,

Mauduit ,

Sárközy , On finite pseudorandom binary sequences. VII. The measures of pseudorandomness , Acta Arith . 103 ( 2002 ) 97 - 118 . doi: 10 .4064/aa103-2-1.