A Gray code for a regular language

              Elena Barcucci                      Antonio Bernini                        Renzo Pinzani
          elena.barcucci@unifi.it              antonio.bernini@unifi.it               renzo.pinzani@unifi.it
                               Dipartimento di Matematica e Informatica “U. Dini”,
                 Università degli Studi di Firenze, Viale G.B. Morgagni 65, 50134 Firenze, Italy.


                                                        Abstract
                       Given a sequence {am }m≥0 of strictly increasing positive integers such
                       that a0 =P1, any non-negative integer N can be uniquely represented
                       by N = i≥0 di ai , where di are non negative integers. The coefficients
                       di form a string over a finite alphabet and all these strings form a
                       language having different properties depending on the sequence. We
                       investigate on the possibility of defining a Gray code over the language
                       arising from particular choices of {am }m≥0 . We consider the sequence
                       defined by a two termed linear recurrence with constant coefficients
                       having some particular properties.


1    Introduction
   Given a class of combinatorial objects, it is a common problem to list them in some specified order. A
famous kind of list is the well-known Gray code [Gra53] where two successive objects differ according to some
fixed constraints. Very often successive objects in a Gray code are required to differ as little as possible,
depending on the nature of the objects we are dealing with. For instance, if the objects are strings (of the
same length) they form a Gray code if two successive strings differ only in a few positions or, more precisely, if
their Hamming distance dH [Ham50] is bounded by some positive integer q (dH = i ≤ q). Gray codes have been
constructed for several combinatorial structures (permutations, binary strings, Motzkin and Schröder words,
derangements, involutions) and used in various technical applications such as circuit testing, signal encoding,
data compression (see [BBPSV14, BBPSV15, BBPV15, BGPP07] and references therein). Even though the most
well-known Gray codes have been studied in the context of the mentioned topics, here we only recall that
recently [BBPSV14, BBPV17] Gray codes have been considered also in the framework of cross-bifix-free sets of
strings (two strings are bifix-free if they can not be overlapped in any way, for details see for example [BPP12]).
Moreover, in [BBBP17a], where cross-bifix-free sets of matrices are considered, the authors propose a Gray code
where the matrices are listed in such a way that any two successive matrices differ in only one digit. Note that
the problem of finding a Gray code for bidimensional structures (matrices) is also considered in [BBBP17b];
nevertheless the approach used in [BBBP17a] doesn’t seem to be not useful.

   In the present paper we consider a language of strings defined in [BR01] over a particular alphabet. These
strings are the representations of integers by means of a system of numeration [Fra82] derived from certain
sequences. Given a strictly increasing sequence of non-negative integers 1 = a0 < a1 < a2 < ..., it is possible to
represent each non-negative integer N by recording the quotient dn obtained by dividing N by the largest member
an of the sequence that is less than or equal to N , then dividing the remainder rn by an−1 and recording the

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: L. Ferrari, M. Vamvakari (eds.): Proceedings of the GASCom 2018 Workshop, Athens, Greece, 18–20 June 2018, published at
http://ceur-ws.org


                                                             87
quotient dn−1 and so on until you get to d0 , since dividing by 1 leaves a remainder of 0. The various quotients
form a string which is an element of the language. In [BR01] sequences are defined by a two termed linear
recurrence depending on two parameters k and h, and the arising language, strictly depending on k and h, is
seen as a combinatorial interpretation of the sequence ai = kai−1 + hai−2 . In this paper we aim at providing a
Gray code for the language derived from particular conditions on k and h.
   We point out that the considered language was introduced in order to provide a general solution to a previous
issue appeared in [BSS93], where the authors asked for a combinatorial interpretation of the recurrence fm+1 =
6fm − fm−1 , with f0 = 1, f1 = 7 (sequence M4423 of [SP96]). After some interesting answers (see [BBDD98,
PP90, Sul98]), in their paper [BR01] the authors gave a general combinatorial interpretation for the recurrences
of the form am = kam−1 + ham−2 , under some conditions on h and k which include the most frequently occurring
two-termed recurrences. We also noted, in the previous paragraph, that this language is strictly linked to the
system of numeration presented in [Fra82]. This is based on a very simple iterated division algorithm which is
the main tool which produces the representation for any non-negative integer N , in terms of any sequence of
the form 1 = u0 < u1 < u2 < . . . . Clearly, the particular choice of the sequence affects the representation
of N , and in some particular cases interesting and useful representations can be obtained. In the references
within [Fra82] several application of different systems of numeration can be found, ranging from compressing
and partitioning large dictionaries, ranking permutations with repetitions, up to designing error-insensitive codes
for data transmission.

   The definition of the Gray code in the present paper could be presented independently of the definition of
numeration systems and without showing which are the links between the considered language and the sequence.
Nevertheless, in order to provide a self-contained paper and for the sake of completeness, we prefer to present
concepts and preliminaries useful for our purpose. Section 2 and Section 3 are devoted to the presentation of
some main notion about numeration systems and properties of the considered recurrence relations. In Section
4 we recall the definition of the language introduced in [BR01] and finally we define a Gray code for listing its
elements.

2    Preliminaries
   Given a sequence {am }m≥0 of positive integers such that a0 = 1 and am < am+1 for each m ∈ N, let N
be any non-negative integer. Consider the largest term an of the sequence such that an ≤ N . More precisely,
an = max{am | am ≤ N } (for the particular case N = 0, see below). We divide N by an obtaining N = dn an +rn .
Obviously, for the remainder rn , it is clear that rn < an . If we divide rn by an−1 , we get rn = dn−1 an−1 + rn−1 ,
with rn−1 < an−1 . Then, iterating this procedure until the division by a0 = 1 (where of course the remainder is
0), we have:
                            N = dn an + rn                              0 ≤ rn < an ,

                              rn   = dn−1 an−1 + rn−1                   0 ≤ rn−1 < an−1 ,

                           rn−1    = dn−2 an−2 + rn−2                   0 ≤ rn−2 < an−2 ,

                             ···   = ············                       ············

                             ···   = ············                       ············

                              r3   = d2 a2 + r2                         0 ≤ r2 < a2 ,

                              r2   = d1 a1 + r1                         0 ≤ r1 < a1 ,

                              r1   = d0 a0 .
    The above relations imply that:
                           N = dn an + dn−1 an−1 + dn−2 an−2 + . . . . . . + d1 a1 + d0 a0 .                      (1)

   Expression (1) is the representation of N in the numeration system S = {a0 , a1 , a2 , . . . . . .}, and the string
dn dn−1 . . . d1 d0 is associated to the number N (in what follows the term “representation” equivalently refers to


                                                          88
the expression (1) or to its associated string). The method presented here can be applied to every non-negative
integer and in the case N = 0, clearly, all the coefficients di are 0 (in other words the representation of 0 is
simply the string 0). Moreover, we have


                              ri = di−1 ai−1 + di−2 ai−2 + . . . . . . + d1 a1 + d0 a0 < ai ,                      (2)

for each i ≥ 0.
                                             Pn
It is possible to show [Fra82] that if N =     i≥0 di ai with


                                  di ai + di−1 ai−1 + . . . + d1 a1 + d0 a0 < ai+1                                 (3)
                                                  Pn
for each i ≥ 0, then the representation N =         i≥0 di ai is unique.   For the sake of completeness, we recall the
complete theorem:

Theorem 2.1. Let 1 = a0 < al < a2 < . . . be any finite or infinite sequence of integers. Any non-negative
                                                                                                  n
                                                                                                  X
integer N has precisely one representation in the system S = {a0 , a1 , a2 , ...} of the form N =   di ai where the
                                                                                                    i≥0
di are non-negative integers satisfying (3).

    As an example, consider the well-known sequence of Pell numbers (sequence M1413 in [SP96]) pm =
1, 2, 5, 12, 29, . . . defined by p0 = 1, p1 = 2, pm = 2pm−1 + pm−2 . The representation of N = 16 is associ-
ated to the string 1020.


3    The language and the alphabet
   The next step is the definition of the language arising from the representations of non-negative numbers.
Given m > 0, we consider all the integers ` ∈ {0, 1, 2, . . . , am − 1}. According to the scheme of the previous
section, the representations of the integers j with am−1 ≤ j < am is j = dm−1 am−1 + dm−2 am−2 + . . . + d0 a0 (so
that the associated string is dm−1 dm−2 . . . d0 ), while, following the same scheme, the remaining integers have a
representation with less than m digits. For example: the representation of am−1 − 1 = dm−2 am−2 + . . . + d0 a0
has m − 1 digits. For our purpose (the construction of a Gray code), we require that all the representations of
the considered integers ` ∈ {0, 1, 2, . . . , am − 1} have m digits, so we pad the string on the left with 0’s until we
have m digits: the representation of am−1 − 1 becomes am−1 − 1 = 0am−1 + dm−2 am−2 + . . . + d0 a0 (therefore,
the associated string is 0dm−2 . . . d0 ).

    With this little adjustment, we now define the following sets:

    L0 = {ε} ,

  Lm = {dm−1 . . . d0 | the string dm−1 . . . d0 is the representation of each ` < am in the
          numeration system {am }m≥0 }.
Finally, we denote by L the language obtained by taking the union of all the sets Lm :

  L = m≥0 Lm .
         S


We remark that each element of Lm has precisely m digits, so that some string dm−1 . . . d0 can admit a prefix
constituted by a certain number of consecutive 0’s. Moreover, each Lm contains precisely am elements (which
are the representations of each ` ∈ {0, 1, . . . , am − 1}).


                                                            89
  Referring to the sequence of Pell numbers pm = {1, 2, 5, 12, 29, . . .} defined in Section 2, we have:

         L0   =   {ε}

         L1   =   {0, 1}

         L2   =   {00, 01, 10, 11, 20}

         L3   =   {000, 001, 010, 011, 020, 100, 101, 110, 111, 120, 200, 201}

         L4   =   {0000, 0001, 0010, 0011, 0020, 0100, 0101, 0110, 0111, 0120, 0200, 0201, 1000, 1001, 1010,

                  1011, 1020, 1100, 1101, 1110, 1111, 1120, 1200, 1201, 2000, 2001, 2010, 2011, 2020}

   The strings in L2 are, respectively, the representations of the integers ` ∈ {0, 1, 2, 3, 4}, being a2 = 5. Note
that L2 contains exactly 5 = a2 elements.

   It is not difficult to realize that the alphabet of the language L strictly depends on the sequence {am }m≥0 .
                                                                                                      Pi−1
In general it is possible to set an upper bound for the digits di . From (3), we deduce di ai < ai+1 − j=0 dj aj ,
so that, since the numbers are all integers:
                                                                  i−1
                                                                  X
                                         di ai ≤ ai+1 − 1 −             dj aj < ai+1 − 1 ,
                                                                  j=0

leading to
                                                                          
                                                           ai+1 − 1
                                                      di ≤                     .                                            (4)
                                                              ai
                                                                                                                 
                                                                                                        ai+1 − 1
  Therefore, the alphabet for Lm is given by {0, 1, . . . , s} with s =                max                              ,
                                                                                   i=0,1,...,m−1           ai
and, denoting by Σ the alphabet for L , we have Σ = {0, 1, . . . , t} with
                                                                
                                                    ai+1 − 1
                                         t = max                       .
                                               i        ai

In this paper we focus our attention on sequences defined by linear recurrences of the form am = kam−1 +ham−2 ,
with suitable initial conditions and some restrictions on k and h. More precisely, we consider sequences of the
form:                                       
                                            1 if m = 0
                                            
                                      am = k if m = 1
                                            
                                              kam−1 + ham−2 if m ≥ 2
                                            

where k ∈ N+ and h ∈ Z . Using standard techniques for solving recurrences (see for example [Aig07]) it is
possible to show the the general term of the sequence is
                                               √              !m+1                           √              !m+1
                              1           k+       k 2 + 4h                   1        k−        k 2 + 4h
                    am = √                                              −√
                            2
                           k + 4h                  2                        2
                                                                           k + 4h                2

and, following [BR01], we require the condition k 2 + 4h ≥ 0 which assures that am ≥ 0 and am+1 > am for each
m ≥ 0 (which are the hypothesis of Theorem 2.1).

   Actually, here we further restrict to a more limiting case for k and h. More precisely we consider the case
k ≥ h ≥ 0, which, of course, implies k 2 + 4h ≥ 0. From [BR01] it is possible to deduce that in this case the
alphabet Σ of the language L is Σ = {0, 1, . . . , k}. The language L is the set of words w ∈ Σ∗ such that


                                                                  90
    1. w = dr dr−1 . . . d1 d0 with di ∈ Σ;
    2. if di = k, then di−1 < h, for i = 1, 2, . . . , r;
    3. d0 6= k.
In the above list, point 2. is due to the fact that, if N < ai+1 is an integer such that di = k, then, if also di−1 = h,
it is N ≥ kai + hai−1 = ai+1 , which is in contrast with N < ai+1 . Moreover, since in the representation of N it
is r1 < a1 = k, we have d0 < k.
We recall [BR01] that the following unambiguous regular grammar generates L :
     S → T 0|T 1| · · · |T (h − 1)|Sh| · · · |S(k − 1)|,
     T → T 0|T 1| · · · |T (h − 1)|Sh| · · · |S(k)|.

4      A Gray code for Ln
  We introduce some notations (as in [BBPSV14]) in order to express the language L in an alternative recursive
way.

    • If α is a symbol and L is a list of words, α · L is the list obtained by concatenating α to each string of L;
    • if i and j are symbols, then ij · L is the list obtained by concatenating i to each string of i · L (or equivalently
      ij · L = i · (j · L));
    • if L is a list of word, L̄ is the list in the reverse order;
    • if L is a list of word, (L)i is L if i is even and L̄ if i is odd;
    • if L and M are two lists, L ◦ M is their concatenation;
                                               r
    • if Lj , Lj+1 , . . . , Lj+r are lists,   `=0 Lj+` is the list Lj ◦ . . . ◦ Lj+r .

    • if L is a list of words, then f irst(L) is the first element of L and last(L) is the last element of L.
With the above notation it is not difficult to realize that L can be defined as follows:

                                
                                
                                 {ε}                                                          if n = 0
                                
                                
                                
                                
                                
                                
                                {0, 1, . . . , k − 1}                                         if n = 1
                           Ln =                                                                                       (5)
                                
                                
                                
                                0 · Ln−1 ∪ 1 · Ln−1 ∪ . . . ∪ (k − 1) · Ln−1 ∪
                                
                                
                                
                                  k0 · Ln−2 ∪ k1 · Ln−2 ∪ . . . ∪ k(h − 1) · Ln−2
                                
                                                                                               if n ≥ 2 .
                                

   Note that from the recursive construction of Ln (n ≥ 2), if an element w ∈ Ln starts with k, then k is
followed by a symbol different from h, according to the definition of the language L in the previous section. If
w starts with a symbol different from k, then it can be concatenated to any element of Ln−1 .

   We propose the following definition providing a particular placement of the elements of L which reveals itself
to be a Gray code.
Definition 4.1. Given k ∈ N, h ∈ Z, with k ≥ h ≥ 0, we define the string list Ln over the alphabet Σ =
{0, 1, 2, . . . , k}:
                               
                               
                               
                               {ε}                                                            if n = 0
                               
                               
                               
                               
                           Ln = {0, 1, . . . , k − 1}                                          if n = 1               (6)
                               
                               
                               
                               
                               
                                k−1 i · (L
                               
                                                      )k+i ◦
                                                                   h−1              k+i
                                                                                           
                                         i=0       n−1              i=0 ki · (Ln−2 )           if n ≥ 2 .


                                                                   91
    We have:

Theorem 4.2. The string list Ln is a Gray code with Hamming distance equal to one.

Proof. We proceed by induction on n. If n = 1, then L1 is trivially seen to be a Gray code. Suppose that
Li is a Gray code for i = 2, . . . , n − 1 where n ≥ 2 and consider the list Ln . The sub-lists arising from each
parenthesis in the third case of (6) are Gray codes since the lists Ln−1 and Ln−2 (which are Gray codes by the
inductive hypothesis) are read alternatively from left to right and vice versa (depending on the parity of k + 1).
                                                                                         k+i
Therefore, we have to
                   k+i
                        check only the Hamming distance between last (k − 1) · (Ln−1 )        with i = k − 1 and
f irst k0 · (Ln−2 )      with i = 0.
    In the case of k even we have:

        last (k − 1) · (Ln−1 )k+k−1 = last (k − 1) · Ln−1 = (k − 1)last(Ln−1 ) = (k − 1)f irst(Ln−1 ) ,
                                                        

and

                            f irst k0 · (Ln−2 )k+0 = f irst (k0 · Ln−2 ) = k0f irst(Ln−2 ) .
                                                  

Easily, from (6), we have
                                            f irst(Ln−1 ) = 0f irst(Ln−2 ) ,
then the only different digit between (k − 1)f irst(Ln−1 ) and k0f irst(Ln−2 ) is the first one. So Ln is a Gray
code.
   The proof in the case of k odd can be conducted with similar arguments.

5     Further developments
   In the present paper we considered the recurrence relation defined by am = kam−1 + ham−2 with k ≥ h ≥ 0,
which is one of the two cases analysed in [BR01]. The other interesting case is k ≥ −h ≥ 0 which again leads to
a strictly increasing sequence defining a language L 0 over an alphabet Σ0 slightly different from Σ, with different
properties. We think that also in this case it is possible to define a Gray code for listing the elements of L 0 ,
with Hamming distance 1.
   It could be interesting to investigate on the possibilities to characterize the language in terms of restricted
words, following the line of [BBBP17b] or [Ber17] where sets of pattern avoiding words and recurrence relations
are considered.

Acknowledgements
   This work is partially supported by the GNCS 2018 project “Proprietà combinatorie e rilevamento di pattern
in strutture discrete lineari e bidimensionali”.

References
[Aig07] M. Aigner. Discrete Mathematics. American Mathematical Society, USA, 2007.

[BBBP17a] E. Barcucci, A. Bernini, S. Bilotta and R. Pinzani. Cross-bifix-free sets in two dimensions. Theoretical
    Computer Science, 664:29–38, 2017.

[BBBP17b] E. Barcucci, A. Bernini, S. Bilotta and R. Pinzani. Non-overlapping matrices. Theoretical Computer
    Science, 658:36–45, 2017.

[BBDD98] E. Barcucci, S. Brunetti, A. Del Lungo and F. Del Ristoro. A combinatorial interpretation of the
   sequence fn+1 = 6fn − fn−1 . Discrete Mathematics, 190:235–240, 1998.

[BR01] E. Barcucci and S. Rinaldi. Some linear recurrences and their combinatorial interpretation by means of
    regular languages. Theoretical Computer Science, 255:679–686, 2001.

[Ber17] A. Bernini. Restricted binary strings and generalized Fibonacci numbers. Lecture Notes in Computer
    Science, 10248:32–43, 2017.


                                                          92
[BBPSV14] A. Bernini, S. Bilotta, R. Pinzani, A. Sabri and V. Vajnovszki. Prefix partitioned Gray code for
    particular cross-bifix-free sets. Cryptography and Communication, 6:359–369, 2014.

[BBPSV15] A. Bernini, S. Bilotta, R. Pinzani, A. Sabri and V. Vajnovszki. Gray code orders for q-ary words
    avoiding a given factor. Acta Informatica, 52:573–592, 2015.
[BBPV15] A. Bernini, S. Bilotta, R. Pinzani and V. Vajnovszki. A trace partitioned Gray code for q-ary
    generalized Fibonacci strings. Journal of Discrete Mathematical Sciences and Cryptography, 18:751–761,
    2015.

[BBPV17] A. Bernini, S. Bilotta, R. Pinzani and V. Vajnovszki. A Gray code for cross-bifix-free sets. Mathe-
    matical Structures in Computer Science, 27:184–196, 2017.
[BGPP07] A. Bernini, E. Grazzini, E. Pergola and R. Pinzani. A general exhaustive generation algorithm for
   Gray structures. Acta Informatica, 44:361-376, 2007.

[BPP12] S. Bilotta, E. Pergola and R. Pinzani. A new approach to cross-bifix-free sets. IEEE Transactions on
    Information Theory, 58:4058–4063, 2012.
[BSS93] J. Bonin, L. Shapiro and R. Simion. Some q-analogues of the Schröder numbers arising from combina-
    torial statistics on lattice paths. Journal of Statistical Planning and Inference, 34:35–55, 1993.
[Fra82] A. S. Fraenkel. Systems of numeration. American Mathematical Monthly, 92:105–114, 1982.

[Gra53] F. Gray. Pulse code communication. US Patent 2(632), 058, 1953.
[Ham50] R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal,
    29:147–160, 1950.
[PP90] E. Pergola and R. Pinzani. A combinatorial interpretation of the area of Schröder paths. Electronic
    Journal of Combinatorics, 6:R40, 1990.
[SP96] N. J. A. Sloane and S. Plouffe. The Encyclopedia of Integer Sequences. Academic Press, New York, 1996.
[Sul98] R. A. Sulanke. Bijective recurrences concerning Schröder paths. Electronic Journal of Combinatorics,
     5:R47, 1998.


                                                     93