<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G. Romana)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>On the Number of Equal-Letter Runs of the Bijective Burrows-Wheeler Transform</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena Biagi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Cenzato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zsuzsanna Lipták</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Romana</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ca' Foscari University of Venice</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Palermo</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Verona</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <issue>2022</issue>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The Bijective Burrows-Wheeler Transform (BBWT) is a variant of the famous BWT [Burrows and Wheeler, 1994]. The BBWT was introduced by Gil and Scott in 2012, and is based on the extended BWT of Mantaci et al. [TCS 2007] and on the Lyndon factorization of the input string. In the original paper, the compression achieved with the BBWT was shown to be competitive with that of the BWT, and it has been gaining interest in recent years. In this work, we present the first study of the number of runs  of the BBWT, which is a measure of its compression power. We exhibit an infinite family of strings on which  of the string and of its reverse difer by a multiplicative factor of Θ(log ), where  is the length of the string. We also present experimental results and statistics on () and (rev), as well as on the number of Lyndon factors in the Lyndon factorization of  and rev.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bijective Burrows-Wheeler Transform</kwd>
        <kwd>BWT</kwd>
        <kwd>eBWT</kwd>
        <kwd>ombinatorics on words</kwd>
        <kwd>Lyndon factorization</kwd>
        <kwd>data compression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The Burrows-Wheeler-Transform (BWT) [1] is a fundamental invertible string transform orig</title>
        <p>
          inally introduced in 1994 by Michael Burrows and David J. Wheeler as a preprocessing step
for string compression. The BWT tends to be easier to compress than the original input and
supports eficient pattern matching tasks while keeping the transform in compressed form. Due
to this, this transform has become the cornerstone of several string compressors and compressed
† Funded by the European Union (ERC, REGINDEX, 101039208). Views and opinions expressed are however those of
the author(s) only and do not necessarily reflect those of the European Union or the European Research Council.
Neither the European Union nor the granting authority can be held responsible for them.
data structures [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ] and several commonly used bioinformatics tools such as Bowtie [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ] and
BWA [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          These properties are due to the so-called clustering efect of the BWT. In fact, if the input
text is highly repetitive, the final transform will tend to include few long runs of the same
character, the number of which is usually denoted . This motivated the interest for  as a
parameter to measure text repetitiveness, with several recent papers studying the suitability of
 as a repetitiveness measure as well as how  is related to other such measures [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
          ]. Since
a string and its reverse are repetitive in the same way, we would expect the number of runs of
their transformations to be the same. However, Giuliani et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] showed that the parameter 
is not invariant w.r.t. string reversal; in fact, there are strings  for which () and (rev) difer
by a multiplicative factor of Θ(log ), where  is the length of the string.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>The Bijective Burrows-Wheeler Transform (BBWT) [11, 12] is another invertible transforma</title>
        <p>
          tion, which is a variation of the BWT. It is defined as the extended BWT (eBWT) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] of the
factors of the Lyndon factorization of the input string. As opposed to the BWT, the BBWT is
bijective: no two diferent words have the same BBWT, and every word is the BBWT of some
word.
        </p>
        <p>
          This transformation has met increasing interest in the last decade, with several papers
published on this topic. Recently Bannai et al. in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] presented the first linear time algorithm
for constructing the Bijective BWT, thus unlocking the eficient computation of this transform
for large inputs. In [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], it was further shown how to use in-place algorithms for constructing
the BBWT and converting the BWT to the BBWT in quadratic time, thus highlighting a strong
connection between these two transforms. Finally, in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], Bannai et al. presented the first
self-index based on the BBWT supporting eficient pattern-matching queries on the original
input, similar to the original BWT. This comes with a small additional log(| |) factor in the
backward search algorithm, where  is the pattern.
        </p>
        <p>
          In the original BBWT paper [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], the authors studied the suitability of this transform for text
compression. Their experimental results show that on the Calgary corpus, the compression
guaranteed by the BBWT is competitive. In particular, on average, the BBWT was about 1%
more compressible than the BWT. The BBWT properties were further studied in subsequent
papers, and it has inspired the definition of other bijective BWT variants [
          <xref ref-type="bibr" rid="ref12 ref17 ref18">12, 17, 18</xref>
          ]. However,
the efectiveness of the number  of runs of the BBWT as a repetitiveness measure has not
been studied before.
        </p>
        <sec id="sec-1-2-1">
          <title>In this paper, we present the first results on  as a repetitiveness measure, comparing the</title>
          <p>
            behaviour of  of a string and of its reverse. We define an infinite family of words for which 
of the string and its reverse difer by a multiplicative factor of Θ(log ), where  is the length
of the string, thus proving a parallel result on the BBWT to that of Giuliani et al. [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] on the
          </p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>BWT. This result shows the BBWT, as a measure of repetitiveness, exhibits the same defect as</title>
        <p>
          the BWT, namely that reversing the string may change it significantly, while repetitiveness is,
of course, preserved. The family of strings used in this paper derive from finite Fibonacci words,
as do those of [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], but the similarities end there; both the strings themselves and the proof
techniques employed are diferent.
        </p>
        <sec id="sec-1-3-1">
          <title>In the final part of the paper, we present experimental results on  , studying the multiplica</title>
          <p>tive and additive diference between a string and its reverse, as well as the relationship to the
number of factors of its Lyndon factorization.</p>
        </sec>
      </sec>
      <sec id="sec-1-4">
        <title>Due to space constraints, some proofs have been omitted and will be included in the full version of this paper. A preliminary version of some of the results in this paper was contained in the first author’s master thesis [19].</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Basics</title>
      <sec id="sec-2-1">
        <title>Let Σ be a finite ordered alphabet of size  . A string (or word) over Σ is a finite sequence</title>
        <p>= 1 · · ·  of characters  from Σ. We denote the length of string  as ||. The empty
string is the only string of length 0 and is denoted . For  ≥ 0, Σ denotes the set of all words
of length , and Σ* = ∪≥ 0Σ the set of all finite words over Σ.</p>
        <p>Let , , ,  ∈ Σ* such that  = . Then  is called a prefix ,  a substring, and  a sufix
of . A string  is a subsequence of  if  can be obtained from  by deleting some (possibly 0,
possibly all) characters from . A prefix (sufix, substring)  of a word  is a called proper if
 ̸= . For a string  and an integer  ≥ 1,  =  ·  · · ·  denotes the -fold concatenation of
. A string  is called primitive if  =  implies  =  and  = 1. If  is not primitive then
it is called a power. Every word  can be uniquely written as  =  for a primitive string ,
called root of . Given a string , we also define the infinite word  =  ·  ·  · · · We denote
the number of maximal equal-letter runs of a string  by runs().</p>
      </sec>
      <sec id="sec-2-2">
        <title>The lexicographic order on Σ* is defined as follows:  &lt;lex  if either  is a proper prefix of ,</title>
        <p>or if there exists  ∈ Σ* and ,  ∈ Σ,  &lt;  such that  is a prefix of  and  is a prefix of .
Another order relation on Σ* is the omega-order defined as follows: Let  =  and  = ℓ, , 
primitive. Then  &lt;  if  =  and  &lt; ℓ, or else if  &lt;lex . Note that the lexicographic and
the omega-order coincide on strings of the same length, but they can difer if one is a proper
prefix of the other, e.g. ab &lt;lex aba but aba &lt; ab.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Two words  and ′ are called conjugates (or rotations) if there exists , , possibly empty,</title>
        <p>s.t.  =  and ′ = . Conjugacy is an equivalence relation. We denote the conjugacy class
of a word  ∈ Σ, as [] = { |  and  are conjugates}. A word  is primitive if and only if
its conjugacy class has cardinality ||.</p>
      </sec>
      <sec id="sec-2-4">
        <title>A primitive word is called a Lyndon word if it is lexicographically strictly smaller than all</title>
        <p>of its rotations. A necklace is a Lyndon word or a power of a Lyndon word. For a primitive
word , we denote by () the unique conjugate which is a Lyndon word (its Lyndon rotation).
Every string  can be uniquely written as  = 1 · 2 · · ·  such that  are Lyndon words for
 = 1, . . . , , and 1 ≥ 2 ≥ . . . ≥  [20]. This is called ’s Lyndon factorization. A string 
is Lyndon if and only if its Lyndon factorization consists of one factor only. The multiset of
Lyndon factors in the Lyndon factorization of  is denoted ℒ().</p>
      </sec>
      <sec id="sec-2-5">
        <title>The Burrows-Wheeler Transform (BWT) [1] of a string  is a permutation of the characters of</title>
        <p>, whose th character is the last character of the th rotation of , where the rotations are taken
w.r.t. lexicographic order. For example, BWT(banana) = nnbaaa, see Fig. 1. The number of
runs of the BWT is denoted () = runs(BWT()), e.g. (banana) = 3.</p>
      </sec>
      <sec id="sec-2-6">
        <title>The extended BWT (eBWT) [13] is a generalization of the BWT to a multiset of primitive strings</title>
        <p>ℳ: eBWT(ℳ) is a permutation of the characters of the strings in ℳ, whose th character is
the last character of the th rotation, where the rotations of all strings in ℳ are listed w.r.t.
omega-order. For an example, see Fig. 2. Note that for all strings , eBWT({}) = BWT().</p>
        <sec id="sec-2-6-1">
          <title>Next we list some known properties of the eBWT.</title>
          <p>Lemma 1</p>
          <p>1. Let  ∈ . Then BWT() is a subsequence of eBWT().
2. Let  be a multiset of primitive strings, and ′ a conjugate of some  ∈ . Then the number
of runs of eBWT( ∪ {′}) equals the number of runs of eBWT().
3. Given an integer  &gt; 0 and a primitive word , let  be the multiset consisting of  copies
of . Then BWT() = eBWT().</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The bijective BWT</title>
      <p>Let  = 1 · 2 · · ·  be the Lyndon factorization of string . Then BBWT() = eBWT(ℳ),
where ℳ = ℒ() = {1, 2, . . . , } is the multiset of Lyndon factors of . As an example,
BWT(banana) = nnbaaa, while BBWT(banana) = annbaa, since the Lyndon factorization
of banana is b · an · an · a, and thus ℒ(banana) = {a, an, an, b}, see Fig. 1.
sorted rotations BWT
abanan n
anaban n
ananab b
banana a
nabana a
nanaba a</p>
      <p>Proof First assume that BWT() = BBWT() holds. Let  = , where  is primitive and  ≥ 1,
and let  be the conjugate of  which is a necklace. Then clearly,  =  with  = (), and
ℒ() consists of  copies of . Thus:</p>
      <p>BBWT() d=ef. eBWT(ℒ()) Lem=ma 1 BWT() ,  conj. BWT() = BBWT().</p>
      <p>=
By bijectivity of the BBWT, this implies that  equals its own necklace rotation , and is thus a
necklace.</p>
      <p>Conversely, if  =  is a necklace, then ℒ() consists of  copies of , and BBWT() =
eBWT(ℒ()) = BWT(), again by Lemma 1. □</p>
      <p>Let us denote by () = runs(BBWT()), the number of runs of the BBWT of . It is known
that the number of runs () of the BWT of a binary string  is always even. This is because
necessarily the first character must be b, and the last character must be a. This is not the case
of the BBWT since a and b are the smallest and the greatest Lyndon factor, respectively, that a
binary word can have, and therefore, BBWT may start and end with either a or b. In the next
lemma, we give the conditions that a or b appear as a Lyndon factor.</p>
      <p>Lemma 3 Let  ∈ {a, b}* be a binary string, with a &lt; b. Then, a ∈ ℒ() if and only if a is
the last letter of . Symmetrically, b ∈ ℒ() if and only if b is the first letter of .
Proof We prove just the first statement on a, and the other case is treated symmetrically. For
the first direction, observe that a is the smallest Lyndon factor (both in lexicographical and 
order) that any binary word can have. Since the Lyndon factorization requires that any Lyndon
factor must be greater or equal than the following in  (if any), a ∈ ℒ() implies that either the
next letter in  is another a, or there are no more letters in . For the other direction, suppose by
contradiction that  ends with an a and a ∈/ ℒ(). Then, there exist ℓ ≥ 0,  ≥ 1,  ∈ {a, b}*
such that the word  = aℓba ∈ ℒ(), that is the Lyndon factor containing the last a of .
However, one can verify that aℓ+1b &lt; aℓba, which contradicts that  is a Lyndon word. □</p>
      <sec id="sec-3-1">
        <title>We next characterize when () is odd.</title>
        <p>Lemma 4 Let  ∈ {a, b}* be a binary word. It holds that () is odd if and only if  starts and
ends with the same letter.</p>
        <p>However, it turns out that there is a simple connection between () and (rev), namely
that they must have the same parity:
Lemma 5 For every binary string , the diference between the number of runs of the BBWT of 
and the BBWT of its reverse is even.</p>
        <p>Proof If  starts and ends with the same character, then so does rev, and by Lemma 4, both have
an odd number of runs. If  starts and ends with diferent characters, then so does rev, and by
Lemma 4, both have an even number of runs. □</p>
        <p>The BWT of a word achieves maximal compression when it has as many runs as the size
of the alphabet. In case of binary alphabets, it was shown in [21] that the perfect clustering
efect of the BWT is a characterization for the family of standard Sturmian words. These words
can be constructed through a directive sequence, an infinite sequence of integers {}≥ 0 such
that 0 ≥ 0 and  &gt; 0 for all  &gt; 0. The standard Sturmian words generated by this sequence
are the words of the sequence {}≥ 0 such that 0 = b, 1 = a, and +1 = − 1 − 1 for

 &gt; 1. For instance, the directive sequence 1, 1, 1, 1, . . . generates the well-known sequence of
ifnite Fibonacci words : 0 = b, 1 = a, 2 = ab, 3 = aba, 4 = abaab, 5 = abaababa, 6 =
abaababaabaab, 7 = abaababaabaababaababa, . . .</p>
        <p>Theorem 1 ([21]) BWT() = baℓ, for some , ℓ ≥ 1, if and only if  is the power of a rotation
of a standard Sturmian word.</p>
        <sec id="sec-3-1-1">
          <title>We give a similar characterization for the BBWT. As opposed to the BWT, here it is not</title>
          <p>necessary that the transform begins with b and ends with a.</p>
          <p>Lemma 6 The number of runs of the BBWT of a string  is 2 if and only if (i)  has the form baℓ
for some , ℓ ≥ 1, or (ii)  is the Lyndon rotation of a standard Sturmian word or a power of the
Lyndon rotation of a standard Sturmian word.
diference of  as  () = () − (rev).</p>
          <p>In the rest of the paper, we will look at the relationship between () and (rev). To
this end, we define the runs-ratio of string  as  () = max( ((re)v) , ((re)v) ), and the
runs</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Fibonacci words</title>
      <sec id="sec-4-1">
        <title>In this section, we will show that the Lyndon rotation of a Fibonacci word of order  has the</title>
        <p>following interesting property: the number of runs of the BBWT of its reverse has 2( − 2)
runs, while the BBWT of the word itself has only 2 runs. Thus,   of these words is Θ(log ),
where  is the length of the word.</p>
        <sec id="sec-4-1-1">
          <title>Fibonacci words were defined in the previous section. It follows directly from the defini</title>
          <p>tion that for all  ≥ 0, || = , where {}≥ 0 is the well-known Fibonacci sequence</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>1, 1, 2, 3, 5, 8, 13, . . . Fibonacci words have been studied extensively, see [22] for an overview.</title>
        <sec id="sec-4-2-1">
          <title>Some of the properties of Fibonacci words also follow from properties that have been shown for all standard words, see e.g. [23, 24, 25, 26, 21]. We next list some of these properties:</title>
          <p>Proposition 1 (Some known properties of the Fibonacci words) Let  be the Fibonacci
word of order  ≥ 0. Then there exists a palindrome  with the following properties:
1. for all  ≥ 2: if  is even, then  = ab, and if  is odd, then  = ba, where  is a
palindrome (in particular, 2 = ).
2. for all  ≥ 4,
3. for all  ≥ 2, ab is a Lyndon word.
4. for all : ()rev is a conjugate of .</p>
          <p>• if  is even, then  = − 1ba− 2ab = − 2ab− 1ab, and
• if  is odd, then  = − 1ab− 2ba = − 2ba− 1ba.
5. for  ≥ 2: BWT() has two runs; in particular, BWT() = b− 2 a− 1 .
6. for  ≥ 2: ab and ba, the so-called central words, are adjacent in the BW-matrix of
.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>From these we can easily deduce further properties. Recall that () is the Lyndon rotation of a primitive string .</title>
        <p>Corollary 1 Let  be the palindrome from Prop. 1, i.e.  = ab for  even, and  = ba
for  odd. Then
1. () = ab,
2.  = − 1ba− 2 = − 2ab− 1, if  even,
3.  = − 1ab− 2 = − 2ba− 1, if  odd.</p>
        <sec id="sec-4-3-1">
          <title>Since Lyndon rotations of Fibonacci words will be of central importance, we list the first few</title>
          <p>here: (0) = b, (1) = a, (2) = ab, (3) = aab, (4) = aabab, (5) = aabaabab.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Next we study the Lyndon factorization of the reverse of the Lyndon rotation of a Fibonacci</title>
          <p>word . We will show that it consists of the Lyndon rotations of  for  = 0, . . . ,  − 2, where
the words of even order are listed increasingly (w.r.t. their order), followed by those of odd order
listed decreasingly, followed by one additional factor (1) = a. Formally:
Lemma 7 Let  = ()rev be the reverse of the Lyndon rotation of the Fibonacci word of order
. Then the Lyndon factorization of  is as follows:
1.  = (0) · (2) · (4) · · · (− 2) · (− 3) · (− 5) · · · (3) · (1) · (1) if 
is even, and
2.  = (0) · (2) · (4) · · · (− 3) · (− 2) · (− 4) · · · (3) · (1) · (1) if 
is odd.</p>
          <p>Example 1 In the following we list  with its Lyndon factorization, for  = 2, . . . , 8 :
• 2 = (2)rev = b · a,
• 3 = (3)rev = b · a · a,
• 4 = (4)rev = b · ab · a · a,
• 5 = (5)rev = b · ab · aab · a · a,
• 6 = (6)rev = b · ab · aabab · aab · a · a,
• 7 = (7)rev = b · ab · aabab · aabaabab · aab · a · a,
• 8 = (8)rev = b · ab · aabab · aabaababaabab · aabaabab · aab · a · a.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>For the proof of Lemma 7, we will first need two other lemmas. The first one gives a simple</title>
          <p>recursion for :
Lemma 8 Let  = (())rev. Then the following recursion holds for  ≥ 2 :
•  = − 2− 1 if  is even, and
•  = − 1− 2 if  is odd.</p>
          <p>Proof First note that since () = ab, and  is a palindrome, therefore  = ba. Let
 ≥ 2 be even. By Corollary 1,  = ba = b− 2ab− 1a = − 2− 1. Similarly, if  is odd,
then  = ba = − 1ab− 2 = − 1− 2. □</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>The second lemma gives a factorization of the Lyndon rotations of the .</title>
        <p>Lemma 9 Let  ≥ 4. Then the following holds:
• () = (− 3)(− 5) · · · (3)(1)(1)(0)(2)(4) · · · (− 2), if  is even,
and
• () = (− 2)(− 4) · · · (3)(1)(1)(0)(2)(4) · · · (− 3), if  is odd.</p>
        <sec id="sec-4-4-1">
          <title>Now we are ready to prove Lemma 7.</title>
          <p>Proof (of Lemma 7) The proof is by induction on . For the base cases, 2 = b · a, and 3 =
b · a · a, as claimed. Now let  &gt; 3. If  is even, then  = − 2− 1 = b− 2ab− 1a. Thus,
 = (0)(2) · · · (− 4)(− 5) · · · (3)(1)(1) ·
⏟ − 2 by⏞the I.H.</p>
          <p>(0)(2) · · · (− 4)(− 3) · · · (3)(1)(1)
⏟ − 1 by⏞the I.H.
= (0)(2) · · · (− 4)·
(− 5) · · · (3)(1)(1)(0)(2) · · · (− 4) ·
⏟
(− 2) b⏞y Lemma 9
(− 3) · · · (3)(1)(1)
= (0) · (2) · (4) · · · (− 2) · (− 3) · (− 5) · · · (3) · (1) · (1).
The claim for  odd follows analogously.
□
Example 2 For example, 8 = 6 · 7 = babaababaabaa · babaababaabaababaabaa =
b · ab · aabab · aabaababaabab · aabaabab · aab · a · a, where the new Lyndon factor is
underlined; its factorization from Lemma 9 is (6) = aabaababaabab = aab · a · a · b · ab · aabab.
Corollary 2 (from Lemma 7) Let  be the reverse of the Lyndon rotation of the Fibonacci
word of order . Then ℒ() = {(0), (1), (2), . . . , (− 2)} ∪{(1)}, i.e. the factor
(1) = a appears with multiplicity 2 in the factorization, while all other factors appear exactly
once.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>The final piece we need to prove the main theorem of this section will be Lemma 10, which</title>
          <p>gives the number of runs of the eBWT of the set of Fibonacci words  = {0, 1, . . . , }. A
crucial role will be played by the central words, which we list up to order 7 in Table 1.
Lemma 10 Let  &gt; 0 and  be the set of Fibonacci words of order up to , i.e.  =
{0, 1, . . . , }. Then eBWT() has 2 runs.</p>
          <p>Proof We prove the claim by induction on . For  = 1, 2, eBWT({a, b}) = ab, which has 2
runs, and eBWT({a, b, ab}) = abab, which has 4 runs, as claimed.</p>
          <p>Now let  &gt; 2. We will observe what happens to eBWT(− 1) when we insert  into − 1 and
will show that exactly 2 new runs are created, see Fig. 2.</p>
          <p />
          <p>Let eBWT(− 1) be given, and consider what happens when we add . Denote the two central
words of order  as  = ab and  = ba. Note that  =  if  is even, and  =  if
 is odd. Clearly,  &lt;  for all . Moreover, it can be proven that for  &gt; 2, the following
relationship holds between central words of subsequent order:  &gt; − 1 if  is even, and
 &lt; − 1 if  is odd.</p>
          <p>Now, when considering eBWT(− 1), the two words  and  are inserted together, i.e. they
are adjacent in the eBWT-matrix of . This is because they have the common prefix , of
length || − 2, which, for  &gt; 3, is longer than the longest word previously present (of length
|− 1| = − 1 &lt;  − 2); for  = 3 the claim can be seen by direct inspection (see Fig. 2, left).</p>
          <p>Moreover, the two central words will be inserted immediately adjacent to one of the two central
words of order  − 1. This is because one of the two central words is always equal to  ( for 
even,  for  odd), and thus, − 1 is a proper prefix of  or − 1 is a proper prefix of , and no
longer prefix of these words can be present. Thus we have  =  &gt; − 1 = − 1 if  is even,
and  =  &lt; − 1 = − 1 if  is odd. Therefore,  and  will be inserted immediately after
− 1 = − 1 if  is even, and immediately before − 1 = − 1 if  is odd. It follows by induction
that − 1 was inserted immediately after (immediately before) − 2 if  is even (if  is odd). This
in turn implies that the word just before (just after)  and  is − 2, for  even (for  odd).</p>
          <p>Now consider the number of runs of eBWT(− 1). By the induction hypothesis, eBWT(− 1)
has 2 − 2 runs. Inserting the two central words creates two new runs. This is because they end in
b and a, in this order, and they are inserted between the two previous Fibonacci words: if  is even,
then they are inserted between − 1, which ends in a, and − 2, which ends in b; if  is odd, then
between − 2, which ends in a, and − 1, which ends in b.</p>
          <p>Note that, since  is a standard word, BWT() has the form baℓ, and by Lemma 1, this will
be a subsequence of eBWT(). This implies that, if the two central words of order  are inserted
between, say, position  and position  + 1 of eBWT(− 1), then all rotations of  ending in b
will be inserted before , and all rotations ending in a will be inserted after  + 1. Inserting a b will
create a new run only if it is inserted between two a’s; likewise, inserting an a will create a new run
only if it is inserted between two b’s. It is not dificult to prove (by induction) that all a-runs before
the position of − 1 have length 1, and that all b-runs after the position of − 1 have length 1.</p>
          <p>Thus, exactly two new runs are created. This completes the proof. □
Theorem 2 Let  = () be the Lyndon rotation of the th Fibonacci word . Then   () =
Θ(log ||).</p>
          <p>Proof First note that since  is a Lyndon word, BBWT() = BWT() by Lemma 2. Since it
is a rotation of , BWT() = BWT(). By Prop. 1, BWT() has two runs, so BBWT() has
two runs, thus,  () = 2.</p>
          <p>Now let  = ()rev. By Corollary 2, the set of Lyndon factors of  is − 2 = {0, . . . , − 2}.
It follows from Lemma 1 that the number of runs of a multiset depends only on the set of the
elements (and not on the multiplicity of each element). By Lemma 10, the number of runs of
eBWT(− 2) is 2( − 2), and therefore,  () = 2( − 2).</p>
          <p>Finally, using the fact that || = || =  grows exponentially in , it follows that   () =
2(2− 2) = Θ() = Θ(log ||). □</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental results</title>
      <sec id="sec-5-1">
        <title>In our experiments, we studied the  parameter of a string  and its reverse, looking at both</title>
        <p>multiplicative (  ) and additive diference (   ). We also studied the number of Lyndon factors
in the Lyndon factorization of . We considered only those strings  which are lexicographically
strictly smaller than their reverse. We refer to such strings also as forward strings , as opposed
to strings which are strictly larger than their reverse: we refer to these as reverse strings. This is
to avoid repeating the same experiment twice (as we compare  of a string and of its reverse).</p>
        <sec id="sec-5-1-1">
          <title>Note that this experimental setup excludes palindromes.</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>We computed the BBWT of all forward strings for lengths between 3 and 25, over a binary</title>
        <p>alphabet. We also ran the same experiment over a ternary alphabet, for strings of length up to
15 (data not shown).
5.1. Multiplicative diference in</p>
        <p>of a string and its reverse
In the first step of our analysis,   () was calculated for each forward string . We report
the maximum   for each length over a binary alphabet in Table 2 1. Our results show that
1The statistics reported in the summary do not include palindrome sequences since their  () is always 1. The
number of palindromes of length  is 2⌈/2⌉.
increasing the sequence length , the average runs-ratio as the proportion of sequence pairs
having  () = 1 decreases. On the other hand, large sequence lengths generate large maximum
 () values. This suggests that by increasing , we observe more sequences for which the
 () is very close to 1 and only a small subset for which () is very diferent from (rev),
i.e., we observe extremal words with larger  () values. In addition, for every  up to 21, the
most frequent value of  () is 1, i.e. probability that () = (rev) is high (see Fig. 3 for
 = 21).</p>
        <p>As for the extremal words, we noticed that several of them are Lyndon rotations of standard
words (data not shown). In fact, these strings always have () = 2, and thus they tend to
generate large  (). In particular, for  () ≥ 4 and  ≤ 23, all extremal cases are Lyndon
rotations of standard words. However, no precise pattern is visible since also non-standard
words can be extremal words. On the other hand, if we consider the smallest  values, which
present an increase in  (), for  () ≥ 3, the extremal words are the Lyndon rotation of
the Fibonacci word of length  and its reverse complement. For instance,  = 8 is the smallest
 which allows to obtain  () = 3, and the two extremal cases are aabaabab and ababbabb.
As for  () ≤ 2, the situation appears more complex, since there are several extremal cases
that reach the same runs-ratio.
5.2. Number of Lyndon factors and</p>
      </sec>
      <sec id="sec-5-3">
        <title>To conclude our experiment, we computed the number of Lyndon factors for all  and put that</title>
        <p>in relation to (). In Fig. 4 we note that there are no strings with both a high number of</p>
        <sec id="sec-5-3-1">
          <title>Lyndon factors and a high number of runs as the top right corners of both graphs for forward</title>
          <p>(a) and reverse (b) are empty. The two plots are quite similar, with the exception of the number
of Lyndon factors that appear to be higher in reverse strings (see also Fig. 3). In addition, the
-axis for forward and reverse is the same, indicating that values of  span the same range on
both forward and reverse strings.</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>We believe that the behavior described above might be related to the way we define ; since</title>
        <p>is always lexicographically strictly smaller than its reverse, if rev has a run of b at the beginning
and a run of a at the end the resulting Lyndon factorization will contain several length-one</p>
      </sec>
      <sec id="sec-5-5">
        <title>Lyndon factors. For instance, the Lyndon factorization of bbbbababaababbaaaa results in</title>
        <p>eight length-one Lyndon factors: b · b · b · b · ab · ab · aababb · a · a · a · a. However, also in
this case, no clear pattern arose from our experiments. In fact, there are cases where the Lyndon
factorization of  leads to () which is much smaller than (rev).</p>
        <sec id="sec-5-5-1">
          <title>On strings over a binary alphabet, we can observe in the right plot in Fig. 4 that many strings</title>
          <p>are found on the vertical line indicating 0 as the diference in the number of runs, and the two
measures analysed do not seem to strongly correlate. Results were inconclusive also when
performing the same analysis counting only distinct Lyndon factors. Further details on the
experimental results will be given in the full version of the paper.
survey of string orderings and their application to the Burrows-Wheeler transform, Theor.</p>
          <p>Comput. Sci. 710 (2018) 52–65.
[19] E. Biagi, On comparing the Bijective Burrows-Wheeler-Transform of a word and its reverse,</p>
        </sec>
        <sec id="sec-5-5-2">
          <title>Master’s thesis, University of Verona, Italy, 2022.</title>
          <p>[20] K. T. Chen, R. H. Fox, R. Lyndon, Free diferential calculus, iv. the quotient groups of the
lower central series, Annals of Mathematics 68 (1958) 81.
[21] S. Mantaci, A. Restivo, M. Sciortino, Burrows-Wheeler transform and Sturmian words, Inf.</p>
          <p>Process. Lett. 86 (2003) 241–246.
[22] A. de Luca, A combinatorial property of the Fibonacci words, Inf. Process. Lett. 12 (1981)
193–195.
[23] A. de Luca, F. Mignosi, Some combinatorial properties of Sturmian words, Theor. Comput.</p>
          <p>Sci. 136 (1994) 361–285.
[24] A. de Luca, Sturmian words: Structure, combinatorics, and their arithmetics, Theor.</p>
          <p>Comput. Sci. 183 (1997) 45–82.
[25] J. Berstel, A. de Luca, Sturmian words, Lyndon words and trees, Theor. Comp. Sci. 178
(1997) 171–203.
[26] J. Borel, C. Reutenauer, On Christofel classes, RAIRO Theor. Inf. Appl. 40 (2006) 15–27.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Burrows</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Wheeler</surname>
          </string-name>
          ,
          <article-title>A block-sorting lossless data compression algorithm</article-title>
          ,
          <source>Technical Report, DIGITAL System Research Center</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gagie</surname>
          </string-name>
          , G. Navarro,
          <string-name>
            <given-names>N.</given-names>
            <surname>Prezza</surname>
          </string-name>
          ,
          <article-title>Fully functional sufix trees and optimal text searching in BWT-runs bounded space</article-title>
          ,
          <source>ACM</source>
          <volume>67</volume>
          (
          <year>2020</year>
          ) 2:
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mäkinen</surname>
          </string-name>
          , G. Navarro,
          <article-title>Succinct sufix arrays based on run-length encoding</article-title>
          ,
          <source>Nord. J. Comput</source>
          .
          <volume>12</volume>
          (
          <year>2005</year>
          )
          <fpage>40</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Langmead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trapnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Salzberg</surname>
          </string-name>
          ,
          <article-title>Ultrafast and memory-eficient alignment of short DNA sequences to the human genome</article-title>
          ,
          <source>Genome Biology</source>
          <volume>10</volume>
          (
          <year>2009</year>
          )
          <article-title>R25</article-title>
          . doi:
          <volume>10</volume>
          . 1186/gb-2009
          <source>-10-3-r25.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Langmead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Salzberg</surname>
          </string-name>
          ,
          <article-title>Fast gapped-read alignment with Bowtie 2</article-title>
          ,
          <string-name>
            <surname>Nature</surname>
            <given-names>Methods</given-names>
          </string-name>
          9
          <article-title>(</article-title>
          <year>2012</year>
          )
          <fpage>357</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Durbin</surname>
          </string-name>
          ,
          <article-title>Fast and accurate long-read alignment with Burrows-Wheeler transform</article-title>
          ,
          <source>Bioinform</source>
          .
          <volume>26</volume>
          (
          <year>2010</year>
          )
          <fpage>589</fpage>
          -
          <lpage>595</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Akagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funakoshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Inenaga</surname>
          </string-name>
          ,
          <article-title>Sensitivity of string compressors and repetitiveness measures</article-title>
          ,
          <source>Inf. Comput</source>
          .
          <volume>291</volume>
          (
          <year>2023</year>
          )
          <fpage>104999</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Inenaga</surname>
          </string-name>
          , Zs. Lipták, G. Romana,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sciortino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Urbina</surname>
          </string-name>
          ,
          <article-title>Bit catastrophes for the Burrows-Wheeler Transform</article-title>
          ,
          <source>in: Proc. of Developments in Language Theory - 27th International Conference, DLT</source>
          <year>2023</year>
          , volume
          <volume>13911</volume>
          of Lecture Notes in Computer Science, LNCS,
          <year>2023</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <article-title>Indexing highly repetitive string collections, part I: repetitiveness measures</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2022</year>
          )
          <volume>29</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          :
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Inenaga</surname>
          </string-name>
          , Zs. Lipták,
          <string-name>
            <given-names>N.</given-names>
            <surname>Prezza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sciortino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tofanello</surname>
          </string-name>
          ,
          <article-title>Novel results on the number of runs of the Burrows-Wheeler-Transform</article-title>
          ,
          <source>in: Proc. of 47th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM</source>
          <year>2021</year>
          , volume
          <volume>12607</volume>
          of Lecture Notes in Computer Science, LNCS,
          <year>2021</year>
          , pp.
          <fpage>249</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <article-title>A bijective string sorting transform</article-title>
          ,
          <source>CoRR abs/1201</source>
          .3077 (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kufleitner</surname>
          </string-name>
          ,
          <article-title>On bijective variants of the Burrows-Wheeler Transform</article-title>
          ,
          <source>in: Proc. of the Prague Stringology Conference</source>
          <year>2009</year>
          ,
          <year>2009</year>
          , pp.
          <fpage>65</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mantaci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Restivo</surname>
          </string-name>
          , G. Rosone,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sciortino</surname>
          </string-name>
          ,
          <article-title>An extension of the Burrows-Wheeler Transform</article-title>
          ,
          <source>Theor. Comput. Sci</source>
          .
          <volume>387</volume>
          (
          <year>2007</year>
          )
          <fpage>298</fpage>
          -
          <lpage>312</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bannai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kärkkäinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Köppl</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Piatkowski, Constructing the Bijective and the Extended Burrows-Wheeler Transform in linear time</article-title>
          ,
          <source>in: Proc. of 32nd Annual Symposium on Combinatorial Pattern Matching, CPM</source>
          <year>2021</year>
          , volume
          <volume>191</volume>
          of LIPIcs,
          <year>2021</year>
          , pp.
          <volume>7</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Köppl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shinohara</surname>
          </string-name>
          , In-place
          <string-name>
            <surname>Bijective</surname>
          </string-name>
          Burrows-Wheeler Transforms,
          <source>in: Proc. of 31st Annual Symposium on Combinatorial Pattern Matching, CPM</source>
          <year>2020</year>
          , volume
          <volume>161</volume>
          of LIPIcs,
          <year>2020</year>
          , pp.
          <volume>21</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          :
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bannai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kärkkäinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Köppl</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Piatkowski, Indexing the Bijective BWT</article-title>
          ,
          <source>in: Proc. of 30th Annual Symposium on Combinatorial Pattern Matching, CPM</source>
          <year>2019</year>
          , volume
          <volume>128</volume>
          of LIPIcs,
          <year>2019</year>
          , pp.
          <volume>17</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          :
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Daykin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. F.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <article-title>A bijective variant of the Burrows-Wheeler Transform using V-order</article-title>
          ,
          <source>Theor. Comput. Sci</source>
          .
          <volume>531</volume>
          (
          <year>2014</year>
          )
          <fpage>77</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Daykin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Groult</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guesnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lecroq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lefebvre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Léonard</surname>
          </string-name>
          , É.
          <string-name>
            <surname>Prieur-Gaston</surname>
          </string-name>
          , A
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>