=Paper= {{Paper |id=Vol-3284/4242 |storemode=property |title=When a Dollar in a Fully Clustered Word Makes a BWT |pdfUrl=https://ceur-ws.org/Vol-3284/4242.pdf |volume=Vol-3284 |authors=Sara Giuliani,Zsuzsanna LiptΓ k,Francesco Masillo |dblpUrl=https://dblp.org/rec/conf/ictcs/GiulianiLM22 }} ==When a Dollar in a Fully Clustered Word Makes a BWT== https://ceur-ws.org/Vol-3284/4242.pdf
When a Dollar in a Fully Clustered Word Makes a
BWT
Sara Giuliani1,* , Zsuzsanna LiptΓ‘k1 and Francesco Masillo1
1
    University of Verona, Italy


                                         Abstract
                                         The Burrows-Wheeler Transform (BWT) is a powerful transform widely used in string compression
                                         and string processing. It produces a reversible permutation of the characters of the input string, often
                                         allowing for easier compression of the string, while also enabling fast pattern matching. While the BWT
                                         is defined for every word, not every word is the BWT of some word. A characterization of BWT images
                                         was given in [Likhomanov and Shur, CSR, 2011], based on the standard permutation of the string.
                                             Often an end-of-file character $ is added to mark the end of a string. Given a string 𝑀, it is an
                                         interesting combinatorial question where a $ can be inserted to make 𝑀 the BWT of some string 𝑣$. This
                                         question was answered in [Giuliani et al. Theor. Comput. Sci. 2021], where an efficient algorithm was
                                         presented for computing all such positions (called nice positions), and a characterization of nice positions
                                         was given, based on pseudo-cycles in the standard permutation of 𝑀.
                                             In this paper, we give a stronger characterization of nice positions: We show that these can be
                                         characterized using only essential pseudo-cycles, which constitute a small subset of all possible pseudo-
                                         cycles. We present an algorithm to compute all essential pseudo-cycles of a word 𝑀.
                                             In the second part of the paper, we study nice positions of fully clustered words: these are words 𝑀
                                         whose number of runs equals the number of distinct characters occurring in 𝑀. Fully clustered words are
                                         of particular interest due to their extreme compressibility, and words whose BWT is fully clustered have
                                         been studied extensively. We are interested in the number of nice positions of fully clustered words, as
                                         well as in the number of fully clustered words with π‘˜ nice positions, for fixed π‘˜ and word length.

                                         Keywords
                                         combinatorics on words, Burrows-Wheeler-Transform, permutations, string algorithms, fully clustered
                                         words




Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022
*
 Corresponding author.
" sara.giuliani_01@univr.it (S. Giuliani); zsuzsanna.liptak@univr.it (Zs. LiptΓ‘k); francesco.masillo@univr.it
(F. Masillo)
                                       Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
1. Introduction
The Burrows-Wheeler Transform (BWT) [1] is a widely used reversible transformation on
strings, which is central in the context of lossless compression and string processing. It is
a permutation of the input word that tends to group together equal characters. Usually, the
more repetitive the string, the longer the groups of consecutive equal characters, called runs.
This is known as clustering effect [2], and allows for easier compression of the input [3, 4].
Due to its power with very repetitive strings and to its reversibility, the BWT is fundamental
in fields where large amounts of textual data need to be processed, such as bioinformatics,
where BWT-based tools are among those most frequently used [5, 6, 7, 8]. Understanding the
underlying combinatorics is crucial in applications, since they are necessary for advancements
on data structures based on the BWT, such as the FM-index [9] or the more recent π‘Ÿ-index [10].
    The BWT has also been studied intensively from a combinatorics on words point of view,
with [11] showing that it was a special case of the Gessel-Reutenauer bijection [12]. Many
papers examined the clustering effect of the BWT, most recently [13, 14, 15, 16], while other
transforms with similar properties were investigated in [17, 18, 19]. While the BWT is defined
for every word, not every word is the BWT of some word; characterizations of BWT images are
known [20, 21].
    In this paper, we continue the study of the following combinatorial problem, introduced
in [22]: Given a word 𝑀, where, if at all, can a sentinel character $ be inserted into 𝑀 to make
it the BWT of some word ending with $? We call such positions nice positions. The question
is motivated by the fact that in the application context, often such a sentinel character $ is
appended to mark the end of the input string.
    In [22], an π’ͺ(𝑛 log 𝑛)-time algorithm for computing all nice positions was given, as well as a
characterization using so-called pseudo-cycles. In this paper, we give a stronger characterization
of nice positions, restricting the set of pseudo-cycles to a subset which can be exponentially
smaller than the complete set. We also present an π’ͺ(𝑛2 )-time algorithm that computes nice
positions via this restricted set of pseudo-cycles. Although the new algorithm is slower than
the previous one in the worst case, it gives more information in output, thus opening up new
possibilities of insights into the problem.
    We then apply this algorithm to the study of nice positions in fully clustered words. These
are words whose number of equal-letter-runs equals the number of distinct characters, such as
𝑀 = bbbbcaa. Words whose BWT is fully clustered are of special interest because of their high
compressibility via the BWT; e.g. such a word is abbacbb, with BWT 𝑀. Over a binary alphabet,
it is known that a word has fully clustered BWT if and only if it is conjugate of a standard word
or of a power of a standard word [20]. Over a ternary alphabet, there exists a characterization
via morphisms, of words whose BWT has the form c𝑖 b𝑗 aπ‘˜ (there called Type I) [23], while
a combinatorial property (circular palindromic richness) was shown to be a necessary but
not sufficient condition for such words [24, 25]. This result extends to larger alphabets, i.e.
                                          π‘›πœŽβˆ’1
words whose BWT has the form π‘Žπ‘›πœŽπœŽ π‘ŽπœŽβˆ’1         . . . π‘Žπ‘›1 1 , with alphabet Ξ£ = {π‘Ž1 , π‘Ž2 , . . . , π‘ŽπœŽ } and
π‘Ž1 < π‘Ž2 < . . . < π‘ŽπœŽ . Finally, a characterization of words with fully clustered BWT, over
arbitrary alphabets, was given in terms of discrete interval exchanges in [26].
    We present both theoretical and experimental results on nice positions in fully clustered
words, as well as on the number of words with π‘˜ nice positions, for fixed π‘˜. We studied binary
and ternary alphabets. Due to space restrictions, we present only the results on the binary
alphabet, as well as having to omit some proofs. Both will be included in the full version.
   Overview of paper: In Section 2, we introduce the necessary notation and basic facts. In
Section 3, we give the new stronger characterization of nice positions, and in Section 3.1, we
describe the algorithm exploiting the new characterization. In Section 4, we present some
properties and conjectures related to nice positions for fully clustered words for the unary and
binary alphabets. We close with a brief outline of future work in Section 5.


2. Preliminaries
Strings and permutations. Given a finite ordered alphabet Ξ£, a string (or word) over Ξ£ is a
finite sequence 𝑀 = 𝑀0 Β· Β· Β· π‘€π‘›βˆ’1 of characters from Ξ£. We denote by 𝜎 the size of Ξ£, by |𝑀| = 𝑛
the length of a word 𝑀0 . . . π‘€π‘›βˆ’1 , and by πœ€ the empty word, the only string of length 0. Note that
we index strings from 0. We denote by alph(𝑀) = {π‘Ž | there exists 1 ≀ 𝑖 ≀ |𝑀| : 𝑀𝑖 = π‘Ž} the
set of characters occurring in 𝑀. For two words 𝑒, 𝑣, 𝑒𝑣 denotes their concatenation, given by
𝑒𝑣 = 𝑒0 Β· Β· Β· 𝑒|𝑒|βˆ’1 𝑣0 Β· Β· Β· 𝑣|𝑣|βˆ’1 . If 𝑀 = π‘₯𝑣𝑦, then π‘₯ is called a prefix, 𝑣 a substring, and 𝑦 a suffix
of 𝑀. Given a word 𝑒 and a positive integer 𝑐, 𝑒𝑐 = 𝑒 Β· Β· Β· 𝑒 denotes the 𝑐-fold concatenation of
𝑒. A word 𝑣 is called a power if it can be written as 𝑣 = 𝑒𝑐 for some 𝑐 > 1, otherwise it is called
primitive. Often a sentinel character, denoted $, is appended to mark the end of a string, where
$ does not occur elsewhere in the string, and is assumed to be smaller than all π‘Ž ∈ Ξ£. Clearly,
every word of the form 𝑣$ is primitive.
   Two words 𝑀, 𝑀′ are called conjugates (or rotations) if there exist 𝑒, 𝑣, possibly empty, such that
𝑀 = 𝑒𝑣 and 𝑀′ = 𝑣𝑒. Given a word 𝑀 = 𝑀0 Β· Β· Β· π‘€π‘›βˆ’1 , the 𝑖’th rotation of 𝑀, for 0 ≀ 𝑖 ≀ π‘›βˆ’1, is
𝑀𝑖 Β· Β· Β· π‘€π‘›βˆ’1 𝑀0 Β· Β· Β· π‘€π‘–βˆ’1 . A word of length 𝑛 is primitive if and only if it has 𝑛 distinct conjugates.
The set of all words over Ξ£ is totally ordered by the lexicographic order: Let 𝑣, 𝑀 ∈ Ξ£* , then
𝑣 ≀ 𝑀 if 𝑣 is a prefix of 𝑀, or there exists an index 𝑗 s.t. for all 𝑖 < 𝑗, 𝑣𝑖 = 𝑀𝑖 , and 𝑣𝑗 < 𝑀𝑗
according to the order on Ξ£.
   A run in a word 𝑀 is a maximal substring consisting of the same character. A fully clustered
                                                                                    π‘–π‘Ÿβˆ’1
word is a word 𝑀 with |alph(𝑀)| runs. Given a word 𝑀 = 𝑏𝑖00 𝑏𝑖11 Β· Β· Β· π‘π‘Ÿβˆ’1              with π‘Ÿ runs, we define
its pattern as pat(𝑀) = 𝑏0 𝑏1 Β· Β· Β· π‘π‘Ÿβˆ’1 . For example, the fully clustered word 𝑀 = bbccccca has
pattern pat(𝑀) = bca.
   Let 𝑛 be a positive integer. A permutation πœ‹ of 𝑛 is a bijection from the set(οΈ€{0, . . . , 𝑛 βˆ’ 1} to it-)οΈ€
                                                                                              0   1 ... π‘›βˆ’1
self. A common way to represent permutations is the two-line notation: πœ‹ = πœ‹(0)                  πœ‹(1) ... πœ‹(π‘›βˆ’1) .
(Note that we index permutations from 0.) A cycle of πœ‹ is a minimal subset 𝐢 of the elements
of the permutation such that πœ‹(𝐢) = 𝐢. A cycle of length 1 is called a fixpoint, one of length
2 is called a transposition. It is a basic result on permutations that every permutation can be
uniquely decomposed into disjoint cycles, giving rise to(οΈ€another common                     representation of
permutations, the cycle representation. For example, πœ‹ = 2 3 0 5 4 1 has cycle representation
                                                                         0 1 2 3 4 5
                                                                                     )οΈ€

(0 2)(1 3 5)(4), where (0 2) is a transposition, (1 3 5) is a cycle of length 3, and (4) is a fixpoint.
A permutation that consists of one cycle only is called cyclic. For more details on permutations,
see [27]. Given a word 𝑀, the standard permutation of 𝑀, denoted πœ‹π‘€ , is the permutation defined
by: πœ‹π‘€ (𝑖) < πœ‹π‘€ (𝑗) if and only if either   (οΈ€ 0 1 2𝑀3𝑖 4<5 )︀𝑀𝑗 , or 𝑀𝑖 = 𝑀𝑗 and 𝑖 < 𝑗. For example, the
standard permutation of banana is 3 0 4 1 5 2 = (0 3 1)(2 4 5).
Burrows-Wheeler-Transform. Given a word 𝑣 of length 𝑛, the Burrows-Wheeler-Transform
(BWT) [1] of 𝑣 is the concatenation of the final characters of its rotations in lexicographic order.
This is often visualized using the so-called BW-matrix, consisting of the lexicographically sorted
rotations of 𝑣: bwt(𝑣) is the last column of this matrix, read from top to bottom. For example,
the BWT of banana is nnbaaa. It follows from the definition that two words 𝑒, 𝑣 have the same
BWT if and only if they are conjugates.
    While the BWT is defined for every word, not every word 𝑀 is the BWT of some word.
Likhomanov and Shur have shown that a word is the BWT of some word 𝑣 if and only if
the number of cycles of its standard permutation equals the greatest common divisor of the
runlengths of 𝑀 [21] (see also [20, 23] for earlier, more(οΈ€ restricted)οΈ€ versions of this statement).
For example, the standard permutation of nnbaaa is 04 15 23 30 41 52 = (0 4 1 5 2 3). It has one
cycle and this equals gcd(2, 1, 3); indeed, nnbaaa is the BWT of the word banana. Conversely,
banana’s standard permutation has two cycles, but the gcd of its runlengths is 1, and therefore,
it is not the BWT of any word.

Nice positions and pseudo-cycles. When is a word the BWT of some word of the form 𝑣$,
where $ is the sentinel character? It is clear that such a word must have exactly one occurrence
of $. Thus, it follows from the result of Likhomanov and Shur that its standard permutation
must be cyclic.
    The following question was studied in [22]: Given a word 𝑀, where can $ be inserted into 𝑀
such that the resulting word is the BWT of some 𝑣$? Such positions are called nice, formally:
Given 𝑀 = 𝑀0 Β· Β· Β· π‘€π‘›βˆ’1 , a position 𝑖 is nice, 0 ≀ 𝑖 ≀ 𝑛, if the (𝑛 + 1)-length word 𝑀′ is the
BWT of some word of the form 𝑣$, where 𝑀𝑗′ = 𝑀𝑗 for 𝑗 < 𝑖, 𝑀𝑖′ = $, and 𝑀𝑗′ = π‘€π‘—βˆ’1 for 𝑗 β‰₯ 𝑖.
For example, the word nnbaaa has two nice positions, 1 and 3: n$nbaaa = bwt(abanan$) and
nnb$aaa = bwt(anaban$), while banana has none: nowhere can a $ be inserted to make it
the BWT of some word.
    Let πœ‹ be a permutation of 𝑛. A non-empty subset 𝑆 βŠ† {0, 1, . . . , π‘›βˆ’1} is called a pseudo-cycle
if it can be partitioned into two sets 𝑆left and 𝑆right , where 𝑆left < 𝑆right (elementwise), such that
πœ‹(𝑆) = {π‘₯ βˆ’ 1 | π‘₯ ∈ 𝑆left } βˆͺ 𝑆right . The critical interval of pseudo-cycle 𝑆 is 𝑅𝑆 = [π‘Ž + 1, 𝑏],
where π‘Ž = max 𝑆left and 𝑏 = min 𝑆right . If 𝑆left is empty, we set π‘Ž = βˆ’1, and if 𝑆right is empty,
we set 𝑏 = 𝑛. An example of a pseudo-cycle is given in Fig. 1. When $ is inserted in the
critical interval, 𝑆 becomes a cycle, hence the resulting standard permutation is not cyclic. The
following characterization of nice positions was given in [22]:
Theorem 1 (Thm. 6 in [22]). Let 𝑀 be a word of length 𝑛 over Ξ£, and 0 ≀ 𝑖 ≀ 𝑛. Then 𝑖 is nice if
and only if there is no pseudo-cycle 𝑆 w.r.t. the standard permutation πœ‹π‘€ whose critical interval
contains 𝑖.
   We will denote by 𝑅𝑀 the critical set of 𝑀, the union of the critical intervals of all pseudo-
cycles. By Thm. 1 then 𝑖 is nice if and only if 𝑖 ̸∈ 𝑅𝑀 . We will further say that pseudo-cycle 𝑆
blocks position 𝑖 if 𝑖 ∈ 𝑅𝑆 . We will need one more result from [22]. Recall that here we index
permutations from 0, while in [22], they are indexed from 1.
Lemma 1 (Thm. 5 in [22]). Let 𝑀 be the BWT of some word, and 𝑐 the number of cycles of πœ‹π‘€ .
Then 𝑐 is a nice position.
 0     1      2     3     4      5     6      7              0     1    2        3      4      5    6   7   8
                                                             c     b    c        c      a      $    b   a   a
 c     b      c     c     a      b     a      a




 a     a      a     b     b      c     c     c                     a    a        a                  c   c   c
                                                             $                          b      b
 0     1      2     3     4      5     6     7               0     1    2        3      4      5    6   7   8
                                                                            (οΈ€ 0 1 2 3 4 5 6 7 )οΈ€
Figure 1: Standard permutation of (︀𝑀 = cbccabaa)οΈ€ with πœ‹(𝑀) = 5 3 6 7 0 4 1 2 (on the left) and
of 𝑀′ = cbcca$baa with πœ‹(𝑀′ ) = 06 14 27 38 41 50 65 72 83 (on the right). In particular, the pseudo-cycle
𝑆 = {3, 7} of πœ‹ (with 𝑆left in green, 𝑆right in blue) and its critical interval 𝑅 = [4, 7] (in red) are shown.
Inserting $ in position 5 in the word transforms the pseudo-cycle of πœ‹(𝑀) in a cycle in πœ‹(𝑀′ ).


3. New Results on Pseudo-cycles
In this section we will give a stricter characterization of nice positions via a subset of all possible
pseudo-cycles. We need some new definitions.
Definition 1. Let 𝑆 be a pseudo-cycle, 𝑆 = 𝑆left βˆͺ 𝑆right . If 𝑆left = βˆ… then 𝑆 is called right-only,
if 𝑆right = βˆ… then 𝑆 is called left-only. If 𝑆left ΜΈ= βˆ…, then we refer to π‘Ž = max 𝑆left as the boundary
of 𝑆. A right-only pseudo-cycle is called minimal if no proper subset is a cycle.
Example 1. Consider the word cbccabaa. The pseudo-cycle 𝑆 = {1, 2, 3, 6, 7}, with πœ‹(𝑆) =
{1, 2, 3, 6, 7} is a right-only pseudo-cycle with critical interval 𝑅𝑆 = [0, 1], while 𝑆 β€² = {5}, with
πœ‹(𝑆 β€² ) = {4} is a left-only pseudo-cycle with with boundary 4 and critical interval 𝑅𝑆 β€² = [6, 8].
Finally, 𝑆 β€²β€² = {2, 3, 6, 7}, with πœ‹(𝑆 β€²β€² ) = {1, 2, 6, 7} has boundary 3 and critical interval 𝑅𝑆 β€²β€² =
                                                      β€²β€² = {2, 3} and 𝑆 β€²β€²
[4, 6]. This last pseudo-cycle can be divided in 𝑆left                 right = {6, 7}.

Lemma 2. Let 𝑆 and 𝑆 β€² be two pseudo-cycles with boundary 𝑖. Then 𝑆 ∩ 𝑆 β€² is also a pseudo-cycle
with boundary 𝑖.
Proof. If 𝑆 βŠ† 𝑆 β€² or 𝑆 β€² βŠ† 𝑆, then the statement is trivial. Else, we have to show that (1)
𝑖 ∈ 𝑆 ∩ 𝑆 β€² , and (2) if π‘₯ ∈ 𝑆 ∩ 𝑆 β€² then π‘₯ βˆ’ 1 ∈ πœ‹(𝑆 ∩ 𝑆 β€² ) if π‘₯ ≀ 𝑖, and π‘₯ ∈ πœ‹(𝑆 ∩ 𝑆 β€² ) if π‘₯ > 𝑖. (1)
follows from the fact that both 𝑆 and 𝑆 β€² have boundary 𝑖. Ad (2): Let π‘₯ ∈ 𝑆 ∩ 𝑆 β€² , π‘₯ ≀ 𝑖. Since
𝑆 is a pseudo-cycle with boundary 𝑖, it follows that πœ‹ βˆ’1 (π‘₯ βˆ’ 1) ∈ 𝑆. Similarly, since 𝑆 β€² is a
pseudo-cycle with boundary 𝑖, πœ‹ βˆ’1 (π‘₯ βˆ’ 1) ∈ 𝑆 β€² , thus πœ‹ βˆ’1 (π‘₯ βˆ’ 1) ∈ 𝑆 ∩ 𝑆 β€² . Now let π‘₯ ∈ 𝑆 ∩ 𝑆 β€² ,
π‘₯ > 𝑖. Then it follows that πœ‹ βˆ’1 (π‘₯) ∈ 𝑆 and πœ‹ βˆ’1 (π‘₯) ∈ 𝑆 β€² , thus πœ‹ βˆ’1 (π‘₯) ∈ 𝑆 ∩ 𝑆 β€² .

  Note that not every 𝑖 is the boundary of some pseudo-cycle. But with Lemma 2, we can now
define a unique pseudo-cycle for each 𝑖 which is.
Definition 2. Let 0 ≀ 𝑖 ≀ 𝑛 βˆ’ 1. A pseudo-cycle 𝑆 = 𝑆left βˆͺ 𝑆right , with 𝑆left ΜΈ= βˆ… is called
𝑖-essential if 𝑖 is the boundary of 𝑆 and every pseudo-cycle 𝑆 β€² with boundary 𝑖 is a superset of 𝑆.
A pseudo-cycle is called essential if it is 𝑖-essential for some 𝑖.
   Clearly, for every 𝑖, there exists at most one 𝑖-essential pseudo-cycle. This implies that
altogether there are at most 𝑛 βˆ’ 2 essential pseudo-cycles.
Example 2. Consider again the word cbccabaa. 𝑆 = {3, 7}, with πœ‹(𝑆) = {2, 7} is the 3-
essential pseudo-cycle with critical interval 𝑅 = [4, 7]. In this pseudo-cycle we have 𝑆left = {3}
and 𝑆right = {7}.

Proposition 1. Given an 𝑖-essential pseudo-cycle 𝑆, its critical interval 𝑅𝑆 is maximal w.r.t. the
boundary 𝑖. In other words, if 𝑆 β€² is a pseudo-cycle with the same boundary 𝑖, then 𝑅𝑆 β€² βŠ† 𝑅𝑆 .

Proof. Follows immediately from Lemma 2.

Corollary 1. Let 𝑀 be a word. Then 𝑅𝑀 equals the union of critical intervals of those pseudo-cycles
which are either minimal right-only or essential.

3.1. Algorithm
In this section we will describe an alternative algorithm for finding all nice positions of a given
word 𝑀. Even though this algorithm is slower than the one in [22], it is of interest due to its
greater informativeness w.r.t. pseudo-cycles. The algorithm takes advantage of Corollary 1.
This is useful from a computational point of view, as can be deduced from the next lemma.

Lemma 3. The set of essential pseudo-cycles can be exponentially smaller than the complete set of
pseudo-cycles.

Proof. Let 𝑀 = bβŒˆπ‘›/2βŒ‰ aβŒŠπ‘›/2βŒ‹ , thus |𝑀| = 𝑛. The total number of pseudo-cycles is 2|𝑀|π‘Ž = 2βŒŠπ‘›/2βŒ‹ ,
since every subset of the index set of 𝑀 containing at least one a and excluding the first b is a
pseudo-cycle with 𝑆left and 𝑆right not empty. On the other hand, there are only βŒŠπ‘›/2βŒ‹ essential
pseudo-cycles, namely sets of the form 𝑆 = {𝑖, βŒŠπ‘›/2βŒ‹ + 𝑖} for 𝑖 ∈ {1, 2, 3, . . . , βŒŠπ‘›/2βŒ‹}.

   From Cor. 1 we know that it suffices to compute right-only pseudo-cycles and essential
pseudo-cycles to identify nice positions of a word. However, our experiments show that on 98%
of words with 0 nice positions, all positions are blocked by left-only and right-only pseudo-cycles
(data not shown). Based on experimental evidence from [22] we know that a large fraction of
words have 0 nice positions: e.g. for binary words of length 𝑛 = 20, around 65% of words have
0 nice positions, while over a ternary alphabet, 61% of words of length 20 have 0 nice positions.
Since right-only and left-only pseudo-cycles can be computed in linear time, our algorithm does
this as a first step (Phase 1 and Phase 2). Only if at this point there are still positions which
have not been blocked does it proceed to computing essential pseudo-cycles (Phase 3).
   We next give a description of the three phases of the algorithm.
   Phase 1 consists in computing the cycle decomposition of πœ‹π‘€ , since the cycles of the permu-
tation are exactly the minimal right-only pseudo-cycles.
   Phase 2 starts with a filtering step: It removes from the index set all indices which cannot be
part of any left-only pseudo-cycle. The idea is that 0 cannot be part of a left-only pseudo-cycle
because, since then βˆ’1 would be in the image of the pseudo-cycle. This cannot happen because
both the pseudo-cycle and its image must be a subset of the set of indices {0, 1, . . . , 𝑛 βˆ’ 1}.
By using πœ‹ on 0 we have the information that πœ‹(0) + 1 must not be in the pseudo-cycle itself,
because otherwise it would imply also the previous statement. If we continue to apply πœ‹
iteratively we can block every index connected to 0, avoiding them in the following operations.
 Algorithm 1: Pseudo-cycles finder
  Input: Word 𝑀
  Output: Set of right-only, left-only, and essential pseudo-cycles
1 𝑛 ← |𝑀|
2 πœ‹ ← 𝑠𝑑𝑑𝑃 π‘’π‘Ÿπ‘š(𝑀)
3 πœ‹ βˆ’1 ← 𝑖𝑛𝑣𝑃 π‘’π‘Ÿπ‘š(πœ‹)
4 setRight ← π‘Ÿπ‘–π‘”β„Žπ‘‘π‘ƒ 𝐢𝑦𝑐𝑙𝑒𝑠(πœ‹)                    // cycle decomposition
5 setLeft ← 𝑙𝑒𝑓 𝑑𝑃 𝐢𝑦𝑐𝑙𝑒𝑠(πœ‹, πœ‹ βˆ’1 )       // compute left-only pseudo-cycles
6 if |𝑅𝑀 | = 𝑛 + 1 then       // if 𝑅𝑀 contains all indices, exit
7     return setRight βˆͺ setLeft
8 setEss ← 𝑒𝑠𝑠𝑃 𝐢𝑦𝑐𝑙𝑒𝑠(πœ‹, πœ‹ βˆ’1 )        // compute 𝑖-essential pseudo-cycles
9 return setRight βˆͺ setLeft βˆͺ setEss



With the remaining set of indices the left-only pseudo-cycles are built. We use a procedure
resembling the cycle decomposition of the standard permutation. Starting from position 𝑖, we
traverse the permutation and we insert πœ‹(𝑖) + 1 in 𝑆. From this index we continue until we
loop back to the starting index 𝑖, namely we build a β€œshifted” cycle. This procedure is repeated
until no indices are left to create disjoint left-only pseudo-cycles.
   Phase 3 computes 𝑖-essential pseudo-cycles which have a non-empty right part, since essen-
tial pseudo-cycles which are left-only have already been computed in Phase 2. Let 𝐡 be the set
of indices for which such a pseudo-cycle has already been computed. For each 𝑖 > 0, 𝑖 ̸∈ 𝐡,
the algorithm computes the unique 𝑖-essential pseudo-cycle, if it exists. During this step, the
algorithm traverses the permutation iteratively starting from 𝑗 = πœ‹ βˆ’1 (𝑖 βˆ’ 1) and using this
trajectory:                          ⎧
                                     βŽͺabort
                                     βŽͺ                       if 𝑗 = 0
                                     βŽͺ
                                     βŽ¨πœ‹ βˆ’1 (𝑗 βˆ’ 1)           if 𝑗 < i
                                     βŽͺ
                               𝑗←                                                             (1)
                                     βŽͺ
                                     βŽͺ
                                     βŽͺ
                                         βˆ’1
                                       πœ‹ (𝑗)                 if 𝑗 > i
                                     ⎩stop - ps.-c. closed if 𝑗 = i
                                     βŽͺ

   The algorithm inserts the indices that it touches either in 𝑆left (if the index is on the left of
the boundary) or in 𝑆right (if the index is on the right of the boundary). It stops either because
it returns to the starting point (boundary) during the traversal of the permutation or because
it fails the construction of such pseudo-cycle, ending up in position 0 (which cannot be part
of 𝑆left , otherwise we would have -1 in πœ‹(𝑆)). An example of the construction of 𝑖-essential
pseudo-cycles can be seen in Figure 2.
Proposition 2. Algorithm 1 outputs exactly the essential pseudo-cycles and the minimal right-only
pseudo-cycles of the input word. Therefore, the algorithm computes all pseudo-cycles which are
necessary to identify all nice positions.
Proposition 3. Algorithm 1 runs in π’ͺ(𝑛2 ) time.
  The following example shows that this bound is tight: In fact, there is an infinite number of
words on which the algorithm takes Θ(𝑛2 ) time. The reason is that the running time of Phase
 0      1     2     3     4     5     6      7           0     1     2     3     4     5    6      7
 b      d     c     a     e     b     d      c           b     d     c     a     e     b    d      c




 a      b     b     c     c     d     d      e           a     b     b     c     c    d     d      e
 0      1     2     3     4     5     6      7           0     1     2     3     4    5     6      7



 0      1     2     3     4     5     6      7           0    1     2     3     4     5     6     7
 b      d     c     a     e     b     d      c           b    d     c     a     e     b     d     c




 a      b     b     c     c     d     d      e           a    b     b     c     c     d     d     e
 0      1     2     3     4     5     6      7           0    1     2     3     4     5     6     7


Figure 2: An example for the construction of the 𝑖-essential pseudo-cycle of the word bdcaebdc with
boundary 𝑖 = 3 (Section 3.1). 𝑆left in green, 𝑆right in blue, critical interval in red.


3 is proportional to the sum of the lengths of the essential pseudo-cycles of the input word.
E.g. for 𝜎 = 2 and 𝑛 = 20, consider 𝑀 = bbaaaaaaaaabbbbbbbba: the sum of the sizes of the
essential pseudo-cycles of 𝑀 is 302. We can produce such example words for every even length
by adding an a and a b, respectively, to the first run of a’s and to the second run of b’s.


4. On Nice Positions in Fully Clustered Words
In this section, we present properties of nice positions in fully clustered words. Recall that a fully
clustered word is one which has one run per character, such as cccabb. We will study these
questions for different alphabets, where we restrict our attention to words 𝑀 s.t. alph(𝑀) = Ξ£,
i.e. words in which every character appears at least once. We are interested in the number β„Ž(𝑀)
of nice positions of a word 𝑀, and in the number of words 𝐻𝑛 (π‘˜) which have π‘˜ nice positions,
for given length 𝑛.
   As an example, for 𝜎 = 2, there are 10 fully clustered words of length 6, 6 of which have 1
nice position, 2 have 2 nice positions, and 2 have 3 (see Table 1). Of the 60 fully clustered words
for 𝜎 = 3, 5 have 0 nice positions, 26 have 1, 16 have 2, and 13 have 3 (data not shown).
   First we gather some useful properties using pseudo-cycles:
Lemma 4. Let 𝑀 be any word and πœ‹π‘€ its standard permutation. The following hold:
     1. If 𝑛 βˆ’ 1 is a fixpoint, then no 𝑖 < 𝑛 is nice.
             word       β„Ž(𝑀)    nice positions         word        β„Ž(𝑀)   nice positions
             abbbbb        1    6                      baaaaa         1   1
             aabbbb        1    6                      bbaaaa         3   2, 4, 6
             aaabbb        1    6                      bbbaaa         2   3, 5
             aaaabb        1    6                      bbbbaa         2   2, 4
             aaaaab        1    6                      bbbbba         3   1, 3, 5
Table 1
Nice positions of fully clustered binary words of length 6. We report the number of nice positions and
the nice positions for each word.


   2. If πœ‹(1) = 0, then no 𝑖 > 1 is nice.

   3. 𝑛 is nice if and only if πœ‹π‘€ has no left-only pseudo-cycle.

   4. 0 is not nice.
Proof. 1. If (𝑛 βˆ’ 1) is a fixpoint, then {𝑛 βˆ’ 1} is a right-only pseudo-cycle, so it blocks all
elements from 0 to 𝑛 βˆ’ 1. 2. If πœ‹(1) = 0, then {1} is a left-only pseudo-cycle, blocking any
𝑖 > 1. 3. If 𝑛 is nice, then it cannot be in the critical interval of any pseudo-cycle. But the
pseudo-cycles whose critical interval contains 𝑛 are exactly the left-only pseudo-cycles, since if
there is a non-empty right part, then its minimum must be smaller than 𝑛. 4. The entire set
{0, . . . , 𝑛 βˆ’ 1} is a right-only pseudo-cycle, so its minimum 0 blocks all 𝑖 ≀ 0.

Lemma 5. Let 𝑀 be any word, |𝑀| = 𝑛. If πœ‹π‘€ is the identity permutation, then β„Ž(𝑀) = 1. In
particular, 𝑛 is the only nice position.
Proof. πœ‹π‘€ = (0)(1) Β· Β· Β· (𝑛 βˆ’ 1) (in cycle representation), consisting of 𝑛 fixpoints. Since 𝑛 βˆ’ 1
is a fixpoint, by Lemma 4, no 𝑖 < 𝑛 can be nice. Clearly, πœ‹π‘€ has no left-only pseudo-cycles, so
again by Lemma 4, 𝑛 is nice.

  Since the standard permutation of any word over a unary alphabet is the identity permutation,
these words have exactly one nice position:
Corollary 2. If 𝑀 is a word over an alphabet of size 𝜎 = 1, then β„Ž(𝑀) = 1.

4.1. Number of Nice Positions over Alphabet Size 2
Our first result says that every fully clustered word over a binary alphabet has at least one nice
position.
Proposition 4. If 𝑀 begins with an a, then β„Ž(𝑀) = 1. If 𝑀 begins with a b then β„Ž(𝑀) β‰₯ 1.
Proof. First, let 𝑀 = a𝑖 bπ‘›βˆ’π‘– , for some 1 ≀ 𝑖 ≀ 𝑛 βˆ’ 1. Then πœ‹π‘€ = 𝑖𝑑, thus, by Lemma 5, there is
exactly one nice position, namely 𝑛.
   Now let 𝑀 = b𝑖 aπ‘›βˆ’π‘– , for some 1 ≀ 𝑖 ≀ 𝑛 βˆ’ 1. It is known that a word of this form is the BWT
of a standard word, or of a power of a standard word [20], with the first case if gcd(𝑖, 𝑛 βˆ’ 𝑖) = 1.
By the result of Likhomanov and Shur [21], the number of cycles 𝑐 of πœ‹π‘€ equals the greatest
common divisor of the runlengths, namely 𝑐 = gcd(𝑖, 𝑛 βˆ’ 𝑖). By Lemma 1 then, 𝑐 is nice.
Proposition 5. Let 𝑀 = baπ‘›βˆ’1 . Then β„Ž(𝑀) = 1.

Proof. The standard permutation of 𝑀 is πœ‹π‘€ = (0, 𝑛 βˆ’ 1, 𝑛 βˆ’ 2, . . . , 2, 1), in cycle notation.
By Lemma 1, 1 is nice, since πœ‹π‘€ has one cycle. 0 is not nice by Lemma 4. Every 𝑖 > 0 is a
singleton left-only pseudo-cycle, since πœ‹(𝑖) = 𝑖 βˆ’ 1, while all other pseudo-cycles are subsets
of {1, . . . , 𝑛 βˆ’ 1}, and thus left-only. Every left-only pseudo-cycle {𝑖} is 𝑖-essential, blocking
[𝑖 + 1, 𝑛], i.e. the only position which is nice is 1 (Fig. 3).


  0       1       2      3       4       5         6       0      1      2       3      4         5       6
  b       a       a      a       a       a         a       b      a      a       a      a         a       a




  a       a      a       a       a       a         b       a      a      a      a      a      a       b
  0       1      2       3       4       5         6       0      1      2      3      4      5       6
(a) 𝑆1 = {0, 1, 2, 3, 4, 5, 6},                          (b) 𝑆2 = {1}, πœ‹(𝑆2 ) = {0}, 𝑅2 = [2, 7],
    πœ‹(𝑆1 ) = {0, 1, 2, 3, 4, 5, 6}, 𝑅1 = [0, 0].             boundary 1.
Figure 3: Two pseudo-cycles 𝑆1 , 𝑆2 in the word baaaaaa and their critical intervals 𝑅1 , 𝑅2 , which
block all positions of the word except for 1. (𝑆left in green, 𝑆right in blue, critical intervals in red.)



Proposition 6. Let 𝑀 = b𝑖 aπ‘›βˆ’π‘– where 𝑐 = gcd(𝑖, 𝑛 βˆ’ 𝑖) > 1. Then 𝑀 has at least two nice
positions, namely 𝑐 and 𝑐 + 2.

Proposition 7. Let 𝑀 = b𝑗+1 a𝑗 . Then β„Ž(𝑀) = 2, in particular, 1 and 𝑛 are nice.

Proof. First, since 𝑀 is a BWT, the number of cycles 𝑐 of πœ‹π‘€ is nice, by Lemma 1. Since the
runlengths 𝑗 and 𝑗 + 1 are relatively prime, 𝑐 = 1, so 1 is nice. Let 0 < 𝑖 ≀ 𝑗. The 𝑖-essential
pseudo-cycle is 𝑆𝑖 = {𝑖, 𝑖 + 𝑗}, since πœ‹(𝑖) = 𝑖 + 𝑗 and πœ‹(𝑗) = 𝑖 + 𝑗 βˆ’ (𝑗 + 1) = 𝑖 βˆ’ 1. The
critical interval of 𝑆𝑖 is 𝑅𝑖 = [𝑖+1, 𝑖+𝑗]. The union of these critical intervals is [2, π‘›βˆ’1] βŠ† 𝑅𝑀 ,
since 𝑛 = |𝑀| = 2𝑗 + 1. Finally, no 𝑖-essential pseudo-cycles exist for 𝑖 > 𝑗. Therefore, there
are no left-only pseudo-cycles, and thus, by Lemma 4, 𝑛 is nice (see Figure 4).


4.2. Number of Words with π‘˜ Nice Positions over Alphabet Size 2
We now turn to the number of words with π‘˜ nice positions.

Definition 3. For 𝑛 > 0, π‘˜ β‰₯ 0, let π»π‘›πœŽ (π‘˜) denote the number of fully clustered words with
exactly π‘˜ nice positions over an alphabet of size 𝜎.

   From Prop. 4 it follows that 𝐻𝑛2 (0) = 0, and from Prop. 4 and 5 it follows that 𝐻𝑛2 (1) β‰₯ 𝑛,
since the 𝑛 βˆ’ 1 words beginning with a and the word baπ‘›βˆ’1 have 1 nice position. For a better
understanding of the words with π‘˜ nice positions, we ran experiments on fully clustered words
up to length 100, and studied their pseudo-cycles. These led to the following conjectures:
 0    1    2    3     4    5    6       0    1    2    3    4    5    6     0    1    2    3    4    5    6
 b    b    b    b     a    a    a       b    b    b    b    a    a   a      b    b    b    b    a    a    a




 a    a    a   b     b    b    b        a    a   a    b    b    b    b      a    a    a   b    b    b    b
 0    1    2   3     4    5    6        0    1    2   3    4    5    6      0    1    2   3    4    5    6




(a) 𝑆1 = {1, 4},                       (b) 𝑆2 = {2, 5},                    (c) 𝑆3 = {3, 6},
    πœ‹(𝑆1 ) = {0, 4}, 𝑅1 = [2, 4],          πœ‹(𝑆2 ) = {1, 5}, 𝑅2 = [3, 5],       πœ‹(𝑆3 ) = {2, 6}, 𝑅3 = [4, 6],
    boundary 1.                            boundary 2.                         boundary 3.
Figure 4: Three pseudo-cycles 𝑆1 , 𝑆2 , 𝑆3 of the same form in the word bbbbaaa, and their critical
intervals 𝑅1 , 𝑅2 , 𝑅3 . (𝑆left in green, 𝑆right in blue, critical intervals in red.)



Conjecture 1. For all 𝑛, 𝐻𝑛2 (1) = 𝑛.

Conjecture 2. For 𝑛 even, 𝑛 β‰₯ 8, 𝐻𝑛2 (2) = 0. For 𝑛 odd, 𝐻𝑛2 (2) = 1.

Conjecture 3. For every 𝑛, 𝐻𝑛2 (⌈ 𝑛2 βŒ‰) = 2.

   The two words with βŒˆπ‘›/2βŒ‰ nice positions are bbaπ‘›βˆ’2 and bπ‘›βˆ’1 a. This is because both have
βŒŠπ‘›/2βŒ‹ pseudo-cycles whose critical intervals contain exactly one position each. From [22], we
know that for words that are BWT images, the parity of nice positions is the same as the parity
of the number of cycles. In these two cases, the pseudo-cycles block every position which have
the opposite parity, leaving all the remaining positions available.
   All our conjectures are confirmed by the histograms we produced for words up to length
100 and π‘˜ nice positions, π‘˜ up to 19 (data not shown). Furthermore, the words show a regular
behaviour with distinct π‘˜β€™s. In particular, we can see that the 8 longest words (4 of even length,
4 of odd length) are disconnected from all the others. Moreover, the greater π‘˜, the farther
this group of 8 words is from the others. The first π‘˜ where this group is visible is π‘˜ = 6,
and we identify the 4 words of even length as follows: two primitive words (𝑀1 = b7 a13 ,
𝑀2 = b15 a7 ), two power words (𝑀3 = b14 a6 , 𝑀4 = b8 a14 ); and the 4 primitive words of odd
length (𝑀5 = b8 a15 , 𝑀6 = b16 a7 , 𝑀7 = b9 a16 , 𝑀8 = b17 a8 ).
   These words give an idea of the regularity of 𝐻𝑛2 (π‘˜) across different π‘˜. Consider 𝑀1 = b7 a13 ,
and the same word with 4 more a’s and 2 more b’s, namely 𝑀1β€² = b9 a17 . In both cases there are
|𝑀1 |b βˆ’1 number of pseudo-cycles which have the same form, and starting from the one with the
smallest boundary, they are shifted by one position to the right. Moreover, the critical interval
of each of these pseudo-cycles has length |𝑀1 |𝑏 βˆ’ 1 (in total they cover 2 Β· |𝑀1 |𝑏 βˆ’ 1 positions).
On the other hand, there are |𝑀1 |βˆ’2Β·|𝑀2
                                         1 |𝑏 βˆ’1
                                                 additional pseudo-cycles blocking just one position
each, e.g. 5 and 7 in 𝑀 resp. 𝑀 . For this reason, β„Ž(𝑀1 ) = |𝑀1 | βˆ’ |𝑀1 |βˆ’2Β·|𝑀
                               β€²
                                                                           2
                                                                              1 |𝑏 βˆ’1
                                                                                      βˆ’ 2 Β· |𝑀1 |𝑏 βˆ’ 1 and
                    |𝑀′ |βˆ’2Β·|𝑀′ | βˆ’1
β„Ž(𝑀1β€² ) = |𝑀1β€² | βˆ’ 1 2 1 𝑏 βˆ’ 2 Β· |𝑀1β€² |𝑏 βˆ’ 1, and therefore β„Ž(𝑀1β€² ) = β„Ž(𝑀1 ) + 1. We observed
this phenomenon holds adding iteratively the same number of a’s and b’s (up to length 100).
             π‘˜      3    4    5    6     7    8    9   10    11   12   13   14   15    16
             π‘›π‘˜    10   16   22   28    34   40   46   52    58   64   70   76   82    88
Table 2
Conj. 4: For 𝜎 = 2, there are no binary words of length greater than π‘›π‘˜ with π‘˜ nice positions.


Further, the same happens adding 2 b’s and 4 a’s to 𝑀4 , 𝑀5 , 𝑀7 , and adding 4 b’s and 2 a’s to
𝑀2 , 𝑀3 , 𝑀6 , 𝑀8 . Our experiments suggest that for fixed π‘˜, there are no words with greater length
then those above, see Table 2. On the basis of these observations, we conjecture that the set
β„± 2 (π‘˜) of binary words with π‘˜ nice positions is finite. Formally:

Conjecture 4. For every π‘˜ β‰₯ 3, there exists a length π‘›π‘˜ such that no word of length greater than
π‘›π‘˜ has exactly π‘˜ nice positions.

   Finally, we noticed from our experiments that the smallest nice position is always a divisor of
𝑛. Let 𝑑 divide 𝑛. As we have seen before, 𝑀 = b𝑖 a𝑗 is the BWT of a word 𝑒𝑑 with 𝑒 a standard
word and 𝑑 = gcd(𝑗, 𝑖) [20]. Thus 𝑑 is nice by Lemma 1. The standard permutation of 𝑀 has
𝑑 cycles, and the largest minimum of a cycle is 𝑑 βˆ’ 1, blocking all positions 𝑖 ≀ 𝑑 βˆ’ 1. Thus,
                                                          𝑛
given 𝑛, 𝑑 where 𝑑 is a divisor of 𝑛, e.g. the word b𝑑 a( 𝑑 βˆ’1)𝑑 has smallest nice position 𝑑. We
conjecture that the converse is true also:

Conjecture 5. An integer 𝑑 can occur as a smallest nice position for some fully clustered word of
length 𝑛 if and only if 𝑑 divides 𝑛.


5. Conclusion and Ongoing Work
In this paper we continued the study of a combinatorial problem introduced in [22]: which
are the positions where a dollar can be inserted into a word to make it a BWT (so-called nice
positions). We strengthened the characterization via pseudo-cycles of nice positions given there.
Pseudo-cycles are particular subsets of the indices; here we showed that a much smaller subset
of pseudo-cycles suffices to characterize the set of nice positions of a word.
   We further presented an algorithm that returns the smallest set of pseudo-cycles needed to
identify all nice positions of the input word, and using this algorithm, we studied nice positions
of fully clustered words over a binary alphabet. We presented properties on the number of nice
positions for distinct fully clustered words, and we gave some conjectures on the number of
words with fixed number of nice positions, for which we have experimental evidence.
   As future work we are planning to extend the study to larger alphabets. We already have
extensive experimental data for 𝜎 = 3, and plan to prove related properties as for the binary
alphabet. For example, w.r.t. Conj. 4 we could characterize a consistent phenomenon we observe
in the histograms for the ternary alphabets. For π‘˜ β‰₯ 3 we reach a plateau after a certain length,
which we believe to be strongly related to the conjectured π‘›π‘˜ . Moreover the words appearing
in this plateau seem to be consisting of only two types, namely a𝑖 c𝑗 bβ„“ and c𝑖 b𝑗 aβ„“ .
References
 [1] M. Burrows, D. J. Wheeler, A block-sorting lossless data compression algorithm, Technical
     Report, DIGITAL System Research Center, 1994.
 [2] G. Rosone, M. Sciortino, The Burrows-Wheeler transform between data compression
     and combinatorics on words, in: 9th Conference on Computability in Europe (CiE 2013),
     Springer, 2013, pp. 353–364.
 [3] G. Manzini, An analysis of the Burrows-Wheeler transform, J. ACM 48 (2001) 407–430.
 [4] G. Navarro, Indexing highly repetitive string collections, part I: repetitiveness measures,
     ACM Computing Surveys 54 (2021) 29:1–29:31.
 [5] H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform,
     Bioinformatics 26 (2010) 589–595.
 [6] B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignment
     of short DNA sequences to the human genome, Genome Biology 10 (2009) R25.
 [7] T. W. Lam, R. Li, A. Tam, S. Wong, E. Wu, S. M. Yiu, High throughput short read alignment
     via bi-directional BWT, in: 2009 IEEE International Conference on Bioinformatics and
     Biomedicine, 2009, pp. 31–36.
 [8] C. Boucher, T. Gagie, A. Kuhnle, B. Langmead, G. Manzini, T. Mun, Prefix-free parsing for
     building big BWTs, Algorithms Mol. Biol. 14 (2019) 13:1–13:15.
 [9] P. Ferragina, G. Manzini, Indexing compressed text, J. ACM 52 (2005) 552–581.
[10] T. Gagie, G. Navarro, N. Prezza, Optimal-time text indexing in BWT-runs bounded space,
     in: Proc. of 39th ACM-SIAM Symposium on Discrete Algorithms (SODA 2018), 2018, pp.
     1459–1477.
[11] M. Crochemore, J. DΓ©sarmΓ©nien, D. Perrin, A note on the Burrows-Wheeler transformation,
     Theor. Comput. Sci. 332 (2005) 567–572.
[12] I. M. Gessel, C. Reutenauer, Counting permutations with given cycle structure and descent
     set, Journal of Combinatorial Theory 64 (1993) 189–215.
[13] S. Mantaci, A. Restivo, G. Rosone, M. Sciortino, L. Versari, Measuring the clustering effect
     of BWT via RLE, Theor. Comput. Sci. 698 (2017) 79–87.
[14] S. Brlek, A. Frosini, I. Mancini, E. Pergola, S. Rinaldi, Burrows-Wheeler transform of words
     defined by morphisms, in: 30th International Workshop on Combinatorial Algorithms
     (IWOCA 2019), volume 11638 of Lecture Notes in Computer Science, Springer, 2019, pp.
     393–404.
[15] S. Giuliani, S. Inenaga, Zs. LiptΓ‘k, N. Prezza, M. Sciortino, A. Toffanello, Novel results on
     the number of runs of the Burrows-Wheeler-Transform, in: Proc. of 47th International
     Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM
     2021), volume 12607 of LNCS, 2021, pp. 249–262.
[16] A. Frosini, I. Mancini, S. Rinaldi, G. Romana, M. Sciortino, Logarithmic equal-letter runs
     for BWT of purely morphic words, in: 26th International Conference on Developments in
     Language Theory (DLT 2022), volume 13257 of Lecture Notes in Computer Science, Springer,
     2022, pp. 139–151.
[17] R. Giancarlo, A. Restivo, M. Sciortino, From first principles to the Burrows and Wheeler
     transform and beyond, via combinatorial optimization, Theoretical Computer Science 387
     (2007) 236–248.
[18] I. M. Gessel, A. Restivo, C. Reutenauer, A bijection between words and multisets of
     necklaces, European Journal of Combinatorics 33 (2012) 1537–1546.
[19] J. W. Daykin, R. Groult, Y. Guesnet, T. Lecroq, A. Lefebvre, M. LΓ©onard, Γ‰. Prieur-Gaston,
     A survey of string orderings and their application to the Burrows–Wheeler transform,
     Theoretical Computer Science 710 (2018) 52–65.
[20] S. Mantaci, A. Restivo, M. Sciortino, Burrows–Wheeler transform and Sturmian words,
     Information Processing Letters 86 (2003) 241–246.
[21] K. M. Likhomanov, A. M. Shur, Two Combinatorial Criteria for BWT Images, in: Proceeding
     of the 6th International Computer Science Symposium in Russia (CSR 2011), 2011, pp.
     385–396.
[22] S. Giuliani, Zs. LiptΓ‘k, F. Masillo, R. Rizzi, When a dollar makes a BWT, Theoretical
     Computer Science 857 (2021) 123–146.
[23] J. Simpson, S. J. Puglisi, Words with simple Burrows-Wheeler Transforms, Electronic
     Journal of Combinatorics 15 (2008).
[24] A. Restivo, G. Rosone, Burrows-Wheeler transform and palindromic richness, Theor.
     Comput. Sci. 410 (2009) 3018–3026.
[25] A. Restivo, G. Rosone, Balancing and clustering of words in the Burrows–Wheeler trans-
     form, Theoretical Computer Science 412 (2011) 3019–3032.
[26] S. Ferenczi, L. Q. Zamboni, Clustering words and interval exchanges, Journal of Integer
     Sequences 16 (2013) 3.
[27] M. BΓ³na, Combinatorics of Permutations, Second Edition, Discrete mathematics and its
     applications, CRC Press, 2012.