<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ordering Regular Languages: A Danger Zone</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanna D'Agostino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Martincigh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Policriti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Udine</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ordering the collection of states of a given automaton starting from an order of the underlying alphabet is a natural move towards a computational treatment of the language accepted by the automaton. Along this path, Wheeler graphs have been recently introduced as an extension/adaptation of the Burrows-Wheeler Transform (the now famous BWT, originally defined on strings) to graphs. These graphs constitute an important data-structure for languages, since they allow a very efficient storage mechanism for the transition function of an automaton, while providing a fast support to all sorts of substring queries. This is possible as a consequence of a property-the so-called path coherence-valid on Wheeler graphs and consisting in an ordering on nodes that “propagates” to (collections of) strings. By looking at a Wheeler graph as an automaton, the ordering on strings corresponds to the co-lexicographic order of the words entering each state. This leads naturally to consider the class of regular languages accepted by Wheeler automata, i.e. the Wheeler languages. It has been shown that, as opposed to the general case, the classic determinization by powerset construction is polynomial on Wheeler languages. As a consequence, most of the classical problems turn out to be “easy”that is, solvable in polynomial time-on Wheeler languages. Moreover, deciding whether a DFA is Wheeler and deciding whether a DFA accepts a Wheeler language is polynomial. Our contribution here is to put an upper bound to easy problems. For instance, whenever we generalize by switching to general NFAs or by not fixing an order of the underlying alphabet, the above mentioned problems become “hard”-that is NP-complete or even PSPACE-complete.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Adding an order to a class of structures (graphs, groups, monoids...) is a
natural move. Moreover, ordering is a basic data-structuring mechanism, strongly
favoring computational manipulations of the considered class.</p>
      <p>
        Over the class of finite automata, order has been added e.g. in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where
the order must propagate along equally labeled transitions. Much more recently,
Copyright © 2021 for the individual papers by the papers’ authors. Copyright ©
2021 for the volume as a collection by its editors. This volume and its papers are
published under the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
in an effort to find a common denominator to a number of different algorithmic
techniques, Gagie et al. proposed (in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) a new simple strategy for enforcing
and using order on a given automaton. Starting from an underlying order of
the alphabet, the order on the states (formally given in Definition 1) must: i)
agree with the order of the labels of their incoming edges, and ii) be coherent
on target/source nodes, for pairs of arcs with equal labels. It turns out that
this kind of automata, called Wheeler automata, (a) admit an efficient index
data structure for searching subpaths labeled with a given query pattern, and
(b) enable a representation of the graph in a space proportional to that of the
edges’ labels since the topology can be encoded with just O(1) bits per node
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (as well as enabling more advanced compression mechanisms, see [
        <xref ref-type="bibr" rid="ref11 ref3">3,11</xref>
        ]).
This is in contrast with the fact that general graphs require a logarithmic (in
the graph’s size) number of bits per edge to be represented, as well as with
recent results showing that in general, the subpath search problem can not be
solved in subquadratic time, unless the strong exponential time hypothesis is
false [
        <xref ref-type="bibr" rid="ref10 ref4 ref5 ref6 ref7">4,5,6,7,10</xref>
        ].
      </p>
      <p>q0
a
d
q1
q4
c
c
c
q2
f
q3
c
f
q5
Fig. 1. A WDFAs A recognizing the language Ld = ac + dc f . Condition (i) of
Definition 1 implies input consistency and induces the partial order q1 &lt; q2; q3 &lt; q4 &lt;
q5. From condition (ii) it follows that (q1; c) (q4; c), thus q2 &lt; q3. Therefore, the
only order that could make A Wheeler is q0 &lt; q1 &lt; q2 &lt; q3 &lt; q4 &lt; q5. The reader can
verify that condition (ii) holds for each pair of equally labeled edges.</p>
      <p>
        In Figure 1 is depicted an example of a Wheeler automaton (see also
Definition 1). Notice that the minimum DFA recognizing L is not input consistent,
hence not Wheeler; we will return on this point later. One is naturally led to
consider the class of Wheeler languages, i.e. the regular languages recognized
by some Wheeler automaton, as well as to raise the question of whether it is
possible to decide, just by looking at a (generic, possibly not Wheeler) DFA,
whether the language it recognizes is Wheeler. Wheeler languages have been
studied extensively in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], where it is shown that we can decide in polynomial
time whether a DFA recognizes a Wheeler language. Unfortunately, trying to
use the same strategy on a NFA language does not work. In fact, in this paper
we show that the problem for NFAs becomes PSPACE-complete.
      </p>
      <p>Consider the example in Figure 2 and confront it with Figure 1. Even tough
apparently we made an insignificant change to the automaton, that is we have
simply swapped character-labels d and b, the new language recognized by the
automaton turns out to be not Wheeler. This example illustrates that the order
of the underlying alphabet plays a significant role when determining whether
an automaton, and even a language, is Wheeler. Notice that swapping d and
b can be seen as considering a different order on : from a c d f to
a d c f . Again a natural question is raised: when presented with an
automaton (or a language) that is not Wheeler, is there a different way to order
the alphabet so that the automaton (language) becomes Wheeler? Being able
to answer in polynomial time to this this question would be very helpful for
applications. Unfortunately, as shown in Section 3, this problem turns out to be
NP-complete even for DFAs.</p>
      <p>
        WNFAs have the following useful property: turning a WNFA into a DFA
using the (classic) powerset construction actually results in a WDFA with at
most twice the number of states of the original WNFA. In other words, the
blow-up of states that we can observe when converting NFAs into DFAs, does
not occur for Wheeler non-deterministic automata. Nevertheless, it is known (see
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) that a blow-up of states can occur even when we switch from the minimum
DFA recognizing a language L to the minimum WDFA recognizing L. As a last
contribution, in this paper we give an answer to a question put forward in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
We provide an algorithm to compute the minimum WDFA starting from the
minimum DFA. Our algorithm works in exponential time in the worst case and,
since the dimension of the output might be exponential in the dimension of the
input, is optimal for the task.
      </p>
      <p>Due to space constraints, we present only the sketches of the proofs of our
results. The full proofs, complete with all the missing details, can be found in
the Appendix.
2</p>
      <p>Wheeler languages
First of all, we fix some notation. Let denote a finite alphabet endowed with
a total order ( ; ). We denote by the set of finite words over , with "
being the empty word, and we extend the order over to the co-lexicographic
order ( ; ), where if and only the reverse of , i.e. read from the
right to the left, precedes lexicographically the reverse of . Given two words
; 2 , we denote by a the property that is a suffix of . For a language
L , we denote by Pref(L) the set of prefixes of strings in L. We denote by
A = (Q; q0; ; F; ) a finite automaton (NFA), with Q as set of states, q0 initial
state, : Q ! 2Q transition function, and F Q final states. The dimension
of A, denoted by jAj, is defined as the number of states of A. An automaton is
deterministic (DFA) if j (q; a)j 1, for all q 2 Q and a 2 . As customary, we
extend to operate on strings as follows: for all q 2 Q, a 2 and 2
(q; ") = fqg;
(q; a) =</p>
      <p>[
v2 (q; )
(v; a):
We denote by L(A) = f 2 : (q0; ) \ F 6= ;g the language accepted by the
automaton A. We make the assumption that every automaton is basic, that is,
every state is reachable from the initial state and every state can reach at least
one final state. Notice that this assumption is not restrictive, since removing
from a NFA every state not reachable from q0 and every state from which is
impossible to reach a final state can be done in linear time and does not change
the accepted language. It immediately follows that:
– there might be only one state without incoming edges, namely q0;
– every word that can be read starting from q0 belongs to Pref(L).
Lastly, given a finite set Q, totally ordered by the relation &lt;, we say that I Q
is an interval if and only if for all q; q0; q00 2 Q with q q0 q00, if q; q00 2 I then
q0 2 I. We denote by I := [qmin; qmax] a generic interval, where qmin (qmax) is
the minimum (maximum) element of I with respect to &lt;.</p>
      <p>
        The class of Wheeler automata has been recently introduced in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. An
automaton in this class has the property that there exists a total order on its
states that is propagated along equally labeled transition. Moreover, the order
must be compatible with the underlying order of the alphabet:
Definition 1 (Wheeler Automaton). A Wheeler NFA (WNFA) A is a NFA
(Q; q0; ; F; ) endowed with a binary relation &lt;, such that: (Q; &lt;) is a linear
order having the initial state q0 as minimum, q0 has no in-going edges, and the
following two (Wheeler) properties are satisfied. Let v1 2
(u2; a2):
(u1; a1) and v2 2
(i) a1 a2 ! v1 &lt; v2
(ii) (a1 = a2 ^ u1 &lt; u2) ! v1
      </p>
      <p>v2.</p>
      <p>A Wheeler DFA (WDFA) is a WNFA in which the cardinality of (u; a) is always
less than or equal to one.</p>
      <p>In Figure 1 is depicted an example of a WDFA.</p>
      <p>Remark 1. A consequence of Wheeler property (i) is that A is input-consistent,
that is all transitions entering a given state u 2 Q have the same label: if
u 2 (v; a) and u 2 (w; b), then a = b. Therefore the function : Q ! that
associate to each state the unique label of its incoming edges is well defined. For
the state q0, the only one without incoming edges, we set (q0) := #.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] it is shown that WDFAs have a property called path coherence: let
A = (Q; q0; ; F; ) be a WDFA according to the order (Q; &lt;). Then for every
interval of states I = [qi; qj ] and for all 2 , the set J of states reachable
starting from any state of I by reading is also an interval. Path coherence
allows us to transfer the order &lt; over the states of Q to the co-lexicographic
order over the words entering the states: for each state q define the set Iq =
f : (q0; ) = q)g. With abuse of notation, we extend to subsets of as
follows: U V , with U; V , if and only if for each 2 U; 2 V
such that f ; g 2= U \ V . Then, two states q and p with Iq 6= Ip satisfy q &lt; p
if and only Iq Ip (again proved in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). An immediate consequence of this fact
is that a WDFA admits an unique order of its states that makes it Wheeler and
this order is univocally determined by the co-lexicographic order of any word
entering its states. This result is important for two different reasons. First of
all, it makes possible to decide in polynomial time whether a DFA is Wheeler:
for each state q, pick a word q entering it and order the states reflecting the
co-lexicographic order of the words { q : q 2 Qg; then check if the order satisfies
the Wheeler conditions. Secondly, it is the key to adapt Myhill-Nerode Theorem
to Wheeler automata. We recall the following defintion.
      </p>
      <p>2
Definition 2 (Myhill-Nerode equivalence). Let L
Given a word , we define the right context of
as</p>
      <p>be a language.
1L := f 2
:</p>
      <p>
        The (classic) Myhill-Nerode Theorem, among many other things, establishes
a bijection between equivalence classes of L and the states of the minimum
DFA recognizing L. This minimum automaton is also unique up to isomorphism
and a similar result, fully proved in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], holds for Wheeler languages as well.
      </p>
      <p>In order to state such an analogous of Myhill-Nerode Theorem for Wheeler
languages, the equivalence L is replaced by the equivalence cL defined below.</p>
      <p>Definition 3. The input consistent, convex refinement
follows. cL if and only if
c of
L</p>
      <p>L is defined as
– L ,
– and
– for all
end with the same character, and 2
2 Pref(L), if min( ; ) max( ; ), then
L</p>
      <p>L .
: : :</p>
      <p>L</p>
      <p>The Myhill-Nerode Theorem for Wheeler languages proves that there exists a
minimum (in the number of states) WDFA recognizing L. As in the classic case,
states of the minimum automaton are, in fact, cL-equivalence classes, this time
consisting of intervals of words. Also such WDFA is unique up to isomorphism.</p>
      <p>
        A further important consequence, especially for testing Wheelerness, is stated
in the following Lemma (again proved in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>Lemma 1. A regular language L is Wheeler if and only if all monotone
sequences in (Pref(L); ) become eventually constant modulo L. In other words,
for all sequences ( i)i 0 in Pref(L) with
1
2
: : : i
or
1
2
i
: : :
there exists an n such that h
k, for all h; k</p>
      <p>
        Lemma 1 shows how it is possible to recognize whether a language L is
Wheeler simply by verifying a property on the words of Pref(L): trying to find a
WDFA that recognizes L is no longer needed to decide the Whelerness of L. As
it turns out, we can verify whether the property depicted in Lemma 1 is satisfied
just by looking at any DFA recognizing L, as shown in Theorem 1 (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
Theorem 1. Let A be a DFA such that L = L(A), with initial state q0 and
dimension n = jAj.
      </p>
      <p>L is not Wheeler if and only if there exist ; and in , with a ; , such
that:
6 L and they label paths from q0 to states u and v, respectively;
labels two cycles, one starting from u and one starting from v;</p>
      <p>or ; .</p>
    </sec>
    <sec id="sec-2">
      <title>The length of the words ; and satisfying the above can be bounded:</title>
      <p>4. j j; j j
j j</p>
      <p>n3 + 2n2 + n + 2.</p>
      <p>Since in this work we make an extensive use of Theorem 1, here is a simple
example on how and why it works. Consider the automata depicted in Figure 3.
As shown in Figure 1 and 2, the language Ld is Wheeler whereas Lb is not, as
can be easily proved using Theorem 1. In fact, consider the automaton Ab. By
setting := a, := b and := c, one can verify conditions 1-3 of the theorem
are satisfied. Notice that condition a 6 Lb b follows immediately from the fact
that (q0; a) 6= (q0; b) in the minimum DFA Ab.
c
q1
c
q2
f
q3
f
q3
(a) The minimum DFA Ad recognizing
Ld = ac + dc f .
(b) The minimum DFA Ab recognizing
Lb = ac + bc f .</p>
      <p>Fig. 3. The minimum DFAs recognizing the languages Ld (Wheeler) and Lb (not
Wheeler).</p>
      <p>If we try to transpose the same reasoning to the automaton Ad by setting
= a, = d and = c, condition 3 of Theorem 1 is no longer satisfied. We can
not find 3 words satisfying conditions 1-3 of Theorem 1, therefore Lb is Wheeler.</p>
      <p>
        The polynomial bound given by condition 4 of Theorem 1 allows us to design
an algorithm that decides whether a given DFA recognizes a Wheeler language:
using dynamic programming (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), it is possible to keep track of all the
relevant paths and cycles inside the DFA and check, in polynomial time, whether
there exists three words satisfying the conditions of the theorem.
      </p>
      <p>Things change if, instead of a DFA, we are given a NFA. Trying to exploit
the same idea used for DFAs does not work: the problem of deciding whether
two words and read by a NFA are Myhill-Nerode equivalent is
PSAPCEcomplete, whereas is polynomial on DFAs (simply compute the minimum DFA
using Hopcroft’s algorithm and check whether the two states reached by and
coincide). Even worse, the straightforward attempt of building the minimum
DFA recognizing the NFA’s language might lead to a blow-up of the sates,
resulting in a exponential time (and exponential space) algorithm.</p>
      <p>We show that the problem of deciding whether a NFA recognizes a Wheeler
language is indeed hard, but doesn’t require exponential time to be solved: the
problem turns out to be PSPACE-complete. To show this, we first need to adapt
Theorem 1 to work on NFAs, as described in the following corollary.
Corollary 1. Let A = (Q; q0; ; F; ) be a NFA of dimension n := jAj. Then
L := L(A) is not Wheeler if and only if there exist three words ; ; , with
a ; , such that
1.
2.</p>
      <p>i 6 L j for all 0 i; j 2n;
labels two cycles, one starting from a state p 2
state r 2 (q0; );
(q0; ) and one from a
or</p>
      <p>; .</p>
    </sec>
    <sec id="sec-3">
      <title>The length of the words ; and satisfying the above can be bounded:</title>
      <p>Proof (Sketch). (=)) Let D be the minimum DFA recognizing the language L,
with initial state q^0. Recall that D might have up to 2n states. From Theorem
1 we know that L is not Wheeler if and only if there exist ^; ^ and ^ satisfying
conditions 1-4 of Theorem 1. Although there is not a perfect correspondence
between cycles in A and cycles in D, whenever we find in D a path ^ (respectively,
^) and a cycle ^ both ending in the same state u^ (v^), we can find in A a path
= ^^i ( = ^^j ) and a cycle 1 = ^k1 ( 2 = ^k2 ) both ending in the same
state u (v), for some i; j; k1; k2 n. Note that the word ^(n+1)k1k2 labels two
cycles starting from u and v. Moreover, we have
j
^(n+1)k1k2
j
(n + 1)j^k1k2 j
(n + 1)j^j
(i + 1)j^j &gt; j^^ij;
and similarly j^(n+1)k1k2 j &gt; j^^j j. Since j^j 2 O(23n), we have j^(n+1)k1k2 j 2
O(n3 23n) and therefore the words ; and := ^(n+1)k1k2 satisfy conditions
2-4 of this corollary. Finally, condition 1 is satisfied for all i; j 0 because when
reading i and j on D, they reach different states.
((=) As we did while proving the opposite direction, given three words ; ;
in A satisfying condition 1-3, we can always find, for some i; j; k 2n, three
words ^ = i, ^ = j and ^ = k in D that almost satisfy condition 1-3 of
Theorem 1. The only requirement that might be missing is that ^ 6 L ^. The
upper bound 2n in condition 1 ensures that this requirement is fulfilled, hence
we can apply Theorem 1 to conclude that L is not Wheeler.</p>
      <p>Despite the fact the the bound in condition 4 has become exponential by
switching to NFAs, it is still possible to check in polynomial space (but
exponential time) whether there are three words ; and satisfying the conditions
of Corollary 1. Thus we can prove the following:
Proposition 1. Given a NFA A, deciding whether the language L := L(A) is
Wheeler is PSPACE-complete.</p>
      <p>Proof (Sketch). To prove that the problem is in PSPACE (=NPSPACE), we
guess (using non-determinism) three words ; and satisfying the conditions
of Corollary 1. Since this words are too long to be stored, we first guess their
length bit by bit and then we guess their characters one by one, starting from the
last one and proceeding from right to left. Meanwhile, for all 2 f ; ; g and
for all state t of A, we follow backwards the edges of A labeled as the character of
that we are guessing. This way, we can compute the set of states from which is
possible to reach t by reading . This allows us to find, for all i; j 2n, the sets
Qi := (q0; i) and Qj := (q0; j ). We can then test whether i 6 L j by
confronting the language accepted by the automaton A with set of initial states
Qi and the language accepted by the automaton A with set of initial states Qi .
This can be done in polynomial space since the problem of deciding whether two
NFAs recognize the same language is known to be PSPACE-complete.</p>
      <p>To prove the hardness of the problem, we show a polynomial reduction from
the universality problem for NFA, i.e. the problem of deciding whether the
language accepted by a NFA A over the alphabet is such that L(A) = .
Let A = (Q; q0; ; F; ) be a NFA and let L := L(A). We can assume, without
loss of generality, that q0 2 F , otherwise A would not accept the empty word and
we could immediately derive that L 6= . Starting from A, we build in constant
time a NFA A00 over the alphabet [ fa; b; cg that recognizes the language
L00 := a (Lc)</p>
      <p>L + b ( + c) ;
where a; b; c are three characters not in and such that a b c (the order of
the characters of is irrelevant in this proof). If we prove that L = if and
only if L00 is Wheeler, the reduction is complete and the thesis follows. Notice
that, since " 2 L, the following property holds:</p>
      <p>L =
() (Lc)</p>
      <p>L = ( + c) :
(1)
Let us prove that L = implies that L00 is not Wheeler. If L = , then by
(1) we have L00 = (a + b)( + c) . The minimum DFA recognizing L00 has only
one cycle, therefore from Theorem 1 it follows that L00 is Wheeler.
We next prove that L00 not Wheeler implies L = . Notice that a 1L00 =
(Lc) L so that, by (1) if L 6= , then a 1L00 = (Lc) L 6= ( + c) . However
we have b 1L00 = (
all n, both a L00 acn+ancd) b, thLu0s0 bacn6 Lh0o0ldb.. MThoerreeofvoerre, tihtecamnobnoetpornoevseedqutheantc,efor
ac
bc
ac2
bc2
acn
bcn
: : :
is not constant modulo
Wheeler.
3</p>
      <p>Generalized Wheelerness</p>
      <sec id="sec-3-1">
        <title>L00 and from Lemma 1 it follows that L(A00) is not</title>
        <p>As we have already pointed out in the introduction, changing the underlying
order of the alphabet might turn a Wheeler language into a not Wheeler one
and vice versa. For instance, consider again the Wheeler languages Ld and the
regular (but not Wheeler) language Lb depicted in Figure 3. If we change the
order of from a c d f to a d c f , the Wheeler language Ld
turns into a non-Wheeler language (isomorphic to Lb under the isomorphism
' between alphabets that fixes characters a; c; f and sends d into '(d) = b).
Hence, by not fixing an apriori order of the alphabet we enlarge the class of
languages.</p>
        <p>Definition 4 (Generalized Wheelerness). A NFA A over the alphabet is
called a Generalized Wheeler Automaton (GWNFA) if and only if there exists
an ordering of the elements of that makes A Wheeler.</p>
        <p>A language L is called generalized Wheeler (for short GW) if and only if there
exists a GWNFA that recognizes L.</p>
        <p>
          Let A be a WDFA. Then, every word that labels a cycle in A is primitive
(see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), that is there exists no 6= " and i &gt; 1 such that = i. A direct
consequence of this property is that Wheeler languages form a subclass of star-free
languages, i.e. the class of languages that can be defined by a regular expression
not containing the Kleene star. Since star-free expressions, and thus star-free
languages, are closed under permutations of the alphabet, even GW languages
must be a subclass of star-free languages. Here we show that the inclusion is
strict, therefore GW languages must be studied separately.
        </p>
        <p>Proposition 2. If j j 2, then the set fL : L is GWg is a proper subset
of fL : L is star-freeg.</p>
        <p>Proof. Let a; b be two distinct characters of , and consider the language L =
a(aba) a + ba(aba) b. It is possible to prove that L is star-free, but L is not GW:
consider the sequence ( i)i 2 with 2n = a(aba)n and 2n+1 = ba(aba)n. Since,
for all i, the word i is a prefix of i+1, independently from how a and b are
ordered we have
aaba
baaba
aabaaba
baabaaba
a(aba)i
ba(aba)i
: : :
Moreover, for all i we have a(aba)i 6 L ba(aba)i, since a(aba)i a belongs to L
but ba(aba)i a does not. We can then apply Lemma 1 to conclude that L is not
Wheeler. Since this result does not depend on the order of the alphabet, L is
not GW.</p>
        <p>
          We mentioned that we can decide in polynomial time whether a DFA is
Wheeler. On the contrary, deciding whether a NFA is Wheeler is NP-complete,
even when we bound the outdegree of each state of the NFA to be at most 5
(see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). As one may expect, deciding whether a NFA is a GWNFA is not easier
than deciding whether a NFA is Wheeler. In fact, we show in Proposition 3 that
the problem is NP-complete. We actually prove a stronger result in Proposition
4: even deciding whether a DFA is a GWNFA is NP-complete. Since the proof
of Proposition 4 is cumbersome, we decided to present also it weaker version,
i.e. Proposition 3, which shows a more natural reduction from the problem of
deciding whether a NFA is Wheeler. Both proofs can be found in the Appendix.
It is worth noticing that the proof of Proposition 3 can be adapted to work
even on DFAs, hence giving an alternative way to prove that deciding whether a
DFA is a GWNFA is NP-complete. Nonetheless, Proposition 4 is still stronger,
since it also proves that deciding whether a DFA recognizes a GW language is
NP-complete.
        </p>
        <p>Proposition 3 (GWNFA hardness). Let A be a NFA. Deciding whether A
is a GWNFA is NP-complete.</p>
        <p>Proposition 4 (GWDFA and GW languages hardness). Let L be a
language and A be a DFA. Both the problems of deciding whether A is a GWNFA
and deciding whether L is GW are NP-complete.</p>
        <p>
          DFA to WDFA
As discussed in Section 1, a Wheeler automaton can be represented more
compactly than a generic finite automaton. Therefore, if we are given a Wheeler
language L represented as a DFA A that recognizes L, we may be tempted to
look for a WDFA A0 that recognizes the same language to achieve a better, i.e.
more compact, representation. Unfortunately, this approach might not work in
our favor: in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a family (Lm)m 1 of Wheeler languages with the property that
the dimension of the minimum WDFA recognizing Lm is exponential in the
dimension of the minimum DFA recognizing Lm is presented. Note that we can
always assume that, whenever we are given a DFA or a WDFA, the automaton
is minimum. In fact, both tasks of minimizing a DFA and minimizing a WDFA
can be done in polynomial time (see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for DFAs and [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for WDFAs). Here we
answer (positively) to the open question put forward in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] whether there
exists an algorithm to compute the minimum WDFA starting from the minimum
DFA. Our algorithm works in exponential time in the worst case and, since the
dimension of the output might be exponential in the dimension of the input, is
optimal for the task.
        </p>
        <p>Let L = L(A) be the language recognized by the given (minimum) DFA
A. Recall that there is a 1 to 1 correspondence between the cL-classes and the
states of the minimum WDFA recognizing L. First of all, our algorithm identifies
a representative for each cL-class; to be able do this in exponential time, we
first need to put a bound on the length of such representatives.</p>
        <p>Lemma 2. Let A be the minimum DFA recognizing the Wheeler language L =
L(A) over the alphabet , and let C1; :::; Cm be the pairwise distinct equivalence
classes of cL. Then, for each 1 i m, there exists a word i 2 Ci such that
j ij &lt; n + n2, where n := jAj.</p>
        <p>Proof (Sketch). Suppose by contradiction that there exists a class Ci such that
for all 2 Ci it holds j j n + n2, and let 2 Ci be a word of minimum
length. Since A has n states, there must exists a factor of that corresponds to
a cycle in the run of on A. Let be the word obtained by erasing such factor
fmroinmima.liStyinocef iatnfodllowesntdhsaitn t h6ecLsa m,tehsetraetfoeroeftAhe,rweemhuastveexistsLa w.oFrdromththaet
is not Myhill-Nerode equivalent to and that is included (co-lexicographically)
between and . We can further assume that the length of is greater than n2,
hence we can identify a common factor of and that labels two cycles in the
runs of and on A. We can then apply Theorem 1 to conclude that L is not
Wheeler, a contradiction.</p>
        <p>The algorithm works as follow: first of all, it generates the list containing
every word of length less than n + n2 that can be reand+onn2 Awo.rTdhs.isTchaennb,ewdeoonredeinr
exponential time since the list contains at most j j
the list co-lexicographically and we apply the following proposition.
Proposition 5 (Minimum DFA to minimum WDFA). Let A be the
minimum automaton recognizing L = L(A) with jAj = n, over the alphabet with
j j = . Let d := n + n2 and define</p>
        <p>Pref(L) d := f
2 Pref(L) : j j
dg:
Assume that we are given the elements of Pref(L) d in co-lexicographic order,
i.e. Pref(L) d = f 1; :::; kg with i i+1. Then it is possible to build the
minimun WDFA recognizing L in O(k + n2 m log m) time, where m is the
dimension of the output.</p>
        <p>
          Proof (Sketch). From Lemma 2 we know that each cL-class has at least one
representative in Pref(L) d. By considering the list [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]L; : : : ; [ k]L we can spot
such representatives: i represents a cL-class if and only if i and i+1 belong
to different L-classes or they belong to the same class but their last character
is different.
        </p>
        <p>Each representative of a cL-class become a state of our WDFA. To compute
the edges of the automaton, we simply check, for all representative and for all
c 2 , the cL-class of c.
5</p>
        <p>Conclusions
Having an order on the set of states of an automaton that is consistent with a
given order of the underlying alphabet has important implication from a
practical point of view. In fact, deterministic or even non-deterministic finite state
automata enjoying this property—Wheeler automata—can be used to efficiently
analyse the language they accept by standard tools.</p>
        <p>In this work we established a few limitations along the directions that one
can imagine to take in order to extend the nice properties of Wheeler languages.</p>
        <p>More specifically, we proved that adding as a degree of freedom the possibility
to re-order the underlying alphabet produces a significantly more complex
(NPcomplete) class of languages. In addition, the polynomial test we have for testing
whether a language is Wheeler when the latter is presented by a deterministic
automaton, turns out much more complex (PSPACE-complete) if the language
is presented by a non-deterministic one. The picture is completed by explicitly
giving an algorithm that turns a DFA into a Wheeler automaton, whenever this
is possible.</p>
        <p>Proving the above limitations should clarify the role played by apparently
secondary aspects of the definition of Wheeler language. This kind of study
should help to better understand the nature of Wheeler languages, with the
ultimate goal of singling out any feature that may admit some sort of extension.</p>
        <p>Consider, in more general terms, the following open problem: is there a class
of automata A properly extending the class of Wheeler automata and such that
their deterministic equivalent DA is such that jDAj 2 poly(jAj)?</p>
        <p>Appendix
Proof of Corollary 1. Let D = (Q^; q^0; ^; F^; ) be the minimum DFA
recognizing L. Clearly D has at most 2n states.
((=) From condition 2 it follows that Pref(L), so consider the following
list of 2n + 1 states of D:
^(q^0;
0); ^(q^0;
1); : : : ; ^(q^0;
Since D has at most 2n states, there must exist two integers 0 h &lt; k 2n
such that ^(q^0; h) = ^(q^0; k). Therefore k h labels a cycle starting from
^(q^0; h). Similarly, there exist 0 h0 &lt; k0 2n such that k0 h0 labels a cycle
starting from ^(q^0; h0 ). The words
^ :=
^ :=
h
h0
^ := lcm(k h;k0 h0) 2n ;
where the factor 2n in the definition of ^ ensures that j^j; j^j j^j, satisfy
condition 2 of Theorem 1. Condition 1 of Theorem 1 follows automatically from
conditions 1 of this corollary. Lastly, condition 3 of Theorem 1 follows from
conditions 3 of this corollary and the fact that a ; : Thus we can apply
Theorem 1 to conclude that L is not Wheeler.
(=)) Since L = L(D) is not Wheeler, let ^; ^; ^ be as in Theorem 1. The DFA
D has at most 2n states, hence the length of ^ is bounded by the constant
23n + 2 22n + 2n + 2. We have ^^ Pref(L), so let t0 = q0; t1; : : : ; tm be a run
of ^^n over A. We set u := j^j and g := j^j, and consider the list of n + 1 states
tu; tu+g; tu+2g; : : : ; tu+ng = tm
Since A has n states, there must exist two integers 0 h &lt; k n such that
tu+hg = tu+kg. That is, there exists a state p := tu+hg such that p 2 q0; ^^h
and ^k h labels a cycle starting from p. We can repeat the same argument for a
run of ^^n over A to find a state r and two integers h0; k0 such that r 2 (q0; ^^h0 )
and ^k0 h0 labels a cycle starting from r. We can then define the words
:= ^^h
:= ^^h0
:= ^lcm(k h;k0 h0) n
which satisfy the conditions 2 and 3.</p>
        <p>Condition 4 is satisfied since j^j 23n +2 22n +2n +2 and lcm(k h; k0 h0) &lt; n2.
Finally, condition 1 is satisfied for all i; j 0. Indeed, for all l the words ^ and
^^l lead to the same state of D, thus ^</p>
        <p>i = L^^^s^i,l.aSnidmisliamrliyla,rfloyr, a8ljl l9swje saulcsoh hthavaet
^ L ^^l. Since 8i 9si such that</p>
        <p>j = ^^sj , the thesis follows from ^ 6 L ^. tu</p>
        <p>Proof of Proposition 1. First of all we need to prove that the problem
is in PSPACE. We will show instead that its complement is in NPSPACE,
then the thesis follows from Savitch’s Theorem, which states that NPSPACE
= PSPACE, and the fact that PSPACE is closed under complementation. We
prove that checking the conditions in Corollary 1 is in NPSPACE. We can use
non-determinism to guess, bit by bit, the length of ; and and store this
guessed information in three counters u; v; g respectively, using O(n) space for
each. We also need to guess the ending states p; r 2 Q of ; . Then we start
guessing the characters of ; and starting from their last one and proceeding
backwards toward their first one, checking condition 3 of Corollary 1 in constant
space. Whenever we guess a character of (respectively, ; ) we decrease by
one the counter u (v; g), so we know when the guessing stops. If condition 3 is
satisfied, we calculate (we will show later how) the sets (q0; ), (q0; ), (p; ),
and (r; ) and check condition 2, that is, whether p 2 (q0; ), r 2 (q0; ),
p 2 (p; ), and r 2 (r; ). If condition 2 is satisfied, we consider condition 1: for
fixed i; j 2n consider the automata Ai and Aj obtained from the NFA L(A)
by considering as initial states the sets (q0; i), (q0; j ), respectively. Notice
that we have i 6 L j if and only if L(Ai ) 6= L(Aj ) and checking whether
L(Ai ) = L(Aj ) can be done in polynomial space, since deciding whether two
NFAs recognize the same language is a well-known PSPACE-complete problem.</p>
        <p>To conclude the proof, we claim that we are able to calculate in polynomial
space, for all q 2 Q and for all 2 f ; ; g , the set (q; ). While guessing
character by character, we can compute, for each state q, the set Q ;q of the
states from which is possible to reach q reading . To build Q ;q we start from
the set Q0 := fqg and we follow backwards the edges entering q and labeled
as the last character of . We call Q1 this new set of states and we repeat the
process by following backwards the edges entering each state of Q1 and labeled
as the second to last character of . Proceeding inductively, we compute the
sets Q0; Q1; : : : Qu = Q ;q; notice that to calculate Qk+1 we only need Qk and
the k-th to last character of , thus we can update (instead of storing) the set
Qk. We can do the same for and to compute, for each q 2 Q, the sets Q ;q
and Q ;q. Once we have stored, for all q 2 Q and for all 2 f ; ; g, the sets
Q ;q, our claim follows easily. For instance, to compute (q0; ) we build, for
all t 2 Q, the set</p>
        <p>Q(t) := fq 2 Q : t 2 (q;
)g =</p>
        <p>[
p2Q ;t</p>
        <p>Q ;p:</p>
      </sec>
      <sec id="sec-3-2">
        <title>Then we have t 2 (q0; ) if an only if q0 2 Q(t).</title>
        <p>To prove the completeness of the problem, we will show a polynomial
reduction from the universality problem for NFA, i.e. the problem of deciding whether
the language accepted by a NFA A over the alphabet is such that L(A) = .</p>
        <p>Let A = (Q; q0; ; F; ) be a NFA and let L = L(A). We can assume without
loss of generality that q0 2 F , otherwise A would not accept the empty word
and we could immediately derive that L 6= . Let a; b; c be three characters
not in and such that a b c with respect to the lexicographical order
(the order of the characters of is irrelevant in this proof). First, we build the
automaton A0 starting from A by adding an edge (qf ; q0; c) for each final state
qf 2 F , see the top part of Figure 1. Notice that A0 recognizes the language
L0 = L(A0) = (Lc) L, and it is straightforward to prove that L = if and
only if L0 = ( + c) : if L = , let be a word in ( + c) containing n
occurrences of c. Then = 0 c 2 c : : : n 1 c n for some 1; : : : ; n 2 .
Hence 2 ( c) = L0. On the other hand, if L 6= let be a word in
n L. Then c 2= L0.</p>
        <p>A0
; c</p>
        <p>N</p>
        <p>A</p>
        <p>We build a second automaton A00 as depicted in Figure 4. Let L00 = L(A00)
be the language recognized by A00. We claim that L = if and only if L00 is
Wheeler.
(=)) If L = , we have already proved that L0 = ( + c) . Hence we have
L00 = (a + b) ( + c) . The minimum DFA recognizing L00 has only one loop,
therefore by Theorem 1 L00 is Wheeler.
((=) If L 6= , let be a word in n L. Notice that 6= " since we assumed
that " 2 L. Every possible run of over A must lead to a non-accepting state,
hence c 2= L0. This implies that for all i 0 we have a ci c 2= L00 (notice
that the only edge labeled c leaving q0 ends in q0). On the other hand, for all
j 0 we have bcj c 2 L00, hence for all i; j 0 we have aci 6 L00 bcj . Thus
the following monotone sequence in Pref(L00)
ac
bc
acc
bcc
acn
bcn
: : :
is not eventually constant modulo
Wheeler.</p>
      </sec>
      <sec id="sec-3-3">
        <title>L00 . From Lemma 1 it follows that L00 is not</title>
        <p>tu
Proof of Lemma 2. Suppose by contradiction that there exists a class Ci such
that for all 2 Ci it holds j j n + n2, and let 2 Ci be a word of minimum
length. Consider the first n + 1 states q0 = t0; :::; tn of A visited by reading the
first n characters of . Since A has only n states, there must exist 0 i; j n
with i &lt; j such that ti = tj . Let 0 be the prefix of of length i (if i = 0 then
0 = "), let be the factor of of length j i labeling the path ti; :::; tj , and
let be the suffix of such that = 0 . By construction, the words and
mi:n=im a0liteyndofin tihtefoslalomwesstthaatet, h e6ncLce . L . Moreover, from j j &lt; j j and the
Suppose that , the other case being completely symmetrical. Since and
share the same suffix , they end with the same character. This means that
the words and , which are Myhill-Nerode equivalent but not cL equivalent,
were not split into two distinct cL-classes due to input-consistency, therefore
there must exists a word such that and 6 L . Formally, assume
bbyy cdoenfitnriatdioicntioofn thcLa,titfowroaullldwfoorldlosw suchcL th,ata contradictiiotnh.olds L . Then,
Let be a word such that and 6 L . From a ; it follows that
a , so we can write = 0 for some 0 2 . Recall that by construction
= 0 with j 0 j n, hence j j n2. Consider the last n2 + 1 states r0; :::; rn2
of A visited by reading the word , and the last n2 + 1 states p0; :::; pn2 visited
by reading the word . Since A has only n states, there must exist 0 i; j n2
with i &lt; j such that (ri; pi) = (rj ; pj ). Notice that it can’t be ri = pi, otherwise
from the determinism of A it would follow rn2 = pn2 ; from the minimality of A
iLtewto u00ldbethtehne fsoulffilowx of Lof l,enagctohntnr2adicjt,ioann.d let be the factor of of length
j i labeling the path ri; :::; rj . Since j j n2, there exists 0 2 such that
= 0 00. We can then rewrite ; and as
Let k be an integer such that j kj is greater than j 0 0j and j 0 0j. Set := 0 0;
from it follows that 0 0 0 0. If k set := 0 0,
otherwise set := 0 0. In both cases, the hypothesis of Theorem 1 are satisfied,
since k labels two cycles starting from the states ri and pi, that we have proved
to be distinct. We can conclude that L is not Wheeler, a contradiction, and the
thesis follows. tu
Proof of Proposition 5. Consider the pairwise distinct equivalence classes
C1; : : : ; Cm of the equivalence cL. Clearly, the minimum Wheeler automaton
recognizing L has m states. We can assume without loss of generality that the
equivalence classes are co-lexicographically ordered, i.e. Ci Ci+1 for all i &lt; m.
For sake of simplicity, given a word 2 we will write [ ] to indicate its
equivalence class modulo L, and we will write [ ]W to indicate its equivalence
class modulo cL.</p>
        <p>
          Let K := f1; :::; kg and, for all i, let Ki := fj 2 K : [ j ]W = Cig. Since
each Ci is convex in Pref(L), each Ki must be convex in K, that is each Ki
is an interval. Therefore the list of equivalence classes [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]W ; :::; [ k]W must be
partitioned in consecutive runs of the same class, each class appearing in one and
only one run. From Lemma 2 we know that each equivalence class has at least
one representative in Pref(L) d, hence the list [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]W ; :::; [ k]W must contain
exactly m runs.
        </p>
        <p>
          For all 1 j &lt; k, we have [ j ]W 6= [ j+1]W if and only if
where last( ) denotes the last character of . This means that we can identify
the m runs of the equivalence cL just by looking at the two lists 1; :::; k and
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]; :::; [ k]: whenever last( j ) 6= last( j+1) or [ j ] 6= [ j+1], we know that a
new cL run must start at j+1.
        </p>
        <p>In O(k) we are able to determine the m runs and to pick a representative for
each of them, i.e. we can find m indexes i1; :::; im such that for all 1 j m
it holds ij 2 Ci. We call the set fai1 ; :::; aim g a fingerprint of the language L,
i.e. a set of words that has cardinality m such that distinct elements of the set
belong to distinct cL-classes.</p>
        <p>We show how to build the minimum Wheeler DFA recognizing L, starting
from any fingerprint of L and the standard minimum DFA recognizing L. Let
f 1; :::; mg be a fingerprint of L and let A be the minimum DFA recognizing
L. We can assume without loss of generality that 1 ::: m. We build the
automaton AW = (Q; 1; ; F; ), where the set of states is Q = f 1; :::; mg
and the set of final states is F = f j : j 2 Lg. The transition function can
be computed as follow. For all 1 j m and for all c 2 , check whether
j c 2 Pref(L). If j c 2= Pref(L), there are no edges labeled c that exit from
j . If instead j c 2 Pref(L), locate j c using a binary search. There are three
possible cases.
1. j c 1. Then ( j ; c) = 1.
2. m j c. Then ( j ; c) = m.
3. There exists s such that s j c s+1. It can not be the case that both
j c 6 L s and j c 6 L s+1, since f 1; :::; mg is a fingerprint of L. Hence
we distinguish three cases.
(a)
(b)
(c)
s L j c 6 L s+1. Then ( j ; c) = s.
s 6 L j c L s+1. Then ( j ; c) = s+1.</p>
        <p>s L j c L s+1. Since f 1; :::; mg is a fingerprint of L, it is either
c = last( j c) = last( s), in which case ( j ; c) = s, or c = last( s+1),
in which case ( j ; c) = s+1.</p>
        <p>Proof of Proposition 3. The problem is in NP, since we can use non-determinism
to guess the order of the alphabet and then check whether such order makes the
NFA Wheeler.</p>
        <p>To prove the hardness, we show a polynomial reduction from the problem
of deciding whether a NFA is Wheeler. Let A be a NFA with initial state q0,
over the alphabet = fa1 : : : ; a g ordered by the relation a1 a . We
want to build a new automaton A0 such that A0 is a GWNFA if and only if A
is Wheeler. A0 will be an automaton of size jAj + O( ) over the alphabet 0 of
size O( ).</p>
        <p>The automaton A0 will be built starting from A and adding extra states and
transitions. We define the new alphabet as
0 = fa1; : : : ; a ; x1; : : : ; x
1; e; f g;
with xi; e; f 2= , and we add two final states qe and qf . We then build 1
gadgets, one for each pair of consecutive characters (ai; ai+1) of , each one
connected to A[fqe; qf g as depicted in Figure 5. This completes the construction
of the automaton A0. Notice that A is input-consistent, but in general it is not
deterministic.</p>
        <p>
          We want to show that A is Wheeler according to the order ( ; ) if and only
if A0 is a GWNFA.
(=)) Define the order 0 over 0 by setting
We show that A0 is Wheeler according to ( 0; 0) by ordering its states. Since
A is Wheeler, there already exists an order of its states that makes A Wheeler.
Therefore, we simply need to extend this order to the states of A0. Recall that
in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] it has been proved that a WNFA is Wheeler if and only if, for each pair of
states q; p such that Iq 6= Ip, either Iq Ip or Ip Iq holds, where by definition
Iq Ip if and only if, for all 2 Iq and for all 2 Ip such that f ; g 6 Iq \ Ip,
we have . Therefore we check, for each pair of states q and p in A0, that
either Iq Ip (implying q &lt; p) or Ip Iq (implying p &lt; q) holds. Note that
when Iq and Ip are disjoint, the condition Iq Ip translates to the following:
for all 2 Iq and for all 2 Ip, . In the discussion that follows we never
compare two states that belong to A, since the order between them is already
established.
        </p>
        <p>First of all, we will order, for each i, the states with incoming edge xi. Since
xi 2= , the only states to compare are the one belonging to the gadget Gi, i.e.
qi4; qi5; qi6; qi7. Consider the languages</p>
        <p>I4 := Iq4 = xi(xiaixi)</p>
        <p>i
I5 := Iq5 = ai+1xi(xiaixi)</p>
        <p>i
I6 := Iq6 = xixi(aixixi)</p>
        <p>i
I7 := Iq7 = ai+1xixi(aixixi) :</p>
        <p>i
Since ai+1 0 xi we have I4; I5 I6; I7. Moreover, consider any two words
2 I4 and 2 I5. If j j &lt; j j, we have a hence . If j j j j
instead, then the last j j characters of must be aixi(xiaixi)m, where m is such
that = ai+1xi(xiaixi)m. From ai 0 ai+1 we still get , hence I4 I5.
Similarly we can prove that I6 I7. It immediately follows that we need to
order the states as follows: qi4 &lt; qi5 &lt; qi6 &lt; qi7.</p>
        <p>Secondly, for each 1 i we need to sort the states with incoming edges
labeled ai. Note that the automaton A might contain states with incoming label
ai and we need to consider such states as well. For i = , the task is easy. The
only gadget with a state labeled a is G 1, and such state is q3 1. Notice that
Iq3 1 = fa g. Let q be a state of A with (q) = a . Since every word in Iq ends
with a , it trivially follows that Iq3 1 Iq. If Iq 6= fa g, we are forced to set
q3 1 &lt; q. If instead Iq = fa3 g1,&lt;thqe. order of q and q3 1 does not matter. For
sake of consistency, we set q
For i = 1, the only gadget with states labeled a1 is G1, and such states are q11
and q12. Let q be a state of A with (q) = a1 and consider the languages
I1 := Iq1 = x1x1a1(x1x1a1)</p>
        <p>1
I2 := Iq2 = a2x1x1a1(x1x1a1) :</p>
        <p>1</p>
        <p>For all 2 Iq, we have either = a1 or = 0aja1 for some 0 2 and some
1 j . In both cases, must precede co-lexicographically every word of I1
and I2, thus Iq I1; I2. To compare I1 and I2, consider any two words 2 I1
and 2 I2. If j j &lt; j j, we have a hence . If j j j j instead, then
the last j j characters of must be a1x1x1a1(x1x1a1)m, where m is such that
= a2x1x1a1(x1x1a1)m. From a1 &lt;0 a2 we still get , hence I1 I2. It
follows that q11 and q12 must follow q (and all the other states of A with label
a1), and we must set q11 &lt; q12.</p>
        <p>Lastly, if 1 &lt; i &lt; we should compare the sates qi3 1; qi1 and qi2 with the sates
in Qi := fq 2 A : (q) = aig. Applying both of the reasoning discussed in
the cases i = 1 and i = , we can conclude that the state qi3 1 precedes all the
states in Qi and that the states qi1 and qi2 follow all the states in Qi and must
be ordered as qi1 &lt; qi2.</p>
        <p>The order of the states of A0 that we described makes A0 Wheeler with
respect to ( 0; 0), hence making it a GWNFA.
((=) If A0 is a GWNFA, then there exists an order 0 over 0 that makes A0
Wheeler. Since A is a sub-automaton of A0, it follows that even A is Wheeler
according to 0. Let e be the restriction of 0 over the alphabet ; we want to
show that e is the same order as . Assume by contradiction that e 6= . If for
all 1 i &lt; we have ai e ai+1, then e = , a contradiction. Hence there exists
1 i &lt; such that ai+1 e ai. Since 0 extends e , this implies that ai+1 0 ai.
We will show that A0 is not Wheeler according to 0, a contradiction. Define
the words := xi; := ai+1xi and := xiaixi. From a and ai+1 0 ai
we have ; . The word labels two cycles in A0 starting from two distinct
states, i.e. qi4 and qi5. Moreover, and label two paths that start from the
initial state q0 and end in qi4 and qi5 respectively. Since qi4 and qi5 are not
MyhillNerode equivalent, we can apply Theorem 1 to conclude that A0 is not Wheeler
according to 0, a contradiction. Thus e and coincide.</p>
        <p>We have shown that A is Wheeler according to 0, and that 0 extends .
Therefore we can conclude that A is Wheeler according to . tu
Definition 5 (Betweenness). Input: a list Y of n distinct elements Y =
y1; :::; yn and k &lt; n3 ordered triples (a1; b1; c1); :::; (ak; bk; ck), where each
element of each triple belongs to Y . Elements belonging to the same triple are
distinct.</p>
        <p>Output: yes/no answer. The answer is “yes” if and only if there exists a total
order &lt; of Y such that, for each k, either ak &lt; bk &lt; ck or ak &gt; bk &gt; ck.
Proof of Proposition 4. We can prove that both problems are in NP using an
argument similar to the one employed in the proof of Proposition 3.</p>
        <p>To prove the hardness, we show a polynomial reduction from the
betweenness problem to both of the problems described; we will use exactly the same
reduction for both problems. We start from an instance I = (Y; K) of the
betweenness problem, where Y is the set Y = fy1 : : : ; yng and K P(T 3) is a
collection of k triples (a1; b1; c1); : : : ; (ak; bk; ck), for some 1 &lt; k &lt; n3. We build
a DFA A of size O(n + k), over an alphabet of size O(n + k). The alphabet is
= Y [ fx1 : : : ; xk; e; f g, where we introduce a new character xi for each triple
(ai; bi; ci) 2 K and two extra “ending” characters e and f . To build G, we start
with the initial state q0 connected with n states q1; : : : ; qn through the edges
(q0; yj ; qj ) for each 1 j n. We also add two sinks qe and qf , the only final
states. We add the states qi1j ; qi3j ; qi5j (see Figure 6) and the transitions
ci =
ym
qj
qm
qi5j
xi
xi</p>
        <p>xi
qi6m
xi
bi
qi1j
qi2m
bi
xi
qi3j
e
f
xi
qi4m
qe
qf</p>
        <p>Fig. 6. The gadget Gi related to the i-th triple.
(qj ; xi) = qi1j ;
(qi1j ; xi) = qi3j ;
(qi3j ; bi) = qi5j ;
(qi5j ; xi) = qi1j
and (qi1j ; e) = qe. We repeat the same process with ci: given the integer m such
that ci = ym, we add the states qi2m; qi4m; qi6m and the transitions
(qm; xi) = qi2m;
(qi2m; xi) = qi4m;
(qi4m; bi) = qi6m;
(qi6m; xi) = qi2m
and (qi2m; f ) = qf : Lastly, we remove the states among q1; : : : ; qn that don’t
have outgoing edges. More formally, we define the sets A := fa1; : : : ; akg and
C := fc1; : : : ; ckg and we remove from G all the states qj such that yj 2= A [ C.
We show that the instance I = (Y; K) of the betweennes problem is satisfiable
if and only if A is a GWNFA, if and only if L(A) is GW.
(=)) Since I = (Y; K) is satisfiable, there exists an ordering : Y ! f1; :::; ng
of the elements of Y satisfying I. We order as follows:
1(1)
:::
1(n)
x1
xk
e
f:
This ordering induce a partial order on the states of A, where states with
different incoming labels are ordered by such labels. Therefore we only need to order
the states of A with the same incoming label.</p>
        <p>For each 1 i k, the only states of A with incoming label xi are qi1j; qi3j; qi2m; qi4m,
where j and m are integers such that ai = yj and ci = ym. Since, by
construction, the order satisfies the instance I, then only two cases can occur:
either (ai) &lt; (bi) &lt; (ci), or (ci) &lt; (bi) &lt; (ai). In the first case, we set
qi1j &lt; qi2m &lt; qi3j &lt; qi4m. To realize that this is in fact the correct order of the
states, consider the following languages:</p>
        <p>I1 := Iqi1j = f 2</p>
        <p>: (q0; ) = qi1jg = aixi(xibixi)
I2 := Iqi2m = cixi(xibixi)
I3 := Iqi3j = aixixi(bixixi)</p>
        <p>I4 := Iqi4m = cixixi(bixixi) :
Since, by construction, we have ai; bi; ci xi, it follows that I1; I2 I3; I4.
Moreover, from (ai) &lt; (bi) &lt; (ci) we also have that I1 I2 and I3 I4,
which completes the ordering. Symmetrically, if (ci) &lt; (bi) &lt; (ai) then we
set qi2m &lt; qi1j &lt; qi4m &lt; qi3j.</p>
        <p>We still need to order the states of A whose incoming labels belong to Y . For
each 1 p n, the states with incoming label yp belong to the sets V5 := fqi5j :
bi = ypg or V6 := fqi6m : bi = ypg or Vp, where Vp = fqpg if qp is a state of A
(i.e. if yp 2 A [ C) and Vp = ; otherwise. If Vp = fqpg, we set qp as the smallest
state. We then sort the states of V5 and V6 by their first subscript; when the first
subscript is equal, i.e. the two states that we want to confront are qi5j and qi6m
(with ai = yj and ci = ym), then we set qi5j &lt; qi6m if (ai) &lt; (bi) &lt; (ci) and
we set qi6m &lt; qi5j if (ci) &lt; (bi) &lt; (ai). This can be deduced by confronting
the following languages:</p>
        <p>I5 := Iqi5j = aixixibi(xixibi)</p>
        <p>I6 := Iqi60m = ci0 xi0 xi0 bi0 (xi0 xi0 bi0 ) :
If i &lt; i0 then, by construction, we have xi xi0 , hence I5 I6. Symmetrically,
if i0 &lt; i then we have I6 I5. Lastly, if i = i0 then the order between I5 and I6
is determined solely by ai and ci0 = ci.</p>
        <p>Since there is only one state with incoming label e and only one state with
incoming label f , we have finished. The order described makes A Wheeler, thus
A is a GWNFA and L(A) is GW.
((=) Assume that the instance I = (Y; K) of the betweenness problem is
unsatisfiable. We prove that L(A) is not GW. Assume by contradiction that L(A)
is GW, then there exists an ordering 0 of the elements of such that L(A)
is Wheeler according to said order. Recall that Y and consider the
order := 0jY . Since (Y; K) is unsatisfiable, must violate one of the
constraints, i.e. there exists an 1 i k such that either (ai); (ci) &lt; (bi) or
(bi) &lt; (ai); (ci). Define the words := aixi; := cixi and := xibixi; then
it is either ; or ; (here the co-lexicographic order is calculated
with respect to 0). By construction, labels two cycles in A starting from two
distinct states, qi1j and qi2m, which are not Myhill-Nerode equivalent. Moreover,
and label two paths that start from the initial state q0 and end in qi1j and
qi2m respectively. We can then apply the Theorem 1 to conclude that L(A) is
not Wheeler according to 0, a contradiction. Therefore L(A) is not GW, which
automatically implies that A is not a GWNFA. tu</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alanko</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>D'Agostino</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Policriti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prezza</surname>
          </string-name>
          , N.:
          <article-title>Regular languages meet prefix sorting</article-title>
          .
          <source>In: Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms</source>
          . pp.
          <fpage>911</fpage>
          -
          <lpage>930</lpage>
          (
          <year>2020</year>
          ). https://doi.org/10.1137/1.9781611975994.55, https: //epubs.siam.org/doi/abs/10.1137/1.9781611975994.55
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alanko</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>D'Agostino</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Policriti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prezza</surname>
          </string-name>
          , N.:
          <article-title>Wheeler Languages</article-title>
          . CoRR arXiv:
          <year>2002</year>
          .
          <volume>10303</volume>
          (
          <year>Feb 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alanko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gagie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navarro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seelbach</surname>
            <given-names>Benkner</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Tunneling on wheeler graphs</article-title>
          .
          <source>In: 2019 Data Compression Conference (DCC)</source>
          . pp.
          <fpage>122</fpage>
          -
          <lpage>131</lpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1109/DCC.
          <year>2019</year>
          .00020
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Backurs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Indyk</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Which regular expression patterns are hard to match?</article-title>
          <source>In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)</source>
          . pp.
          <fpage>457</fpage>
          -
          <lpage>466</lpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1109/FOCS.
          <year>2016</year>
          .56
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Equi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makinen</surname>
          </string-name>
          , V.:
          <article-title>On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree</article-title>
          .
          <source>In: ICALP 2019 - 46th International Colloquium on Automata, Languages and Programming</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . Patras,
          <string-name>
            <surname>Greece</surname>
          </string-name>
          (Jul
          <year>2019</year>
          ), https://hal.inria.fr/hal-02338498
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Equi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mäkinen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomescu</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          :
          <article-title>Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless seth fails</article-title>
          . In: Bureš,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Dondi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Gamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Guerrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Jurdziński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Pahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Sikora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Wong</surname>
          </string-name>
          , P.W. (eds.) SOFSEM 2021:
          <article-title>Theory and Practice of Computer Science</article-title>
          . pp.
          <fpage>608</fpage>
          -
          <lpage>622</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gibney</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoppenworth</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thankachan</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          :
          <article-title>Simple reductions from formula-sat to pattern matching on labeled graphs and subtree isomorphism</article-title>
          . In: Le,
          <string-name>
            <given-names>H.V.</given-names>
            ,
            <surname>King</surname>
          </string-name>
          , V. (eds.) 4th Symposium on Simplicity in Algorithms, SOSA 2021, Virtual Conference,
          <source>January 11-12</source>
          ,
          <year>2021</year>
          . pp.
          <fpage>232</fpage>
          -
          <lpage>242</lpage>
          . SIAM (
          <year>2021</year>
          ). https://doi.org/10.1137/1.9781611976496.26, https://doi.org/10.1137/ 1.9781611976496.26
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gibney</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thankachan</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          :
          <article-title>On the hardness and inapproximability of recognizing wheeler graphs</article-title>
          .
          <source>In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11</source>
          ,
          <year>2019</year>
          , Munich/Garching, Germany. pp.
          <volume>51</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>51</lpage>
          :
          <fpage>16</fpage>
          (
          <year>2019</year>
          ). https://doi.org/10.4230/LIPIcs.ESA.
          <year>2019</year>
          .
          <volume>51</volume>
          , https://doi.org/10.4230/ LIPIcs.ESA.
          <year>2019</year>
          .51
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hopcroft</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>An n log n algorithm for minimizing states in a finite automaton</article-title>
          .
          <source>Tech. rep.</source>
          , Stanford University (
          <year>January 1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Potechin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shallit</surname>
          </string-name>
          , J.:
          <article-title>Lengths of words accepted by nondeterministic finite automata</article-title>
          .
          <source>Information Processing Letters</source>
          <volume>162</volume>
          ,
          <issue>105993</issue>
          (
          <year>2020</year>
          ). https://doi.org/https://doi.org/10.1016/j.ipl.
          <year>2020</year>
          .
          <volume>105993</volume>
          , https: //www.sciencedirect.com/science/article/pii/S0020019020300806
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Prezza</surname>
          </string-name>
          , N.:
          <article-title>On locating paths in compressed tries</article-title>
          .
          <source>In: Proceedings of the ThirtySecond Annual ACM-SIAM Symposium on Discrete Algorithms</source>
          . p.
          <fpage>744</fpage>
          -
          <lpage>760</lpage>
          . Society for Industrial and Applied Mathematics, USA (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Shyr</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thierrin</surname>
          </string-name>
          , G.:
          <article-title>Ordered automata and associated languages</article-title>
          .
          <source>Tamkang J. Math (5)</source>
          ,
          <fpage>9</fpage>
          -
          <lpage>20</lpage>
          (
          <year>1974</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Travis</surname>
            <given-names>Gagie</given-names>
          </string-name>
          , Giovanni Manzini e Sirén, J.:
          <article-title>Wheeler graphs: A framework for bwtbased data structures</article-title>
          .
          <source>Theoretical computer science 698</source>
          ,
          <fpage>67</fpage>
          -
          <lpage>78</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>