-

Ordering Regular Languages: A Danger Zone

Giovanna D'Agostino

Davide Martincigh

Alberto Policriti

0 0 University of Udine , Italy

Ordering the collection of states of a given automaton starting from an order of the underlying alphabet is a natural move towards a computational treatment of the language accepted by the automaton. Along this path, Wheeler graphs have been recently introduced as an extension/adaptation of the Burrows-Wheeler Transform (the now famous BWT, originally defined on strings) to graphs. These graphs constitute an important data-structure for languages, since they allow a very efficient storage mechanism for the transition function of an automaton, while providing a fast support to all sorts of substring queries. This is possible as a consequence of a property-the so-called path coherence-valid on Wheeler graphs and consisting in an ordering on nodes that “propagates” to (collections of) strings. By looking at a Wheeler graph as an automaton, the ordering on strings corresponds to the co-lexicographic order of the words entering each state. This leads naturally to consider the class of regular languages accepted by Wheeler automata, i.e. the Wheeler languages. It has been shown that, as opposed to the general case, the classic determinization by powerset construction is polynomial on Wheeler languages. As a consequence, most of the classical problems turn out to be “easy”that is, solvable in polynomial time-on Wheeler languages. Moreover, deciding whether a DFA is Wheeler and deciding whether a DFA accepts a Wheeler language is polynomial. Our contribution here is to put an upper bound to easy problems. For instance, whenever we generalize by switching to general NFAs or by not fixing an order of the underlying alphabet, the above mentioned problems become “hard”-that is NP-complete or even PSPACE-complete.

Adding an order to a class of structures (graphs, groups, monoids...) is a natural move. Moreover, ordering is a basic data-structuring mechanism, strongly favoring computational manipulations of the considered class.

Over the class of finite automata, order has been added e.g. in [ 12 ], where the order must propagate along equally labeled transitions. Much more recently, Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021 for the volume as a collection by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). in an effort to find a common denominator to a number of different algorithmic techniques, Gagie et al. proposed (in [ 13 ]) a new simple strategy for enforcing and using order on a given automaton. Starting from an underlying order of the alphabet, the order on the states (formally given in Definition 1) must: i) agree with the order of the labels of their incoming edges, and ii) be coherent on target/source nodes, for pairs of arcs with equal labels. It turns out that this kind of automata, called Wheeler automata, (a) admit an efficient index data structure for searching subpaths labeled with a given query pattern, and (b) enable a representation of the graph in a space proportional to that of the edges’ labels since the topology can be encoded with just O(1) bits per node [ 13 ] (as well as enabling more advanced compression mechanisms, see [ 3,11 ]). This is in contrast with the fact that general graphs require a logarithmic (in the graph’s size) number of bits per edge to be represented, as well as with recent results showing that in general, the subpath search problem can not be solved in subquadratic time, unless the strong exponential time hypothesis is false [ 4,5,6,7,10 ].

q0 a d q1 q4 c c c q2 f q3 c f q5 Fig. 1. A WDFAs A recognizing the language Ld = ac + dc f . Condition (i) of Definition 1 implies input consistency and induces the partial order q1 < q2; q3 < q4 < q5. From condition (ii) it follows that (q1; c) (q4; c), thus q2 < q3. Therefore, the only order that could make A Wheeler is q0 < q1 < q2 < q3 < q4 < q5. The reader can verify that condition (ii) holds for each pair of equally labeled edges.

In Figure 1 is depicted an example of a Wheeler automaton (see also Definition 1). Notice that the minimum DFA recognizing L is not input consistent, hence not Wheeler; we will return on this point later. One is naturally led to consider the class of Wheeler languages, i.e. the regular languages recognized by some Wheeler automaton, as well as to raise the question of whether it is possible to decide, just by looking at a (generic, possibly not Wheeler) DFA, whether the language it recognizes is Wheeler. Wheeler languages have been studied extensively in [ 2 ], where it is shown that we can decide in polynomial time whether a DFA recognizes a Wheeler language. Unfortunately, trying to use the same strategy on a NFA language does not work. In fact, in this paper we show that the problem for NFAs becomes PSPACE-complete.

Consider the example in Figure 2 and confront it with Figure 1. Even tough apparently we made an insignificant change to the automaton, that is we have simply swapped character-labels d and b, the new language recognized by the automaton turns out to be not Wheeler. This example illustrates that the order of the underlying alphabet plays a significant role when determining whether an automaton, and even a language, is Wheeler. Notice that swapping d and b can be seen as considering a different order on : from a c d f to a d c f . Again a natural question is raised: when presented with an automaton (or a language) that is not Wheeler, is there a different way to order the alphabet so that the automaton (language) becomes Wheeler? Being able to answer in polynomial time to this this question would be very helpful for applications. Unfortunately, as shown in Section 3, this problem turns out to be NP-complete even for DFAs.

WNFAs have the following useful property: turning a WNFA into a DFA using the (classic) powerset construction actually results in a WDFA with at most twice the number of states of the original WNFA. In other words, the blow-up of states that we can observe when converting NFAs into DFAs, does not occur for Wheeler non-deterministic automata. Nevertheless, it is known (see [ 1 ]) that a blow-up of states can occur even when we switch from the minimum DFA recognizing a language L to the minimum WDFA recognizing L. As a last contribution, in this paper we give an answer to a question put forward in [ 1 ]. We provide an algorithm to compute the minimum WDFA starting from the minimum DFA. Our algorithm works in exponential time in the worst case and, since the dimension of the output might be exponential in the dimension of the input, is optimal for the task.

Due to space constraints, we present only the sketches of the proofs of our results. The full proofs, complete with all the missing details, can be found in the Appendix. 2

Wheeler languages First of all, we fix some notation. Let denote a finite alphabet endowed with a total order ( ; ). We denote by the set of finite words over , with " being the empty word, and we extend the order over to the co-lexicographic order ( ; ), where if and only the reverse of , i.e. read from the right to the left, precedes lexicographically the reverse of . Given two words ; 2 , we denote by a the property that is a suffix of . For a language L , we denote by Pref(L) the set of prefixes of strings in L. We denote by A = (Q; q0; ; F; ) a finite automaton (NFA), with Q as set of states, q0 initial state, : Q ! 2Q transition function, and F Q final states. The dimension of A, denoted by jAj, is defined as the number of states of A. An automaton is deterministic (DFA) if j (q; a)j 1, for all q 2 Q and a 2 . As customary, we extend to operate on strings as follows: for all q 2 Q, a 2 and 2 (q; ") = fqg; (q; a) =

[ v2 (q; ) (v; a): We denote by L(A) = f 2 : (q0; ) \ F 6= ;g the language accepted by the automaton A. We make the assumption that every automaton is basic, that is, every state is reachable from the initial state and every state can reach at least one final state. Notice that this assumption is not restrictive, since removing from a NFA every state not reachable from q0 and every state from which is impossible to reach a final state can be done in linear time and does not change the accepted language. It immediately follows that: – there might be only one state without incoming edges, namely q0; – every word that can be read starting from q0 belongs to Pref(L). Lastly, given a finite set Q, totally ordered by the relation <, we say that I Q is an interval if and only if for all q; q0; q00 2 Q with q q0 q00, if q; q00 2 I then q0 2 I. We denote by I := [qmin; qmax] a generic interval, where qmin (qmax) is the minimum (maximum) element of I with respect to <.

The class of Wheeler automata has been recently introduced in [ 13 ]. An automaton in this class has the property that there exists a total order on its states that is propagated along equally labeled transition. Moreover, the order must be compatible with the underlying order of the alphabet: Definition 1 (Wheeler Automaton). A Wheeler NFA (WNFA) A is a NFA (Q; q0; ; F; ) endowed with a binary relation <, such that: (Q; <) is a linear order having the initial state q0 as minimum, q0 has no in-going edges, and the following two (Wheeler) properties are satisfied. Let v1 2 (u2; a2): (u1; a1) and v2 2 (i) a1 a2 ! v1 < v2 (ii) (a1 = a2 ^ u1 < u2) ! v1

v2.

A Wheeler DFA (WDFA) is a WNFA in which the cardinality of (u; a) is always less than or equal to one.

In Figure 1 is depicted an example of a WDFA.

Remark 1. A consequence of Wheeler property (i) is that A is input-consistent, that is all transitions entering a given state u 2 Q have the same label: if u 2 (v; a) and u 2 (w; b), then a = b. Therefore the function : Q ! that associate to each state the unique label of its incoming edges is well defined. For the state q0, the only one without incoming edges, we set (q0) := #.

In [ 13 ] it is shown that WDFAs have a property called path coherence: let A = (Q; q0; ; F; ) be a WDFA according to the order (Q; <). Then for every interval of states I = [qi; qj ] and for all 2 , the set J of states reachable starting from any state of I by reading is also an interval. Path coherence allows us to transfer the order < over the states of Q to the co-lexicographic order over the words entering the states: for each state q define the set Iq = f : (q0; ) = q)g. With abuse of notation, we extend to subsets of as follows: U V , with U; V , if and only if for each 2 U; 2 V such that f ; g 2= U \ V . Then, two states q and p with Iq 6= Ip satisfy q < p if and only Iq Ip (again proved in [ 1 ]). An immediate consequence of this fact is that a WDFA admits an unique order of its states that makes it Wheeler and this order is univocally determined by the co-lexicographic order of any word entering its states. This result is important for two different reasons. First of all, it makes possible to decide in polynomial time whether a DFA is Wheeler: for each state q, pick a word q entering it and order the states reflecting the co-lexicographic order of the words { q : q 2 Qg; then check if the order satisfies the Wheeler conditions. Secondly, it is the key to adapt Myhill-Nerode Theorem to Wheeler automata. We recall the following defintion.

2 Definition 2 (Myhill-Nerode equivalence). Let L Given a word , we define the right context of as

be a language. 1L := f 2 :

The (classic) Myhill-Nerode Theorem, among many other things, establishes a bijection between equivalence classes of L and the states of the minimum DFA recognizing L. This minimum automaton is also unique up to isomorphism and a similar result, fully proved in [ 2 ], holds for Wheeler languages as well.

In order to state such an analogous of Myhill-Nerode Theorem for Wheeler languages, the equivalence L is replaced by the equivalence cL defined below.

Definition 3. The input consistent, convex refinement follows. cL if and only if c of L

L is defined as – L , – and – for all end with the same character, and 2 2 Pref(L), if min( ; ) max( ; ), then L

L . : : :

The Myhill-Nerode Theorem for Wheeler languages proves that there exists a minimum (in the number of states) WDFA recognizing L. As in the classic case, states of the minimum automaton are, in fact, cL-equivalence classes, this time consisting of intervals of words. Also such WDFA is unique up to isomorphism.

A further important consequence, especially for testing Wheelerness, is stated in the following Lemma (again proved in [ 2 ]).

Lemma 1. A regular language L is Wheeler if and only if all monotone sequences in (Pref(L); ) become eventually constant modulo L. In other words, for all sequences ( i)i 0 in Pref(L) with 1 2 : : : i or 1 2 i : : : there exists an n such that h k, for all h; k

Lemma 1 shows how it is possible to recognize whether a language L is Wheeler simply by verifying a property on the words of Pref(L): trying to find a WDFA that recognizes L is no longer needed to decide the Whelerness of L. As it turns out, we can verify whether the property depicted in Lemma 1 is satisfied just by looking at any DFA recognizing L, as shown in Theorem 1 (see [ 2 ]). Theorem 1. Let A be a DFA such that L = L(A), with initial state q0 and dimension n = jAj.

L is not Wheeler if and only if there exist ; and in , with a ; , such that: 6 L and they label paths from q0 to states u and v, respectively; labels two cycles, one starting from u and one starting from v;

or ; .

The length of the words ; and satisfying the above can be bounded:

4. j j; j j j j

n3 + 2n2 + n + 2.

Since in this work we make an extensive use of Theorem 1, here is a simple example on how and why it works. Consider the automata depicted in Figure 3. As shown in Figure 1 and 2, the language Ld is Wheeler whereas Lb is not, as can be easily proved using Theorem 1. In fact, consider the automaton Ab. By setting := a, := b and := c, one can verify conditions 1-3 of the theorem are satisfied. Notice that condition a 6 Lb b follows immediately from the fact that (q0; a) 6= (q0; b) in the minimum DFA Ab. c q1 c q2 f q3 f q3 (a) The minimum DFA Ad recognizing Ld = ac + dc f . (b) The minimum DFA Ab recognizing Lb = ac + bc f .

Fig. 3. The minimum DFAs recognizing the languages Ld (Wheeler) and Lb (not Wheeler).

If we try to transpose the same reasoning to the automaton Ad by setting = a, = d and = c, condition 3 of Theorem 1 is no longer satisfied. We can not find 3 words satisfying conditions 1-3 of Theorem 1, therefore Lb is Wheeler.

The polynomial bound given by condition 4 of Theorem 1 allows us to design an algorithm that decides whether a given DFA recognizes a Wheeler language: using dynamic programming (see [ 1 ]), it is possible to keep track of all the relevant paths and cycles inside the DFA and check, in polynomial time, whether there exists three words satisfying the conditions of the theorem.

Things change if, instead of a DFA, we are given a NFA. Trying to exploit the same idea used for DFAs does not work: the problem of deciding whether two words and read by a NFA are Myhill-Nerode equivalent is PSAPCEcomplete, whereas is polynomial on DFAs (simply compute the minimum DFA using Hopcroft’s algorithm and check whether the two states reached by and coincide). Even worse, the straightforward attempt of building the minimum DFA recognizing the NFA’s language might lead to a blow-up of the sates, resulting in a exponential time (and exponential space) algorithm.

We show that the problem of deciding whether a NFA recognizes a Wheeler language is indeed hard, but doesn’t require exponential time to be solved: the problem turns out to be PSPACE-complete. To show this, we first need to adapt Theorem 1 to work on NFAs, as described in the following corollary. Corollary 1. Let A = (Q; q0; ; F; ) be a NFA of dimension n := jAj. Then L := L(A) is not Wheeler if and only if there exist three words ; ; , with a ; , such that 1. 2.

i 6 L j for all 0 i; j 2n; labels two cycles, one starting from a state p 2 state r 2 (q0; ); (q0; ) and one from a or

; .

The length of the words ; and satisfying the above can be bounded:

Proof (Sketch). (=)) Let D be the minimum DFA recognizing the language L, with initial state q^0. Recall that D might have up to 2n states. From Theorem 1 we know that L is not Wheeler if and only if there exist ^; ^ and ^ satisfying conditions 1-4 of Theorem 1. Although there is not a perfect correspondence between cycles in A and cycles in D, whenever we find in D a path ^ (respectively, ^) and a cycle ^ both ending in the same state u^ (v^), we can find in A a path = ^^i ( = ^^j ) and a cycle 1 = ^k1 ( 2 = ^k2 ) both ending in the same state u (v), for some i; j; k1; k2 n. Note that the word ^(n+1)k1k2 labels two cycles starting from u and v. Moreover, we have j ^(n+1)k1k2 j (n + 1)j^k1k2 j (n + 1)j^j (i + 1)j^j > j^^ij; and similarly j^(n+1)k1k2 j > j^^j j. Since j^j 2 O(23n), we have j^(n+1)k1k2 j 2 O(n3 23n) and therefore the words ; and := ^(n+1)k1k2 satisfy conditions 2-4 of this corollary. Finally, condition 1 is satisfied for all i; j 0 because when reading i and j on D, they reach different states. ((=) As we did while proving the opposite direction, given three words ; ; in A satisfying condition 1-3, we can always find, for some i; j; k 2n, three words ^ = i, ^ = j and ^ = k in D that almost satisfy condition 1-3 of Theorem 1. The only requirement that might be missing is that ^ 6 L ^. The upper bound 2n in condition 1 ensures that this requirement is fulfilled, hence we can apply Theorem 1 to conclude that L is not Wheeler.

Despite the fact the the bound in condition 4 has become exponential by switching to NFAs, it is still possible to check in polynomial space (but exponential time) whether there are three words ; and satisfying the conditions of Corollary 1. Thus we can prove the following: Proposition 1. Given a NFA A, deciding whether the language L := L(A) is Wheeler is PSPACE-complete.

Proof (Sketch). To prove that the problem is in PSPACE (=NPSPACE), we guess (using non-determinism) three words ; and satisfying the conditions of Corollary 1. Since this words are too long to be stored, we first guess their length bit by bit and then we guess their characters one by one, starting from the last one and proceeding from right to left. Meanwhile, for all 2 f ; ; g and for all state t of A, we follow backwards the edges of A labeled as the character of that we are guessing. This way, we can compute the set of states from which is possible to reach t by reading . This allows us to find, for all i; j 2n, the sets Qi := (q0; i) and Qj := (q0; j ). We can then test whether i 6 L j by confronting the language accepted by the automaton A with set of initial states Qi and the language accepted by the automaton A with set of initial states Qi . This can be done in polynomial space since the problem of deciding whether two NFAs recognize the same language is known to be PSPACE-complete.

To prove the hardness of the problem, we show a polynomial reduction from the universality problem for NFA, i.e. the problem of deciding whether the language accepted by a NFA A over the alphabet is such that L(A) = . Let A = (Q; q0; ; F; ) be a NFA and let L := L(A). We can assume, without loss of generality, that q0 2 F , otherwise A would not accept the empty word and we could immediately derive that L 6= . Starting from A, we build in constant time a NFA A00 over the alphabet [ fa; b; cg that recognizes the language L00 := a (Lc)

L + b ( + c) ; where a; b; c are three characters not in and such that a b c (the order of the characters of is irrelevant in this proof). If we prove that L = if and only if L00 is Wheeler, the reduction is complete and the thesis follows. Notice that, since " 2 L, the following property holds:

L = () (Lc)

L = ( + c) : (1) Let us prove that L = implies that L00 is not Wheeler. If L = , then by (1) we have L00 = (a + b)( + c) . The minimum DFA recognizing L00 has only one cycle, therefore from Theorem 1 it follows that L00 is Wheeler. We next prove that L00 not Wheeler implies L = . Notice that a 1L00 = (Lc) L so that, by (1) if L 6= , then a 1L00 = (Lc) L 6= ( + c) . However we have b 1L00 = ( all n, both a L00 acn+ancd) b, thLu0s0 bacn6 Lh0o0ldb.. MThoerreeofvoerre, tihtecamnobnoetpornoevseedqutheantc,efor ac bc ac2 bc2 acn bcn : : : is not constant modulo Wheeler. 3

Generalized Wheelerness

L00 and from Lemma 1 it follows that L(A00) is not

As we have already pointed out in the introduction, changing the underlying order of the alphabet might turn a Wheeler language into a not Wheeler one and vice versa. For instance, consider again the Wheeler languages Ld and the regular (but not Wheeler) language Lb depicted in Figure 3. If we change the order of from a c d f to a d c f , the Wheeler language Ld turns into a non-Wheeler language (isomorphic to Lb under the isomorphism ' between alphabets that fixes characters a; c; f and sends d into '(d) = b). Hence, by not fixing an apriori order of the alphabet we enlarge the class of languages.

Definition 4 (Generalized Wheelerness). A NFA A over the alphabet is called a Generalized Wheeler Automaton (GWNFA) if and only if there exists an ordering of the elements of that makes A Wheeler.

A language L is called generalized Wheeler (for short GW) if and only if there exists a GWNFA that recognizes L.

Let A be a WDFA. Then, every word that labels a cycle in A is primitive (see [ 2 ]), that is there exists no 6= " and i > 1 such that = i. A direct consequence of this property is that Wheeler languages form a subclass of star-free languages, i.e. the class of languages that can be defined by a regular expression not containing the Kleene star. Since star-free expressions, and thus star-free languages, are closed under permutations of the alphabet, even GW languages must be a subclass of star-free languages. Here we show that the inclusion is strict, therefore GW languages must be studied separately.

Proposition 2. If j j 2, then the set fL : L is GWg is a proper subset of fL : L is star-freeg.

Proof. Let a; b be two distinct characters of , and consider the language L = a(aba) a + ba(aba) b. It is possible to prove that L is star-free, but L is not GW: consider the sequence ( i)i 2 with 2n = a(aba)n and 2n+1 = ba(aba)n. Since, for all i, the word i is a prefix of i+1, independently from how a and b are ordered we have aaba baaba aabaaba baabaaba a(aba)i ba(aba)i : : : Moreover, for all i we have a(aba)i 6 L ba(aba)i, since a(aba)i a belongs to L but ba(aba)i a does not. We can then apply Lemma 1 to conclude that L is not Wheeler. Since this result does not depend on the order of the alphabet, L is not GW.

We mentioned that we can decide in polynomial time whether a DFA is Wheeler. On the contrary, deciding whether a NFA is Wheeler is NP-complete, even when we bound the outdegree of each state of the NFA to be at most 5 (see [ 8 ]). As one may expect, deciding whether a NFA is a GWNFA is not easier than deciding whether a NFA is Wheeler. In fact, we show in Proposition 3 that the problem is NP-complete. We actually prove a stronger result in Proposition 4: even deciding whether a DFA is a GWNFA is NP-complete. Since the proof of Proposition 4 is cumbersome, we decided to present also it weaker version, i.e. Proposition 3, which shows a more natural reduction from the problem of deciding whether a NFA is Wheeler. Both proofs can be found in the Appendix. It is worth noticing that the proof of Proposition 3 can be adapted to work even on DFAs, hence giving an alternative way to prove that deciding whether a DFA is a GWNFA is NP-complete. Nonetheless, Proposition 4 is still stronger, since it also proves that deciding whether a DFA recognizes a GW language is NP-complete.

Proposition 3 (GWNFA hardness). Let A be a NFA. Deciding whether A is a GWNFA is NP-complete.

Proposition 4 (GWDFA and GW languages hardness). Let L be a language and A be a DFA. Both the problems of deciding whether A is a GWNFA and deciding whether L is GW are NP-complete.

DFA to WDFA As discussed in Section 1, a Wheeler automaton can be represented more compactly than a generic finite automaton. Therefore, if we are given a Wheeler language L represented as a DFA A that recognizes L, we may be tempted to look for a WDFA A0 that recognizes the same language to achieve a better, i.e. more compact, representation. Unfortunately, this approach might not work in our favor: in [ 1 ], a family (Lm)m 1 of Wheeler languages with the property that the dimension of the minimum WDFA recognizing Lm is exponential in the dimension of the minimum DFA recognizing Lm is presented. Note that we can always assume that, whenever we are given a DFA or a WDFA, the automaton is minimum. In fact, both tasks of minimizing a DFA and minimizing a WDFA can be done in polynomial time (see [ 9 ] for DFAs and [ 1 ] for WDFAs). Here we answer (positively) to the open question put forward in [ 1 ] whether there exists an algorithm to compute the minimum WDFA starting from the minimum DFA. Our algorithm works in exponential time in the worst case and, since the dimension of the output might be exponential in the dimension of the input, is optimal for the task.

Let L = L(A) be the language recognized by the given (minimum) DFA A. Recall that there is a 1 to 1 correspondence between the cL-classes and the states of the minimum WDFA recognizing L. First of all, our algorithm identifies a representative for each cL-class; to be able do this in exponential time, we first need to put a bound on the length of such representatives.

Lemma 2. Let A be the minimum DFA recognizing the Wheeler language L = L(A) over the alphabet , and let C1; :::; Cm be the pairwise distinct equivalence classes of cL. Then, for each 1 i m, there exists a word i 2 Ci such that j ij < n + n2, where n := jAj.

Proof (Sketch). Suppose by contradiction that there exists a class Ci such that for all 2 Ci it holds j j n + n2, and let 2 Ci be a word of minimum length. Since A has n states, there must exists a factor of that corresponds to a cycle in the run of on A. Let be the word obtained by erasing such factor fmroinmima.liStyinocef iatnfodllowesntdhsaitn t h6ecLsa m,tehsetraetfoeroeftAhe,rweemhuastveexistsLa w.oFrdromththaet is not Myhill-Nerode equivalent to and that is included (co-lexicographically) between and . We can further assume that the length of is greater than n2, hence we can identify a common factor of and that labels two cycles in the runs of and on A. We can then apply Theorem 1 to conclude that L is not Wheeler, a contradiction.

The algorithm works as follow: first of all, it generates the list containing every word of length less than n + n2 that can be reand+onn2 Awo.rTdhs.isTchaennb,ewdeoonredeinr exponential time since the list contains at most j j the list co-lexicographically and we apply the following proposition. Proposition 5 (Minimum DFA to minimum WDFA). Let A be the minimum automaton recognizing L = L(A) with jAj = n, over the alphabet with j j = . Let d := n + n2 and define

Pref(L) d := f 2 Pref(L) : j j dg: Assume that we are given the elements of Pref(L) d in co-lexicographic order, i.e. Pref(L) d = f 1; :::; kg with i i+1. Then it is possible to build the minimun WDFA recognizing L in O(k + n2 m log m) time, where m is the dimension of the output.

Proof (Sketch). From Lemma 2 we know that each cL-class has at least one representative in Pref(L) d. By considering the list [ 1 ]L; : : : ; [ k]L we can spot such representatives: i represents a cL-class if and only if i and i+1 belong to different L-classes or they belong to the same class but their last character is different.

Each representative of a cL-class become a state of our WDFA. To compute the edges of the automaton, we simply check, for all representative and for all c 2 , the cL-class of c. 5

Conclusions Having an order on the set of states of an automaton that is consistent with a given order of the underlying alphabet has important implication from a practical point of view. In fact, deterministic or even non-deterministic finite state automata enjoying this property—Wheeler automata—can be used to efficiently analyse the language they accept by standard tools.

In this work we established a few limitations along the directions that one can imagine to take in order to extend the nice properties of Wheeler languages.

More specifically, we proved that adding as a degree of freedom the possibility to re-order the underlying alphabet produces a significantly more complex (NPcomplete) class of languages. In addition, the polynomial test we have for testing whether a language is Wheeler when the latter is presented by a deterministic automaton, turns out much more complex (PSPACE-complete) if the language is presented by a non-deterministic one. The picture is completed by explicitly giving an algorithm that turns a DFA into a Wheeler automaton, whenever this is possible.

Proving the above limitations should clarify the role played by apparently secondary aspects of the definition of Wheeler language. This kind of study should help to better understand the nature of Wheeler languages, with the ultimate goal of singling out any feature that may admit some sort of extension.

Consider, in more general terms, the following open problem: is there a class of automata A properly extending the class of Wheeler automata and such that their deterministic equivalent DA is such that jDAj 2 poly(jAj)?

Appendix Proof of Corollary 1. Let D = (Q^; q^0; ^; F^; ) be the minimum DFA recognizing L. Clearly D has at most 2n states. ((=) From condition 2 it follows that Pref(L), so consider the following list of 2n + 1 states of D: ^(q^0; 0); ^(q^0; 1); : : : ; ^(q^0; Since D has at most 2n states, there must exist two integers 0 h < k 2n such that ^(q^0; h) = ^(q^0; k). Therefore k h labels a cycle starting from ^(q^0; h). Similarly, there exist 0 h0 < k0 2n such that k0 h0 labels a cycle starting from ^(q^0; h0 ). The words ^ := ^ := h h0 ^ := lcm(k h;k0 h0) 2n ; where the factor 2n in the definition of ^ ensures that j^j; j^j j^j, satisfy condition 2 of Theorem 1. Condition 1 of Theorem 1 follows automatically from conditions 1 of this corollary. Lastly, condition 3 of Theorem 1 follows from conditions 3 of this corollary and the fact that a ; : Thus we can apply Theorem 1 to conclude that L is not Wheeler. (=)) Since L = L(D) is not Wheeler, let ^; ^; ^ be as in Theorem 1. The DFA D has at most 2n states, hence the length of ^ is bounded by the constant 23n + 2 22n + 2n + 2. We have ^^ Pref(L), so let t0 = q0; t1; : : : ; tm be a run of ^^n over A. We set u := j^j and g := j^j, and consider the list of n + 1 states tu; tu+g; tu+2g; : : : ; tu+ng = tm Since A has n states, there must exist two integers 0 h < k n such that tu+hg = tu+kg. That is, there exists a state p := tu+hg such that p 2 q0; ^^h and ^k h labels a cycle starting from p. We can repeat the same argument for a run of ^^n over A to find a state r and two integers h0; k0 such that r 2 (q0; ^^h0 ) and ^k0 h0 labels a cycle starting from r. We can then define the words := ^^h := ^^h0 := ^lcm(k h;k0 h0) n which satisfy the conditions 2 and 3.

Condition 4 is satisfied since j^j 23n +2 22n +2n +2 and lcm(k h; k0 h0) < n2. Finally, condition 1 is satisfied for all i; j 0. Indeed, for all l the words ^ and ^^l lead to the same state of D, thus ^

i = L^^^s^i,l.aSnidmisliamrliyla,rfloyr, a8ljl l9swje saulcsoh hthavaet ^ L ^^l. Since 8i 9si such that

j = ^^sj , the thesis follows from ^ 6 L ^. tu

Proof of Proposition 1. First of all we need to prove that the problem is in PSPACE. We will show instead that its complement is in NPSPACE, then the thesis follows from Savitch’s Theorem, which states that NPSPACE = PSPACE, and the fact that PSPACE is closed under complementation. We prove that checking the conditions in Corollary 1 is in NPSPACE. We can use non-determinism to guess, bit by bit, the length of ; and and store this guessed information in three counters u; v; g respectively, using O(n) space for each. We also need to guess the ending states p; r 2 Q of ; . Then we start guessing the characters of ; and starting from their last one and proceeding backwards toward their first one, checking condition 3 of Corollary 1 in constant space. Whenever we guess a character of (respectively, ; ) we decrease by one the counter u (v; g), so we know when the guessing stops. If condition 3 is satisfied, we calculate (we will show later how) the sets (q0; ), (q0; ), (p; ), and (r; ) and check condition 2, that is, whether p 2 (q0; ), r 2 (q0; ), p 2 (p; ), and r 2 (r; ). If condition 2 is satisfied, we consider condition 1: for fixed i; j 2n consider the automata Ai and Aj obtained from the NFA L(A) by considering as initial states the sets (q0; i), (q0; j ), respectively. Notice that we have i 6 L j if and only if L(Ai ) 6= L(Aj ) and checking whether L(Ai ) = L(Aj ) can be done in polynomial space, since deciding whether two NFAs recognize the same language is a well-known PSPACE-complete problem.

To conclude the proof, we claim that we are able to calculate in polynomial space, for all q 2 Q and for all 2 f ; ; g , the set (q; ). While guessing character by character, we can compute, for each state q, the set Q ;q of the states from which is possible to reach q reading . To build Q ;q we start from the set Q0 := fqg and we follow backwards the edges entering q and labeled as the last character of . We call Q1 this new set of states and we repeat the process by following backwards the edges entering each state of Q1 and labeled as the second to last character of . Proceeding inductively, we compute the sets Q0; Q1; : : : Qu = Q ;q; notice that to calculate Qk+1 we only need Qk and the k-th to last character of , thus we can update (instead of storing) the set Qk. We can do the same for and to compute, for each q 2 Q, the sets Q ;q and Q ;q. Once we have stored, for all q 2 Q and for all 2 f ; ; g, the sets Q ;q, our claim follows easily. For instance, to compute (q0; ) we build, for all t 2 Q, the set

Q(t) := fq 2 Q : t 2 (q; )g =

[ p2Q ;t

Q ;p:

Then we have t 2 (q0; ) if an only if q0 2 Q(t).

To prove the completeness of the problem, we will show a polynomial reduction from the universality problem for NFA, i.e. the problem of deciding whether the language accepted by a NFA A over the alphabet is such that L(A) = .

Let A = (Q; q0; ; F; ) be a NFA and let L = L(A). We can assume without loss of generality that q0 2 F , otherwise A would not accept the empty word and we could immediately derive that L 6= . Let a; b; c be three characters not in and such that a b c with respect to the lexicographical order (the order of the characters of is irrelevant in this proof). First, we build the automaton A0 starting from A by adding an edge (qf ; q0; c) for each final state qf 2 F , see the top part of Figure 1. Notice that A0 recognizes the language L0 = L(A0) = (Lc) L, and it is straightforward to prove that L = if and only if L0 = ( + c) : if L = , let be a word in ( + c) containing n occurrences of c. Then = 0 c 2 c : : : n 1 c n for some 1; : : : ; n 2 . Hence 2 ( c) = L0. On the other hand, if L 6= let be a word in n L. Then c 2= L0.

A0 ; c

We build a second automaton A00 as depicted in Figure 4. Let L00 = L(A00) be the language recognized by A00. We claim that L = if and only if L00 is Wheeler. (=)) If L = , we have already proved that L0 = ( + c) . Hence we have L00 = (a + b) ( + c) . The minimum DFA recognizing L00 has only one loop, therefore by Theorem 1 L00 is Wheeler. ((=) If L 6= , let be a word in n L. Notice that 6= " since we assumed that " 2 L. Every possible run of over A must lead to a non-accepting state, hence c 2= L0. This implies that for all i 0 we have a ci c 2= L00 (notice that the only edge labeled c leaving q0 ends in q0). On the other hand, for all j 0 we have bcj c 2 L00, hence for all i; j 0 we have aci 6 L00 bcj . Thus the following monotone sequence in Pref(L00) ac bc acc bcc acn bcn : : : is not eventually constant modulo Wheeler.

L00 . From Lemma 1 it follows that L00 is not

tu Proof of Lemma 2. Suppose by contradiction that there exists a class Ci such that for all 2 Ci it holds j j n + n2, and let 2 Ci be a word of minimum length. Consider the first n + 1 states q0 = t0; :::; tn of A visited by reading the first n characters of . Since A has only n states, there must exist 0 i; j n with i < j such that ti = tj . Let 0 be the prefix of of length i (if i = 0 then 0 = "), let be the factor of of length j i labeling the path ti; :::; tj , and let be the suffix of such that = 0 . By construction, the words and mi:n=im a0liteyndofin tihtefoslalomwesstthaatet, h e6ncLce . L . Moreover, from j j < j j and the Suppose that , the other case being completely symmetrical. Since and share the same suffix , they end with the same character. This means that the words and , which are Myhill-Nerode equivalent but not cL equivalent, were not split into two distinct cL-classes due to input-consistency, therefore there must exists a word such that and 6 L . Formally, assume bbyy cdoenfitnriatdioicntioofn thcLa,titfowroaullldwfoorldlosw suchcL th,ata contradictiiotnh.olds L . Then, Let be a word such that and 6 L . From a ; it follows that a , so we can write = 0 for some 0 2 . Recall that by construction = 0 with j 0 j n, hence j j n2. Consider the last n2 + 1 states r0; :::; rn2 of A visited by reading the word , and the last n2 + 1 states p0; :::; pn2 visited by reading the word . Since A has only n states, there must exist 0 i; j n2 with i < j such that (ri; pi) = (rj ; pj ). Notice that it can’t be ri = pi, otherwise from the determinism of A it would follow rn2 = pn2 ; from the minimality of A iLtewto u00ldbethtehne fsoulffilowx of Lof l,enagctohntnr2adicjt,ioann.d let be the factor of of length j i labeling the path ri; :::; rj . Since j j n2, there exists 0 2 such that = 0 00. We can then rewrite ; and as Let k be an integer such that j kj is greater than j 0 0j and j 0 0j. Set := 0 0; from it follows that 0 0 0 0. If k set := 0 0, otherwise set := 0 0. In both cases, the hypothesis of Theorem 1 are satisfied, since k labels two cycles starting from the states ri and pi, that we have proved to be distinct. We can conclude that L is not Wheeler, a contradiction, and the thesis follows. tu Proof of Proposition 5. Consider the pairwise distinct equivalence classes C1; : : : ; Cm of the equivalence cL. Clearly, the minimum Wheeler automaton recognizing L has m states. We can assume without loss of generality that the equivalence classes are co-lexicographically ordered, i.e. Ci Ci+1 for all i < m. For sake of simplicity, given a word 2 we will write [ ] to indicate its equivalence class modulo L, and we will write [ ]W to indicate its equivalence class modulo cL.

Let K := f1; :::; kg and, for all i, let Ki := fj 2 K : [ j ]W = Cig. Since each Ci is convex in Pref(L), each Ki must be convex in K, that is each Ki is an interval. Therefore the list of equivalence classes [ 1 ]W ; :::; [ k]W must be partitioned in consecutive runs of the same class, each class appearing in one and only one run. From Lemma 2 we know that each equivalence class has at least one representative in Pref(L) d, hence the list [ 1 ]W ; :::; [ k]W must contain exactly m runs.

For all 1 j < k, we have [ j ]W 6= [ j+1]W if and only if where last( ) denotes the last character of . This means that we can identify the m runs of the equivalence cL just by looking at the two lists 1; :::; k and [ 1 ]; :::; [ k]: whenever last( j ) 6= last( j+1) or [ j ] 6= [ j+1], we know that a new cL run must start at j+1.

In O(k) we are able to determine the m runs and to pick a representative for each of them, i.e. we can find m indexes i1; :::; im such that for all 1 j m it holds ij 2 Ci. We call the set fai1 ; :::; aim g a fingerprint of the language L, i.e. a set of words that has cardinality m such that distinct elements of the set belong to distinct cL-classes.

We show how to build the minimum Wheeler DFA recognizing L, starting from any fingerprint of L and the standard minimum DFA recognizing L. Let f 1; :::; mg be a fingerprint of L and let A be the minimum DFA recognizing L. We can assume without loss of generality that 1 ::: m. We build the automaton AW = (Q; 1; ; F; ), where the set of states is Q = f 1; :::; mg and the set of final states is F = f j : j 2 Lg. The transition function can be computed as follow. For all 1 j m and for all c 2 , check whether j c 2 Pref(L). If j c 2= Pref(L), there are no edges labeled c that exit from j . If instead j c 2 Pref(L), locate j c using a binary search. There are three possible cases. 1. j c 1. Then ( j ; c) = 1. 2. m j c. Then ( j ; c) = m. 3. There exists s such that s j c s+1. It can not be the case that both j c 6 L s and j c 6 L s+1, since f 1; :::; mg is a fingerprint of L. Hence we distinguish three cases. (a) (b) (c) s L j c 6 L s+1. Then ( j ; c) = s. s 6 L j c L s+1. Then ( j ; c) = s+1.

s L j c L s+1. Since f 1; :::; mg is a fingerprint of L, it is either c = last( j c) = last( s), in which case ( j ; c) = s, or c = last( s+1), in which case ( j ; c) = s+1.

Proof of Proposition 3. The problem is in NP, since we can use non-determinism to guess the order of the alphabet and then check whether such order makes the NFA Wheeler.

To prove the hardness, we show a polynomial reduction from the problem of deciding whether a NFA is Wheeler. Let A be a NFA with initial state q0, over the alphabet = fa1 : : : ; a g ordered by the relation a1 a . We want to build a new automaton A0 such that A0 is a GWNFA if and only if A is Wheeler. A0 will be an automaton of size jAj + O( ) over the alphabet 0 of size O( ).

The automaton A0 will be built starting from A and adding extra states and transitions. We define the new alphabet as 0 = fa1; : : : ; a ; x1; : : : ; x 1; e; f g; with xi; e; f 2= , and we add two final states qe and qf . We then build 1 gadgets, one for each pair of consecutive characters (ai; ai+1) of , each one connected to A[fqe; qf g as depicted in Figure 5. This completes the construction of the automaton A0. Notice that A is input-consistent, but in general it is not deterministic.

We want to show that A is Wheeler according to the order ( ; ) if and only if A0 is a GWNFA. (=)) Define the order 0 over 0 by setting We show that A0 is Wheeler according to ( 0; 0) by ordering its states. Since A is Wheeler, there already exists an order of its states that makes A Wheeler. Therefore, we simply need to extend this order to the states of A0. Recall that in [ 1 ] it has been proved that a WNFA is Wheeler if and only if, for each pair of states q; p such that Iq 6= Ip, either Iq Ip or Ip Iq holds, where by definition Iq Ip if and only if, for all 2 Iq and for all 2 Ip such that f ; g 6 Iq \ Ip, we have . Therefore we check, for each pair of states q and p in A0, that either Iq Ip (implying q < p) or Ip Iq (implying p < q) holds. Note that when Iq and Ip are disjoint, the condition Iq Ip translates to the following: for all 2 Iq and for all 2 Ip, . In the discussion that follows we never compare two states that belong to A, since the order between them is already established.

First of all, we will order, for each i, the states with incoming edge xi. Since xi 2= , the only states to compare are the one belonging to the gadget Gi, i.e. qi4; qi5; qi6; qi7. Consider the languages

I4 := Iq4 = xi(xiaixi)

i I5 := Iq5 = ai+1xi(xiaixi)

i I6 := Iq6 = xixi(aixixi)

i I7 := Iq7 = ai+1xixi(aixixi) :

i Since ai+1 0 xi we have I4; I5 I6; I7. Moreover, consider any two words 2 I4 and 2 I5. If j j < j j, we have a hence . If j j j j instead, then the last j j characters of must be aixi(xiaixi)m, where m is such that = ai+1xi(xiaixi)m. From ai 0 ai+1 we still get , hence I4 I5. Similarly we can prove that I6 I7. It immediately follows that we need to order the states as follows: qi4 < qi5 < qi6 < qi7.

Secondly, for each 1 i we need to sort the states with incoming edges labeled ai. Note that the automaton A might contain states with incoming label ai and we need to consider such states as well. For i = , the task is easy. The only gadget with a state labeled a is G 1, and such state is q3 1. Notice that Iq3 1 = fa g. Let q be a state of A with (q) = a . Since every word in Iq ends with a , it trivially follows that Iq3 1 Iq. If Iq 6= fa g, we are forced to set q3 1 < q. If instead Iq = fa3 g1,<thqe. order of q and q3 1 does not matter. For sake of consistency, we set q For i = 1, the only gadget with states labeled a1 is G1, and such states are q11 and q12. Let q be a state of A with (q) = a1 and consider the languages I1 := Iq1 = x1x1a1(x1x1a1)

1 I2 := Iq2 = a2x1x1a1(x1x1a1) :

For all 2 Iq, we have either = a1 or = 0aja1 for some 0 2 and some 1 j . In both cases, must precede co-lexicographically every word of I1 and I2, thus Iq I1; I2. To compare I1 and I2, consider any two words 2 I1 and 2 I2. If j j < j j, we have a hence . If j j j j instead, then the last j j characters of must be a1x1x1a1(x1x1a1)m, where m is such that = a2x1x1a1(x1x1a1)m. From a1 <0 a2 we still get , hence I1 I2. It follows that q11 and q12 must follow q (and all the other states of A with label a1), and we must set q11 < q12.

Lastly, if 1 < i < we should compare the sates qi3 1; qi1 and qi2 with the sates in Qi := fq 2 A : (q) = aig. Applying both of the reasoning discussed in the cases i = 1 and i = , we can conclude that the state qi3 1 precedes all the states in Qi and that the states qi1 and qi2 follow all the states in Qi and must be ordered as qi1 < qi2.

The order of the states of A0 that we described makes A0 Wheeler with respect to ( 0; 0), hence making it a GWNFA. ((=) If A0 is a GWNFA, then there exists an order 0 over 0 that makes A0 Wheeler. Since A is a sub-automaton of A0, it follows that even A is Wheeler according to 0. Let e be the restriction of 0 over the alphabet ; we want to show that e is the same order as . Assume by contradiction that e 6= . If for all 1 i < we have ai e ai+1, then e = , a contradiction. Hence there exists 1 i < such that ai+1 e ai. Since 0 extends e , this implies that ai+1 0 ai. We will show that A0 is not Wheeler according to 0, a contradiction. Define the words := xi; := ai+1xi and := xiaixi. From a and ai+1 0 ai we have ; . The word labels two cycles in A0 starting from two distinct states, i.e. qi4 and qi5. Moreover, and label two paths that start from the initial state q0 and end in qi4 and qi5 respectively. Since qi4 and qi5 are not MyhillNerode equivalent, we can apply Theorem 1 to conclude that A0 is not Wheeler according to 0, a contradiction. Thus e and coincide.

We have shown that A is Wheeler according to 0, and that 0 extends . Therefore we can conclude that A is Wheeler according to . tu Definition 5 (Betweenness). Input: a list Y of n distinct elements Y = y1; :::; yn and k < n3 ordered triples (a1; b1; c1); :::; (ak; bk; ck), where each element of each triple belongs to Y . Elements belonging to the same triple are distinct.

Output: yes/no answer. The answer is “yes” if and only if there exists a total order < of Y such that, for each k, either ak < bk < ck or ak > bk > ck. Proof of Proposition 4. We can prove that both problems are in NP using an argument similar to the one employed in the proof of Proposition 3.

To prove the hardness, we show a polynomial reduction from the betweenness problem to both of the problems described; we will use exactly the same reduction for both problems. We start from an instance I = (Y; K) of the betweenness problem, where Y is the set Y = fy1 : : : ; yng and K P(T 3) is a collection of k triples (a1; b1; c1); : : : ; (ak; bk; ck), for some 1 < k < n3. We build a DFA A of size O(n + k), over an alphabet of size O(n + k). The alphabet is = Y [ fx1 : : : ; xk; e; f g, where we introduce a new character xi for each triple (ai; bi; ci) 2 K and two extra “ending” characters e and f . To build G, we start with the initial state q0 connected with n states q1; : : : ; qn through the edges (q0; yj ; qj ) for each 1 j n. We also add two sinks qe and qf , the only final states. We add the states qi1j ; qi3j ; qi5j (see Figure 6) and the transitions ci = ym qj qm qi5j xi xi

xi qi6m xi bi qi1j qi2m bi xi qi3j e f xi qi4m qe qf

Fig. 6. The gadget Gi related to the i-th triple. (qj ; xi) = qi1j ; (qi1j ; xi) = qi3j ; (qi3j ; bi) = qi5j ; (qi5j ; xi) = qi1j and (qi1j ; e) = qe. We repeat the same process with ci: given the integer m such that ci = ym, we add the states qi2m; qi4m; qi6m and the transitions (qm; xi) = qi2m; (qi2m; xi) = qi4m; (qi4m; bi) = qi6m; (qi6m; xi) = qi2m and (qi2m; f ) = qf : Lastly, we remove the states among q1; : : : ; qn that don’t have outgoing edges. More formally, we define the sets A := fa1; : : : ; akg and C := fc1; : : : ; ckg and we remove from G all the states qj such that yj 2= A [ C. We show that the instance I = (Y; K) of the betweennes problem is satisfiable if and only if A is a GWNFA, if and only if L(A) is GW. (=)) Since I = (Y; K) is satisfiable, there exists an ordering : Y ! f1; :::; ng of the elements of Y satisfying I. We order as follows: 1(1) ::: 1(n) x1 xk e f: This ordering induce a partial order on the states of A, where states with different incoming labels are ordered by such labels. Therefore we only need to order the states of A with the same incoming label.

For each 1 i k, the only states of A with incoming label xi are qi1j; qi3j; qi2m; qi4m, where j and m are integers such that ai = yj and ci = ym. Since, by construction, the order satisfies the instance I, then only two cases can occur: either (ai) < (bi) < (ci), or (ci) < (bi) < (ai). In the first case, we set qi1j < qi2m < qi3j < qi4m. To realize that this is in fact the correct order of the states, consider the following languages:

I1 := Iqi1j = f 2

: (q0; ) = qi1jg = aixi(xibixi) I2 := Iqi2m = cixi(xibixi) I3 := Iqi3j = aixixi(bixixi)

I4 := Iqi4m = cixixi(bixixi) : Since, by construction, we have ai; bi; ci xi, it follows that I1; I2 I3; I4. Moreover, from (ai) < (bi) < (ci) we also have that I1 I2 and I3 I4, which completes the ordering. Symmetrically, if (ci) < (bi) < (ai) then we set qi2m < qi1j < qi4m < qi3j.

We still need to order the states of A whose incoming labels belong to Y . For each 1 p n, the states with incoming label yp belong to the sets V5 := fqi5j : bi = ypg or V6 := fqi6m : bi = ypg or Vp, where Vp = fqpg if qp is a state of A (i.e. if yp 2 A [ C) and Vp = ; otherwise. If Vp = fqpg, we set qp as the smallest state. We then sort the states of V5 and V6 by their first subscript; when the first subscript is equal, i.e. the two states that we want to confront are qi5j and qi6m (with ai = yj and ci = ym), then we set qi5j < qi6m if (ai) < (bi) < (ci) and we set qi6m < qi5j if (ci) < (bi) < (ai). This can be deduced by confronting the following languages:

I5 := Iqi5j = aixixibi(xixibi)

I6 := Iqi60m = ci0 xi0 xi0 bi0 (xi0 xi0 bi0 ) : If i < i0 then, by construction, we have xi xi0 , hence I5 I6. Symmetrically, if i0 < i then we have I6 I5. Lastly, if i = i0 then the order between I5 and I6 is determined solely by ai and ci0 = ci.

Since there is only one state with incoming label e and only one state with incoming label f , we have finished. The order described makes A Wheeler, thus A is a GWNFA and L(A) is GW. ((=) Assume that the instance I = (Y; K) of the betweenness problem is unsatisfiable. We prove that L(A) is not GW. Assume by contradiction that L(A) is GW, then there exists an ordering 0 of the elements of such that L(A) is Wheeler according to said order. Recall that Y and consider the order := 0jY . Since (Y; K) is unsatisfiable, must violate one of the constraints, i.e. there exists an 1 i k such that either (ai); (ci) < (bi) or (bi) < (ai); (ci). Define the words := aixi; := cixi and := xibixi; then it is either ; or ; (here the co-lexicographic order is calculated with respect to 0). By construction, labels two cycles in A starting from two distinct states, qi1j and qi2m, which are not Myhill-Nerode equivalent. Moreover, and label two paths that start from the initial state q0 and end in qi1j and qi2m respectively. We can then apply the Theorem 1 to conclude that L(A) is not Wheeler according to 0, a contradiction. Therefore L(A) is not GW, which automatically implies that A is not a GWNFA. tu

1. Alanko , J., D'Agostino , G. , Policriti , A. , Prezza , N.: Regular languages meet prefix sorting . In: Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms . pp. 911 - 930 ( 2020 ). https://doi.org/10.1137/1.9781611975994.55, https: //epubs.siam.org/doi/abs/10.1137/1.9781611975994.55

2. Alanko , J., D'Agostino , G. , Policriti , A. , Prezza , N.: Wheeler Languages . CoRR arXiv: 2002 . 10303 ( Feb 2020 )

3. Alanko , J. , Gagie , T. , Navarro , G. , Seelbach

Benkner

, L. : Tunneling on wheeler graphs . In: 2019 Data Compression Conference (DCC) . pp. 122 - 131 ( 2019 ). https://doi.org/10.1109/DCC. 2019 .00020

4. Backurs , A. , Indyk , P. : Which regular expression patterns are hard to match? In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS) . pp. 457 - 466 ( 2016 ). https://doi.org/10.1109/FOCS. 2016 .56

5. Equi , M. , Grossi , R. , Makinen , V.: On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree . In: ICALP 2019 - 46th International Colloquium on Automata, Languages and Programming . pp. 1 - 15 . Patras, Greece (Jul 2019 ), https://hal.inria.fr/hal-02338498

6. Equi , M. , Mäkinen , V. , Tomescu , A.I. : Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless seth fails . In: Bureš, T. , Dondi , R. , Gamper , J. , Guerrini , G. , Jurdziński , T. , Pahl , C. , Sikora , F. , Wong , P.W. (eds.) SOFSEM 2021: Theory and Practice of Computer Science . pp. 608 - 622 . Springer International Publishing, Cham ( 2021 )

7. Gibney , D. , Hoppenworth , G. , Thankachan , S.V. : Simple reductions from formula-sat to pattern matching on labeled graphs and subtree isomorphism . In: Le, H.V. , King , V. (eds.) 4th Symposium on Simplicity in Algorithms, SOSA 2021, Virtual Conference, January 11-12 , 2021 . pp. 232 - 242 . SIAM ( 2021 ). https://doi.org/10.1137/1.9781611976496.26, https://doi.org/10.1137/ 1.9781611976496.26

8. Gibney , D. , Thankachan , S.V. : On the hardness and inapproximability of recognizing wheeler graphs . In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11 , 2019 , Munich/Garching, Germany. pp. 51 : 1 - 51 : 16 ( 2019 ). https://doi.org/10.4230/LIPIcs.ESA. 2019 . 51 , https://doi.org/10.4230/ LIPIcs.ESA. 2019 .51

9. Hopcroft , J.E. : An n log n algorithm for minimizing states in a finite automaton . Tech. rep. , Stanford University ( January 1971 )

10. Potechin , A. , Shallit , J.: Lengths of words accepted by nondeterministic finite automata . Information Processing Letters 162 , 105993 ( 2020 ). https://doi.org/https://doi.org/10.1016/j.ipl. 2020 . 105993 , https: //www.sciencedirect.com/science/article/pii/S0020019020300806

11. Prezza , N.: On locating paths in compressed tries . In: Proceedings of the ThirtySecond Annual ACM-SIAM Symposium on Discrete Algorithms . p. 744 - 760 . Society for Industrial and Applied Mathematics, USA ( 2021 )

12. Shyr , H. , Thierrin , G.: Ordered automata and associated languages . Tamkang J. Math (5) , 9 - 20 ( 1974 )

13. Travis

Gagie

, Giovanni Manzini e Sirén, J.: Wheeler graphs: A framework for bwtbased data structures . Theoretical computer science 698 , 67 - 78 ( 2017 )