Incremental NFA Minimization
Christian Bianchini1 , Alberto Policriti1,* , Brian Riccardi1 and Riccardo Romanello1
1
    University of Udine, Italy


                                         Abstract
                                         Finite state automata are fundamental objects in Theoretical Computer Science and find their application
                                         in Text Processing, Compilers Design, Artificial Intelligence and many other areas. The problem of
                                         minimizing such objects can be traced back to the ‘50s and since then it has been the arena for developing
                                         new algorithmic ideas. There are two main paradigms to tackle the problem: top down — which builds
                                         a descending chain of equivalences by subsequent refinements — and bottom up — which builds an
                                         ascending chain of equivalences by aggregation of classes. The former approach leads to a fast 𝒪(𝑛 log 𝑛)
                                         algorithm, whereas the latter is currently quadratic for any practical application. Nevertheless, the
                                         bottom up algorithm enjoys the property of being incremental, i.e. the minimization process can be
                                         stopped at any time obtaining a language-equivalent partially minimized automaton. In this work we
                                         correct a small mistake in the algorithm given by Almeida et al. in 2014 and we propose a simple, DFS-like
                                         and truly quadratic incremental algorithm for minimizing deterministic automata. Furthermore, we
                                         adapt the idea to the nondeterministic case obtaining an incremental algorithm which computes the
                                         maximum bisimulation relation in time 𝒪(𝑛2 𝑟|Σ|), where 𝑛 is the number of states, Σ is the alphabet
                                         and 𝑟 is the degree of nondeterminism, improving by a factor of 𝑟 the running time of the fastest known
                                         aggregation based algorithm for this problem.

                                         Keywords
                                         Automata, Bisimulation, Minimization, Incremental


1. Introduction
Finite state automata are fundamental objects in Theoretical Computer Science and find their
application in Text Processing, Compilers Design, Artificial Intelligence and many other areas.
   The minimization of an automaton is the process of constructing a new (language-equivalent)
automaton which is minimal in the number of states. This problem can be traced back to the
‘50s by the work of Moore [1]. A fundamental result in Automata Theory is the Myhill-Nerode
Theorem [2], establishing that in the deterministic case this minimal automaton is in fact the
minimum, up to isomorphism. In the wider setting of nondeterministic automata there is no
analog result and finding any state-minimal automaton is PSPACE-complete [3]. For this reason,
a practical alternative is the minimization with respect to bisimulation. Bisimilarity is indeed


Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022
*
 Corresponding author.
" bianchini.christian@spes.uniud.it (C. Bianchini); alberto.policriti@uniud.it (A. Policriti);
riccardi.brian@spes.uniud.it (B. Riccardi); riccardo.romanello@uniud.it (R. Romanello)
~ http://users.dimi.uniud.it/~alberto.policriti/home (A. Policriti); https://riccardoromanello.github.io
(R. Romanello)
 0000-0002-2855-1221 (R. Romanello)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                           1
Christian Bianchini et al. CEUR Workshop Proceedings                                          1–13


a valid choice since in the deterministic case two states are bisimilar if and only if they are
Myhill-Nerode equivalent.
   Thus, the problem of minimizing automata reduces to the problem of computing bisimilarity
between states, which in turn is equivalent to determining the coarsest partition of a set stable
with respect to some binary relation. The two main paradigms to compute the aforementioned
partition are top down and bottom up.
   Top down algorithms start with the partition that separates states between final and non final
and subsequently refine the partition until it is stable. By a careful choice of which block of the
partition to split at each refining step, Paige and Tarjan [4] devised an algorithm that computes
the maximum bisimulation equivalence in time 𝒪(𝑚 log 𝑛), where 𝑛 is the number of states and
𝑚 is the number of transitions. The iconic Hopcroft’s Algorithm [5] (which Paige and Tarjan’s
solution is based on) deals with the special case of deterministic automata. Furthermore, it has
recently been proved that the bisimilarity computation requires Ω(𝑚 log 𝑛) time assuming top
down algorithms [6].
   On the contrary, bottom up solutions start with the finest partition — the one where each
state constitutes a singleton — and proceed by subsequently merging two blocks found to be
equivalent. For this reason, the technique is also known in the literature as partition-aggregation.
The main advantage of this paradigm is that the algorithm is incremental, i.e. it proceeds in
subsequent stages where at the end of each merging step the resulting automaton is language-
equivalent to the input one. In this way, the minimization process can be stopped at any
time and can be resumed later. The first algorithm of this kind is due to Watson [7]. After a
series of improvements Watson and Daciuk [8] reduced the running time to 𝒪(𝑛2 |Σ|𝛼(𝑛)) for
deterministic automata with 𝑛 states, alphabet Σ and where 𝛼(𝑛) is related to the inverse of
Ackermann’s function [9] which can be treated as a constant for any practical value of 𝑛1 . The
main idea is to propagate the definition of bisimilarity: if states 𝑝 and 𝑞 are equivalent, then
also their transitions by the same character must lead to equivalent states. This is done by a
recursive function Equiv which resembles an equivalence algorithm by Hopcroft and Karp [10].
A subsequent work by Almeida et al. [11] aimed at simplifying the algorithm by Watson and
Daciuk maintaining its running time. Unfortunately, there is a small mistake in their version of
Equiv which leads to a Ω(𝑛3 ) algorithm in the worst case.
   Above algorithms are focused on the minimization of deterministic automata. The nonde-
terministic case was tackled by Björklund and Cleophas [12] adapting ideas from Watson and
Daciuk. They devised an incremental algorithm for computing bisimilarity in time 𝒪(𝑛2 𝑟2 |Σ|),
where 𝑟 is the degree of nondeterminism.
   In this work we correct the algorithm by Almeida et al. providing a simplified version of
the one by Watson and Daciuk and maintaining the quadratic running time. The solution is
based on the concept of associated graph whose purpose is twofold: to distill the behaviour of
the aforementioned algorithms by interpreting them as graph colorings and to design our own
incremental procedure. Having established the clear connection between minimization and
graph coloring it is natural to generalize the algorithm for solving the bisimilarity problem on
nondeterministic automata. Furthermore, the proposed solution improves by a factor of 𝑟 the
running time of Björklund and Cleophas [13].
                               16
1
    It holds 𝛼(𝑛) ≤ 5 for 𝑛 ≤ 22 .


                                                 2
Christian Bianchini et al. CEUR Workshop Proceedings                                         1–13


   The paper is organized as follows: in the next section we give the basics about partitions,
relations, and automata. In Section 3 we briefly describe the algorithm by Almeida et al. and
we point out the mistake. In Section 4 we introduce our procedure on the deterministic case.
Finally, in Section 5 we lift the algorithm to the nondeterministic case. Some conclusions are
drawn in Section 6.


2. Preliminaries
2.1. Relations and Partitions
A binary relation from set 𝐴 to set 𝐵 is a subset 𝜌 ⊆ 𝐴 × 𝐵. Its size will be denoted by |𝜌|. We
say that 𝑎 ∈ 𝐴 is in relation with 𝑏 ∈ 𝐵 whenever (𝑎, 𝑏) ∈ 𝜌, and we denote this by writing 𝑎𝜌𝑏.
If we consider binary relations over 𝐴 (i.e. subsets of 𝐴 × 𝐴), it remains defined the identity
relation 𝜄𝐴 = {(𝑎, 𝑎) : 𝑎 ∈ 𝐴}.
   An equivalence relation (or equivalence) is a relation which is reflexive, symmetric and
transitive. Given 𝑎 ∈ 𝐴, the equivalence class of 𝑎 is the set [𝑎] = {𝑏 : 𝑎𝜌𝑏}. The quotient set
𝐴/𝜌 = {[𝑎] : 𝑎 ∈ 𝐴} forms a partition of 𝐴.

2.2. Languages and Automata
An alphabet is a non-empty set Σ of symbols. A string is a finite sequence 𝑤 = 𝑤1 . . . 𝑤𝑛 of
symbols. Σ* is the set of all finite length strings of symbols in Σ and we call a subset 𝐿 ⊆ Σ* a
language.                                                       ⟨︀            ⟩︀
   A nondeterministic finite state automaton (NFA) is 𝒩 = 𝑄, Σ, 𝑞0 , 𝛿, 𝐹 where 𝑄 is a non-
empty finite set of states, Σ is the alphabet, 𝑞0 is the initial state, 𝛿 : 𝑄 × Σ → 2𝑄 is the
transition function and 𝐹 ⊆ 𝑄 is the set of final states. The degree of nondeterminism is
𝑟 = max𝑥∈Σ,𝑞∈𝑄 {|𝛿(𝑞, 𝑥)|}. We say that 𝒩 is complete if |𝛿(𝑞, 𝑥)| ≥ 1 for every state and
symbol. In what follows we will assume complete automata: this is not a loss of generalities
since it is always possible to complete an automaton by adding one state and suitable transitions.
As usual, the transition function can be recursively extended to strings, i.e. 𝛿 * : 𝑄 × Σ* → 2𝑄 ,
still denoted by 𝛿.
   We say that state 𝑞 ∈ 𝑄 accepts a string 𝑤 ∈ Σ* if 𝛿(𝑞, 𝑤) ∩ 𝐹 ̸= ∅. The set of strings
accepted by 𝑞 is denoted by 𝐿(𝑞). The language accepted by the automaton is 𝐿(𝒩 ) = 𝐿(𝑞0 ).
A minimal automaton accepting 𝐿 has the minimum number of states amongst all automata
accepting 𝐿.
   A deterministic finite state automaton (DFA) is a NFA 𝒟 with the added condition that for
each symbol 𝑥 and each state 𝑞, |𝛿(𝑞, 𝑥)| = 1.
   For an automaton 𝒩 we define the equivalence relation ∼ ⊆ 𝑄 × 𝑄 as:

                                   𝑝 ∼ 𝑞 ⇐⇒ 𝐿(𝑝) = 𝐿(𝑞)

If 𝑝 ̸∼ 𝑞 we say they are distinguishable and if exactly one of the two is final we say they are
trivially distinguishable.
⟨︀ Given a NFA 𝒩 and     ⟩︀ an equivalence 𝜌 over its states, its quotient is defined as 𝒩 /𝜌 =
  𝑄/𝜌, Σ, [𝑞0 ], 𝛿𝜌 , 𝐹/𝜌 where 𝛿𝜌 ([𝑞], 𝑥) = {[𝑞 ′ ] | 𝑞 ′ ∈ 𝛿(𝑞, 𝑥)}. The Myhill-Nerode Theorem


                                                3
Christian Bianchini et al. CEUR Workshop Proceedings                                              1–13


[2] estabilishes that the quotient automaton 𝒟/ ∼ is well defined and is the unique (up to
isomorphism) minimal automaton recognizing 𝐿(𝒟).

Definition 1. Given a NFA, a bisimulation is a binary relation over its states such that, for every
pair (𝑝, 𝑞) ∈ 𝐵:
B1. 𝑝 ∈ 𝐹 ⇐⇒ 𝑞 ∈ 𝐹 ,

B2. ∀𝑥 ∈ Σ ∀𝑝′ ∈ 𝛿(𝑝, 𝑥) ∃𝑞 ′ ∈ 𝛿(𝑞, 𝑥) ∧ (𝑝′ , 𝑞 ′ ) ∈ 𝐵,

B3. ∀𝑥 ∈ Σ ∀𝑞 ′ ∈ 𝛿(𝑞, 𝑥) ∃𝑝′ ∈ 𝛿(𝑝, 𝑥) ∧ (𝑝′ , 𝑞 ′ ) ∈ 𝐵.

Two states 𝑝 and 𝑞 are said bisimilar if there exists a bisimulation which contains (𝑝, 𝑞). The set of
all bisimulations over the states of 𝒩 is denoted by B𝒩 .

   The union of two bisimulation is also a bisimulation. In fact, the following generalization of
this observation is a well-known result.

Lemma 1. B𝒩 is closed under union and has a unique largest bisimulation ℬ, which will be called
bisimilarity, it is an equivalence relation, and it relates all and only bisimilar states. ℬ ⊆ ∼ and if
𝒩 is deterministic, ℬ coincides with ∼.

  The above properties — combined with efficient algorithms to compute it — justify the use of
ℬ as an approximation of ∼ for nondeterministic automata.

2.3. Partition Aggregation
Given an automaton 𝒩 , our goal is to compute the bisimilarity relation ℬ over its set of states, so
that the resulting quotient 𝒩 /ℬ can be returned as the minimized version of 𝒩 . The partition-
aggregation strategy will compute an ascending chain 𝜄 ⊆ 𝐵1 ⊆ . . . ⊆ 𝐵𝑛 = ℬ of
bisimulation-equivalences.
   A partition-aggregation algorithm proceeds by a sequence of merging steps where at each
step 𝑖 the bisimulation 𝐵𝑖 is computed. Since each 𝐵𝑖 is a bisimulation, the minimization process
can be stopped at any step obtaining a language-equivalent automaton with no more states than
the input one; the minimization process can be resumed later from this intermediate automaton.
In this sense, the algorithm is incremental. This property is not shared with top down algorithms
that proceed by partition-refinement — such as Hopcroft’s Algorithm and its many successors —
where only the final result is a bisimulation.


3. The Algorithm Proposed by Almeida et al.
This section is devoted to a brief description of the algorithm proposed by Almeida et al. It
uses the union-find [14, 9] data structure to manage the partition of states, so that finding and
merging classes with the Find and Union primitives can be done in 𝒪(𝛼(𝑛)).
   Pairs of states are recursively considered until their equivalence is estabilished. Intermediate
results are cached, so that queries on pairs of states already found to be (non-)equivalent can be
answered in constant time.


                                                  4
Christian Bianchini et al. CEUR Workshop Proceedings                                                           1–13


Algorithm 1 Aggregation-based minimization by Almeida et al.
 1: function MinimizeAlmeida(𝑄, 𝛿, 𝐹 )                21: function Equiv(𝑝, 𝑞)
 2:    for all 𝑞 ∈ 𝑄 do                               22:    if (𝑝, 𝑞) ∈ 𝐸 then
 3:        Make(𝑞)                                    23:        return ⊥
 4:    𝐸 ← (𝐹 × 𝐹 c ) ∪ (𝐹 c × 𝐹 )                    24:    if (𝑝, 𝑞) ∈ 𝐻 then
 5:                                                   25:        return ⊤
 6:    for all (𝑝, 𝑞) ∈ 𝑄 × 𝑄 do                      26:
 7:        𝑓 𝑝 ← Find(𝑝)                              27:    𝐻 ← 𝐻 ∪ {(𝑝, 𝑞), (𝑞, 𝑝)}
 8:        𝑓 𝑞 ← Find(𝑞)                              28:    for all 𝑥 ∈ Σ do
 9:        if 𝑓 𝑝 ̸= 𝑓 𝑞 ∧ (𝑝, 𝑞) ̸∈ 𝐸 then           29:        (𝑝′ , 𝑞 ′ ) ← (Find(𝛿(𝑝, 𝑥)),Find(𝛿(𝑞, 𝑥)))
10:            𝐸←∅                                    30:        if 𝑝′ ̸= 𝑞 ′ ∧ (𝑝′ , 𝑞 ′ ) ̸∈ 𝐸 then
11:            𝐻←∅                                    31:             𝐸 ← 𝐸 ∪ {(𝑝′ , 𝑞 ′ ), (𝑞 ′ , 𝑝′ )}
12:            if Equiv(𝑝, 𝑞) then                    32:             if ¬Equiv(𝑝′ , 𝑞 ′ ) then
13:                 for all (𝑝′ , 𝑞 ′ ) ∈ 𝐸 do        33:                   return ⊥
14:                     Union(𝑝′ , 𝑞 ′ )              34:
15:            else                                   35:    𝐻 ← 𝐻 ∖ {(𝑝, 𝑞), (𝑞, 𝑝)}
16:                 𝐸 ←𝐸∪𝐻                            36:    𝐸 ← 𝐸 ∪ {(𝑝, 𝑞), (𝑞, 𝑝)}
17:                                                   37:    return ⊤
18:    𝒫 ← {ClassOf(𝑝) : 𝑝 ∈ 𝑄}                       38: end function
19:    return 𝒫
20: end function


   At lines 2–3, the identity relation is constructed and pairs of trivially non-equivalent states
are added to the memoization table 𝐸. In the main loop at lines 6–16, we iterate over all pairs
of states to check for equivalence. If a pair is either on the same class – i.e. is a pair of states
already found to be equivalent – or is in the memoization table – i.e. is a pair of distinguishable
states – the minimization continues to the next iteration. Otherwise, two empty collections 𝐸
and 𝐻 are prepared, respectively the set of wondering pairs of states and the history pairs. 𝐸
and 𝐻 are considered global variables and can be accessed from Equiv. The recursive function
Equiv is responsible for checking if states 𝑝, 𝑞 are equivalent and, if so, pairs in 𝐸 are merged.
Otherwise, all visited pairs are set to be distinguishable and this information is used to update
𝐸. At the end the partition in equivalence classes is returned.
   The underlying idea of Equiv(𝑝, 𝑞) is to recursively check the transitions from 𝑝 and 𝑞 on all
symbols. If two states are found to be cached as distinguishable, the recursion stops returning
⊥. If they are found to be in visit, it is useless to continue the visit and nothing can be said (i.e.
Equiv returns ⊤ postponing the decision to the upper-level of the recursion). These preliminary
checks are at lines 22–25. Next, each 𝑥-transition is checked recursively in the loop at 28–33,
stopping when the states are found to be distinguishable. At the end, if 𝑝 and 𝑞 are not found to
be distinguishable, the pair is removed from the history 𝐻 and added to the “wondering” pairs
𝐸 and ⊤ is returned.
   Detailed proof of the algorithm’s correctness can be found in [11].
   On the complexity analysis, the authors claim that the algorithm terminates in time
𝒪(𝑛2 |Σ|𝛼(𝑛)). This comes from the assumption that each pair visited during the recursion
of Equiv will be skipped on the subsequent iterations of MinimizeAlmeida (cf. [11, Lemma
4.9]). The assumption is wrong and a family of counterexamples can be constructed such that
MinimizeAlmeida terminates in time Θ(𝑛3 |Σ|𝛼(𝑛)) (see Fig 1).


                                                  5
Christian Bianchini et al. CEUR Workshop Proceedings                                                                  1–13


            start          1                               Figure 1: A small counterexample of [11, Lemma 4.9]:
               0                     1
                                                                     pair (6, 7) is visited twice. Consider Equiv(2, 3).
                                                                     Pair (6, 7) is visited a first time by reading 0.
        2                                    3                       Symbol 1 leads to (4, 5) which stops the recur-
                   1                             1
                                                                     sion without inserting (6, 7) in 𝐸. At some
    0
                       4
                               0/1       0
                                                       5             subsequent iteration, (6, 7) is visited again via
                                                                     Equiv(6, 7). By generalization of this automa-
                                                 0/1
        6
                           1
                                             7                       ton (𝐴1 ) automata 𝐴𝑛 of 8𝑛 − 1 states can be
                           1                                         provided such that 𝑛 pairs are visited (roughly)
        0                                    0                       𝑛2 times each.


4. Deterministic Case
In this Section we present our idea to correct the previously presented algorithm, discussing
its correctness and complexity. The main point is to run the recursive check on a richer data
structure, the associated graph introduced below, whereby the running time of the overall
algorithm is going to be 𝒪(𝑛2 |Σ|𝛼(𝑛)).

4.1. The Associated Graph
The reason why Algorithm 1 is not quadratic on some automata is the fact that whenever a pair
of distinguishable states is found the recursion stops losing reusable information gathered on
elements of 𝐸. A graph associated to the automaton clarifies how pairs of states evolve when
they are found to either be equivalent or distinguishable.

Definition 2. Given a DFA 𝒟 its associated graph 𝒢 = (𝑉, 𝐴) is defined as:

                                         𝑉 = 𝑄 × 𝑄,
                                         𝐴 = {⟨𝑝, 𝑞⟩ → ⟨𝛿(𝑝, 𝑥), 𝛿(𝑞, 𝑥)⟩ | 𝑝, 𝑞 ∈ 𝑄, 𝑥 ∈ Σ} .

⟨𝑝, 𝑞⟩ is distinguishable if 𝑝 and 𝑞 are distinguishable and equivalent otherwise.

   Coloring 𝒢 with distinguishable vertices black and equivalent vertices white, the problem of
computing ∼ can be seen as the problem of correctly coloring the associated graph.
   The algorithm by Almeida et al. can be described as follows: starts by coloring trivially dis-
tinguishable and equivalent vertices in black and white, respectively, and in grey the remaining
vertices. At each iteration of the main loop, it considers a grey vertex ⟨𝑝, 𝑞⟩ and starts a visit of
𝒢 from it. If the visit reaches a black vertex ⟨𝑝′ , 𝑞 ′ ⟩, the recursion stops and all vertices in the
path from ⟨𝑝, 𝑞⟩ to ⟨𝑝′ , 𝑞 ′ ⟩ (saved in 𝐻) are colored in black. Otherwise, if all paths lead either
to white or grey vertices all visited vertices are colored white. The main issue with Algorithm 1
is that when a black vertex is encountered all information of vertices in 𝐸 ∖ 𝐻 gets lost.

4.2. The Algorithm for Deterministic Automata
We now present the minimization algorithm based on the observations above.


                                                                        6
Christian Bianchini et al. CEUR Workshop Proceedings                                                            1–13


Algorithm 2 Proposed algorithm for deterministic automata.
 1: function MinimizeDfa(𝑄, Σ, 𝛿, 𝐹 )                   19: function Equiv(⟨𝑝, 𝑞⟩)                  ◁ ℋ and ℎ global
 2:    for all ⟨𝑝, 𝑞⟩ ∈ 𝑄 × 𝑄 do                        20:    if ⟨𝑝, 𝑞⟩ ∈ ℋ then
 3:        if 𝑝, 𝑞 are triv. distinguishable then       21:        return ⊤
 4:            Color(⟨𝑝, 𝑞⟩) ← Black                    22:    else if Color(⟨𝑝, 𝑞⟩) = Black then
 5:        else if 𝑝 = 𝑞 then                           23:        ℎ ← ⟨𝑝, 𝑞⟩
 6:            Color(⟨𝑝, 𝑞⟩) ← White                    24:        return ⊥
 7:        else                                         25:    else if Color(⟨𝑝, 𝑞⟩) = White then
 8:            Color(⟨𝑝, 𝑞⟩) ← Grey                     26:        return ⊤
 9:    for all ⟨𝑝, 𝑞⟩ ∈ 𝑄 × 𝑄 do                        27:    else                   ◁ here ⟨𝑝, 𝑞⟩ is Grey and fresh
10:        if Color(⟨𝑝, 𝑞⟩) = Grey then                 28:        Color(⟨𝑝, 𝑞⟩) ← White
11:            ℋ ← EmptyGraph                           29:
12:            𝑒𝑞 ← Equiv(⟨𝑝, 𝑞⟩)                       30:        for all 𝑥 ∈ Σ do                     ◁ in lex. order
13:            if ¬𝑒𝑞 then                              31:            ⟨𝑝𝑥 , 𝑞𝑥 ⟩ ← ⟨𝛿(𝑝, 𝑥), 𝛿(𝑞, 𝑥)⟩
14:                ℋ ← Reverse(ℋ)                       32:            AddArc(ℋ, ⟨𝑝, 𝑞⟩ , ⟨𝑝𝑥 , 𝑞𝑥 ⟩)
15:                Visit(ℋ, ℎ)                          33:            𝑒𝑞 ← Equiv(⟨𝑝𝑥 , 𝑞𝑥 ⟩)
16:            for all ⟨𝑝′ , 𝑞 ′ ⟩ ∈ WhiteV(ℋ) do       34:            if ¬𝑒𝑞 then return ⊥
17:                Union(𝑝′ , 𝑞 ′ )                     35:
                                                        36:        return ⊤
18: end function
                                                        37: end function


   The general structure of MinimizeDfa is the same as MinimizeAlmeida rewritten in terms of
colorings. The only difference is that instead of maintaining two sets 𝐸 and 𝐻 we maintain the
global variable ℋ which represents the visited portion of 𝒢. The idea is that when Equiv(⟨𝑝, 𝑞⟩)
returns to the main loop, after line 15, vertices in ℋ will be correctly colored, either in White
or Black. In Algorithm 1 this information was lost while in Algorithm 2 ℋ is used to determine
extra black vertices. Helper procedures Reverse and Visit perform, respectively, arc-reverse of
a graph and the Black-coloring of ℋ starting from the source vertex ℎ.
   Let us analyze the version of Equiv(⟨𝑝, 𝑞⟩) in Algorithm 2. At lines 20–27 some base cases
are checked. In particular, if ⟨𝑝, 𝑞⟩ is Black, then it is stored in the global variable ℎ and ⊥ is
returned. Otherwise, if ⟨𝑝, 𝑞⟩ is White we return ⊤ to continue the downstream inspection.
Finally, in case ⟨𝑝, 𝑞⟩ is Grey, it is colored White and at lines 30–34 the for loop tries to
continue the recursive visit by reading each symbol in lexicographic order. Before each recursive
call ℋ is updated by adding arc ⟨𝑝, 𝑞⟩ → ⟨𝑝𝑥 , 𝑞𝑥 ⟩ — we assume vertex ⟨𝑝𝑥 , 𝑞𝑥 ⟩ is added, if not
already present.

  Even though we do not give detailed proofs for space reasons, below we outline the main
arguments for complexity and correctness.
  As far as complexity is concerned it is clear that, summing over all the iterations of the main
loop, line 9, the associated graph is visited at most thrice: during the “forward” recursion pass
and, optionally, during Reverse and Visit. In fact, every vertex starts Grey and gets White
during the forward pass of an Equiv-call. Possibly, if the call returns ⊥, some White vertices
become Black. Altogether, considering the cost of maintaining the union-find data-structure,
we have the following: (notice that |𝒢| = 𝑛2 + 𝑛2 |Σ|)

Theorem 1. MinimizeDfa algorithm terminates in time 𝒪(𝑛2 |Σ|𝛼(𝑛)).

  To prove correctness we will show the following invariant at line 16: all vertices in ℋ are
correctly colored either Black or White — Lemma 2 below.


                                                    7
Christian Bianchini et al. CEUR Workshop Proceedings                                            1–13


   We start by showing some properties of the coloring performed by Equiv and Visit. First of
all, notice that, since we are in the deterministic case, for every 𝑢 = ⟨𝑝, 𝑞⟩ ∈ 𝑉 and 𝑤 ∈ Σ* there
is a unique path in 𝒢 starting from 𝑢 and spelling 𝑤: denote by 𝛿(𝑢, 𝑤) = ⟨𝛿(𝑝, 𝑤), 𝛿(𝑞, 𝑤)⟩
the last vertex of this path.
Definition 3. Given 𝑢 ∈ 𝑉 and 𝑤 ∈ Σ* , we say that 𝑤 for 𝑢 is:
       1. simple if 𝑢 ⇝ 𝛿(𝑢, 𝑤) in 𝒢 is simple,
       2. avalanche2 if it is simple and vertex 𝛿(𝑢, 𝑤) is Black.
If there exists 𝑤 avalanche for 𝑢, denote by av(𝑢) the lexicographically smallest such 𝑤 and name
𝑢 ⇝ 𝛿(𝑢, av(𝑢)) the avalanche path of 𝑢.
   If Equiv(𝑢0 ) is called in the main loop, line 12, the visit checks all simple words for 𝑢0 in
lexicographic order, either until all words are explored or — if it exists — until av(𝑢0 ) is found.
Any vertex that can reach the avalanche path should be colored in Black.
Proposition 1. Consider ℋ upon return of Equiv(𝑢0 ) at line 12. If there exists 𝑢 ∈ ℋ distin-
guishable, then av(𝑢0 ) exists. Furthermore, if all White vertices in 𝒢 ∖ ℋ are equivalent, then for
every 𝑢 ∈ ℋ, if 𝑢 is distinguishable we have that 𝑢 ⇝ 𝛿(𝑢0 , av(𝑢0 )) exists in ℋ.
Proof. Suppose 𝑢 ∈ ℋ is distinguishable. First of all, we prove that there exists 𝑤 ∈ Σ* such
that 𝛿(𝑢0 , 𝑤) is Black — recall av(𝑢0 ) is the lexicographically smallest such 𝑤. Since 𝑢 ∈ ℋ,
there exists 𝑤0 such that 𝑢 = 𝛿(𝑢0 , 𝑤0 ). Since 𝑢 is distinguishable, there exists 𝑤1 such that
𝛿(𝑢, 𝑤1 ) is trivially distinguishable (i.e. Black from the start). Thus, 𝛿(𝑢0 , 𝑤0 𝑤1 ) is Black.
   Let ℎ = 𝛿(𝑢0 , av(𝑢0 )), 𝑢 ∈ ℋ distinguishable and 𝜋 be the 𝒢-path leading 𝑢 to some trivially
distinguishable 𝑣. We claim 𝜋 must cross the avalanche path of 𝑢0 (call it 𝛼). Suppose not. Since
ℎ is the only Black vertex in ℋ, 𝑣 ∈ / ℋ. Hence, 𝜋 must traverse some arc 𝑢′ → 𝑢′′ with 𝑢′ ∈ ℋ
and 𝑢 ∈′′ / ℋ. By assumption on 𝜋 and construction of Equiv it follows that 𝑢′ ∈       / 𝛼 and 𝑢′′ is
White. Since 𝜋 leads 𝑢′′ to 𝑣 it follows that 𝑢′′ is distinguishable contradicting the hypothesis
on White vertices in 𝒢 ∖ ℋ.
Lemma 2. The following hold at the end of each iteration of loop 9–17:
D1. {⟨𝑝, 𝑞⟩ | Color(⟨𝑝, 𝑞⟩) = Black} ∩ ∼ = ∅,
D2. {⟨𝑝, 𝑞⟩ | Color(⟨𝑝, 𝑞⟩) = White} ⊆ ∼.
Proof. Before entering the the loop both properties hold by initialization.
D1. Assume (D1) and (D2) hold before Equiv(𝑢0 ). It is sufficient to prove that at the end of
     the iteration for every ⟨𝑝, 𝑞⟩ ∈ ℋ we have that ⟨𝑝, 𝑞⟩ is Black if and only if ⟨𝑝, 𝑞⟩ is
     distinguishable.
          (→) If 𝑢 = ⟨𝑝, 𝑞⟩ ∈ ℋ is Black, then it must have been colored by Visit. Therefore,
          𝑒𝑞 = ⊥ and before Reverse there was 𝑢 ⇝ ℎ in ℋ. Since ℎ was Black before the
          Equiv-call, by (D1) it follows ℎ distinguishable. Thus, ⟨𝑝, 𝑞⟩ is distinguishable.
          (←) If 𝑢 = ⟨𝑝, 𝑞⟩ ∈ ℋ is distinguishable, then by Prop. 1 it follows that after Reverse
          and Visit pair ⟨𝑝, 𝑞⟩ has been colored in Black.
2
    The word “catastrophically" leads to the Black vertex.


                                                             8
Christian Bianchini et al. CEUR Workshop Proceedings                                           1–13


D2. It follows from (D1) and the fact that all vertices in ℋ are either Black or White.


Theorem 2. Algorithm 2 is correct and incremental.

Proof. Both correctness and incrementality follow from Lemma 2 and the fact that — upon
termination — all vertices of 𝒢 are either Black or White. In particular, (D2) of Lemma 2
proves that Union is always correct.


5. Nondeterministic Case
Algorithm 2 is not directly applicable to the nondeterministic case, the reason being that reaching
a pair of non-bisimilar states — i.e. sufficient condition to color in Black a node by Algorithm 2
— is not a sufficient condition now to declare a pair of states distinguishable.
   To tackle this issue we first turn the associated graph into a bipartite graph. In the definition
below, for each state 𝑝 we introduce the shadow state 𝑝 as a distinct copy of the real 𝑝.

Definition 4. Let 𝒩 be a complete NFA. The associated graph 𝒢(𝒩 ) is a bipartite directed graph
with vertices 𝑉0 ∪ 𝑉1 and arcs 𝐴0 ∪ 𝐴1 , defined as:

        𝑉0 = 𝑄 × 𝑄,
        𝑉1 = {⟨𝑝, 𝑞, 𝑥⟩ , ⟨𝑝, 𝑞, 𝑥⟩ | 𝑝, 𝑞 ∈ 𝑄, 𝑥 ∈ Σ} ,
        𝐴0 = ⟨𝑝, 𝑞⟩ → 𝑝′ , 𝑞, 𝑥 , ⟨𝑝, 𝑞⟩ → 𝑝, 𝑞 ′ , 𝑥 | 𝑝′ ∈ 𝛿(𝑝, 𝑥), 𝑞 ′ ∈ 𝛿(𝑞, 𝑥) ,
             {︀           ⟨︀      ⟩︀           ⟨︀      ⟩︀                          }︀

        𝐴1 = 𝑝′ , 𝑞, 𝑥 → 𝑝′ , 𝑞 ′ , 𝑝, 𝑞 ′ , 𝑥 → 𝑝′ , 𝑞 ′ | 𝑝′ ∈ 𝛿(𝑝, 𝑥), 𝑞 ′ ∈ 𝛿(𝑞, 𝑥) .
             {︀⟨︀      ⟩︀     ⟨︀     ⟩︀ ⟨︀      ⟩︀  ⟨︀    ⟩︀                           }︀


⟨𝑝, 𝑞⟩ in “left" 𝑉0 will be called equivalent or distinguishable as in Def. 2.

   The bisimilarity between 𝑝 and 𝑞 (Def. 1) can be checked in two steps: 0) choose 𝑥 ∈ Σ and
𝑝′ ∈ 𝛿(𝑝, 𝑥), and 1) respond with suitable 𝑞 ′ ∈ 𝛿(𝑞, 𝑥). The idea is to mimic step 0) traversing
arcs of 𝐴0 and step 1) traversing arcs of 𝐴1 . The triplet ⟨𝑝′ , 𝑞, 𝑥⟩ ∈ 𝑉1 indicates that we have
chosen symbol 𝑥, state 𝑝′ ∈ 𝛿(𝑝, 𝑥), and we are expecting to respond with some 𝑞 ′ ∈ 𝛿(𝑞, 𝑥) (𝑞
provides the information on the state that must respond). 𝐴1 arcs do something similar.
   Then we need to tune up nondeterminism and Black coloring of vertices in 𝒢. Observe that
𝑢 ∈ 𝑉0 needs only one Black child to be colored in Black, while it needs all children White to
be colored in White. Dually, 𝑢 ∈ 𝑉1 behaves the same but with reversed colors. A check will
be performed using the variable Doubts(𝑢) which, roughly speaking, counts how many Black
neighbours are needed to mark 𝑢 as Black.

5.1. The Algorithm for Nondeterministic Automata
We present Algorithm 3 for the nondeterministic case, whose essential ingredients are those of
Algorithm 2. Details are left to the reader.
  First of all, notice that we are actually dealing with four colors: ⊥ (never been explored),
Grey (in visit), Black (distinguishable) and White (equivalent).


                                                  9
Christian Bianchini et al. CEUR Workshop Proceedings                                                                        1–13


Algorithm 3 Proposed algorithm for nondeterministic automata, adapted from DFA case.
 1: {⊥, Black, White, Grey} ← {−1, 0, 1, 2}                              29: function Equiv(𝑢, 𝑠)                    ◁ ℋ is global
 2:                                                                      30:    if Color(𝑢) ̸= ⊥ then
 3: function MinimizeNfa(𝑄, Σ, 𝛿, 𝐹 )                                    31:        return Color(𝑢)
 4:    for all ⟨𝑝, 𝑞⟩ ∈ 𝑄 × 𝑄 do                                         32:
 5:        if 𝑝, 𝑞 are triv. distinguishable then                        33:    Color(𝑢) ← Grey
 6:            Color(⟨𝑝, 𝑞⟩) ← Black                                     34:    Doubts(𝑢) ← 0
 7:        else if 𝑝 = 𝑞 then                                            35:
 8:            Color(⟨𝑝, 𝑞⟩) ← White                                     36:    for 𝑣 ∈ Adj(𝒢, 𝑢) ∧ Color(𝑢) ̸= 𝑠 do
 9:        else                                                          37:        𝑐𝑜𝑙 ← Equiv(𝑣, 1 − 𝑠)
10:            Color(⟨𝑝, 𝑞⟩) ← ⊥                                         38:        if 𝑐𝑜𝑙 = 𝑠 then
11:    for all ⟨𝑝, 𝑞⟩ ∈ 𝑄 × 𝑄 do                                         39:            Color(𝑢) ← 𝑐𝑜𝑙
12:        ℋ ← EmptyGraph                                                40:            Doubts(𝑢) ← 0
13:        Equiv(⟨𝑝, 𝑞⟩ , 0)                                             41:        else if 𝑐𝑜𝑙 = Grey then
14:        for all 𝑢 ∈ 𝑉0 ∩ ℋ do                                         42:            AddArc(ℋ, 𝑣, 𝑢)
15:            ⟨𝑝′ , 𝑞 ′ ⟩ ← 𝑢                                           43:            Doubts(𝑢) ← Doubts(𝑢) + 1
16:            if Color(𝑢) ̸= Black then                                 44:
17:                                 ◁ 𝑢 is either Grey or White          45:    if Color(𝑢) = Grey then
18:                 Color(𝑢) ← White                                     46:        if Doubts(𝑢) = 0 then
19:                 Union(𝑝′ , 𝑞 ′ )                                     47:            Color(𝑢) ← 1 − 𝑠
20: end function                                                         48:        else if 𝑠 = 0 then
21:                                                                      49:            Doubts(𝑢) ← 1
22: procedure Relax(𝑣)                              ◁ ℋ is global        50:
23:    for 𝑢 ∈ Adj(ℋ, 𝑣) do                                              51:    if Color(𝑢) = Black then
24:        Doubts(𝑢) ← Doubts(𝑢) − 1                                     52:        Relax(𝑢)
25:        if Doubts(𝑢) = 0 then                                         53:    return Color(𝑢)
26:            Color(𝑢) ← Black
                                                                         54: end function
27:            Relax(𝑢)
28: end procedure


   The procedure MinimizeNfa is structurally the same as MinimizeDfa, the twist being the
usage and maintenance of ℋ. Function Equiv takes two inputs: the current vertex 𝑢 and
the “side" 𝑠 ∈ {0, 1} of the bipartition. If 𝑢 has already been encountered we return its color.
Otherwise, it is colored in Grey with zero Doubts. At lines 36–43 each successor 𝑣 of 𝑢 is
recursively visited. In case 𝑠 = 0 (𝑢 ∈ 𝑉0 ) if 𝑣 is recursively found Black, then 𝑢 can be safely
marked Black. Otherwise, there is not enough information to safely assign a Black/White
color to 𝑢. In particular, if 𝑣 is Grey we add arc 𝑣 → 𝑢 to ℋ (notice that it is reversed w.r.t. the
transition) and we increment Doubts(𝑢) — Blackness of 𝑢 depends on the (possible) future
Blackness of 𝑣. Case 𝑠 = 1 is dual.
   Upon leaving the loop at lines 45–49, if 𝑢 is still Grey we consider two cases: if there are no
doubts (i.e. Doubts(𝑢) = 0) each of its successors has the same color (either Black or White)
and this can be safely assigned to 𝑢; otherwise, in case 𝑢 ∈ 𝑉0 we set Doubts(𝑢) = 1.
   Proceeding at lines 51–53, if 𝑢 is made Black this information is propagated (Relaxed) to its
neighbours in ℋ. Notice that in this case we explicitly define the procedure Relax: in Algorithm
2 the corresponding procedure Visit’s purpose, was to color in Black all vertices reachable
from some distinguishable vertex. In Algorithm 3 we also need to consider the doubts of each
vertex 𝑣 by coloring in Black only non-doubtful vertices.
   The complexity is again easy to be determined. Observe that in case the automaton has 𝑛
states and 𝑚 transitions, |𝒢| ≤ 7𝑛𝑚. Equiv performs the equivalent of a visit of 𝒢 and Relax
visits every arc of ℋ at most once. Hence, their cost over all the execution of Algorithm 3 is


                                                                    10
Christian Bianchini et al. CEUR Workshop Proceedings                                            1–13


bounded by the size of 𝒢. Considering the cost of maintaining the union-find data structures
and recalling that 𝑟 is the degree of nondeterminism, we have the following:

Theorem 3. MinimizeNfa terminates in time 𝒪(𝑛𝑚𝛼(𝑛)) ⊆ 𝒪(𝑛2 |Σ|𝑟𝛼(𝑛)).

  Let us briefly discuss correctness.

Proposition 2. Let 𝑝 ̸= 𝑞 and 𝑢 = ⟨𝑝, 𝑞⟩ be White. Then, for every arc 𝑢 → 𝑣 ∈ 𝐴0 , vertex 𝑣 is
either Grey or White.

Proof. Since 𝑝 ̸= 𝑞, there are only two places at which 𝑢 = ⟨𝑝, 𝑞⟩ changed from Grey to White:

Line 18. In this case, upon termination of Equiv(𝑢, 0) the pair is Grey and Doubts(𝑢) = 1
      (line 49). In the loop of its Equiv-call every neighbour was recursively found to be either
      Grey or White — or 𝑢 would have been colored in Black. Finally, it cannot be the case
      that some Grey neighbour 𝑣 became Black afterward: Relax(𝑣) would have colored 𝑢
      in Black by setting Doubts(𝑢) = 0.
Line 47. In this case, inside the loop of Equiv(𝑢, 0) none of 𝑢’s neighbours were found to be
      either Black or Grey. Thus, they must all be White.

Therefore, every neighbour of 𝑢 is either Grey or White.

Proposition 3. At line 19 (end of the main loop), for every 𝑣 ∈ 𝑉1 if 𝑣 is either Grey or White,
then there exists 𝑣 → 𝑢′ ∈ 𝐴1 such that 𝑢′ is White.

Proof. If 𝑣 is White, then it must have been colored at line 39 after finding a White neighbour.
If 𝑣 is Grey, upon termination of Equiv at line 13 some of its neighbours must have been found
to be Grey. Since every Grey left node is colored in White before the end of the iteration, it
follows again that 𝑣 has some White neighbour.

Lemma 3. The following hold at line 19 (end of the main loop):

N1. ℛ𝑊 = {⟨𝑝, 𝑞⟩ | Color(⟨𝑝, 𝑞⟩) = White} ⊆ ℬ,

N2. ℛ𝐵 = {⟨𝑝, 𝑞⟩ | Color(⟨𝑝, 𝑞⟩) = Black} ∩ ℬ = ∅.

Proof.

N1. By Lemma 1 it is sufficient to prove that ℛ𝑊 is a bisimulation.
         First, notice that pairs violating (B1) are colored in Black from the start. Consider
         ⟨𝑝, 𝑞⟩ ∈ ℛ𝑊 . If 𝑝 = 𝑞, (B2) and (B3) trivially hold. Otherwise, let 𝑥 ∈ Σ and 𝑝′ ∈ 𝛿(𝑝, 𝑥).
         From Prop. 2 it follows that 𝑣 = ⟨𝑝′ , 𝑞, 𝑥⟩ is either Grey or White. From Prop. 3 it
         follows that some neighbour 𝑢′ of 𝑣 is in ℛ𝑊 . By Def. 4 we have 𝑢′ = ⟨𝑝′ , 𝑞 ′ ⟩ for some
         𝑞 ′ ∈ 𝛿(𝑞, 𝑥). Thus, (B2) holds for ⟨𝑝, 𝑞⟩. The very same argument can be used to prove
         that (B3) holds for ⟨𝑝, 𝑞⟩. Hence, ℛ𝑊 is a bisimulation.

N2. The result follows from (N1) and the fact that vertices in 𝑉0 ∩ ℋ are either Black or White.


                                                  11
Christian Bianchini et al. CEUR Workshop Proceedings                                           1–13


Theorem 4. MinimizeNfa is correct and incremental.

Proof. The inclusion ℛ𝑊 ⊆ ℬ is (N1). For the converse inclusion notice that it can be easily
checked that, upon termination, every pair ⟨𝑝, 𝑞⟩ ∈ 𝑉0 is either Black or White. Therefore,

                  ℬ = 𝑉0 ∩ ℬ = (ℛ𝐵 ∪ ℛ𝑊 ) ∩ ℬ = ∅ ∪ (ℛ𝑊 ∩ ℬ) ⊆ ℛ𝑊 .

Incrementality follows again from (N1).


6. Conclusions
Bisimilarity is a fundamental (equivalence) relation among the states of finite automata, finding
applications and variants in a number of different areas. Algorithms for computing bisimilarity
are a classic and can be subdivided in two categories: top-down and a bottom-up. The former
(partition refinement) approach starts with a coarse partition and refines it until the result is
produced, while the latter (partition aggregation) starts from a singleton-classes equivalence
relation and merges classes until possible.
   Although algorithms belonging to the bottom-up category are, to the best of our knowledge,
still currently asymptotically slower than their alternative ones, aggregation based techniques
enjoy the property of being incremental: automata resulting at intermediate stages of the
computation are partially minimized yet language-equivalent to the input one.
   Moreover, partition aggregation algorithms, even though less celebrated than partition
refinement ones — introduced by Hopcroft and generalized by Paige and Tarjan —, are interesting
(at least) for two reasons. The first is theoretical: if two methods computes the same relation
(just one from “above" and the other from “below"), why is there a complexity gap? Is there some
(hidden) cost involved in maintaining incrementality? The second is practical: some applicative
contexts can greatly benefit from having a partially minimized equivalent automaton, especially
when, as alternative, long sequences of refinement steps are involved.
   In this work, while fixing a minor mistake in the algorithm by Almeida et al., we reduced
bisimilarity computation to a coloring problem on an associated graph. We then extended the
algorithm to nondeterministic case, obtaining a complexity improvement on the best known
bound for this case. The time complexity of both algorithms carry the 𝛼(𝑛) factor which we
aim to shave off as future work. As a further line of research, it will be interesting to investigate
the effect of applying the technique introduced here to the color refinement algorithm (a.k.a.
Weisfeiler-Leman-1 algorithm, see [15]), currently implemented using an algorithm by Cardon
and Crochemore (see [16]) belonging to the top-down/partition-refinement category.


References
 [1] E. F. Moore, Gedanken-Experiments on Sequential Machines, Princeton University Press,
     1956, pp. 129–154. doi:10.1515/9781400882618-006.
 [2] J. E. Hopcroft, R. Motwani, J. D. Ullman, Pearson Education, 2014.
 [3] A. R. Meyer, L. J. Stockmeyer, The equivalence problem for regular expressions with
     squaring requires exponential space, in: SWAT, 1972.


                                                 12
Christian Bianchini et al. CEUR Workshop Proceedings                                           1–13


 [4] R. Paige, R. E. Tarjan, Three partition refinement algorithms, SIAM J. Comput. 16 (1987)
     973–989.
 [5] J. E. Hopcroft, An n log n algorithm for minimizing states in a finite automaton, 1971.
 [6] J. F. Groote, J. Martens, E. P. de Vink, Lowerbounds for bisimulation by partition refinement
     (2022). URL: https://arxiv.org/abs/2203.07158.
 [7] B. W. Watson, Taxonomies and toolkits of regular language algorithms, 1995.
 [8] B. W. Watson, J. Daciuk, An efficient incremental dfa minimization algorithm, Natural
     Language Engineering 9 (2003) 49 – 64.
 [9] R. E. Tarjan, Efficiency of a good but not linear set union algorithm, J. ACM 22 (1975)
     215–225. doi:10.1145/321879.321884.
[10] J. E. Hopcroft, A linear algorithm for testing equivalence of finite automata, volume 114,
     Defense Technical Information Center, 1971.
[11] M. Almeida, N. Moreira, R. Reis, Incremental dfa minimisation, RAIRO - Theoretical
     Informatics and Applications 48 (2014) 173–186. doi:10.1051/ita/2013045.
[12] J. Björklund, L. G. Cleophas, Minimization of finite state automata through partition
     aggregation, in: LATA, 2017.
[13] J. Björklund, L. Cleophas, Aggregation-based minimization of finite state automata, 2020.
     doi:10.1007/s00236-019-00363-5.
[14] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to algorithms, 3rd edition,
     2009, pp. 170–173.
[15] M. Grohe, K. Kersting, M. Mladenov, P. Schweitzer, Color refinement and its applications,
     2021.
[16] A. Cardon, M. Crochemore, Partitioning a graph in o(|a| log2 |v|), Theor. Comput. Sci. 19
     (1982) 85–98. doi:10.1016/0304-3975(82)90016-0.


                                                13