1. Introduction

October

Extracting Weighted Finite Automata from RNNs via iterative partitioning and spectral learning

Sandamali Yashodhara Wickramasinghe

Jacob M. Howe

Laure Daviaud

1 0 City St George's, University of London , Northampton Square, London EC1V 0HB , UK 1 University of East Anglia , Research Park, Norwich NR4 7TJ , UK

2025

26 2025 0000 0002

This paper proposes an algorithm to extract a Weighted Finite Automaton (WFA) from a Recurrent Neural Network (RNN), by adapting the * algorithm in combination with spectral learning using Singular Value Decomposition (SVD). The method introduces a discrete abstraction of the RNN's hidden state space to enable approximate equivalence queries within the * framework. Through these queries, counterexamples are identified and used to iteratively refine the prefix and sufix sets that define the Hankel matrix, thereby improving the accuracy of successive WFA hypotheses. This work lays the groundwork for integrating discrete clustering with spectral learning in the context of WFA extraction from RNNs, an area that remains underexplored, with potential applications in sequence modelling and analysis. Future work will focus on implementing and empirically validating the proposed approach.

eol>Recurrent Neural Networks Weighted Finite Automata Explainable AI Interpretability

1. Introduction

This research explores the extraction of Weighted Finite Automata from Recurrent Neural Networks. Recurrent Neural Networks (RNNs) are often referred to as black boxes due to their lack of interpretability. Understanding the behaviour of RNNs is important to enhance their interpretability and stability, and this understanding can be achieved by extracting symbolic models such as automata. Most research on extracting knowledge from RNNs has focused on Deterministic Finite Automata (DFA).

DFA extraction is suitable for RNNs trained as binary classifiers (producing Boolean outputs). In contrast, for RNNs with real-valued outputs, such as probabilities, extracting Weighted Finite Automata is more appropriate [ 1 ].

One approach to extract automata from RNNs is through compositional techniques, where a WFA is derived by dividing the RNN state space into clusters and identifying the transitions and weight matrices to construct the automaton [ 2, 3 ]. Another approach involves exact learning methods, such as the * algorithm [ 4 ], where an automaton is extracted by posing queries about the language learned by the RNN. Angluin [ 4 ] introduced this * algorithm in 1987 for inferring a DFA from an unknown target system, which was later extended to extract DFAs from RNNs [ 5 ]. In the * algorithm, two types of queries are employed in the extraction process: membership queries and equivalence queries. A membership query asks whether a specific string belongs to the target language, while an equivalence query checks whether the language recognised by the extracted automaton is equivalent to that of the target.

The work in [ 5 ] introduced a novel approach for extracting a DFA from an RNN using the * algorithm and demonstrated strong performance in accurately representing the language learned by the RNN. Later on, the work was extended to extract PDFAs (Probabilistic Deterministic Finite Automata) from RNNs [ 6 ]. Even though the * algorithm was designed to extract a DFA, later on, [ 7 ] proposed a variation of it for WFA extraction, namely the spectral learning algorithm [ 8 ]. Building upon this, [9] conducted a study where a WFA is extracted without requiring access to the internal mechanisms of the RNN or the original training data. The work presented in [10] also used spectral learning to extract WFAs from second-order RNNs.

The use of exact learning methods for extracting WFAs from RNNs remains relatively underexplored. While membership queries are straightforward in this setting, the main challenge lies in performing equivalence queries, as a formal equivalence between a WFA and an RNN is undecidable [11].

In [ 5 ], this was addressed by using an abstraction of the RNN’s hidden state space while doing the equivalence queries. By clustering similar hidden states into discrete regions, a finite-state abstraction can be constructed and compared against the behaviour of the extracted automaton. This method enables an approximate but feasible equivalence check. The same abstraction-based approach was later extended to WFA extraction in the study presented by [ 1 ].

Our proposed contributions are twofold: first, we aim to integrate spectral learning with Singular Value Decomposition (SVD) to perform membership queries in the extraction of a WFA from an RNN using the * algorithm; second, we propose addressing the challenge of equivalence queries by iteratively constructing a discrete abstraction of the RNN state space to enable efective comparison. This framework builds upon the foundational work of [ 8 ] and extends the ideas explored in [ 5 ] toward the extraction of WFAs.

In contrast to prior approaches such as [9, 12], which construct WFAs using spectral learning over fixed sets of prefixes and sufixes, we propose an iterative spectral learning framework. This approach incrementally expands the set of prefixes and sufixes based on the learning process, allowing for dynamic refinement until equivalence is achieved, potentially improving both adaptability and eficiency.

Additionally, we propose incorporating a discrete abstraction network during equivalence queries by incrementally partitioning the RNN hidden state space. This builds upon the preliminary ideas in [ 5 ] and can be further refined using advanced partitioning strategies, including hyperplane-based segmentation [13], k-means clustering [ 2 ], and the K-DCP method introduced in [ 3 ].

While the use of the weighted * algorithm for WFA extraction has been explored in [ 1 ], its integration with spectral learning [ 8 ] with a discrete state abstraction remains under-explored. Furthermore, the iterative construction of an abstraction network using the aforementioned partitioning techniques has not been applied in this context. Our proposed framework aims to bridge these gaps and provide a new direction for extracting WFAs from RNNs.

2. Related work

The association between automata and RNNs has been studied in the context of both binary classification problems [ 14, 5 ] and quantitative tasks [ 9, 10, 6, 1, 3, 2, 12 ]. The earliest works, such as [15, 16], laid the foundation for understanding the relationship between automata and RNNs. In 1996, Omlin and Giles [17] demonstrated that the state space of an RNN trained on a regular grammar tends to cluster in a way that resembles the structure of a finite automaton. Subsequent studies continued to explore this line of research, including [18, 14], further validating the connection between RNN dynamics and automata representations.

The work presented in [ 5 ] is considered state-of-the-art in DFA extraction and employs the * algorithm, an exact learning approach. This method was extended to quantitative RNNs in a probabilistic setting in [ 6 ], where probabilistic deterministic finite automata (PDFAs) were extracted. Subsequently, [ 1 ] further advanced this line of research by extracting WFAs using a regression-based method to perform equivalence checks.

In [ 7 ], a weighted variation of the * algorithm was introduced specifically for WFA extraction with the use of spectral learning for WFA introduced in [ 8 ]. Unlike the original * algorithm by [ 4 ], which constructs an observation table through membership queries to a teacher (i.e., the target system), this variation replaces the observation table with a Hankel matrix populated by real-valued entries. A WFA is then derived from this matrix using spectral learning techniques, with rank factorisation and Singular Value Decomposition (SVD) being core components of the process. Building on [ 8 ], [10] demonstrated that second-order RNNs with linear activation functions are natural extensions of, and computationally equivalent to, WFAs, establishing a fundamental connection between the two models.

The works in [9] and [12] focus on extracting WFAs from RNNs producing real-valued outputs over sequential data without accessing internal states of the black-box models. These studies adopt the spectral learning framework introduced in [ 8 ], treating the RNN as an oracle and analysing its outputs to infer the automaton. However, they do not incorporate equivalence queries in the learning process.

A compositional approach is presented in [ 3 ], which diverges from the exact learning approach. Their Decision Guided Extraction algorithm leverages the target RNN’s decision-making process and contextual information to guide the extraction. This method has shown eficiency in larger-scale tasks, addressing scalability challenges in approximating RNN behaviour. Similarly, [ 2 ] employed a compositional strategy using k-means clustering to extract WFAs from RNNs trained on natural language data.

While exact learning methods ofer strong accuracy guarantees, they often face scalability and computational complexity limitations. In contrast, compositional approaches, though potentially less precise, provide more scalable solutions. Therefore, combining these methods could yield a more efective framework for automata extraction from RNNs, enhancing both accuracy and scalability [ 1, 6, 5 ].

3. Background

This section introduces the technical notation that will be used throughout the paper. A finite alphabet (a finite set of elements) Σ is fixed, and the set of all finite strings (i.e. a finite sequence of elements) over Σ is denoted by Σ * . The empty string of length 0 is denoted by . The length of a string ∈ Σ * is denoted by ||. Given two strings and , the concatenation of the two strings is = . In that case, is called a prefix of and a sufix of . A set of strings is said to be prefix closed if for all strings ∈ , all the prefixes of are also in .

Definition 1. Hankel matrix [ 8 ]: Let be a function : Σ * → R. The Hankel matrix of is a bi-infinite matrix ℋ ∈ RΣ* × Σ* , whose entries are defined as ℋ(, ) = () for any , ∈ Σ * .

Rows of a Hankel matrix can thus be seen as indexed by prefixes and columns by sufixes. Only the ifnite sub-blocks of ℋ are of interest, and a sub-block can be defined using a basis ℬ = (, ), where , ⊆ Σ * . The sub-block of ℋ defined by ℬ is the sub-matrix of ℋℬ restricted to the rows indexed by , and columns indexed by . We will mainly be interested in basis ℬ = (, ) that are prefix closed that is to say , ⊆ Σ * are prefix-closed and complete, meaning that the rank of the sub-block ℋℬ equals the rank of ℋ.

Definition 2. Weighted Finite Automaton (WFA): A WFA is a tuple ⟨Σ , , 0, ∞, { } ∈Σ⟩, where Σ is the finite alphabet, is the finite set of states, 0 and ∞ are vectors of size || called the initial and ifnal vectors, and is the transition matrix corresponding to ∈ Σ with size || × | | and real-valued entries. A WFA also can be represented linearly as a triplet ⟨ 0, ( ) ∈Σ, ∞⟩ [ 8, 12 ].

The weight of a string = 1 2 . . . in a WFA is computed as () = 0⊤ ∞ = 0⊤ 1 2 ... ∞ .

Definition 3. Recurrent Neural Network (RNN): A Recurrent Neural Network is defined as a tuple = (ℎ0, , ), where ℎ0 ∈ R is the initial hidden state with ∈ N, : R × Σ → R is the transition function, and : R → R is the output function mapping a hidden state to a real value. For an input string ∈ Σ * , the RNN produces a real-valued output, defining a function : Σ * → R that maps strings to real values.

4. Extracting WFAs from RNNs

This section describes the proposed algorithm for extracting a WFA from an RNN using the * algorithm (see Algorithm 1).

4.1. * Algorithm for WFA learning from an RNN

Angluin’s * algorithm learns a minimal automaton using a minimally adequate teacher, a theoretical construct capable of answering queries about an unknown target system. The algorithm relies on two types of queries: membership queries and equivalence queries. In the context of learning a WFA from a target function : Σ * → R, the learner interacts with the teacher as follows [ 7 ]:

In membership queries, the learner queries the teacher for the value () of a specific string , and records the result. Based on the collected information, the learner constructs a hypothesised WFA . Equivalence queries are then used to verify whether the hypothesised WFA correctly computes the target function . If it does not, the teacher provides a counterexample– a string on which the behaviour of difers from (i.e. () ̸= ()). The learner uses this counterexample to refine the hypothesis.

In this setting, the RNN is a model that produces real-valued outputs and serves as the teacher, meaning that both membership and equivalence queries are directed to the RNN. The responses to membership queries are recorded in a Hankel matrix (see Definition 1), which can then be used to derive a minimal WFA through SVD-based rank factorisation (as described in Section 4.2).

4.2. Hankel matrix to WFA

Theorem 1. Let : Σ * → R be a function over Σ * . Then, is computable by a WFA if and only if the rank of its Hankel matrix is finite. In that case, this rank is equal to the minimum number of states of any WFA computing [19, 20].

As shown in [ 8 ], a WFA can be constructed from the Hankel matrix ℋ of a function : Σ * → R using a sub-block ℋℬ ∈ R× , defined by a basis ℬ = (, ) with || = prefixes and || = sufixes, where is prefix-closed and ℬ spans the row and column spaces of ℋ (see Algorithm 1 lines 2–5).

For a sub-block ℋℬ of rank , a minimal WFA is computed via SVD, where ℋℬ = Λ ⊤, with ∈ R× and ∈ R× orthogonal, and Λ ∈ R× a diagonal matrix containing singular values of ℋℬ, yielding a rank factorization ℋℬ = with = Λ ∈ R× and = ⊤ ∈ R× .

Additionally, let ℋ ∈ R× be the sub-block defined by ℋ (, ) = ℋ(, ) for ∈ , ∈ , and ∈ Σ , let ℎ, ∈ R be the vector with coordinates ℎ, () = ℋ(, ), and let ℎ, ∈ R be the vector with coordinates ℎ, () = ℋ(, ). The WFA = ⟨ 0, ( ) ∈Σ, ∞⟩ is then defined with initial weights 0⊤ = ℎ⊤ +, final weights ∞ = +ℎ, and transition matrices = +ℋ +, , where + and + is the Moore–Penrose pseudo inverse of and respectively [ 8 ].

The prefixes and sufixes required to construct the basis can be derived from strings generated over the alphabet on which the RNN was trained. These sets can be generated in lexicographical order, expanding systematically, or sampled from a uniform distribution over the alphabet [12].

4.3. Equivalence queries and the abstraction of the state space

When using the * algorithm for extracting a WFA from an RNN, a key challenge is performing equivalence queries due to the difering natures of RNNs (continuous state space in R) and WFAs (finite discrete states), which complicates establishing exact equivalence. To address this, a discrete abstraction of the RNN’s hidden state space can be used to compare its behaviour with the WFA during the extraction process.

Definition 4. Suppose is a hypothesised abstraction of the RNN state space. Given an RNN with state space R, a partitioning function : R → maps each hidden state to a discrete state in the set = {1, 2, . . . , | |}, where | | is the number of states (clusters) in the abstraction . The abstraction represents a partitioning of the RNN state space, where each state ∈ corresponds to a cluster of hidden states.

Equivalence queries perform two tasks: (1) return a counterexample string ∈ Σ * if the WFA is incorrect (i.e., () ̸= ()), and add its prefixes to and sufixes to to refine the basis ℬ = (, ) (see Algorithm 1, lines 7–8); (2) refine the abstraction by partitioning R into finer clusters (see Algorithm 1, lines 10–11).

To construct the abstraction , the RNN’s state space R can be iteratively partitioned into clusters during equivalence queries (see Definition 4). Initially, the entire state space is treated as a single cluster. If () ̸= (), for a string , while performing equivalence queries, the abstraction should be refined, and clustering methods can be applied.

The refinement of the abstraction could be realised using these compositional methods, such as decision-guided extraction using stepwise predictions [ 3 ] and -means based clustering of hidden states [ 2 ] or hyperplane-based partitioning [13] during equivalence queries, where the hidden state space is iteratively divided into partitions. This iterative refinement of continues until the WFA accurately approximates the RNN’s behaviour, with grouping similar hidden states and separating dissimilar ones to facilitate efective comparison and identify counterexamples for refining ℬ.

The termination of the algorithm is intended to be guided by an error tolerance parameter, which specifies the permissible discrepancy between the RNN and the extracted WFA during equivalence queries. This parameter influences the granularity of the resulting WFA: a lower tolerance is expected to yield a finer approximation, while a higher tolerance produces a coarser model. In theory, termination occurs when no further counterexamples are identified, indicating agreement between the WFA and the RNN on the tested strings, or when resource limits, such as a predefined time limit [ 13], are reached, since convergence is inherently approximate. Because WFAs and RNNs are fundamentally diferent constructs, exact equivalence cannot be established. In practice, therefore, a test set can be used to evaluate the extracted WFA, providing an empirical measure of how closely it approximates the behaviour of the RNN under the chosen tolerance.

5. Conclusion

This paper explores an algorithm to extract a Weighted Finite Automaton (WFA) from a Recurrent Neural Network (RNN) with real-valued outputs, using a variation of the * algorithm. The approach combines spectral learning with Singular Value Decomposition (SVD) to handle membership queries and introduces a discrete abstraction of the RNN’s hidden state space to approximate equivalence queries. The goal is to investigate whether such a method can yield an accurate approximation of the RNN.

A key factor influencing the quality of the extracted WFA is the dynamic selection of prefixes and sufixes used to build the Hankel matrix. Counterexamples generated during equivalence queries are added to the basis, enabling iterative refinement that improves the accuracy of each successive WFA hypothesis. Consequently, the algorithm does not require access to the full training dataset; only the alphabet of the language for which the RNN was trained is needed, along with access to the RNN hidden state space, making the process more eficient.

To the best of our knowledge, the integration of spectral learning with the weighted * algorithm using a discrete abstraction for equivalence queries to extract WFAs has not been previously explored. This work provides a foundation for interpretable WFA extraction from RNNs, with potential applications in sequence modelling and analysis. In future, we aim to extend the implementation and theoretical framework beyond the proposed work.

Acknowledgments Work supported by the EPSRC grant EP/T018313/1. Declaration on Generative AI The authors have not employed any Generative AI tools.

[9] S. Ayache, R. Eyraud, N. Goudian, Explaining Black Boxes on Sequential Data using Weighted Automata, International Conference on Grammatical Inference 93 (2018) 81–103. URL: http: //arxiv.org/abs/1810.05741. [10] G. Rabusseau, T. Li, D. Precup, Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning, 2019. doi:doi.org/10.48550/arXiv.1807.01406. [11] R. Marzouk, C. de la Higuera, Distance and equivalence between finite state machines and recurrent neural networks: Computational results, arXiv preprint arXiv:2004.00478 (2020). doi:10.48550/ arXiv.2004.00478. [12] R. Eyraud, S. Ayache, Distillation of weighted automata from recurrent neural networks using a spectral approach, Machine Learning 113 (2024) 3233–3266. doi:10.1007/ s10994-021-05948-1. [13] S. Y. Wickramasinghe, J. M. Howe, L. Daviaud, Extracting Deterministic Finite Automata from RNNs via Hyperplane Partitioning and Learning, in: Proceedings of the 17th International Joint Conference on Computational Intelligence, 2025. To appear. [14] Q. Wang, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, C. L. Giles, An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks, Neural Computation 30 (2018) 2568–2591. doi:10.1162/neco_a_01111. [15] A. Cleeremans, D. Servan-Schreiber, J. L. McClelland, Finite State Automata and Simple Recurrent

Networks, Neural Computation 1 (1989) 372–381. doi:10.1162/neco.1989.1.3.372. [16] C. L. Giles, C. B. Miller, D. Chen, H. H. Chen, G. Z. Sun, Y. C. Lee, Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks, Neural Computation 4 (1992) 393–405. doi:10.1162/neco.1992.4.3.393. [17] C. W. Omlin, C. Giles, Extraction of rules from discrete-time recurrent neural networks, Neural

Networks 9 (1996) 41–52. doi:10.1016/0893-6080(95)00086-0. [18] H. Jacobsson, Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review, Neural

Computation 17 (2005) 1223–1263. doi:10.1162/0899766053630350. [19] J. Carlyle, A. Paz, Realizations by stochastic finite automata, Journal of Computer and System Sciences 5 (1971) 26–40. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022000071800053. doi:10.1016/S0022-0000(71)80005-3. [20] M. Fliess, Matrices de Hankel, Journal de Mathématiques Pures et Appliquées 53 (1974).

[1]

Okudono ,

Waga ,

Sekiyama , I. Hasuo , Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces, Proceedings of the AAAI Conference on Artificial Intelligence 34 ( 2020 ) 5306 - 5314 . doi: 10 .1609/aaai.v34i04. 5977 .

[2]

Wei ,

Zhang , M. Sun, Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages , 2022 , pp. 370 - 385 . doi: 10 .1007/978-3- 031 -17244-1_ 22 .

[3]

Zhang ,

Du ,

Xie , L. Ma, Y. Liu,

Sun , Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks , Proceedings of the AAAI Conference on Artificial Intelligence 35 ( 2021 ) 11699 - 11707 . doi: 10 .1609/aaai.v35i13. 17391 .

[4]

Angluin , Learning regular sets from queries and counterexamples , Information and Computation 75 ( 1987 ) 87 - 106 . doi: 10 .1016/ 0890 - 5401 ( 87 ) 90052 - 6 .

[5]

Weiss ,

Goldberg , E. Yahav, Extracting automata from recurrent neural networks using queries and counterexamples (extended version) , Machine Learning 113 ( 2024 ) 2877 - 2919 . doi: 10 .1007/ s10994-022-06163-2.

[6]

Weiss ,

Goldberg ,

Yahav , Learning Deterministic Weighted Automata with Queries and Counterexamples , Advances in Neural Information Processing Systems 32 ( 2019 ).

[7]

Balle ,

Mohri , Learning Weighted Automata, in: International Conference on Algebraic Informatics, Springer International Publishing, 2015 , pp. 1 - 21 . URL: https://link.springer.com/10. 1007/978-3- 319 -23021- 4 _1. doi: 10 .1007/978-3- 319 -23021- 4 _ 1 .

[8]

Balle ,

Carreras ,

F. M.

Luque ,

Quattoni , Spectral learning of weighted automata , Machine Learning 96 ( 2014 ) 33 - 63 . doi: 10 .1007/s10994-013-5416-x.