<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Extracting Weighted Finite Automata from RNNs via iterative partitioning and spectral learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sandamali Yashodhara Wickramasinghe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacob M. Howe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laure Daviaud</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City St George's, University of London</institution>
          ,
          <addr-line>Northampton Square, London EC1V 0HB</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of East Anglia</institution>
          ,
          <addr-line>Research Park, Norwich NR4 7TJ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>26</volume>
      <issue>2025</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper proposes an algorithm to extract a Weighted Finite Automaton (WFA) from a Recurrent Neural Network (RNN), by adapting the * algorithm in combination with spectral learning using Singular Value Decomposition (SVD). The method introduces a discrete abstraction of the RNN's hidden state space to enable approximate equivalence queries within the * framework. Through these queries, counterexamples are identified and used to iteratively refine the prefix and sufix sets that define the Hankel matrix, thereby improving the accuracy of successive WFA hypotheses. This work lays the groundwork for integrating discrete clustering with spectral learning in the context of WFA extraction from RNNs, an area that remains underexplored, with potential applications in sequence modelling and analysis. Future work will focus on implementing and empirically validating the proposed approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recurrent Neural Networks</kwd>
        <kwd>Weighted Finite Automata</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Interpretability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This research explores the extraction of Weighted Finite Automata from Recurrent Neural Networks.
Recurrent Neural Networks (RNNs) are often referred to as black boxes due to their lack of interpretability.
Understanding the behaviour of RNNs is important to enhance their interpretability and stability, and
this understanding can be achieved by extracting symbolic models such as automata. Most research on
extracting knowledge from RNNs has focused on Deterministic Finite Automata (DFA).</p>
      <p>
        DFA extraction is suitable for RNNs trained as binary classifiers (producing Boolean outputs). In
contrast, for RNNs with real-valued outputs, such as probabilities, extracting Weighted Finite Automata
is more appropriate [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        One approach to extract automata from RNNs is through compositional techniques, where a WFA
is derived by dividing the RNN state space into clusters and identifying the transitions and weight
matrices to construct the automaton [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Another approach involves exact learning methods, such as
the * algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where an automaton is extracted by posing queries about the language learned
by the RNN. Angluin [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduced this * algorithm in 1987 for inferring a DFA from an unknown
target system, which was later extended to extract DFAs from RNNs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In the * algorithm, two types
of queries are employed in the extraction process: membership queries and equivalence queries. A
membership query asks whether a specific string belongs to the target language, while an equivalence
query checks whether the language recognised by the extracted automaton is equivalent to that of the
target.
      </p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] introduced a novel approach for extracting a DFA from an RNN using the *
algorithm and demonstrated strong performance in accurately representing the language learned by the
RNN. Later on, the work was extended to extract PDFAs (Probabilistic Deterministic Finite Automata)
from RNNs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Even though the * algorithm was designed to extract a DFA, later on, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed a
variation of it for WFA extraction, namely the spectral learning algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Building upon this, [9]
conducted a study where a WFA is extracted without requiring access to the internal mechanisms of
the RNN or the original training data. The work presented in [10] also used spectral learning to extract
WFAs from second-order RNNs.
      </p>
      <p>The use of exact learning methods for extracting WFAs from RNNs remains relatively underexplored.
While membership queries are straightforward in this setting, the main challenge lies in performing
equivalence queries, as a formal equivalence between a WFA and an RNN is undecidable [11].</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], this was addressed by using an abstraction of the RNN’s hidden state space while doing the
equivalence queries. By clustering similar hidden states into discrete regions, a finite-state abstraction
can be constructed and compared against the behaviour of the extracted automaton. This method
enables an approximate but feasible equivalence check. The same abstraction-based approach was later
extended to WFA extraction in the study presented by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Our proposed contributions are twofold: first, we aim to integrate spectral learning with Singular
Value Decomposition (SVD) to perform membership queries in the extraction of a WFA from an
RNN using the * algorithm; second, we propose addressing the challenge of equivalence queries by
iteratively constructing a discrete abstraction of the RNN state space to enable efective comparison.
This framework builds upon the foundational work of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and extends the ideas explored in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] toward
the extraction of WFAs.
      </p>
      <p>In contrast to prior approaches such as [9, 12], which construct WFAs using spectral learning
over fixed sets of prefixes and sufixes, we propose an iterative spectral learning framework. This
approach incrementally expands the set of prefixes and sufixes based on the learning process, allowing
for dynamic refinement until equivalence is achieved, potentially improving both adaptability and
eficiency.</p>
      <p>
        Additionally, we propose incorporating a discrete abstraction network during equivalence queries
by incrementally partitioning the RNN hidden state space. This builds upon the preliminary ideas
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and can be further refined using advanced partitioning strategies, including hyperplane-based
segmentation [13], k-means clustering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and the K-DCP method introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        While the use of the weighted * algorithm for WFA extraction has been explored in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], its integration
with spectral learning [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with a discrete state abstraction remains under-explored. Furthermore, the
iterative construction of an abstraction network using the aforementioned partitioning techniques has
not been applied in this context. Our proposed framework aims to bridge these gaps and provide a new
direction for extracting WFAs from RNNs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        The association between automata and RNNs has been studied in the context of both binary classification
problems [
        <xref ref-type="bibr" rid="ref5">14, 5</xref>
        ] and quantitative tasks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref6">9, 10, 6, 1, 3, 2, 12</xref>
        ]. The earliest works, such as [15, 16], laid
the foundation for understanding the relationship between automata and RNNs. In 1996, Omlin and
Giles [17] demonstrated that the state space of an RNN trained on a regular grammar tends to cluster
in a way that resembles the structure of a finite automaton. Subsequent studies continued to explore
this line of research, including [18, 14], further validating the connection between RNN dynamics and
automata representations.
      </p>
      <p>
        The work presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is considered state-of-the-art in DFA extraction and employs the *
algorithm, an exact learning approach. This method was extended to quantitative RNNs in a probabilistic
setting in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where probabilistic deterministic finite automata (PDFAs) were extracted. Subsequently,
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] further advanced this line of research by extracting WFAs using a regression-based method to
perform equivalence checks.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a weighted variation of the * algorithm was introduced specifically for WFA extraction with
the use of spectral learning for WFA introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Unlike the original * algorithm by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which
constructs an observation table through membership queries to a teacher (i.e., the target system), this
variation replaces the observation table with a Hankel matrix populated by real-valued entries. A WFA
is then derived from this matrix using spectral learning techniques, with rank factorisation and Singular
Value Decomposition (SVD) being core components of the process. Building on [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [10] demonstrated
that second-order RNNs with linear activation functions are natural extensions of, and computationally
equivalent to, WFAs, establishing a fundamental connection between the two models.
      </p>
      <p>
        The works in [9] and [12] focus on extracting WFAs from RNNs producing real-valued outputs over
sequential data without accessing internal states of the black-box models. These studies adopt the
spectral learning framework introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], treating the RNN as an oracle and analysing its outputs
to infer the automaton. However, they do not incorporate equivalence queries in the learning process.
      </p>
      <p>
        A compositional approach is presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which diverges from the exact learning approach.
Their Decision Guided Extraction algorithm leverages the target RNN’s decision-making process and
contextual information to guide the extraction. This method has shown eficiency in larger-scale
tasks, addressing scalability challenges in approximating RNN behaviour. Similarly, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] employed
a compositional strategy using k-means clustering to extract WFAs from RNNs trained on natural
language data.
      </p>
      <p>
        While exact learning methods ofer strong accuracy guarantees, they often face scalability and
computational complexity limitations. In contrast, compositional approaches, though potentially less
precise, provide more scalable solutions. Therefore, combining these methods could yield a more
efective framework for automata extraction from RNNs, enhancing both accuracy and scalability
[
        <xref ref-type="bibr" rid="ref1 ref5 ref6">1, 6, 5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <p>This section introduces the technical notation that will be used throughout the paper. A finite alphabet
(a finite set of elements) Σ is fixed, and the set of all finite strings (i.e. a finite sequence of elements) 
over Σ is denoted by Σ * . The empty string of length 0 is denoted by . The length of a string  ∈ Σ * is
denoted by ||. Given two strings  and , the concatenation of the two strings is  = . In that case,
 is called a prefix of  and  a sufix of . A set  of strings is said to be prefix closed if for all strings
 ∈ , all the prefixes of  are also in .</p>
      <p>
        Definition 1. Hankel matrix [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: Let  be a function  : Σ * → R. The Hankel matrix of  is a bi-infinite
matrix ℋ ∈ RΣ* × Σ* , whose entries are defined as ℋ(, ) =  () for any ,  ∈ Σ * .
      </p>
      <p>Rows of a Hankel matrix can thus be seen as indexed by prefixes and columns by sufixes. Only the
ifnite sub-blocks of ℋ are of interest, and a sub-block can be defined using a basis ℬ = (, ), where
,  ⊆ Σ * . The sub-block of ℋ defined by ℬ is the sub-matrix of ℋℬ restricted to the rows indexed by
, and columns indexed by . We will mainly be interested in basis ℬ = (, ) that are prefix closed
that is to say ,  ⊆ Σ * are prefix-closed and complete, meaning that the rank of the sub-block ℋℬ
equals the rank of ℋ.</p>
      <p>
        Definition 2. Weighted Finite Automaton (WFA): A WFA  is a tuple ⟨Σ , ,  0,  ∞, { } ∈Σ⟩, where
Σ is the finite alphabet,  is the finite set of states,  0 and  ∞ are vectors of size || called the initial and
ifnal vectors, and  is the transition matrix corresponding to  ∈ Σ with size || × | | and real-valued
entries. A WFA also can be represented linearly as a triplet ⟨ 0, ( ) ∈Σ,  ∞⟩ [
        <xref ref-type="bibr" rid="ref8">8, 12</xref>
        ].
      </p>
      <p>The weight of a string  =  1 2 . . .   in a WFA  is computed as () =  0⊤ ∞ =
 0⊤ 1  2 ...   ∞ .</p>
      <p>Definition 3. Recurrent Neural Network (RNN): A Recurrent Neural Network is defined as a tuple
 = (ℎ0, , ), where ℎ0 ∈ R is the initial hidden state with  ∈ N,  : R × Σ → R is the
transition function, and  : R → R is the output function mapping a hidden state to a real value. For an
input string  ∈ Σ * , the RNN  produces a real-valued output, defining a function  : Σ * → R that
maps strings to real values.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Extracting WFAs from RNNs</title>
      <p>This section describes the proposed algorithm for extracting a WFA from an RNN using the * algorithm
(see Algorithm 1).</p>
      <sec id="sec-4-1">
        <title>4.1. * Algorithm for WFA learning from an RNN</title>
        <p>
          Angluin’s * algorithm learns a minimal automaton using a minimally adequate teacher, a theoretical
construct capable of answering queries about an unknown target system. The algorithm relies on two
types of queries: membership queries and equivalence queries. In the context of learning a WFA from a
target function  : Σ * → R, the learner interacts with the teacher as follows [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]:
        </p>
        <p>In membership queries, the learner queries the teacher for the value  () of a specific string , and
records the result. Based on the collected information, the learner constructs a hypothesised WFA .
Equivalence queries are then used to verify whether the hypothesised WFA  correctly computes the
target function  . If it does not, the teacher provides a counterexample– a string on which the behaviour
of  difers from  (i.e. () ̸=  ()). The learner uses this counterexample to refine the hypothesis.</p>
        <p>In this setting, the RNN is a model that produces real-valued outputs and serves as the teacher,
meaning that both membership and equivalence queries are directed to the RNN. The responses to
membership queries are recorded in a Hankel matrix (see Definition 1), which can then be used to derive
a minimal WFA through SVD-based rank factorisation (as described in Section 4.2).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Hankel matrix to WFA</title>
        <p>Theorem 1. Let  : Σ * → R be a function over Σ * . Then,  is computable by a WFA if and only if the
rank of its Hankel matrix is finite. In that case, this rank is equal to the minimum number of states of any
WFA computing  [19, 20].</p>
        <p>
          As shown in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a WFA can be constructed from the Hankel matrix ℋ of a function  : Σ * → R using
a sub-block ℋℬ ∈ R× , defined by a basis ℬ = (, ) with || =  prefixes and || =  sufixes,
where  is prefix-closed and ℬ spans the row and column spaces of ℋ (see Algorithm 1 lines 2–5).
        </p>
        <p>For a sub-block ℋℬ of rank , a minimal WFA is computed via SVD, where ℋℬ =  Λ  ⊤, with
 ∈ R×  and  ∈ R×  orthogonal, and Λ ∈ R×  a diagonal matrix containing singular values of
ℋℬ, yielding a rank factorization ℋℬ =   with  =  Λ ∈ R×  and  =  ⊤ ∈ R× .</p>
        <p>
          Additionally, let ℋ ∈ R×  be the sub-block defined by ℋ (, ) = ℋ(,  ) for  ∈ ,  ∈ ,
and  ∈ Σ , let ℎ, ∈ R be the vector with coordinates ℎ, () = ℋ(,  ), and let ℎ,  ∈ R be the
vector with coordinates ℎ,  () = ℋ(,  ). The WFA  = ⟨ 0, ( ) ∈Σ,  ∞⟩ is then defined with
initial weights  0⊤ = ℎ⊤ +, final weights  ∞ =  +ℎ, and transition matrices  =  +ℋ +,
, 
where  + and + is the Moore–Penrose pseudo inverse of  and  respectively [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>The prefixes  and sufixes  required to construct the basis can be derived from strings generated
over the alphabet on which the RNN was trained. These sets can be generated in lexicographical order,
expanding systematically, or sampled from a uniform distribution over the alphabet [12].</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Equivalence queries and the abstraction of the state space</title>
        <p>When using the * algorithm for extracting a WFA from an RNN, a key challenge is performing
equivalence queries due to the difering natures of RNNs (continuous state space in R) and WFAs
(finite discrete states), which complicates establishing exact equivalence. To address this, a discrete
abstraction  of the RNN’s hidden state space can be used to compare its behaviour with the WFA 
during the extraction process.</p>
        <p>Definition 4. Suppose  is a hypothesised abstraction of the RNN state space. Given an RNN  with
state space R, a partitioning function  : R →  maps each hidden state to a discrete state in the set
 = {1, 2, . . . , | |}, where | | is the number of states (clusters) in the abstraction  . The
abstraction  represents a partitioning of the RNN state space, where each state  ∈  corresponds to
a cluster of hidden states.</p>
        <p>Equivalence queries perform two tasks: (1) return a counterexample string  ∈ Σ * if the WFA  is
incorrect (i.e., () ̸= ()), and add its prefixes to  and sufixes to  to refine the basis ℬ = (, )
(see Algorithm 1, lines 7–8); (2) refine the abstraction  by partitioning R into finer clusters (see
Algorithm 1, lines 10–11).</p>
        <p>To construct the abstraction  , the RNN’s state space R can be iteratively partitioned into clusters
during equivalence queries (see Definition 4). Initially, the entire state space is treated as a single cluster.
If  () ̸= (), for a string , while performing equivalence queries, the abstraction  should be
refined, and clustering methods can be applied.</p>
        <p>
          The refinement of the abstraction  could be realised using these compositional methods, such
as decision-guided extraction using stepwise predictions [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and -means based clustering of hidden
states [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or hyperplane-based partitioning [13] during equivalence queries, where the hidden state
space is iteratively divided into partitions. This iterative refinement of  continues until the WFA 
accurately approximates the RNN’s behaviour, with  grouping similar hidden states and separating
dissimilar ones to facilitate efective comparison and identify counterexamples for refining ℬ.
        </p>
        <p>The termination of the algorithm is intended to be guided by an error tolerance parameter, which
specifies the permissible discrepancy between the RNN and the extracted WFA during equivalence
queries. This parameter influences the granularity of the resulting WFA: a lower tolerance is expected to
yield a finer approximation, while a higher tolerance produces a coarser model. In theory, termination
occurs when no further counterexamples are identified, indicating agreement between the WFA and
the RNN on the tested strings, or when resource limits, such as a predefined time limit [ 13], are reached,
since convergence is inherently approximate. Because WFAs and RNNs are fundamentally diferent
constructs, exact equivalence cannot be established. In practice, therefore, a test set can be used
to evaluate the extracted WFA, providing an empirical measure of how closely it approximates the
behaviour of the RNN under the chosen tolerance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper explores an algorithm to extract a Weighted Finite Automaton (WFA) from a Recurrent
Neural Network (RNN) with real-valued outputs, using a variation of the * algorithm. The approach
combines spectral learning with Singular Value Decomposition (SVD) to handle membership queries
and introduces a discrete abstraction of the RNN’s hidden state space to approximate equivalence
queries. The goal is to investigate whether such a method can yield an accurate approximation of the
RNN.</p>
      <p>A key factor influencing the quality of the extracted WFA is the dynamic selection of prefixes and
sufixes used to build the Hankel matrix. Counterexamples generated during equivalence queries are
added to the basis, enabling iterative refinement that improves the accuracy of each successive WFA
hypothesis. Consequently, the algorithm does not require access to the full training dataset; only the
alphabet of the language for which the RNN was trained is needed, along with access to the RNN hidden
state space, making the process more eficient.</p>
      <p>To the best of our knowledge, the integration of spectral learning with the weighted * algorithm
using a discrete abstraction for equivalence queries to extract WFAs has not been previously explored. This
work provides a foundation for interpretable WFA extraction from RNNs, with potential applications
in sequence modelling and analysis. In future, we aim to extend the implementation and theoretical
framework beyond the proposed work.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>Work supported by the EPSRC grant EP/T018313/1.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[9] S. Ayache, R. Eyraud, N. Goudian, Explaining Black Boxes on Sequential Data using Weighted
Automata, International Conference on Grammatical Inference 93 (2018) 81–103. URL: http:
//arxiv.org/abs/1810.05741.
[10] G. Rabusseau, T. Li, D. Precup, Connecting Weighted Automata and Recurrent Neural Networks
through Spectral Learning, 2019. doi:doi.org/10.48550/arXiv.1807.01406.
[11] R. Marzouk, C. de la Higuera, Distance and equivalence between finite state machines and recurrent
neural networks: Computational results, arXiv preprint arXiv:2004.00478 (2020). doi:10.48550/
arXiv.2004.00478.
[12] R. Eyraud, S. Ayache, Distillation of weighted automata from recurrent neural
networks using a spectral approach, Machine Learning 113 (2024) 3233–3266. doi:10.1007/
s10994-021-05948-1.
[13] S. Y. Wickramasinghe, J. M. Howe, L. Daviaud, Extracting Deterministic Finite Automata from
RNNs via Hyperplane Partitioning and Learning, in: Proceedings of the 17th International Joint
Conference on Computational Intelligence, 2025. To appear.
[14] Q. Wang, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, C. L. Giles, An Empirical Evaluation of
Rule Extraction from Recurrent Neural Networks, Neural Computation 30 (2018) 2568–2591.
doi:10.1162/neco_a_01111.
[15] A. Cleeremans, D. Servan-Schreiber, J. L. McClelland, Finite State Automata and Simple Recurrent</p>
        <p>Networks, Neural Computation 1 (1989) 372–381. doi:10.1162/neco.1989.1.3.372.
[16] C. L. Giles, C. B. Miller, D. Chen, H. H. Chen, G. Z. Sun, Y. C. Lee, Learning and Extracting Finite
State Automata with Second-Order Recurrent Neural Networks, Neural Computation 4 (1992)
393–405. doi:10.1162/neco.1992.4.3.393.
[17] C. W. Omlin, C. Giles, Extraction of rules from discrete-time recurrent neural networks, Neural</p>
        <p>Networks 9 (1996) 41–52. doi:10.1016/0893-6080(95)00086-0.
[18] H. Jacobsson, Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review, Neural</p>
        <p>Computation 17 (2005) 1223–1263. doi:10.1162/0899766053630350.
[19] J. Carlyle, A. Paz, Realizations by stochastic finite automata, Journal of Computer and System
Sciences 5 (1971) 26–40. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022000071800053.
doi:10.1016/S0022-0000(71)80005-3.
[20] M. Fliess, Matrices de Hankel, Journal de Mathématiques Pures et Appliquées 53 (1974).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Okudono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Waga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sekiyama</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Hasuo</surname>
          </string-name>
          ,
          <source>Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces, Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>5306</fpage>
          -
          <lpage>5314</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v34i04.
          <fpage>5977</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Sun,
          <source>Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>370</fpage>
          -
          <lpage>385</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -17244-1_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          , L. Ma, Y. Liu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>11699</fpage>
          -
          <lpage>11707</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v35i13.
          <fpage>17391</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Angluin</surname>
          </string-name>
          ,
          <article-title>Learning regular sets from queries and counterexamples</article-title>
          ,
          <source>Information and Computation</source>
          <volume>75</volume>
          (
          <year>1987</year>
          )
          <fpage>87</fpage>
          -
          <lpage>106</lpage>
          . doi:
          <volume>10</volume>
          .1016/
          <fpage>0890</fpage>
          -
          <lpage>5401</lpage>
          (
          <issue>87</issue>
          )
          <fpage>90052</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          , E. Yahav,
          <article-title>Extracting automata from recurrent neural networks using queries and counterexamples (extended version)</article-title>
          ,
          <source>Machine Learning</source>
          <volume>113</volume>
          (
          <year>2024</year>
          )
          <fpage>2877</fpage>
          -
          <lpage>2919</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10994-022-06163-2.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yahav</surname>
          </string-name>
          ,
          <article-title>Learning Deterministic Weighted Automata with Queries and Counterexamples</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohri</surname>
          </string-name>
          , Learning Weighted Automata, in: International Conference on Algebraic Informatics, Springer International Publishing,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . URL: https://link.springer.com/10. 1007/978-3-
          <fpage>319</fpage>
          -23021-
          <issue>4</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -23021-
          <issue>4</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Carreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Luque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Quattoni</surname>
          </string-name>
          ,
          <article-title>Spectral learning of weighted automata</article-title>
          ,
          <source>Machine Learning</source>
          <volume>96</volume>
          (
          <year>2014</year>
          )
          <fpage>33</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10994-013-5416-x.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>