<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Passive Learning of Lattice Automata from Recurrent Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaouhar Slimi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tristan Le Gall</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Augustin Lemesle</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Université Paris-Saclay</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We present a passive automata learning algorithm that can extract automata from recurrent networks with very large or even infinite alphabets. Our method combines overapproximations from the field of Abstract Interpretation and passive automata learning from the field of Grammatical Inference. We evaluate our algorithm by first comparing it with the state-of-the-art automata extraction algorithm from Recurrent Neural Networks trained on Tomita grammars. Then, we extend these experiments to regular languages with infinite alphabets, which we propose as a novel benchmark.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Automata Learning</kwd>
        <kwd>Recurrent Neural Networks</kwd>
        <kwd>Abstract Interpretation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Multiple techniques relying on clustering were proposed for learning DFA [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], Weighted
Finite Automata (WFA) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Hidden Markov Models (HMM) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and Probabilistic Finite
Automata (PFA) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The type of automata influences the complexity of the solution and is
mainly motivated by the application; for instance, DFAs are well-suited to deterministic RNN
behaviors, whereas WFAs and PFA are better alternatives to capture their probabilistic dynamics.
A diefrent approach by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] relies on the classical Gold algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] by creating a prefix tree
automaton (PTA) from the execution traces, then merging its states to yield a DFA. The authors
associate RNNs hidden states values to each state of the PTA and then use that during the merge
algorithm by only merging states that have at most a distance  between their associated hidden
states. In this paper, we adapt the latter method to learn automata over an infinite alphabet.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Key Notions</title>
      <p>
        processing an input vector  is then in the following form: (, ℎ− →1−− )−− −
Elman Networks A category of simple RNN architectures, defined by a function  (, ℎ− 1)
that computes the hidden state ℎ, followed by a classifier function (ℎ) that computes the
output .  and  are non linear functions parameterized by their weights  and . For
readability purposes, we assume that both the input vector  and the hidden state ℎ− 1 are
vectors in R, at any time step  = 1, 2 . . .  1. In our paper, for simplicity, we consider only
binary classification networks, i.e.  ∈ B. The computation carried out by the RNN after
 
ℎ
→−− −− −
.
• A relation ⊑ on a set Λ is a partial order if:
1. ∀ ∈ Λ,  ⊑ 
2. ∀ 1,  2 ∈ Λ, ( 1 ⊑  2) ∧ ( 2 ⊑  1) ⇒  1 =  2
3. ∀ 1,  2,  3 ∈ Λ, ( 1 ⊑  2) ∧ ( 2 ⊑  3) ⇒  1 ⊑  3
• (Λ, ⊑) is a complete lattice if:
1. ⊑ is a partial order on a set Λ
2. any subset of Ω ⊆ Λ has a greatest lower bound ⊓Ω (i.e. the set { ∈ Λ | ∀ ∈ Ω,  ⊒ }
has a greatest element), and a least upper bound ⊔Ω (i.e. the set { ∈ Λ | ∀ ∈ Ω,  ⊑ }
has a smallest element)
• Then we define:
· ⊥ = ⊔∅ = ⊓Λ
· ⊤ = ⊓∅ = ⊔Λ
· Atoms(Λ) as the set of the minimal elements of Λ ∖ ⊥; in other words,  ∈ Λ is an
atom if ∀ ∈ ,  ⊑  ⇒  =  ∨  = ⊥
Atomistic Lattice A lattice (Λ, ⊑) is atomistic if: ∀ ∈ Λ :  = ⊔{ ∈ Atoms(Λ) |  ⊑  }.
Therefore, we can define a finite partition of Atoms(Λ) either as we did previously, or as a
function Π : {1 . . . } → Λ satisfying the following property: ∀ ∈ Atoms(Λ) there exists a
unique  ∈ {1 . . . } such that  ⊑ Π(). In this paper, and for the sake of simplicity, (Λ, ⊑) is
defined by the box abstraction.
1if the dimensions are not the same, we can always resize them to the highest dimension
The Box Abstraction Any set Ω ⊆ R can be abstracted by a box (-uple of intervals)
 (Ω) = ⟨1 . . . ⟩, such that each  is an interval bounding the projection of Ω on the
dimension . It is a classical abstract domain, and there is a Galois connection [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] between
((R), ⊆ ) and (Λ, ⊑). This lattice is atomistic: its atoms are the singletons ⟨J1, 1K . . . J, K⟩
for any vector ⟨1 . . . ⟩ ∈ R, thus the isomorphism between Atoms(Λ) and R. Because of
this isomorphism, we identify any  ∈ R with  () ∈ Λ. That is why, in the definition of the
language recognized by a Lattice Automaton, we wrote the condition " ⊑  ", while strictly
speaking, the condition should be " () ⊑  " since  ∈ R.
      </p>
      <p>Partition A finite partition on Atoms(Λ) is a function Π : {1 . . . } → (Atoms(Λ)) such that
⋃︀</p>
      <p>
        =1 Π() = Atoms(Λ) and  ̸=  ⇒ Π() ∩ Π() = ∅. Since Λ is atomistic, for any  ∈ {1 . . . },
we can identify the set of atoms Π() with ⊔{| ∈ Π()} which defines the "maximal element"
(also noted Π() by abuse of notation) of this class. We have the property that classes of the
partition are stable for the operation ⊔ : ∀ 1,  2 :  1 ⊑ Π() ∧  2 ⊑ Π() ⇒  1 ⊔  2 ⊑ Π().
Partitions are needed to properly define Lattice Automata, see [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for details.
Lattice Automata Lattice Automata are similar to DFA, but with transitions labelled by the
elements of the atomistic lattice Λ rather than elements of a finite alphabet Σ. Formally, a Lattice
Automaton is a tuple  = ⟨Λ, Π, , I, F, , Γ⟩ such that:
• I ⊆  are the initial states
• Λ is an atomistic lattice
• F ⊆  are the final states
• Π is a partition of Atoms(Λ)
•  ⊆  × Λ ×  is the set of transitions
•  is a finite set of states
• Γ : →− Λ is the hidden states function
      </p>
      <p>J−∞ , 0J
start
0</p>
      <p>J0, +∞K
J−∞ , 0J</p>
      <p>J−∞ , 0J
1</p>
      <p>J0, +∞K
2</p>
      <p>J0, +∞K</p>
      <p>J−∞ , +∞K
3</p>
      <p>We also require the following two properties to ensure that the number of transitions is finite,
even if the alphabet is infinite:
1. For any transition (, ,  ′) ∈  , there is  ∈ {1 . . . }, such that: ∀ ∈ Atoms(Λ),  ⊑  ⇒
 ∈ Π(). In other words, a transition shall not mix atoms belonging to diefrent classes of
the partition. Thus, we can define a function Π− 1 : Λ → {1 . . . } that associates a label 
of a transition to its class of the partition.
2. For any couple of states (, ′) and any class of the partition  ∈ {1 . . . }, there is at most
one transition (, ,  ) ∈  such that Π− 1( ) = .</p>
      <p>In this work, since in our examples and experiments (Λ, ⊑) we rely on the box abstraction, we
will simply call them Interval Lattice Automata (ILA).
sequence 0 →− 1 1 →− 2 . . . →−   such that:
Language recognition Like a DFA, an ILA recognizes a language, i.e. a set of words on the
input alphabet Atoms(Λ) ≡ R. A word 1 . . .  ( ∈ R for all ) is accepted by  if there is a
• ∀  = {1 . . . }, (− 1,  , ) ∈  and  ⊑  
• 0 ∈ I and  ∈ F
5: if ∃(,  ′, ′) ∈   such that Π− 1( ′) =  then
  ←   ∖ {(,  ′, ′)} ∪ {(,  ⊔  ′, ′)};
  ←   ∪ {(, ,  ′)};
has a role in the learning algorithm.</p>
      <sec id="sec-2-1">
        <title>Representation of the hidden states</title>
        <p>The function Γ does not play any role in the definition of
the language recognized by an ILA. Its purpose is to associate any state of the automaton with a
set of hidden states of the RNN, abstracted by an element of Λ. Unlike the elements labelling
the transitions, we do not require Γ() to belong to a single class of the partition Π, it can be
anything (including ⊤). We included this function in the definition of the automaton since it</p>
      </sec>
      <sec id="sec-2-2">
        <title>Adding transitions</title>
        <p>Algorithms on ILA are similar to the ones on DFA. However, one must
ensure that the two properties stated above remain true. For example, when we want to add a
transition (, ,  ′) to an ILA, we must first check if there already exists a transition between
the two states  and ′, labelled by  ′ belonging to the same class as  . If so, that transition is
replaced by  ⊔  ′, as written in Algorithm 1.</p>
        <p>Algorithm 1 Add Transition
that ∃ ∈ {1 . . . } ⊑ Π()
 = ⟨Λ, Π, , I, F,  , Γ⟩</p>
        <p>An automaton  = ⟨Λ, Π, , I, F, , Γ⟩ and (, ,  ′) ∈  × Λ ×  such</p>
        <sec id="sec-2-2-1">
          <title>1: Input:</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>2: Output:</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>3: Initialize:</title>
          <p>← 
I ← I
F ← F
  ← 
Γ ←
Γ
Π− 1( )
4:  ←
6:
8:</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>7: else</title>
          <p>9: end if
10: return ;</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Learning Lattice Automata from RNN execution traces</title>
      <p>
        The proposed algorithm is an extension of Gold’s algorithm for the case of ILA. We first create
an Interval Prefix Tree Automaton (IPTA) from a set of traces of the RNN. Then, we launch the
merging phase in which we merge states according to a similarity score inspired by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>Building the Interval Prefix Tree Automaton (IPTA)</title>
        <p>Let us consider a finite set of RNN
execution traces  = { =
are of varying lengths .</p>
        <p>︁(</p>
        <p>()
1 , ℎ
()
1 , 
())︁ . . . 
︁(</p>
        <p>()
1
, ℎ
()
 , 
())︁</p>
        <p>|  ≥ 1}, where sequences</p>
        <p>We build the IPTA starting with an automaton with a single
adds transitions and states to the automaton  in the following way:</p>
        <p>We consider the initial state 0 and the first triplet (1, ℎ1, 1) of the sequence:
initial state 0 and applying the function Add Sequence(, ) for every sequence  ∈ . The
function Add Sequence(, ) takes a sequence  = (1, ℎ1, 1) , (2, ℎ2, 2) . . . (, ℎ, ) and
• If there exists already in  a state 1 and a transition (0,  1, 1) such that Π− 1( 1) =
Π− 1(1), then we modify the transition (0,  ′1, 1); with  ′ = 
function to Γ(1) ←
Γ(1) ⊔ Jℎ1, ℎ1K; moreover, 1 becomes a final state if 1 = 1
⊔ J1, 1K, and the Γ
• otherwise, we create a new state 1 and a transition (0, 1, 1), with Γ(1) ←
being a final state if 1 = 1
The process is then repeated with 1 and the second triplet (2, ℎ2, 2), and so on, until the end of
the sequence. This construction ensures that any word  = (1, ℎ1, 1) , (2, ℎ2, 2) . . . (, ℎ, )
such that  = 1 will also be accepted by the IPTA, and that Jℎ, ℎK ⊑ Γ(). For an example,
Figure 2 shows the resulting IPTA from the set of traces :
⎧ 1 = (1.4, ℎ11, 1), (− 1.07, ℎ21, 1), (1.08, ℎ31, 1), (− 7.06, ℎ41, 1), (9.03, ℎ51, 1), . . .
⎪
⎪
⎪⎪ 2 = (3.39, ℎ12, 1), (− 3.2, ℎ22, 1), (7.91, ℎ32, 1), (− 3.45, ℎ42, 1), (2.1, ℎ52, 1), . . .
 = ⎨ 3 = (1.9, ℎ13, 1), (3.56, ℎ23, 1), (3.14, ℎ33, 0), (− 33.2, ℎ43, 0), . . .</p>
        <p>⎪⎪ 4 = (2.3, ℎ14, 1), (2.29, ℎ24, 1), (2.06, ℎ34, 0), (− 0.51, ℎ44, 0)
⎪
⎪⎩ . . .</p>
        <p>1</p>
        <p>J2.29, 3.56K
20</p>
        <p>J2.06, 3.14K 21 J− 33.2, − 0.51K . . .</p>
        <p>J1.4, 3.39K</p>
        <p>As shown in the example, there are already some merging involved when building the
IPTA, since, for each state of the IPTA, there are at most  outgoing transitions (one for each
class of the partition). Consequently, the language accepted by the IPTA is larger than the
set of traces , and there is the implicit assumption that the set of traces is coherent w.r.t.
the chosen partition Π. It can be checked by verifying that there cannot be two sequences
 = (1, ℎ1, 1) . . . (, ℎ, ) and ′ = (′1, ℎ′1, 1′) . . . (′, ℎ′, ′) such that:  and ′ are prefixes
of sequences of , ∀ 1 ≤  ≤ , Π− 1() = Π− 1(′), and  ̸= ′.</p>
        <p>If that property does not hold, it means the partition is too coarse to even build the IPTA, and
that our method cannot yield a faithful representation of the behavior of the RNN. In that case,
we can try again with a finer partition.</p>
        <p>Merging the states The second step of the algorithm is to merge states according to their
similarity score (a real number between 0 and 2), as long as it is possible. A description of our
method is described in algorithm 2.</p>
        <p>In our examples and experiments, the similarity score of two states 1 and 2 is defined as
follows:
• If only one of the two states 1 and 2 belongs to F, then the score is 2
• If both states (or none) belong(s) to F, then the score is:
Similarity score(, ) = 1 − cos ((Γ()), (Γ())) = 1 −
where (Γ()) denotes the center of the box Γ().</p>
        <p>(Γ()) · (Γ())
‖(Γ())‖2 ‖(Γ())‖2
Therefore, while there is at least one couple of states 1 ̸= 2 such that their similarity score is
lower than a hyperparameter , the two states are merged. It means that any transition that goes
to (or originates from) 1 or 2 will go to (or originate from) the merged state. In this process, if
. . .
two transitions (, ,  ′) and (,  ′, ′) belong to the same partition class, they will be merged.
For example, when we merge the two states 0 and 2 of the IPTA depicted in Figure 2, we obtain
the ILA depicted in Figure 3.</p>
        <p>At the end of the algorithm, we obtain an ILA  that recognizes a language larger than the
set of traces  given as its input. Indeed, if  = (1, ℎ1, 1) . . . (, ℎ, ) is a trace in , then
there exists a sequence of states and transitions 0 →− 1 1 →− 2 . . . →−   in  such that: 0 ∈ I;
∀  = 1 . . .  we have (− 1,  , ) ∈  ,  ⊑  , if  = 1 then  ∈ F, and ℎ ⊑ Γ().</p>
        <p>However, this ILA  may also accept words that are not accepted by the original RNN, and if
the set of traces is too small, it may also reject words that are accepted by the original RNN. It is
why we need experimental results to assess the faithfulness of the resulting automaton w.r.t the
original RNN.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        Experimental setting Our experiments were run on a Dell Inc. Precision 3591 computer,
equipped with an Intel® Core™ Ultra 7 165H × 22 CPU. Python 3.12.3 was used for algorithms
implementation and data synthesis tasks. We also used PyRAT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to define the abstractions.
      </p>
      <p>
        We evaluated our algorithm on two benchmarks. First we run experiments with RNNs trained
on Tomita languages, to compare our findings with [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and demonstrate that our approach
for inferring ILA is on par with SoTA. The fidelity to the original RNN of ILA are close to
DFA but a bit less eficient (Figure 4), and since ILA are more complex than DFA, we have a
larger number of states in our ILA. In that sense, our merging approach might benefit from
further improvements to reduce the number of states. Then, we propose a novel benchmark
by extending the Tomita languages to accept sequences of floats as inputs. For example, the
ifrst language accepts only sequences of numbers between 0 and 10. The complete definition is
given in Table 1.
      </p>
      <p>Tomita 2.0 language Language description
4312 ((NNJJ0oo0,,so11ud00bdKKs)JJt⋆−r−in11g00,,c00oJJn)t⋆aafteirnainngo3dcdonJs0ec,u10tiKvestlreitntgers ∈ J− 10, 0J
5 Even number of letters ∈ J− 10, 0J and even number of letters ∈ J0, 10K
6
7</p>
      <p>JD− ife1r0e,n0cJe⋆oJ0f,th10eKn⋆uJ−m1b0e,r0oJf⋆lJe0t,te1r0sK⋆ ∈ J0, 10K and letters ∈ J− 10, 0J is a multiple of 3</p>
      <p>Tables 2 and 3 summarize our results. Our implementation takes roughly few seconds to
infer an automaton from a set of execution traces. The fidelity score is measured as follows
Fidelity(ℛ, , ) = ∑︀</p>
      <p>=1 |ℛ() −  ()| for a given RNN ℛ, ILA  and sequence of inputs .</p>
      <p>We also analyze type I and type II errors to evaluate our algorithm:
• A type I error would occur because of abstractions imprecision, when a word is rejected
by the RNN and accepted by its automaton (also known as false alarms).
• A type II error reflects failure to capture RNN semantics, which can be due to insuficient
sample size used to build the IPTA.</p>
      <sec id="sec-4-1">
        <title>Tomita language</title>
        <p>RNN accuracy
ILA size
ILA Fidelity
Type I error
Type II error</p>
      </sec>
      <sec id="sec-4-2">
        <title>Tomita 2.0 language</title>
        <p>RNN accuracy
ILA size
ILA Fidelity
Type I error
Type II error</p>
        <p>
          The scores of extracted ILA from a Tomita 2.0 RNN are roughly 80%, which is acceptable
at this stage. The distribution of error scores suggests that our interval abstraction is precise
enough (except for language 4, where it can reach 7.72%); however, type II error reflects an
insuficient sample size, especially for language 2. Increasing the size of our sequences  (which
was set to 1000 sequences of maximum length 20 for each) will likely lead to improving the
scores. Our preliminary results can be improved by further benchmarking the merge parameter
and analyzing errors. Also by experimenting with diefrent state merging algorithms that
perform better than Gold’s algorithm, such as RPNI or EDSM [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], would lead to better results.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>We presented in this paper a passive learning algorithm that is capable of inferring an automaton
from a set of traces of an RNN. Unlike previous methods, we do not require the inputs of the
RNN to belong to a finite alphabet. Our algorithm ensures that we obtain an overapproximation
of the set of traces. Our experiments demonstrate a capacity to infer ILA from an RNN trained on
possibly infinite regular language, while also indicating that there remains significant potential
for further improvements (increase fidelity and reduce ILA size), and to extend our benchmarks.</p>
      <p>In future work, we aim to extend this proof of concept, especially for the verification and the
explainability of RNNs trained for practical real-world applications, e.g. time series forecasting,
where the robustness of the RNN is of paramount importance. The automata-based approach is a
generalizable formal method oefring a synergy between the interpretability and the verification
for recurrent networks. While properties that can be formulated and verified are beyond
adversarial robustness, the explanation can also be regarded as a property we seek to verify.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work was supported by the French Agence Nationale de la Recherche (ANR) through SAIF
(ANR-23-PEIA-0006) as part of the France 2030 programme.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and
spelling check. After using this tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Zhang,</surname>
          </string-name>
          <article-title>Segrnn: Segment recurrent neural network for long-term time series forecasting</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , I. Antonoglou,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lockhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Graepel</surname>
          </string-name>
          , et al.,
          <article-title>Mastering atari, go, chess and shogi by planning with a learned model</article-title>
          ,
          <source>Nature</source>
          <volume>588</volume>
          (
          <year>2020</year>
          )
          <fpage>604</fpage>
          -
          <lpage>609</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jacobsson</surname>
          </string-name>
          ,
          <article-title>Rule extraction from recurrent neural networks: Ataxonomy and review</article-title>
          ,
          <source>Neural Computation</source>
          <volume>17</volume>
          (
          <year>2005</year>
          )
          <fpage>1223</fpage>
          -
          <lpage>1263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bollig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Neider</surname>
          </string-name>
          ,
          <article-title>A survey of model learning techniques for recurrent neural networks</article-title>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. Zhang,
          <article-title>Supervised and semi-supervised text categorization using lstm for region embeddings</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>526</fpage>
          -
          <lpage>534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Angluin</surname>
          </string-name>
          ,
          <article-title>Learning regular sets from queries and counterexamples</article-title>
          ,
          <source>Information and computation 75</source>
          (
          <year>1987</year>
          )
          <fpage>87</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mayr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yovine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Visca</surname>
          </string-name>
          ,
          <article-title>Property checking with interpretable error characterization for recurrent neural networks</article-title>
          ,
          <source>Machine Learning and Knowledge Extraction</source>
          <volume>3</volume>
          (
          <year>2021</year>
          )
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Muškardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Aichernig</surname>
          </string-name>
          , I. Pill,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tappler</surname>
          </string-name>
          ,
          <article-title>Learning finite state models from recurrent neural networks</article-title>
          ,
          <source>in: International Conference on Integrated Formal Methods</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Muškardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tappler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Pill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Aichernig</surname>
          </string-name>
          , T. Pock,
          <article-title>On the relationship between rnn hidden state vectors and semantic ground truth</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Segre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Adaax:</surname>
          </string-name>
          <article-title>Explaining recurrent neural networks by learning automata with adaptive states (</article-title>
          <year>2022</year>
          )
          <fpage>574</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Ororbia</surname>
          </string-name>
          <string-name>
            <surname>II</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>A comparative study of rule extraction for recurrent neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>05420</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Sun,
          <article-title>Weighted automata extraction and explanation of recurrent neural networks for natural language tasks</article-title>
          ,
          <source>Journal of Logical and Algebraic Methods in Programming</source>
          <volume>136</volume>
          (
          <year>2024</year>
          )
          <fpage>100907</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Juefei-Xu</surname>
          </string-name>
          , L. Ma, Luna:
          <article-title>A model-based universal analysis framework for large language models</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          <volume>50</volume>
          (
          <year>2024</year>
          )
          <fpage>1921</fpage>
          -
          <lpage>1948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Towards interpreting recurrent neural networks through probabilistic abstraction (</article-title>
          <year>2020</year>
          )
          <fpage>499</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tsilivis</surname>
          </string-name>
          ,
          <article-title>Extracting finite automata from rnns using state merging</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Gold</surname>
          </string-name>
          ,
          <article-title>Language identification in the limit</article-title>
          ,
          <source>Information and control 10</source>
          (
          <year>1967</year>
          )
          <fpage>447</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cousot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cousot</surname>
          </string-name>
          ,
          <article-title>Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints</article-title>
          ,
          <source>in: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages</source>
          ,
          <year>1977</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Le Gall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jeannet</surname>
          </string-name>
          ,
          <article-title>Lattice automata: A representation for languages on infinite alphabets, and some applications to verification</article-title>
          , in: International
          <source>Static Analysis Symposium</source>
          , Springer,
          <year>2007</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lemesle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Gall</surname>
          </string-name>
          ,
          <article-title>Neural network verification with pyrat</article-title>
          ,
          <source>arXiv preprint arXiv:2410.23903</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Soubki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heinz</surname>
          </string-name>
          ,
          <article-title>Benchmarking state-merging algorithms for learning regular languages</article-title>
          , in: International Conference on Grammatical Inference,
          <string-name>
            <surname>PMLR</surname>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>