1. Introduction

10.1145/93385.93442

RNN Generalization to Omega-Regular Languages

Charles Pert

Dalal Alrajeh

Alessandra Russo

Imperial College London

2025

4963 0009 0009

Büchi automata (BAs) recognize -regular languages defined by formal specifications like linear temporal logic (LTL) and are commonly used in the verification of reactive systems. However, BAs face scalability challenges when handling and manipulating complex system behaviors. As neural networks are increasingly used to address these scalability challenges in areas like model checking, investigating their ability to generalize beyond training data becomes necessary. This work presents the first study investigating whether recurrent neural networks (RNNs) can generalize to -regular languages derived from LTL formulas. We train RNNs on ultimately periodic -word sequences to replicate target BA behavior and evaluate how well they generalize to out-of-distribution sequences. Through experiments on LTL formulas corresponding to deterministic automata of varying structural complexity, from 3 to over 100 states, we show that RNNs achieve high accuracy on their target -regular languages when evaluated on sequences up to 8× longer than training examples, with 92.6% of tasks achieving perfect or near-perfect generalization. These results establish the feasibility of neural approaches for learning complex -regular languages, suggesting their potential as components in neurosymbolic verification methods.

eol>recurrent neural networks omega-regular languages linear temporal logic büchi automata length generalization

1. Introduction

Linear Temporal Logic (LTL) [ 1 ] formulas specify properties of system execution traces through regular languages, which are composed of infinitely long sequences called -words. While regular languages are recognized by finite automata, -regular languages require Büchi automata (BAs) [ 2 ]. See Figure 1 for an example of a deterministic Büchi automaton (DBA). For a detailed introduction to these concepts, we refer the reader to [ 3 ].

While BAs provide exact solutions, they can become computationally expensive to handle and manipulate when representing complex behaviors. Recently, neurosymbolic methods have been used in verification contexts such as model checking [ 4 ], areas that traditionally rely on BAs. Characterizing whether neural networks can recognize -regular languages is a step toward enabling the development of additional approaches. Recent studies have shown that recurrent neural networks (RNNs) have the ability to generalize to the recognition of regular languages [ 5, 6, 7, 8 ], but this has not yet been shown specifically for -regular languages. Related works [ 9, 10 ] have used graph neural networks to analyze BA properties like emptiness checking and [ 11 ] demonstrated that Transformers [ 12 ] can generate satisfying -words for LTL formulas, yet the problem of generalization to the recognition of -regular languages remains open.

Extending the existing work on regular language generalization to -regular languages requires handling two practical challenges: (1) encoding: representing infinite sequences with finite-length sequences; (2) labeling sequences: computing acceptance labels for large batches of sequences. While ifnite automata accept a word when it terminates in an accepting state, BAs require -words to traverse accepting states infinitely often, meaning RNNs must learn to recognize eventually periodic behavior instead of just reaching an accepting state. We address challenge (1) by using ultimately periodic (UP) -words, which uniquely characterize their -regular languages [ 13 ]. This representation enables us to investigate whether RNNs can approximate the symbolic acceptance mechanisms of BAs. 0 a ∧ b ¬a ¬a ∨ b 1 3 a ∧ ¬b a ∧ ¬b b ⊤ ¬b 2

While existing model checking tools like Spot [ 14 ] can compute acceptance labels for specific sequences, this approach quickly becomes impractical for the large datasets required for neural network training and does not address sequence generation. Instead, we use Spot to construct a BA from each LTL formula, generate sequences using the BA representation and simulate acceptance by directly processing the generated sequences through the BA. We restrict our approach to DBAs to simplify this acceptance check, limiting our study to recurrence properties [ 15 ]. While this means that this study does not cover persistence properties, recurrence properties cover a large number of properties and are commonly used in verification. Using BAs as data generators provides some control over sampling diversity, which is essential since random sampling can induce data imbalance problems when accepted (or rejected) traces are rare.

This work establishes the feasibility of RNNs generalizing to the recognition of -regular languages while identifying challenges for future research. Our investigation complements ongoing neurosymbolic advances in verification, for example, neural certificates for model checking [ 4 ], neural circuit synthesis [16], as well as neural specification mining, learning finite automata or LTL [17] from traces [18, 19, 20, 21].

The main contributions of this work are: • We present the first empirical evidence that RNNs achieve high accuracy on -regular languages when evaluated at lengths up to 8× longer than the sequences in their training distribution.1 • We provide an analysis of RNN generalization, showing that generalization is not limited to toy -regular languages and is robust across DBAs with over 100 states. We also show that the model complexity is correlated with the complexity of the BA recognizing the -regular language.

2. Method

Practical training of RNNs on -regular languages requires a finite representation of -words and an eficient method for computing acceptance labels for generated sequences given the infinite acceptance condition of BAs. Our approach resolves the finite encoding problem by only using UP -words of an -regular language, , where is a finite prefix and is an infinitely repeating sufix; these -words uniquely characterize the language [ 13 ]. We encode as $ [ 13 ]. The alphabet size of a DBA is 2| | + 1, where | | is the number of propositions present in the LTL formula used to construct the DBA. Each symbol in the alphabet represents either an assignment to all propositions or the separator symbol $.

Importantly, the encoding $ establishes a bijection between UP -words in the target -regular language and words in a derived regular language. The DBA can be algorithmically reconstructed from 1Code available at: https://github.com/pertcj/omega-generalization the finite automaton recognizing this regular language [ 13 ]. Therefore, while our RNNs are learning regular languages (and leveraging their established ability [ 6, 7, 5 ]), they are learning canonical regular representations of the target DBA that preserve all structural information for reconstruction.

To determine the label of a $ sequence, we simulate DBA behavior. Simulating the finite prefix yields the state in which terminates. For the sufix , we compute the state transition matrix induced by reading and use matrix exponentiation to determine reachability. The sequence is accepted if repeated application of can reach a cycle containing an accepting state.

We illustrate how we sample $ sequences with fixed length from a DBA. We first sample the position of $ uniformly between 1 and − 1 and then sample with length − 1 and with length − . We sample and by uniformly sampling valid paths in the DBA. Many DBAs exhibit strong acceptance biases that skew random sampling toward rejection or acceptance. For example, the presence of accepting or rejecting sink states (states that can only transition back to themselves) can dominate uniform sampling. We address this through targeted sampling strategies to improve the balance of our sampled sequences: (1) we oversample sequences, determine their labels, and selectively filter them, targeting a balanced class distribution; (2) when sampling accepted sequences, we exclude transitions to rejecting sink states during sampling (and vice versa for rejected sequences); (3) when sampling rejected sequences, we prevent transitions to accepting states within the sufix because if contains a state transition to an accepting state, the resulting -word is likely to be accepted. It is still possible to sample equivalent rejected sequences as this constraint is not applied to . By controlling the sampling of accepted and rejected sequences separately, the resulting dataset is more balanced compared to uniform sampling of sequences.

3. Experiments

Our experiments aim to answer the following research questions:

RQ1: Can RNNs generalize to the recognition of sequences from -regular languages when trained only on short UP -words? RQ2: Do structural properties of the underlying DBAs influence (a) generalization performance and (b) learned model complexity?

To answer these questions, we use the -regular languages associated with two well-known LTL benchmarks. The first benchmark ( alaska_lift) [22] consists of two encodings of safe lift behaviors (we use the variants with bug-fixes [ 23]) parameterized by the number of floors: encoding (a) uses a linear number of propositions per floor and encoding (b) uses a logarithmic number of propositions per lfoor. The second benchmark ( acacia_example) [24] consists of 25 formulas specifying the behavior of arbiters and trafic light controllers [ 25]. For this benchmark, we use the negated versions of the formulas, which in our setting only flip the labels of the generated sequences.

For each LTL formula, we generate its corresponding DBA using Spot [ 14 ]. During training, we generate sequences on-the-fly with uniformly sampled lengths between 2 and 64. Once sampled, we transform sequences into one-hot encoded symbol embeddings for RNN processing. Our test data consists of 512 sampled sequences for each length between 2 and 512. We apply a 10-minute timeout for the construction of each DBA. We excluded automata with more than 200 states due to computationally expensive sequence generation, not RNN training. This constraint caused the exclusion of lift formulas for 4 or more floors and 2 formulas from acacia_example. The test data was balanced for the lift formulas and for 20 of the acacia_example formulas. However, the training data was not balanced for the lift formulas and the 3 acacia_example formulas with unbalanced test data.

Each experiment trains a single-layer vanilla RNN [26] with a hidden dimension of 256 (consistent with [ 6 ]), batch size 256 for 1 × 105 steps. We use linear warmup for the learning rate [27] from 1 × 10− 8 to 1 × 10− 3 at 20% of the training steps, the AMSGrad optimizer [28], and 2 regularization with weight 5 × 10− 4. We minimize cross-entropy loss. Results are from a single seed due to computational budget. All experiments were conducted using an NVIDIA RTX 6000 Ada GPU. We measure in-distribution (ID) accuracy, the mean accuracy of the test data in the training length range, and out-of-distribution (OOD) accuracy, the mean accuracy of the test data outside of the training length range.

To address RQ1, we assess whether RNNs can learn to classify -words by measuring performance on the test data, reporting ID accuracy (lengths 2-64) and OOD accuracy (lengths 65-512) in a summary table. To address RQ2a, we plot the relationship between the number of states in the DBAs and generalization performance. To address RQ2b, we plot the number of states against the parameter norm (2 norm) of the trained models (post-training). We report the Pearson correlation coeficients [ 29] and their statistical significance between the number of states and both OOD accuracy and the trained models’ parameter norms to assess their linear correlations. 3.1. Results In Table 1, we present the results answering RQ1. The majority of tasks (92.6%) had perfect or nearperfect generalization. Two tasks had OOD accuracies of 81.5% and 76.8%.

In Figure 2, we present the plots comparing the number of states with OOD accuracy (Figure 2a) and the models’ parameter norms (Figure 2b). Figure 2a reveals that generalization performance remains consistently high across automata of varying complexity, with 85.2% of tasks achieving perfect accuracy regardless of state count, indicating that RNN capacity is robust to the structural complexity of the underlying BA. OOD accuracy showed no significant correlation with the number of states in the DBA ( = 0.115, = 0.567), suggesting that generalization performance does not necessarily degrade with increasing DBA complexity. Figure 2b shows a strong positive correlation between the number of states in the DBA and the parameter norm of the trained models ( = 0.880, < 0.001), indicating that model complexity aligns with the complexity of the target -regular languages.

)100 (%95 . cc 90 AD 85 O 80 O 75 0 25 50 75

Number of States

100 25 50 75

Number of States

(a) OOD accuracy (acc.) remains consistently high (b) Parameter (param.) norm of the trained RNNs across DBAs with large numbers of states. shows a strong positive correlation with the number of states in the DBAs.

We now examine the two tasks that exhibited poor generalization to understand their failure modes. During training, both tasks demonstrate unstable validation patterns that predict their generalization

Lift Acacia 100 Lift Acacia

r40 m o N . m30 a r a P 0 failures (see Figure 3a). Acacia 13 achieves 100% validation accuracy initially but then degrades to random chance (50%), while Acacia 22 oscillates between 100% and 50% throughout training, indicating that neither model converges to a stable representation of the ground truth -regular language. This instability corresponds to poor length generalization, as shown in Figure 3b, where both tasks achieve perfect accuracy on shorter OOD sequences before experiencing sharp degradation to 50% at longer lengths (degradation begins at 345 and 300 for Acacia 13 and Acacia 22, respectively). This degradation suggests that these models converged to local minima, achieving perfect in-distribution accuracy without learning the ground truth -regular language. The DBAs of both failure cases contained accepting sink states. If a sequence reaches such a state, the RNN must encode this information when processing all subsequent symbols. Further investigation is needed to confirm this hypothesis. 1.0 y c a rcu0.8 c A l.a0.6 V

Acacia 13 Acacia 22

0.0 0.5 Train Step 1.0 ×105

4. Conclusion and Future Work

Our experiments demonstrate that RNNs generalize to the recognition of -regular languages from short UP -words. Across 27 tasks with a diverse range of system behaviors, we achieved perfect or near-perfect generalization in 92.6% of cases when testing on sequences up to 8 times longer than training examples. This finding remained consistent across automata ranging from 3 to 105 states, demonstrating that the approach scales to realistic verification problems. These results provide a foundation for developing diferentiable Büchi automata , components within neurosymbolic systems. These components might behave as monitors in reinforcement learning or enable gradient-based search in model checking.

This work has several limitations that future research should address. Sampling sequences from complex automata (>200 states) proved impractical despite our attempts to speed up the process. Our experiments were also restricted to DBAs as a first step; notably, DBAs cannot represent all -regular languages [ 3 ]. Further investigation of the failure cases is necessary to develop improved training methods. Despite these limitations, this work provides strong evidence that neural approaches could further enhance neurosymbolic methods by ofering more scalable, diferentiable alternatives to traditional automata-theoretic methods. Exploring interpretability by adapting existing automata extraction techniques (e.g. [30, 31]) may enable verification of the learned representations required for safety-critical applications.

Acknowledgments

This work was supported by the UK EPSRC grant 2760033. The authors would like to thank Frederik Kelbel for reading the paper and the reviewers for their constructive feedback.

Declaration on Generative AI

During the preparation of this work, the authors used Claude Sonnet 4 in order to: Paraphrase and reword. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Pnueli , The Temporal Logic of Programs , in: 18th Annual Symposium on Foundations of Computer Science (sfcs 1977 ), 1977 , pp. 46 - 57 . doi: 10 .1109/SFCS. 1977 . 32 .

[2]

J. R.

Büchi , On a Decision Method in Restricted Second Order Arithmetic , 1990 , pp. 425 - 435 . doi: 10 .1007/978-1- 4613 -8928-6\_ 23 .

[3] Baier , Christel and Katoen, Joost-Pieter , Principles of Model Checking , 2008 . URL: https://mitpress. mit.edu/9780262026499/principles-of-model-checking/.

[4]

Mirco

Giacobbe and

Daniel

Kroening and

Abhinandan

Pal and

Michael

Tautschnig , Neural Model Checking, in: The Thirty-eighth Annual Conference on Neural Information Processing Systems , 2024 . URL: https://openreview.net/forum?id=dJ9KzkQ0oH.

[5]

Alexandra

Butoi and Ghazal Khalighinejad and Anej Svete and Josef Valvoda and Ryan Cotterell and Brian DuSell, Training Neural Networks as Recognizers of Formal Languages , in: The Thirteenth International Conference on Learning Representations , 2025 . URL: https://openreview. net/forum?id=aWLQTbfFgV.

[6]

Delétang ,

Ruoss ,

Grau-Moya ,

Genewein ,

L. K.

Wenliang ,

Catt ,

Cundy ,

Hutter ,

Legg ,

Veness ,

P. A.

Ortega , Neural Networks and the Chomsky Hierarchy , in: 11th International Conference on Learning Representations , 2023 .

[7] Svete , Anej and Chan, Robin and Cotterell, Ryan, On Eficiently Representing Regular Languages as RNNs, in: Findings of the Association for Computational Linguistics: ACL 2024, Association for Computational Linguistics , 2024 , pp. 4118 - 4135 . URL: https://aclanthology.org/ 2024 .findings-acl. 244 /. doi: 10 .18653/v1/ 2024 .findings-acl. 244 .

[8] Merrill , William and Weiss, Gail and Goldberg, Yoav and Schwartz, Roy and Smith, Noah A. and Yahav , Eran , A Formal Hierarchy of RNN Architectures, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , Association for Computational Linguistics, 2020 , pp. 443 - 459 . URL: https://aclanthology.org/ 2020 .acl-main. 43 /. doi: 10 .18653/v1/ 2020 . acl-main. 43 .

[9] Stammet , Christophe and Dotti, Prisca and Ultes-Nitsche, Ulrich and Fischer, Andreas, Analyzing Büchi Automata with Graph Neural Networks , arXiv preprint arXiv:2206.09619 ( 2022 ).

[10] Stammet , Christophe and Ultes-Nitsche, Ulrich and Fischer, Andreas, Universality of Büchi Automata: Analysis with Graph Neural Networks , IEEE Access 11 ( 2023 ).

[11]

Hahn ,

Schmitt ,

J. U.

Kreber ,

M. N.

Rabe ,

Finkbeiner , Teaching Temporal Logics to Neural Networks , in: International Conference on Learning Representations, 2021 . URL: https://openreview.net/forum?id=dOcQK-f4byz.

[12]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez , L. u. Kaiser, I. Polosukhin , Attention Is All You Need, in: Advances in Neural Information Processing Systems , volume 30 , 2017 . URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.

[13] Calbrix , Hugues and Nivat, Maurice and Podelski, Andreas, Ultimately Periodic Words of Rational -Languages, in : Mathematical Foundations of Programming Semantics , 1994 , pp. 554 - 566 .

[14]

Duret-Lutz ,

Renault ,

Colange ,

Renkin ,

A. G.

Aisse ,

Schlehuber-Caissier ,

Medioni ,

Martin ,

Dubois ,

Gillard ,

Lauko , From Spot 2 .0 to Spot 2.10: What's New? , in: Proceedings of the 34th International Conference on Computer Aided Verification (CAV'22) , volume 13372 of Lecture Notes in Computer Science, 2022 , pp. 174 - 187 . doi: 10 .1007/978-3- 031 -13188-2\_9.

[15]

Manna ,

Pnueli , A Hierarchy of Temporal Properties (invited paper, 1989 ), in: Proceedings of