Predicting pseudo-random number generator output
                                with sequential analysis⋆
                                Dmytro Proskurin1,†, Maksim Iavich2,†, Tetiana Okhrimenko1,*,†, Okoro Chukwukaelonma1,†
                                and Tetiana Hryniuk1,†
                                1
                                    National Aviation University, 1 Liubomyra Huzara ave., 03058 Kyiv, Ukraine
                                2
                                    Caucasus University, 1 Paata Saakadze str., 0102 Tbilisi, Georgia


                                                   Abstract
                                                   This study delves into the predictive capabilities of neural network models, specifically focusing on
                                                   Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks as well as the
                                                   combination of both in a hybrid architecture, for forecasting the outputs of various pseudo-random number
                                                   generators (PRNGs). The investigation extends across a diverse set of PRNG algorithms, including Linear
                                                   Congruential Generator (LCG), Mersenne Twister (MT), Xorshift, and Middle Square. Through meticulous
                                                   analysis, the study evaluates the accuracy of these models in predicting single and continuous outputs
                                                   generated from the mentioned PRNGs. The research findings illuminate the superior predictive
                                                   performance of hybrid models, attributed to their adeptness at capturing long-term dependencies, a crucial
                                                   factor in decoding the complexities of PRNG sequences. Additionally, the impact of model optimization
                                                   techniques, including dropout and L2 regularization, on enhancing predictive accuracy is thoroughly
                                                   explored. This comprehensive examination not only underscores the potential of neural networks in
                                                   identifying deterministic patterns within PRNG outputs but also offers valuable insights into optimal model
                                                   selection and configuration. The implications of this work are significant, paving new avenues in
                                                   cryptography and securing random number generation by highlighting the predictability of PRNGs under
                                                   advanced neural network models.

                                                   Keywords
                                                   random numbers, RNN, CNN, LSTM, GRU, hybrid model, PRNG 1


                         1. Introduction                                                                   This paper embarks on a comprehensive exploration of
                                                                                                           RNNs and LSTMs in the context of sequence prediction. We
                         In the ever-evolving landscape of machine learning, the                           delve into the architectural intricacies of these models, their
                         ability to accurately predict future events based on                              strengths and weaknesses, and their performance across
                         sequential data stands as a cornerstone of numerous                               various sequence prediction scenarios.
                         technological advancements and applications. From                                     Our study is particularly focused on datasets generated
                         forecasting stock market trends to decoding human                                 by different Pseudo-Random Number Generators (PRNGs),
                         language, the significance of effective sequence prediction                       offering a unique lens through which the capabilities of
                         cannot be overstated. Central to this domain are Recurrent                        these models can be examined and understood.
                         Neural Networks (RNNs) and Long Short-Term Memory                                     Through rigorous experimentation and analysis, we aim
                         (LSTM) networks, which have emerged as powerful tools in                          to shed light on the nuances of sequence prediction and
                         the machine learning arsenal for handling sequential data.                        provide insights that could guide future applications and
                              RNNs, known for their unique architecture that allows                        research in this fascinating area of machine learning.
                         information to persist, have been instrumental in modeling
                         time-dependent data [1]. However, their application is often                      2. Background and related work
                         marred by challenges such as the vanishing gradient
                         problem, which hinders the learning of long-range                                 Recent advancements in sequence prediction have been
                         dependencies [2]. Enter LSTMs, a special kind of RNN                              significantly influenced by the development and refinement
                         designed specifically to overcome these limitations. With                         of Recurrent Neural Networks (RNNs) and Long Short-Term
                         their sophisticated internal mechanisms, LSTMs have set                           Memory (LSTM) networks. These models have shown
                         new benchmarks in sequence prediction tasks,                                      remarkable proficiency in handling sequential data,
                         demonstrating remarkable success where traditional RNNs                           particularly in domains where understanding temporal
                         falter [2, 3].                                                                    dynamics is crucial.


                                CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv,              0000-0002-2835-4279 (D. Proskurin); 0000-0002-3109-7971
                                Ukraine                                                                       (M. Iavich); 0000-0001-9036-6556 (T. Okhrimenko); 0000-0002-1247-9854
                                ∗ Corresponding author.                                                       (O. Chukwukaelonma); 0000-0003-0123-5241 (T. Hryniuk)
                                †
                                  These authors contributed equally.
                                                                                                                            © 2024 Copyright for this paper by its authors. Use permitted under
                                   proskurin.d@stud.nau.edu.ua (D. Proskurin); miavich@cu.edu.ge                            Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                (M. Iavich); t.okhrimenko@npp.nau.edu.ua (T. Okhrimenko);
                                kaelo@gmail.com (O. Chukwukaelonma); t.hryniuk@ukr.net
                                (T. Hryniuk)
CEUR
Workshop
                  ceur-ws.org
              ISSN 1613-0073
                                                                                                      42
Proceedings
1. LSTM for Time Series Prediction: Studies have                       neurons to improve themselves through learning [8]. CNNs
demonstrated the effectiveness of LSTM models in time                  have made remarkable achievements. This neural network
series forecasting, a domain traditionally dominated by                is now widely used in deep learning. Convolutional neural
statistical methods like ARIMA. Unlike these methods,                  networks have revolutionized computer vision, enabling
LSTMs can capture complex nonlinear relationships in time              previously unthinkable feats like facial recognition,
series data [2, 4]. Researchers have successfully applied              driverless automobiles, self-service supermarkets, and
LSTM models to forecast stock prices, energy demand, and               intelligent medical treatments, CNNs also differ from
weather patterns, achieving higher accuracy than                       typical ANNs by focusing on picture pattern recognition.
traditional models, especially in scenarios with long-term             This allows us to encode image-specific properties into the
dependencies and high volatility.                                      architecture, making the network better suited for image-
     2. RNNs in Natural Language Processing (NLP): RNNs                focused tasks while also lowering the number of parameters
have been pivotal in advancing NLP. Their ability to process           needed to set up the model [8, 9].
sequential text data has led to breakthroughs in machine                    Hybrid Neural Networks (HNNs), which integrate the
translation, text generation, and sentiment analysis [1, 2, 5].        strengths of many neural networks, are becoming
The sequential processing capability of RNNs allows them               increasingly popular in computer vision applications
to maintain context in text, a critical factor in understanding        including picture captioning and action identification.
human language [5]. However, vanilla RNNs often struggle               However, there has been limited research on the effective
with long-term dependencies [6], leading to the adoption of            use of hybrid architectures for time series data, particularly
LSTMs and GRUs (Gated Recurrent Units) in more complex                 for trend forecasting purposes [10]. HNNs use their internal
NLP tasks.                                                             structure to limit the interactions between process variables
     3. Sequence-to-Sequence Learning: The sequence-to-                to align with physical models. Compared to regular neural
sequence learning framework, often implemented using                   networks, coupled models are more accurate, dependable,
LSTMs, has revolutionized tasks like machine translation.              and generalizable [11].
This approach involves training models on pairs of output                   Recurrent Neural Networks (RNNs) represent a
and output sequences, enabling the model to learn                      paradigm shift in neural networks, specifically designed to
mappings from one sequence to another. This framework                  recognize patterns in sequences of data [12]. Unlike
has been crucial in developing models that can translate               traditional feedforward neural networks, RNNs possess a
entire sentences with context, rather than translating on a            unique feature: the output from the previous step is fed back
word-by-word basis [2].                                                into the output of the current step. This looping mechanism
     4. Challenges and Limitations: Despite their successes,           allows RNNs to maintain an internal state that captures
RNNs and LSTMs are not without challenges. The vanishing               information about the sequence they have processed so far,
gradient problem in RNNs, where the model loses its ability            making them ideal for tasks like speech recognition,
to learn long-range dependencies, has been partially                   language modeling, and time series forecasting [2, 12].
addressed by LSTMs but still poses limitations [4].                         The core architecture of an RNN involves a hidden layer
Additionally, the training of these models can be                      where the activation at a given time step is a function of the
computationally intensive, requiring significant resources             output at the same step and the activation of the hidden
for large datasets.                                                    layer at the previous step [6]. This recurrent nature allows
     5. Future Directions: Ongoing research is exploring               the network to maintain a form of memory [12]. However,
more efficient and effective variants of RNNs and LSTMs,               RNNs are often challenged by long-term dependencies due
such as attention mechanisms and Transformer models [4].               to issues like vanishing and exploding gradients during
These developments aim to address existing limitations                 backpropagation [2, 3], where the network becomes unable
while enhancing the models’ ability to process longer                  to learn and retain information from earlier time steps in the
sequences and maintain context over extended periods.                  sequence [4].
                                                                            Long Short-Term Memory Networks, a special kind of
3. Model architecture overview                                         RNN, were developed to overcome the limitations of
                                                                       traditional RNNs. LSTMs are adept at learning long-term
Neural networks are artificial intelligence models that                dependencies, thanks to their unique internal structure [3].
mimic human brain function [2]. A neural network connects              Unlike standard RNNs, LSTMs have a complex architecture
processing units, similar to neurons, rather than                      with a series of gates: the forget gate, output gate, and
manipulating zeros and ones like a digital model does [2].             output gate [3, 4]. These gates regulate the flow of
The result depends on how the connections are organized                information into and out of the cell, deciding what to keep
and weighted. Neural networks are algorithms modeled                   in memory and what to discard, thereby addressing the
after the human brain that recognize patterns. Sensory data            vanishing gradient problem [4].
is interpreted using machine perception, which labels or                    Forget Gate: Determines what information is discarded
clusters raw information. They recognize numerical                     from the cell state [4, 13].
patterns in vectors, which must be converted into real-                     Output Gate: Updates the cell state with new
world data like as images, sounds, text, and time series [7].          information from the current output [13].
Artificial Neural Networks (ANNs) are computing systems                     Output Gate: Determines the next hidden state and
modeled after biological neural systems, including the                 output based on the current output and the updated cell
human brain [8].                                                       state [13].
    Convolutional Neural Networks (CNNs) are similar to
standard artificial neural networks (ANNs) in that they use


                                                                  43
This architecture allows LSTMs to make more precise                   intricate and seemingly random sequences. In addition to its
decisions about what information to store, modify, and                inability to produce the all-zero state, the Mersenne Twister
output. As a result, LSTMs have been successfully applied             also finds it difficult to act randomly in its nearly all-zero
in various complex sequence modeling tasks, including                 state [20].
machine translation, speech synthesis, and even generative
models for music composition [3, 4, 13].                              4.3. Xorshift dataset
    While both RNNs and LSTMs are designed for sequence
                                                                      Description: Xorshift is a class of PRNGs that operates using
processing, the key difference lies in their ability to handle
                                                                      XOR (exclusive or) and bit-shifting operations [20]. It’s
long-term dependencies [4, 14]. Standard RNNs, while
                                                                      known for its simplicity and speed, often used in scenarios
simpler and computationally less intensive, struggle with
                                                                      where the speed of random number generation is critical [20].
retaining information over longer sequences. LSTMs, with
                                                                          Characteristics: Despite its simplicity, Xorshift can
their intricate gating mechanism, excel in scenarios where
                                                                      produce high-quality random sequences [20]. The non-
understanding long-range contextual information is
                                                                      linear nature of its operations makes it an interesting case
crucial [4].
                                                                      for studying how well neural network models can adapt to
    The choice between RNNs and LSTMs often boils down
                                                                      and predict outputs from non-linear algorithms [22]. A
to the specific requirements of the task at hand, the
                                                                      bitwise xor operation is a type of permutation that involves
complexity of the sequences involved, and the
                                                                      flipping certain bits in the target. It can be performed again
computational resources available. LSTMs are generally
                                                                      to reverse the effects [20]. The conventional understanding
preferred for more complex tasks with longer sequences [3],
                                                                      of Xorshift would advise us to concentrate on lengthening
while RNNs might suffice for simpler tasks with shorter
                                                                      the bits’ period [20].
temporal dependencies [1].
                                                                      4.4. Middle square method dataset
4. Methodology
                                                                      Description: The Middle Square method is an older PRNG
There are a large number of pseudorandom generators that              technique that generates random numbers by squaring the
differ in their characteristics, construction methods, and            number and extracting the middle digits of the result [22].
areas of possible application [15–19]. In our study, we               It’s less commonly used today due to certain limitations.
employed datasets generated by four distinct PRNG                          Characteristics: This method is prone to quickly
algorithms, each offering unique challenges and                       converging to repetitive cycles or zeros, especially with
characteristics for sequence prediction using RNN and                 certain seed values [23]. The predictability and potential
LSTM models. These datasets serve as a testing ground to              repetition in the sequences makes it a unique dataset to test
evaluate and compare the performance of different neural              the models’ ability to detect and adapt to less complex and
network architectures in sequence prediction tasks.                   potentially degenerative patterns [22]. The field of
                                                                      computer science began with the invention of the middle
4.1. Linear congruential generator dataset                            square [22]. It is possible to develop a viable version with a
Description: The LCG is one of the oldest and simplest                sufficiently long period (264 for each stream) thanks to
PRNG algorithms [20]. It generates random numbers using               modern 64-bit computing architecture. The fastest RNGs are
a linear equation [20]. The simplicity of its algorithm makes         comparable in processing speed [23]. This generator works
it a good baseline for evaluating the predictive capabilities         well for parallel processing because of its simple stream
of RNN and LSTM models.                                               capability. Because a square is nonlinear, it provides this
     Characteristics: The sequence generated by an LCG can            generator with an edge over linearly-based generators in
exhibit patterns due to its linear nature. These patterns,            terms of data quality [22].
while not immediately apparent, can be learned over time,
making it an interesting case for sequence prediction                 5. Dataset preparation
models [20]. Despite their potential statistical issues, LCGs
                                                                      In our study on “Predicting PRNG Output with Sequential
have the advantage of offering all the auxiliary qualities,
                                                                      Analysis”, we meticulously prepared a dataset to analyze
such as seekability, numerous streams, and k-dimensional
                                                                      the predictability of various Pseudorandom Number
equidistribution [20].
                                                                      Generators (PRNGs), focusing on four widely recognized
                                                                      algorithms: Linear Congruential Generator (LCG) (Fig. 2),
4.2. Mersenne twister dataset                                         MiddleSquare (Fig. 4), Xorshift (Fig. 3), and Mersenne
Description: The Mersenne Twister, specifically the                   Twister (MT) (Fig. 1). Each of these PRNGs was chosen for
MT19937 variant, is known for its long period and high-               its unique approach to generating sequences of
quality outputs. It’s widely used in various applications due         pseudorandom numbers, providing a diverse test bed for our
to its reliability and speed [21].                                    predictive models.
     Characteristics: MT generates sequences that are far
more complex and less predictable than LCG [20]. This                 5.1. Data generation parameters
complexity provides a challenging scenario for RNNs and
                                                                      The dataset was generated using the following
LSTMs, testing their ability to model and predict more
                                                                      parameters to ensure consistency across all PRNGs:


                                                                 44
Figure 1: MT dataset distribution


Figure 2: LCG dataset distribution


Figure 3: Xorshift dataset distribution


Figure 4: MiddleSquare dataset distribution


                                              45
Sample Size: Each PRNG was used to generate a sequence                 6.2. Recurrent neural networks
of 10,000 numbers, with n = 10000, to create a
                                                                       Application: RNNs were employed for both single-value and
substantial dataset for training and evaluation. Seed
                                                                       continuous-value output predictions. Unlike CNNs, RNNs
Value: A common seed value of 8956482 was applied to
                                                                       have a memory mechanism that allows them to process
initialize each PRNG, ensuring that the starting point of
                                                                       entire sequences of data, making them ideal for
the pseudorandom sequence was consistent across
                                                                       understanding the temporal dynamics and dependencies
different generators. Word Size: For PRNGs where
                                                                       within PRNG outputs.
applicable, such as MiddleSquare, a word size of 8 bits
was selected, balancing the need for computational                     6.3. Long short-term memory networks
efficiency with the desire for sequence complexity.
Sequence Length: The output was segmented into                         Application: Like RNNs, LSTMs were utilized for both
sequences of length 10, which were then used as                        single-value and continuous-value output predictions.
individual data points for the subsequent analysis. This               LSTMs are a special kind of RNN capable of learning long-
sequence length was chosen to provide enough data for                  term dependencies. They are particularly effective in
recognizing patterns without overwhelming the                          avoiding the vanishing gradient problem, enabling them to
analytical models.                                                     capture patterns over longer sequences of PRNG outputs.

5.2. Dataset splitting                                                 6.4. Hybrid model
Once generated, the dataset was divided into three distinct            Configuration: The Hybrid model represents an innovative
sets to facilitate the training, testing, and validation of our        approach, integrating the strengths of CNNs and LSTMs
predictive models:                                                     into a singular architecture. It comprises:
    Training Set: Used to train the models, allowing them to
                                                                          •     CNN Layer: For extracting local features within
learn and adapt to the patterns inherent in the
                                                                                the subsequence of the PRNG output.
pseudorandom sequences generated by each PRNG.
                                                                          •     LSTM Layer: To capture long-term dependencies
    Testing Set: Employed to assess the performance of the
                                                                                and temporal patterns in the data, building upon
models on unseen data, providing an unbiased evaluation of
                                                                                the features extracted by the CNN layer.
their predictive capabilities.
                                                                          •     Dense Layer: Serving as the output layer, it
    Validation Set: Utilized during the model tuning phase
                                                                                synthesizes the information processed by the CNN
to fine-tune parameters and prevent overfitting, ensuring
                                                                                and LSTM layers to make predictions.
that the models generalize well to new data.
    This careful preparation and partitioning of the dataset               Application: Designed for versatility, the Hybrid model
were critical in establishing a robust foundation for our              is equipped to handle both single-value and continuous-
investigation into the predictability of PRNG outputs                  value outputs, offering a robust solution for predicting
through sequential analysis. By standardizing the                      PRNG outputs by leveraging the complementary strengths
generation parameters and thoughtfully splitting the data,             of convolutional and recurrent layers.
we aimed to create a fair and consistent testing environment               The strategic selection and configuration of these
for each of the predictive models applied in our study.                models underpin our analytical methodology. By employing
                                                                       a diverse array of architectures, each with its unique
6. Model configuration                                                 advantages, our study aims to comprehensively evaluate the
                                                                       predictability of PRNG outputs. The Hybrid model
In our exploration of “Predicting PRNG Output with
                                                                       underscores our commitment to innovation, integrating
Sequential Analysis”, we employed a comprehensive
                                                                       multiple neural network paradigms to enhance predictive
approach by leveraging various neural network
                                                                       accuracy and insight into the sequential nature of PRNG-
architectures. Each model was selected based on its ability
                                                                       generated data.
to process sequential data, a core characteristic of PRNG
outputs. Our analysis incorporated Convolutional Neural
Networks (CNNs), Recurrent Neural Networks (RNNs),                     7. Evaluation metrics
Long Short-Term Memory networks (LSTMs), and a custom                  To rigorously assess the effectiveness of our models in
Hybrid model, each designed to handle the intricacies of               predicting PRNG outputs, we employed a set of
PRNG-generated data in distinct ways.                                  comprehensive evaluation metrics. These metrics are
                                                                       crucial for quantifying the accuracy of our predictions and
6.1. Convolutional neural networks                                     facilitating a direct comparison between the different neural
Application: Primarily used for single-value output                    network architectures utilized in our study. Our evaluation
prediction, CNNs are adept at identifying patterns within a            framework is centered around the Mean Squared Error
fixed-size window of the sequence. This model excels at                (MSE) and a specially devised Model Performance Score.
capturing local dependencies and spatial hierarchies in data,
making it suitable for analyzing individual segments of the            7.1. Mean squared error
PRNG output.                                                           MSE serves as the cornerstone of our evaluation strategy. It
                                                                       calculates the average squared difference between the
                                                                       actual and predicted values, offering a precise measure of


                                                                  46
the prediction error’s magnitude. By squaring the errors,                     Tangent). These functions were chosen for their
MSE gives more weight to larger errors, making it                             distinct characteristics in handling nonlinearities
particularly sensitive to outliers and significant prediction                 in the data.
inaccuracies.                                                                Number of Neurons: The neuron counts tested
    In the context of predicting PRNG outputs, MSE                            were 8, 16, and 32. This range allowed us to
provides a clear and direct measure of how closely the                        explore the models’ capacity to learn and
model’s predictions align with the actual sequence of                         generalize from the data, balancing complexity
numbers generated by the PRNGs. A lower MSE indicates                         with computational efficiency.
higher prediction accuracy, reflecting a model’s ability to                  Epochs: All models were trained for [1000] epochs,
effectively capture and replicate the underlying patterns of                  providing ample opportunity for learning and
the PRNG sequence.                                                            convergence.
                                                                             Model Layers: We varied the depth of the models
7.2. Model performance score                                                  by testing configurations with [1, 2, 4] layers. This
Recognizing the need for a standardized metric that allows                    variation aimed to understand how model depth
for an intuitive understanding of model performance, we                       influences learning and prediction accuracy.
introduced the Model Performance Score. This metric
normalizes the MSE to a scale ranging from 0 to 1, where 0               Output Lengths: For continuous value prediction,
represents the poorest performance (highest MSE) and 1               output lengths of [1–4] were tested. This range was
denotes perfect prediction accuracy (zero MSE).                      selected to assess the models’ ability to forecast multiple
    The Model Performance Score is calculated by inversely           steps in the PRNG sequence.
scaling the MSE against a predetermined maximum error
threshold. This approach ensures that the performance                8.1.1. Impact of dropout and L2 regularization
score is adjusted for the scale of the data and the expected         One of the most notable findings from our experiments was
variation in prediction accuracy, allowing for a fair                the impact of dropout and L2 regularization techniques on
comparison across different models and datasets.                     model learning capabilities. Contrary to common practice in
    This normalized score simplifies the interpretation of           machine learning, where these techniques are employed to
our results, providing a straightforward metric to gauge             enhance model generalization and prevent overfitting, our
model effectiveness. It allows stakeholders to quickly assess        experiments revealed that:
the relative performance of each model in predicting PRNG                 Models without dropout and L2 regularization
outputs without delving into the complexities of raw MSE             demonstrated superior performance in learning and
values.                                                              predicting PRNG outputs. The introduction of these
    Together, these evaluation metrics form the foundation           regularization techniques led to models that were unable to
of our analytical approach, enabling a nuanced analysis of           adequately learn from the training data and, consequently,
model performance. MSE offers a detailed view of the                 failed to predict accurately.
prediction accuracy, while the Model Performance Score                    This observation suggests a unique aspect of predicting
provides a high-level, comparative perspective. By                   PRNG outputs: the data generated by PRNGs, while
incorporating both metrics, our study ensures a balanced and         seemingly random, follows deterministic algorithms. The
comprehensive evaluation of how well each neural network             addition of regularization techniques, which are designed to
architecture can predict the seemingly unpredictable: output         introduce randomness and constraint to the learning
of pseudorandom number generators.                                   process, may interfere with the model’s ability to capture
                                                                     the underlying deterministic patterns of PRNG sequences.
8. Experiment variables and                                               The results of these experiments provide valuable
    observations                                                     insights into the design and optimization of neural network
                                                                     models for predicting PRNG outputs. Specifically, they
We conducted an extensive series of experiments to                   underscore the importance of tailoring model
evaluate the predictive capabilities of various neural               configurations to the specific characteristics of the data and
network configurations. These experiments were                       the task at hand. In the context of PRNG prediction,
meticulously designed to explore the impact of different             minimizing external sources of randomness and constraint
model parameters on the accuracy of PRNG output                      (e.g., through dropout and L2 regularization) appears to be
predictions. Below, we detail the variables involved in these        crucial for enabling models to learn and replicate the
experiments and highlight some critical observations                 deterministic patterns that govern PRNG behavior.
related to model performance.
                                                                     9. Experiment results analysis
8.1. Experiment variables
                                                                     Our exhaustive investigation into predicting PRNG output
To systematically assess the effects of various                      through sequential analysis yielded compelling findings,
hyperparameters on model performance, we tested a                    elucidated through the analysis of the top-performing
wide array of combinations, encompassing:                            models for each PRNG. Here, we detail the significant
                                                                     outcomes for both single-output and continuous-output
        Activation Functions: We exper.
                                                                     scenarios across different PRNGs: Xorshift, MT (Mersenne
        imented with two popular activation functions,
         ReLU (Rectified Linear Unit) and tanh (Hyperbolic


                                                                47
Twister), LCG (Linear Congruential Generator), and                     Xorshift: The RNN model with 32 neurons, 5 layers,
MiddleSquare.                                                       and the ReLU activation function emerged as the top
                                                                    performer, achieving a mean score of 0.9898 (Table 1).
9.1. Single-output scenario analysis                                However, both Hybrid and CNN models came close to the
For single-output predictions, our experiments have the             same success rate suggesting that the Xorshift sequence
following results.                                                  characteristics are not particularly difficult to capture.

Table 1
Xorshift, single output results
          Scenario       Model type       Neuron        Activation function       Epochs       Layers     Mean score

          Xorshift          RNN              32                 relu               1000          5         0.989848
          Xorshift          CNN              32                 relu               1000          3         0.983555
          Xorshift         Hybrid            32                 relu               1000          2         0.982930
          Xorshift          CNN              16                 relu               1000          2         0.981882
          Xorshift         Hybrid            8                  relu               1000          2         0.980837
          Xorshift          CNN              32                 relu               1000          5         0.979213
          Xorshift          CNN              8                  relu               1000          2         0.978671
          Xorshift          CNN              32                 relu               1000          2         0.977584
          Xorshift          CNN              16                 relu               1000          5         0.977577


50% of all models were able to reach 90% success thresholds            Nevertheless, more improvement can get this number
(Error! Reference source not found.).                               higher.


Figure 5: Xorshift, single output, all models results

MT: The CNN model with 8 neurons, 3 layers, and ReLU                capabilities are effective at decoding the MT’s output
activation function led the pack with a mean score of 0.9832        patterns.
(Table 2), indicating that CNN’s feature extraction                     28% of all models were able to reach 90% success
                                                                    thresholds (Fig. 6).


                                                               48
Table 2
MT, single output results

         Scenario           Model type    Neuron        Activation function       Epochs      Layers     Mean score

           MT                 CNN            8                 relu                1000           3        0.983227
           MT                 CNN            32                relu                1000           3        0.980932
           MT                 RNN            32                tanh                1000           5        0.978619
           MT                 CNN            32                relu                1000           2        0.978589
           MT                 CNN            16                relu                1000           2        0.977052
           MT                 LSTM           32                relu                1000           5        0.976916
           MT                 RNN            32                relu                1000           5        0.973991
           MT                 CNN            32                tanh                1000           2        0.973030
           MT                 CNN            32                relu                1000           5        0.972651
           MT                 CNN            16                relu                1000           5        0.972521


Figure 6: MT, single output, all models results

LCG: The Hybrid model, combining CNN and LSTM                      reaching a mean score of 0.9831 (Table 3). This
architectures with tanh activation, showed superior                underscores the Hybrid model’s robustness in capturing
performance, especially with 32 neurons and 5 layers,              both local and long-range dependencies in LCG sequences.
Table 2
LCG, single output results
            Scenario         Model type   Neuron     Activation function        Epochs     Layers      Mean score
              LCG             Hybrid        32                tanh               1000        5          0.983155
              LCG             Hybrid        8                 tanh               1000         2         0.982143
              LCG             Hybrid        8                 tanh               1000         3         0.980809
              LCG             Hybrid        32                tanh               1000        2          0.980579
              LCG             Hybrid        16                tanh               1000        2          0.979036
              LCG              CNN          8                 relu               1000        2          0.978362
              LCG             Hybrid        16                tanh               1000        3          0.978311
              LCG             Hybrid        32                tanh               1000        3          0.977490
              LCG              RNN          32                tanh               1000         3         0.976433
              LCG              CNN          32                relu               1000         5         0.975615


60% of all models were able to reach 90% success thresholds
(Fig. 7).


                                                              49
Figure 7: LCG, single output, all models results

MiddleSquare: The Hybrid model with tanh activation, 16            navigating the complex, squared calculations intrinsic to
neurons, and 3 layers stood out with a mean score of 0.9883        MiddleSquare algorithm.
(Table 4), highlighting the model’s effectiveness in
Table 4
MiddleSquare, single output results
           Scenario        Model type     Neuron      Activation function         Epochs      Layers     Mean score
         MiddleSquare       Hybrid           16               tanh               1000          3        0.988377
         MiddleSquare        CNN             32               relu               1000          3        0.987493
         MiddleSquare        CNN             32               relu               1000          2        0.986276
         MiddleSquare       Hybrid           32               tanh               1000          3        0.983554
         MiddleSquare       Hybrid           16               tanh               1000          5        0.983326
         MiddleSquare       Hybrid           8                tanh               1000          2        0.982954
         MiddleSquare        CNN             32               relu               1000          5        0.980597
         MiddleSquare       Hybrid           32               tanh               1000          5        0.980450
         MiddleSquare        CNN             8                relu               1000          5        0.980440
         MiddleSquare       Hybrid           16               tanh               1000          2        0.980428


64% of all models were able to reach 90% success thresholds
(Fig. 8).


Figure 8: MiddleSquare, single output, all models results


                                                              50
These results underscore the nuanced relationship between               predicted. While the models have been fine-tuned to achieve
PRNG algorithms and neural network architectures,                       high predictive accuracy, the graphical analysis indicates
suggesting that no single model architecture is universally             that there is an inherent limitation to the exactness of these
superior. Instead, the optimal choice depends on the specific           predictions.
characteristics and mechanisms of the PRNG being


Figure 9: Prediction vs actual values, MiddleSquare with the best result (0.988377)

A further performance plot (Error! Reference source not                     The scatter plot showing predictions versus actual
found.) illustrates the correlation between the predicted and           values (Fig. 10) for the Hybrid model using tanh activation,
actual values of the PRNG sequence. The near-perfect linear             16 neurons, and 3 layers reveals a close correspondence
alignment along the 45-degree line suggests that the model’s            between predicted and actual values. However, the
predictions are highly correlated with the actual PRNG outputs.         dispersion of points away from the line of perfect agreement
The tight clustering of the points around this line demonstrates        (where predicted values equal actual values) suggests that
the model’s effectiveness in capturing the underlying pattern           while the model can approximate the PRNG’s output with
of the PRNG sequence. However, the slight deviation of points           high fidelity, it cannot achieve complete accuracy. The
from the line implies that while the model can predict the              variance from the line of perfect prediction could be
general trend and distribution of the PRNG outputs, it cannot           attributed to the deterministic yet complex nature of
replicate the sequence with absolute precision.                         PRNGs, which inherently limits the predictability even with
                                                                        sophisticated models.


Figure 10: Prediction vs actual values, MiddleSquare with the best result (0.988377)

9.2. Continuous-Output Scenario Analysis                                For the MiddleSquare PRNG, the Hybrid-C model with tanh
                                                                        activation, 16 neurons, 3 layers, and an output length of 3
The continuous-output models demonstrated even higher                   achieved a near-perfect mean score of 0.9955. Only 29% of
predictive accuracy, with the Hybrid model configured for               all models were able to break the 90% success milestone (Fig.
continuous predictions (Hybrid-C) achieving remarkable                  11).
success.


                                                                   51
Table 5
MiddleSquare, continuous output results
     Scenario      Model type      Neuron      Activation function       Epochs     Layers     Output length     Mean score
  MiddleSquare      Hybrid-C            16             tanh               1000          3             3          0.995479
  MiddleSquare      Hybrid-C            16             tanh               1000          2             2          0.992588
  MiddleSquare      Hybrid-C            8              tanh               1000          5             2          0.990003
  MiddleSquare      Hybrid-C            8              relu               1000          5             1          0.988989
  MiddleSquare      Hybrid-C            32             relu               1000          5             3          0.988566
  MiddleSquare      Hybrid-C            8              tanh               1000          3             1          0.988066
  MiddleSquare      Hybrid-C            16             relu               1000          2             2          0.986359
  MiddleSquare      Hybrid-C            16             tanh               1000          5             3          0.985923
  MiddleSquare      Hybrid-C            16             tanh               1000          5             2          0.985797
  MiddleSquare      Hybrid-C            8              tanh               1000          5             1          0.984813


Figure 11: MiddleSquare, continuous output, all models results

For the LCG PRNG, the Hybrid-C model with tanh                     models were able to break the 90% success threshold
activation, 8 neurons, 5 layers, and an output length of 2         (Fig. 12).
achieved a near-perfect mean score of 0.992055. 20% of all
Table 3
LCG, continuous output results
    Scenario     Model type      Neuron      Activation function      Epochs      Layers     Output length     Mean score

      LCG         Hybrid-C         8                tanh               1000         5             2             0.992055
      LCG         Hybrid-C         8                tanh               1000         5             3             0.989730
      LCG         Hybrid-C         16               tanh               1000         5             5             0.987614
      LCG         Hybrid-C         8                tanh               1000         3             5             0.986818
      LCG         Hybrid-C         8                tanh               1000         2             5             0.985174
      LCG         Hybrid-C         16               tanh               1000         3             2             0.984808
      LCG         Hybrid-C         32               tanh               1000         3             2             0.984462
      LCG         Hybrid-C         16               tanh               1000         2             1             0.984411
      LCG         Hybrid-C         32               tanh               1000         3             1             0.983866
      LCG         Hybrid-C         32               tanh               1000         2             1             0.983706


                                                              52
Figure 12: LCG, continuous output, all models results

For the Xorshift PRNG, the Hybrid-C model with relu                 15% of all models were able to break the 90% success
activation, 16 neurons, 2 layers, and an output length of 2         threshold (Fig. 13).
achieved a near-perfect mean score of 0.987906 (Table 7).
Table 4
Xorshift, continuous output results
   Scenario       Model type      Neuron      Activation function       Epochs     Layers      Output length      Mean score

    Xorshift       Hybrid-C           16             relu                1000        2               2              0.987906
    Xorshift       Hybrid-C           16             relu                1000        2               5              0.985753
    Xorshift       Hybrid-C           8              relu                1000        5               5              0.985715
    Xorshift       Hybrid-C           8              relu                1000        2               2              0.984238
    Xorshift       Hybrid-C           32             relu                1000        2               2              0.983437
    Xorshift       Hybrid-C           16             relu                1000        2               3              0.981247
    Xorshift       Hybrid-C           8              relu                1000        2               1              0.980434
    Xorshift       Hybrid-C           32             relu                1000        2               1              0.978930
    Xorshift        RNN-C             32             tanh                1000        3               1              0.977685
    Xorshift       Hybrid-C           32             relu                1000        2               3              0.976177


Figure 13: Xorshift, continuous output, all models results

For the MT PRNG, the Hybrid-C model with relu                       12% of all models were able to break the 90% success
activation, 32 neurons, 2 layers, and an output length of 2         threshold (Fig. 14).
achieved a near-perfect mean score of 0.985006 (Table 8).


                                                              53
Table 5
MT, continuous output results
       Scenario    Model type    Neuron      Activation function        Epochs     Layers    Output length   Mean score
         MT         Hybrid-C       32               relu                 1000         2            2          0.985006
         MT          RNN-C         32               relu                 1000         5            1          0.981523
         MT          RNN-C         32               tanh                 1000         5            1          0.980135
         MT         Hybrid-C       16               tanh                 1000         2            3          0.976324
         MT         LSTM-C         32               relu                 1000         5            1          0.976245
         MT         Hybrid-C        8               tanh                 1000         2            1          0.975258
         MT          RNN-C         32               tanh                 1000         3            1          0.974210
         MT          RNN-C         32               relu                 1000         3            1          0.972664
         MT          RNN-C         32               relu                 1000         2            1          0.965829
         MT          RNN-C         32               tanh                 1000         2            1          0.963914


Figure 14: MT, continuous output, all models results

The examination of continuous-output models reveals a               correlation, along with a high success score of 0.9955,
notable enhancement in predictive performance compared              reflects the model’s exceptional predictive accuracy. The
to single-output models. This is particularly evident in the        dense clustering of points along the diagonal suggests that
context of predicting sequences generated by the                    the model can reliably predict the MiddleSquare PRNG’s
MiddleSquare PRNG.                                                  output with high confidence, and such precision is
    The performance plot illustrating the correlation               indicative of the model’s ability to capture both the
between predicted and actual values (Fig. 15) for the               immediate and contextual dependencies within the PRNG’s
continuous-output model shows an even tighter linear                sequence.
alignment than the single-output model. This near-perfect


Figure 15: Prediction vs actual values, MiddleSquare with the best result (0.9955)


                                                               54
The scatter plot for the continuous-output Hybrid model,             values throughout the sequence. This tight clustering
which integrates CNN and LSTM architectures (Hybrid-C),              indicates a substantial reduction in prediction errors and a
showcases a substantial concentration of points closely              strong alignment with the true PRNG sequence, suggesting
aligned with the line of perfect prediction (Fig. 16). The           a deeper understanding of the underlying patterns by the
model, employing tanh activation with 16 neurons across 3            model.
layers, exhibits a remarkable ability to track the actual


Figure 16: Prediction vs actual values, MiddleSquare with the best result (0.9955)

The continuous-output model’s superior performance, as               This success rate is indicative of the models’ ability to
evidenced by the closer proximity of predicted to actual             decipher the underlying deterministic patterns that govern
values and the higher success score, highlights the benefit          PRNG outputs.
of utilizing sequential context in PRNG output prediction.               Even more impressive, the continuous-output model,
The ability to forecast the sequence with a success score            which utilizes sequences of values to predict subsequent
reaching 0.9955 marks a significant milestone, suggesting            outputs, reached a 99% success rate. This improvement
that models incorporating sequence history can more                  suggests that incorporating more context in the form of
effectively decode the deterministic yet complex structure           continuous output sequences enables the models to better
of PRNG outputs.                                                     capture the PRNGs’ inherent algorithms, leading to more
    This analysis implies that continuous-output models              accurate predictions.
hold great promise for applications where forecasting
accuracy over sequences is critical. The insights gleaned            9.4. Implications for PRNG Analysis and
from this research can inform the development of more                        Security
secure PRNGs, capable of withstanding sophisticated
                                                                     The success of our models in predicting PRNG outputs with
sequential analysis. Future work will likely explore the
                                                                     such high accuracy has profound implications for the fields
expansion of this approach to more complex and higher-
                                                                     of cryptography and random number generation. While
dimensional sequences, potentially integrating additional
                                                                     PRNGs are designed to produce sequences that are difficult
layers of complexity and exploring the impact on model
                                                                     to predict, our results suggest that advanced neural network
performance.
                                                                     models can uncover and exploit hidden patterns within
                                                                     these sequences. This finding calls for ongoing efforts to
9.3. Model Performance Across PRNGs
                                                                     enhance the unpredictability and security of PRNGs,
Our study’s findings highlight the nuanced nature of PRNG            ensuring they remain robust against sophisticated
output prediction, with different models excelling for               analytical techniques.
specific generators. This variation underscores the
importance of model selection tailored to the characteristics        10. Conclusions
of the PRNG being analyzed. For instance, the best-
performing model for the Xorshift generator might leverage           This research delves into the predictability of PRNGs using
its unique XOR and shift operations, whereas the optimal             advanced neural network models. Our study demonstrates
model for the Mersenne Twister (MT) would need to                    that tested architectures possess a remarkable ability to
account for its complex bit manipulation and tempering               predict the outputs of various PRNGs, with enhanced
techniques.                                                          accuracy observed in continuous output prediction
    Remarkably, the single-output models consistently                scenarios showcasing a superior performance in capturing
achieved a 98% success rate across various PRNGs,                    long-term dependencies within PRNG sequences, affirming
demonstrating a high level of accuracy in predicting the             their suitability for complex sequence prediction tasks.
next output value based solely on a single preceding value.

                                                                55
Our findings illuminate the nuanced dynamics of PRNG                 imperative to consider the ethical implications and potential
predictability and the potential vulnerabilities inherent            security risks associated with disseminating advanced
within commonly used generators. By leveraging neural                predictive models. Developing guidelines and best practices
networks, we not only uncover the deterministic patterns             for responsible research and application in this area is
masked as randomness but also push the boundaries of                 crucial.
understanding in cryptographic security and random                       Enhancing PRNG security: The ability of neural
number generation.                                                   networks to predict PRNG outputs with such accuracy
    Future research should explore the integration of more           highlights an urgent need for the cryptographic community
complex neural architectures and the application of these            to re-evaluate and enhance the design and implementation
findings in real-world scenarios, such as secure                     of PRNGs. Ensuring that PRNGs can withstand analysis by
communications and cryptographic key generation. The                 advanced predictive models is crucial for maintaining the
implications of our work suggest a pivotal shift towards             security and integrity of cryptographic systems, which rely
more secure and unpredictable PRNG designs, bolstering               heavily on the unpredictability of these generators.
the defenses against adversarial predictions and enhancing
the integrity of cryptographic systems.                              Acknowledgment
                                                                     This work was supported by the Shota Rustaveli National
11. Future research directions                                       Foundation of Georgia (SRNSFG) [NFR-22-14060] as well as
These findings have significantly advanced our                       the Ministry of Education and Science of Ukraine (grant
understanding of the capabilities and limitations of current         №0122U002361 “Intelligent system of secure packet data
PRNG technologies when subjected to advanced neural                  transmission based on reconnaissance UAV”).
network-based predictive models. The high success rates
achieved by these models, particularly the 99% success rate          References
with continuous-output models, not only demonstrate the
feasibility of predicting PRNG outputs but also underscore           [1]    K. Cho, et al., Learning Phrase Representations using
                                                                            RNN Encoder-Decoder for Statistical Machine
the intricate patterns that deterministic algorithms
                                                                            Translation, arXiv:1406.1078 (2014).
generate—patterns that sophisticated models can uncover.             [2]    I. Sutskever, O. Vinyals, Q. Le, Sequence to Sequence
    This study opens several avenues for future research,                   Learning with Neural Networks, Advances in Neural
aimed at both improving PRNG designs and developing                         Information Processing Systems 27 (2014).
more advanced predictive models:                                     [3]    S. Hochreiter, J. Schmidhuber, Long Short-Term
    Advanced PRNG Algorithms: There is a clear need for                     Memory, Neural Computation 9(8) (1997) 1735–1780.
the development of new PRNG algorithms that incorporate              [4]    F. Gers, J. Schmidhuber, F. Cummins, Learning to
mechanisms specifically designed to counteract the                          Forget: Continual Prediction with LSTM. Neural
capabilities of neural network-based predictive models.                     Computation 12(10) (2000) 2451–2471.
Future research should focus on exploring algorithmic                [5]    A. Graves, A.-R. Mohamed, G. Hinton, Speech
                                                                            Recognition with Deep Recurrent Neural Networks,
complexities that can more effectively obscure deterministic
                                                                            IEEE International Conference on Acoustics, Speech
patterns.                                                                   and Signal Processing (2013).
    Neural Network Enhancements: Our research has                    [6]    A. Karpathy, The Unreasonable Effectiveness of
shown that certain neural network architectures are more                    Recurrent Neural Networks (2015).
adept at predicting PRNG outputs than others. Investigating          [7]    M. Islam, G. Chen, S. Jin, An Overview of Neural
the development of novel neural network models or hybrid                    Network, American J. Neural Netw. Appl. 5(1) (2019)
architectures that can more efficiently process and predict                 7–11. doi: 10.11648/j.ajnna.20190501.12.
complex sequences is an exciting frontier. This includes             [8]    K. O’Shea, R. Nash, An Introduction to Convolutional
exploring deeper networks, attention mechanisms, and                        Neural Networks (2015).
other advanced features that could further improve                   [9]    Z. Li, et al., A Survey of Convolutional Neural
                                                                            Networks: Analysis, Applications, and Prospects
prediction accuracy.
                                                                            (2021).
    Cross-Disciplinary Approaches: Combining insights                [10]   T. Lin, T. Guo, K. Aberer, Hybrid Neural Networks for
from cryptography, machine learning, and complexity                         Learning the Trend in Time Series.
theory could yield innovative approaches to both PRNG                [11]   D. Psichogios, L. Ungar, A Hybrid Neural Network-
design and predictive modeling. Interdisciplinary research                  First Principles Approach to Process Modeling.
might uncover new principles for creating sequences that             [12]   A. Vaswani, et al., Attention Is All You Need.
are inherently more difficult to predict, as well as models                 Advances in Neural Information Processing Systems
that are more adept at understanding complex patterns.                      30 (2017).
    Real-World Application Scenarios: Applying our                   [13]   J. Brownlee, Deep Learning for Time Series
findings to real-world scenarios, where PRNGs are used                      Forecasting: Predict the Future with MLPs, CNNs and
                                                                            LSTMs in Python, Machine Learning Mastery (2018).
under various constraints and for different purposes, will be
                                                                     [14]   V. Desai, R. Patil, D. Rao, Using Layer Recurrent
essential. This includes testing PRNGs in environments with                 Neural Network to Generate Pseudo Random Number
high-security requirements, such as in blockchain                           Sequences, Int. J. Comput. Sci. 9 (2012) 324–334.
technologies, secure communications, and digital                     [15]   V. Maksymovych, et al., Hardware Modified Additive
signatures.                                                                 Fibonacci Generators Using Prime Numbers,
    Ethical Considerations and Security Implications: As                    Advances in Computer Science for Engineering and
research progresses in predicting PRNG outputs, it is


                                                                56
       Education VI, LNDECT 181 (2023). doi: 10.1007/978-3-
       031-36118-0_44.
[16]   V. Maksymovych, O. Harasymchuk, M. Shabatura,
       Modified Generators of Poisson Pulse Sequences
       Based on Linear Feedback Shift Registers, Advances in
       Intelligent Systems and Computing, AISC 1247 (2021)
       317–326.
[17]   V. Maksymovych, O. Harasymchuk, I. Opirskyy, The
       Designing and Research of Generators of Poisson
       Pulse Sequences on Base of Fibonacci Modified
       Additive Generator, International Conference on
       Theory and Applications of Fuzzy Systems and Soft
       Computing, ICCSEEA 2018: Advances in Intelligent
       Systems and Computing 754 (2019) 43–53.
[18]   R. Hamza, A Novel Pseudo Random Sequence
       Generator for Image-Cryptographic Applications, J.
       Info. Secur. Appl. 35 (2017) 119–127.
[19]   O. Harasymchuk, Generator of Pseudorandom Bit
       Sequence with Increased Cryptographic Security,
       Metallurgical and Mining Industry: Sci. Tech. J. 6(5)
       (2014) 24–28.
[20]   M. O’Neill, PCG: A Family of Simple Fast Space-
       Efficient Statistically Good Algorithms for Random
       Number Generation (2014).
[21]   M. Matsumoto, T. Nishimura, Dynamic Creation of
       Pseudorandom Number Generators (2015).
[22]   B. Widynski, Middle-Square Weyl Sequence RNG
       (2017).
[23]   K. Okada, et al., Learned Pseudo-Random Number
       Generator: WGAN-GP for Generating Statistically
       Robust Random Numbers, PLoS One 18(6) (2023). doi:
       10.1371/journal.pone.0287025.


                                                               57