Predicting pseudo-random number generator output with sequential analysis⋆ Dmytro Proskurin1,†, Maksim Iavich2,†, Tetiana Okhrimenko1,*,†, Okoro Chukwukaelonma1,† and Tetiana Hryniuk1,† 1 National Aviation University, 1 Liubomyra Huzara ave., 03058 Kyiv, Ukraine 2 Caucasus University, 1 Paata Saakadze str., 0102 Tbilisi, Georgia Abstract This study delves into the predictive capabilities of neural network models, specifically focusing on Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks as well as the combination of both in a hybrid architecture, for forecasting the outputs of various pseudo-random number generators (PRNGs). The investigation extends across a diverse set of PRNG algorithms, including Linear Congruential Generator (LCG), Mersenne Twister (MT), Xorshift, and Middle Square. Through meticulous analysis, the study evaluates the accuracy of these models in predicting single and continuous outputs generated from the mentioned PRNGs. The research findings illuminate the superior predictive performance of hybrid models, attributed to their adeptness at capturing long-term dependencies, a crucial factor in decoding the complexities of PRNG sequences. Additionally, the impact of model optimization techniques, including dropout and L2 regularization, on enhancing predictive accuracy is thoroughly explored. This comprehensive examination not only underscores the potential of neural networks in identifying deterministic patterns within PRNG outputs but also offers valuable insights into optimal model selection and configuration. The implications of this work are significant, paving new avenues in cryptography and securing random number generation by highlighting the predictability of PRNGs under advanced neural network models. Keywords random numbers, RNN, CNN, LSTM, GRU, hybrid model, PRNG 1 1. Introduction This paper embarks on a comprehensive exploration of RNNs and LSTMs in the context of sequence prediction. We In the ever-evolving landscape of machine learning, the delve into the architectural intricacies of these models, their ability to accurately predict future events based on strengths and weaknesses, and their performance across sequential data stands as a cornerstone of numerous various sequence prediction scenarios. technological advancements and applications. From Our study is particularly focused on datasets generated forecasting stock market trends to decoding human by different Pseudo-Random Number Generators (PRNGs), language, the significance of effective sequence prediction offering a unique lens through which the capabilities of cannot be overstated. Central to this domain are Recurrent these models can be examined and understood. Neural Networks (RNNs) and Long Short-Term Memory Through rigorous experimentation and analysis, we aim (LSTM) networks, which have emerged as powerful tools in to shed light on the nuances of sequence prediction and the machine learning arsenal for handling sequential data. provide insights that could guide future applications and RNNs, known for their unique architecture that allows research in this fascinating area of machine learning. information to persist, have been instrumental in modeling time-dependent data [1]. However, their application is often 2. Background and related work marred by challenges such as the vanishing gradient problem, which hinders the learning of long-range Recent advancements in sequence prediction have been dependencies [2]. Enter LSTMs, a special kind of RNN significantly influenced by the development and refinement designed specifically to overcome these limitations. With of Recurrent Neural Networks (RNNs) and Long Short-Term their sophisticated internal mechanisms, LSTMs have set Memory (LSTM) networks. These models have shown new benchmarks in sequence prediction tasks, remarkable proficiency in handling sequential data, demonstrating remarkable success where traditional RNNs particularly in domains where understanding temporal falter [2, 3]. dynamics is crucial. CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv, 0000-0002-2835-4279 (D. Proskurin); 0000-0002-3109-7971 Ukraine (M. Iavich); 0000-0001-9036-6556 (T. Okhrimenko); 0000-0002-1247-9854 ∗ Corresponding author. (O. Chukwukaelonma); 0000-0003-0123-5241 (T. Hryniuk) † These authors contributed equally. © 2024 Copyright for this paper by its authors. Use permitted under proskurin.d@stud.nau.edu.ua (D. Proskurin); miavich@cu.edu.ge Creative Commons License Attribution 4.0 International (CC BY 4.0). (M. Iavich); t.okhrimenko@npp.nau.edu.ua (T. Okhrimenko); kaelo@gmail.com (O. Chukwukaelonma); t.hryniuk@ukr.net (T. Hryniuk) CEUR Workshop ceur-ws.org ISSN 1613-0073 42 Proceedings 1. LSTM for Time Series Prediction: Studies have neurons to improve themselves through learning [8]. CNNs demonstrated the effectiveness of LSTM models in time have made remarkable achievements. This neural network series forecasting, a domain traditionally dominated by is now widely used in deep learning. Convolutional neural statistical methods like ARIMA. Unlike these methods, networks have revolutionized computer vision, enabling LSTMs can capture complex nonlinear relationships in time previously unthinkable feats like facial recognition, series data [2, 4]. Researchers have successfully applied driverless automobiles, self-service supermarkets, and LSTM models to forecast stock prices, energy demand, and intelligent medical treatments, CNNs also differ from weather patterns, achieving higher accuracy than typical ANNs by focusing on picture pattern recognition. traditional models, especially in scenarios with long-term This allows us to encode image-specific properties into the dependencies and high volatility. architecture, making the network better suited for image- 2. RNNs in Natural Language Processing (NLP): RNNs focused tasks while also lowering the number of parameters have been pivotal in advancing NLP. Their ability to process needed to set up the model [8, 9]. sequential text data has led to breakthroughs in machine Hybrid Neural Networks (HNNs), which integrate the translation, text generation, and sentiment analysis [1, 2, 5]. strengths of many neural networks, are becoming The sequential processing capability of RNNs allows them increasingly popular in computer vision applications to maintain context in text, a critical factor in understanding including picture captioning and action identification. human language [5]. However, vanilla RNNs often struggle However, there has been limited research on the effective with long-term dependencies [6], leading to the adoption of use of hybrid architectures for time series data, particularly LSTMs and GRUs (Gated Recurrent Units) in more complex for trend forecasting purposes [10]. HNNs use their internal NLP tasks. structure to limit the interactions between process variables 3. Sequence-to-Sequence Learning: The sequence-to- to align with physical models. Compared to regular neural sequence learning framework, often implemented using networks, coupled models are more accurate, dependable, LSTMs, has revolutionized tasks like machine translation. and generalizable [11]. This approach involves training models on pairs of output Recurrent Neural Networks (RNNs) represent a and output sequences, enabling the model to learn paradigm shift in neural networks, specifically designed to mappings from one sequence to another. This framework recognize patterns in sequences of data [12]. Unlike has been crucial in developing models that can translate traditional feedforward neural networks, RNNs possess a entire sentences with context, rather than translating on a unique feature: the output from the previous step is fed back word-by-word basis [2]. into the output of the current step. This looping mechanism 4. Challenges and Limitations: Despite their successes, allows RNNs to maintain an internal state that captures RNNs and LSTMs are not without challenges. The vanishing information about the sequence they have processed so far, gradient problem in RNNs, where the model loses its ability making them ideal for tasks like speech recognition, to learn long-range dependencies, has been partially language modeling, and time series forecasting [2, 12]. addressed by LSTMs but still poses limitations [4]. The core architecture of an RNN involves a hidden layer Additionally, the training of these models can be where the activation at a given time step is a function of the computationally intensive, requiring significant resources output at the same step and the activation of the hidden for large datasets. layer at the previous step [6]. This recurrent nature allows 5. Future Directions: Ongoing research is exploring the network to maintain a form of memory [12]. However, more efficient and effective variants of RNNs and LSTMs, RNNs are often challenged by long-term dependencies due such as attention mechanisms and Transformer models [4]. to issues like vanishing and exploding gradients during These developments aim to address existing limitations backpropagation [2, 3], where the network becomes unable while enhancing the models’ ability to process longer to learn and retain information from earlier time steps in the sequences and maintain context over extended periods. sequence [4]. Long Short-Term Memory Networks, a special kind of 3. Model architecture overview RNN, were developed to overcome the limitations of traditional RNNs. LSTMs are adept at learning long-term Neural networks are artificial intelligence models that dependencies, thanks to their unique internal structure [3]. mimic human brain function [2]. A neural network connects Unlike standard RNNs, LSTMs have a complex architecture processing units, similar to neurons, rather than with a series of gates: the forget gate, output gate, and manipulating zeros and ones like a digital model does [2]. output gate [3, 4]. These gates regulate the flow of The result depends on how the connections are organized information into and out of the cell, deciding what to keep and weighted. Neural networks are algorithms modeled in memory and what to discard, thereby addressing the after the human brain that recognize patterns. Sensory data vanishing gradient problem [4]. is interpreted using machine perception, which labels or Forget Gate: Determines what information is discarded clusters raw information. They recognize numerical from the cell state [4, 13]. patterns in vectors, which must be converted into real- Output Gate: Updates the cell state with new world data like as images, sounds, text, and time series [7]. information from the current output [13]. Artificial Neural Networks (ANNs) are computing systems Output Gate: Determines the next hidden state and modeled after biological neural systems, including the output based on the current output and the updated cell human brain [8]. state [13]. Convolutional Neural Networks (CNNs) are similar to standard artificial neural networks (ANNs) in that they use 43 This architecture allows LSTMs to make more precise intricate and seemingly random sequences. In addition to its decisions about what information to store, modify, and inability to produce the all-zero state, the Mersenne Twister output. As a result, LSTMs have been successfully applied also finds it difficult to act randomly in its nearly all-zero in various complex sequence modeling tasks, including state [20]. machine translation, speech synthesis, and even generative models for music composition [3, 4, 13]. 4.3. Xorshift dataset While both RNNs and LSTMs are designed for sequence Description: Xorshift is a class of PRNGs that operates using processing, the key difference lies in their ability to handle XOR (exclusive or) and bit-shifting operations [20]. It’s long-term dependencies [4, 14]. Standard RNNs, while known for its simplicity and speed, often used in scenarios simpler and computationally less intensive, struggle with where the speed of random number generation is critical [20]. retaining information over longer sequences. LSTMs, with Characteristics: Despite its simplicity, Xorshift can their intricate gating mechanism, excel in scenarios where produce high-quality random sequences [20]. The non- understanding long-range contextual information is linear nature of its operations makes it an interesting case crucial [4]. for studying how well neural network models can adapt to The choice between RNNs and LSTMs often boils down and predict outputs from non-linear algorithms [22]. A to the specific requirements of the task at hand, the bitwise xor operation is a type of permutation that involves complexity of the sequences involved, and the flipping certain bits in the target. It can be performed again computational resources available. LSTMs are generally to reverse the effects [20]. The conventional understanding preferred for more complex tasks with longer sequences [3], of Xorshift would advise us to concentrate on lengthening while RNNs might suffice for simpler tasks with shorter the bits’ period [20]. temporal dependencies [1]. 4.4. Middle square method dataset 4. Methodology Description: The Middle Square method is an older PRNG There are a large number of pseudorandom generators that technique that generates random numbers by squaring the differ in their characteristics, construction methods, and number and extracting the middle digits of the result [22]. areas of possible application [15–19]. In our study, we It’s less commonly used today due to certain limitations. employed datasets generated by four distinct PRNG Characteristics: This method is prone to quickly algorithms, each offering unique challenges and converging to repetitive cycles or zeros, especially with characteristics for sequence prediction using RNN and certain seed values [23]. The predictability and potential LSTM models. These datasets serve as a testing ground to repetition in the sequences makes it a unique dataset to test evaluate and compare the performance of different neural the models’ ability to detect and adapt to less complex and network architectures in sequence prediction tasks. potentially degenerative patterns [22]. The field of computer science began with the invention of the middle 4.1. Linear congruential generator dataset square [22]. It is possible to develop a viable version with a Description: The LCG is one of the oldest and simplest sufficiently long period (264 for each stream) thanks to PRNG algorithms [20]. It generates random numbers using modern 64-bit computing architecture. The fastest RNGs are a linear equation [20]. The simplicity of its algorithm makes comparable in processing speed [23]. This generator works it a good baseline for evaluating the predictive capabilities well for parallel processing because of its simple stream of RNN and LSTM models. capability. Because a square is nonlinear, it provides this Characteristics: The sequence generated by an LCG can generator with an edge over linearly-based generators in exhibit patterns due to its linear nature. These patterns, terms of data quality [22]. while not immediately apparent, can be learned over time, making it an interesting case for sequence prediction 5. Dataset preparation models [20]. Despite their potential statistical issues, LCGs In our study on “Predicting PRNG Output with Sequential have the advantage of offering all the auxiliary qualities, Analysis”, we meticulously prepared a dataset to analyze such as seekability, numerous streams, and k-dimensional the predictability of various Pseudorandom Number equidistribution [20]. Generators (PRNGs), focusing on four widely recognized algorithms: Linear Congruential Generator (LCG) (Fig. 2), 4.2. Mersenne twister dataset MiddleSquare (Fig. 4), Xorshift (Fig. 3), and Mersenne Description: The Mersenne Twister, specifically the Twister (MT) (Fig. 1). Each of these PRNGs was chosen for MT19937 variant, is known for its long period and high- its unique approach to generating sequences of quality outputs. It’s widely used in various applications due pseudorandom numbers, providing a diverse test bed for our to its reliability and speed [21]. predictive models. Characteristics: MT generates sequences that are far more complex and less predictable than LCG [20]. This 5.1. Data generation parameters complexity provides a challenging scenario for RNNs and The dataset was generated using the following LSTMs, testing their ability to model and predict more parameters to ensure consistency across all PRNGs: 44 Figure 1: MT dataset distribution Figure 2: LCG dataset distribution Figure 3: Xorshift dataset distribution Figure 4: MiddleSquare dataset distribution 45 Sample Size: Each PRNG was used to generate a sequence 6.2. Recurrent neural networks of 10,000 numbers, with n = 10000, to create a Application: RNNs were employed for both single-value and substantial dataset for training and evaluation. Seed continuous-value output predictions. Unlike CNNs, RNNs Value: A common seed value of 8956482 was applied to have a memory mechanism that allows them to process initialize each PRNG, ensuring that the starting point of entire sequences of data, making them ideal for the pseudorandom sequence was consistent across understanding the temporal dynamics and dependencies different generators. Word Size: For PRNGs where within PRNG outputs. applicable, such as MiddleSquare, a word size of 8 bits was selected, balancing the need for computational 6.3. Long short-term memory networks efficiency with the desire for sequence complexity. Sequence Length: The output was segmented into Application: Like RNNs, LSTMs were utilized for both sequences of length 10, which were then used as single-value and continuous-value output predictions. individual data points for the subsequent analysis. This LSTMs are a special kind of RNN capable of learning long- sequence length was chosen to provide enough data for term dependencies. They are particularly effective in recognizing patterns without overwhelming the avoiding the vanishing gradient problem, enabling them to analytical models. capture patterns over longer sequences of PRNG outputs. 5.2. Dataset splitting 6.4. Hybrid model Once generated, the dataset was divided into three distinct Configuration: The Hybrid model represents an innovative sets to facilitate the training, testing, and validation of our approach, integrating the strengths of CNNs and LSTMs predictive models: into a singular architecture. It comprises: Training Set: Used to train the models, allowing them to • CNN Layer: For extracting local features within learn and adapt to the patterns inherent in the the subsequence of the PRNG output. pseudorandom sequences generated by each PRNG. • LSTM Layer: To capture long-term dependencies Testing Set: Employed to assess the performance of the and temporal patterns in the data, building upon models on unseen data, providing an unbiased evaluation of the features extracted by the CNN layer. their predictive capabilities. • Dense Layer: Serving as the output layer, it Validation Set: Utilized during the model tuning phase synthesizes the information processed by the CNN to fine-tune parameters and prevent overfitting, ensuring and LSTM layers to make predictions. that the models generalize well to new data. This careful preparation and partitioning of the dataset Application: Designed for versatility, the Hybrid model were critical in establishing a robust foundation for our is equipped to handle both single-value and continuous- investigation into the predictability of PRNG outputs value outputs, offering a robust solution for predicting through sequential analysis. By standardizing the PRNG outputs by leveraging the complementary strengths generation parameters and thoughtfully splitting the data, of convolutional and recurrent layers. we aimed to create a fair and consistent testing environment The strategic selection and configuration of these for each of the predictive models applied in our study. models underpin our analytical methodology. By employing a diverse array of architectures, each with its unique 6. Model configuration advantages, our study aims to comprehensively evaluate the predictability of PRNG outputs. The Hybrid model In our exploration of “Predicting PRNG Output with underscores our commitment to innovation, integrating Sequential Analysis”, we employed a comprehensive multiple neural network paradigms to enhance predictive approach by leveraging various neural network accuracy and insight into the sequential nature of PRNG- architectures. Each model was selected based on its ability generated data. to process sequential data, a core characteristic of PRNG outputs. Our analysis incorporated Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), 7. Evaluation metrics Long Short-Term Memory networks (LSTMs), and a custom To rigorously assess the effectiveness of our models in Hybrid model, each designed to handle the intricacies of predicting PRNG outputs, we employed a set of PRNG-generated data in distinct ways. comprehensive evaluation metrics. These metrics are crucial for quantifying the accuracy of our predictions and 6.1. Convolutional neural networks facilitating a direct comparison between the different neural Application: Primarily used for single-value output network architectures utilized in our study. Our evaluation prediction, CNNs are adept at identifying patterns within a framework is centered around the Mean Squared Error fixed-size window of the sequence. This model excels at (MSE) and a specially devised Model Performance Score. capturing local dependencies and spatial hierarchies in data, making it suitable for analyzing individual segments of the 7.1. Mean squared error PRNG output. MSE serves as the cornerstone of our evaluation strategy. It calculates the average squared difference between the actual and predicted values, offering a precise measure of 46 the prediction error’s magnitude. By squaring the errors, Tangent). These functions were chosen for their MSE gives more weight to larger errors, making it distinct characteristics in handling nonlinearities particularly sensitive to outliers and significant prediction in the data. inaccuracies.  Number of Neurons: The neuron counts tested In the context of predicting PRNG outputs, MSE were 8, 16, and 32. This range allowed us to provides a clear and direct measure of how closely the explore the models’ capacity to learn and model’s predictions align with the actual sequence of generalize from the data, balancing complexity numbers generated by the PRNGs. A lower MSE indicates with computational efficiency. higher prediction accuracy, reflecting a model’s ability to  Epochs: All models were trained for [1000] epochs, effectively capture and replicate the underlying patterns of providing ample opportunity for learning and the PRNG sequence. convergence.  Model Layers: We varied the depth of the models 7.2. Model performance score by testing configurations with [1, 2, 4] layers. This Recognizing the need for a standardized metric that allows variation aimed to understand how model depth for an intuitive understanding of model performance, we influences learning and prediction accuracy. introduced the Model Performance Score. This metric normalizes the MSE to a scale ranging from 0 to 1, where 0 Output Lengths: For continuous value prediction, represents the poorest performance (highest MSE) and 1 output lengths of [1–4] were tested. This range was denotes perfect prediction accuracy (zero MSE). selected to assess the models’ ability to forecast multiple The Model Performance Score is calculated by inversely steps in the PRNG sequence. scaling the MSE against a predetermined maximum error threshold. This approach ensures that the performance 8.1.1. Impact of dropout and L2 regularization score is adjusted for the scale of the data and the expected One of the most notable findings from our experiments was variation in prediction accuracy, allowing for a fair the impact of dropout and L2 regularization techniques on comparison across different models and datasets. model learning capabilities. Contrary to common practice in This normalized score simplifies the interpretation of machine learning, where these techniques are employed to our results, providing a straightforward metric to gauge enhance model generalization and prevent overfitting, our model effectiveness. It allows stakeholders to quickly assess experiments revealed that: the relative performance of each model in predicting PRNG Models without dropout and L2 regularization outputs without delving into the complexities of raw MSE demonstrated superior performance in learning and values. predicting PRNG outputs. The introduction of these Together, these evaluation metrics form the foundation regularization techniques led to models that were unable to of our analytical approach, enabling a nuanced analysis of adequately learn from the training data and, consequently, model performance. MSE offers a detailed view of the failed to predict accurately. prediction accuracy, while the Model Performance Score This observation suggests a unique aspect of predicting provides a high-level, comparative perspective. By PRNG outputs: the data generated by PRNGs, while incorporating both metrics, our study ensures a balanced and seemingly random, follows deterministic algorithms. The comprehensive evaluation of how well each neural network addition of regularization techniques, which are designed to architecture can predict the seemingly unpredictable: output introduce randomness and constraint to the learning of pseudorandom number generators. process, may interfere with the model’s ability to capture the underlying deterministic patterns of PRNG sequences. 8. Experiment variables and The results of these experiments provide valuable observations insights into the design and optimization of neural network models for predicting PRNG outputs. Specifically, they We conducted an extensive series of experiments to underscore the importance of tailoring model evaluate the predictive capabilities of various neural configurations to the specific characteristics of the data and network configurations. These experiments were the task at hand. In the context of PRNG prediction, meticulously designed to explore the impact of different minimizing external sources of randomness and constraint model parameters on the accuracy of PRNG output (e.g., through dropout and L2 regularization) appears to be predictions. Below, we detail the variables involved in these crucial for enabling models to learn and replicate the experiments and highlight some critical observations deterministic patterns that govern PRNG behavior. related to model performance. 9. Experiment results analysis 8.1. Experiment variables Our exhaustive investigation into predicting PRNG output To systematically assess the effects of various through sequential analysis yielded compelling findings, hyperparameters on model performance, we tested a elucidated through the analysis of the top-performing wide array of combinations, encompassing: models for each PRNG. Here, we detail the significant outcomes for both single-output and continuous-output  Activation Functions: We exper. scenarios across different PRNGs: Xorshift, MT (Mersenne  imented with two popular activation functions, ReLU (Rectified Linear Unit) and tanh (Hyperbolic 47 Twister), LCG (Linear Congruential Generator), and Xorshift: The RNN model with 32 neurons, 5 layers, MiddleSquare. and the ReLU activation function emerged as the top performer, achieving a mean score of 0.9898 (Table 1). 9.1. Single-output scenario analysis However, both Hybrid and CNN models came close to the For single-output predictions, our experiments have the same success rate suggesting that the Xorshift sequence following results. characteristics are not particularly difficult to capture. Table 1 Xorshift, single output results Scenario Model type Neuron Activation function Epochs Layers Mean score Xorshift RNN 32 relu 1000 5 0.989848 Xorshift CNN 32 relu 1000 3 0.983555 Xorshift Hybrid 32 relu 1000 2 0.982930 Xorshift CNN 16 relu 1000 2 0.981882 Xorshift Hybrid 8 relu 1000 2 0.980837 Xorshift CNN 32 relu 1000 5 0.979213 Xorshift CNN 8 relu 1000 2 0.978671 Xorshift CNN 32 relu 1000 2 0.977584 Xorshift CNN 16 relu 1000 5 0.977577 50% of all models were able to reach 90% success thresholds Nevertheless, more improvement can get this number (Error! Reference source not found.). higher. Figure 5: Xorshift, single output, all models results MT: The CNN model with 8 neurons, 3 layers, and ReLU capabilities are effective at decoding the MT’s output activation function led the pack with a mean score of 0.9832 patterns. (Table 2), indicating that CNN’s feature extraction 28% of all models were able to reach 90% success thresholds (Fig. 6). 48 Table 2 MT, single output results Scenario Model type Neuron Activation function Epochs Layers Mean score MT CNN 8 relu 1000 3 0.983227 MT CNN 32 relu 1000 3 0.980932 MT RNN 32 tanh 1000 5 0.978619 MT CNN 32 relu 1000 2 0.978589 MT CNN 16 relu 1000 2 0.977052 MT LSTM 32 relu 1000 5 0.976916 MT RNN 32 relu 1000 5 0.973991 MT CNN 32 tanh 1000 2 0.973030 MT CNN 32 relu 1000 5 0.972651 MT CNN 16 relu 1000 5 0.972521 Figure 6: MT, single output, all models results LCG: The Hybrid model, combining CNN and LSTM reaching a mean score of 0.9831 (Table 3). This architectures with tanh activation, showed superior underscores the Hybrid model’s robustness in capturing performance, especially with 32 neurons and 5 layers, both local and long-range dependencies in LCG sequences. Table 2 LCG, single output results Scenario Model type Neuron Activation function Epochs Layers Mean score LCG Hybrid 32 tanh 1000 5 0.983155 LCG Hybrid 8 tanh 1000 2 0.982143 LCG Hybrid 8 tanh 1000 3 0.980809 LCG Hybrid 32 tanh 1000 2 0.980579 LCG Hybrid 16 tanh 1000 2 0.979036 LCG CNN 8 relu 1000 2 0.978362 LCG Hybrid 16 tanh 1000 3 0.978311 LCG Hybrid 32 tanh 1000 3 0.977490 LCG RNN 32 tanh 1000 3 0.976433 LCG CNN 32 relu 1000 5 0.975615 60% of all models were able to reach 90% success thresholds (Fig. 7). 49 Figure 7: LCG, single output, all models results MiddleSquare: The Hybrid model with tanh activation, 16 navigating the complex, squared calculations intrinsic to neurons, and 3 layers stood out with a mean score of 0.9883 MiddleSquare algorithm. (Table 4), highlighting the model’s effectiveness in Table 4 MiddleSquare, single output results Scenario Model type Neuron Activation function Epochs Layers Mean score MiddleSquare Hybrid 16 tanh 1000 3 0.988377 MiddleSquare CNN 32 relu 1000 3 0.987493 MiddleSquare CNN 32 relu 1000 2 0.986276 MiddleSquare Hybrid 32 tanh 1000 3 0.983554 MiddleSquare Hybrid 16 tanh 1000 5 0.983326 MiddleSquare Hybrid 8 tanh 1000 2 0.982954 MiddleSquare CNN 32 relu 1000 5 0.980597 MiddleSquare Hybrid 32 tanh 1000 5 0.980450 MiddleSquare CNN 8 relu 1000 5 0.980440 MiddleSquare Hybrid 16 tanh 1000 2 0.980428 64% of all models were able to reach 90% success thresholds (Fig. 8). Figure 8: MiddleSquare, single output, all models results 50 These results underscore the nuanced relationship between predicted. While the models have been fine-tuned to achieve PRNG algorithms and neural network architectures, high predictive accuracy, the graphical analysis indicates suggesting that no single model architecture is universally that there is an inherent limitation to the exactness of these superior. Instead, the optimal choice depends on the specific predictions. characteristics and mechanisms of the PRNG being Figure 9: Prediction vs actual values, MiddleSquare with the best result (0.988377) A further performance plot (Error! Reference source not The scatter plot showing predictions versus actual found.) illustrates the correlation between the predicted and values (Fig. 10) for the Hybrid model using tanh activation, actual values of the PRNG sequence. The near-perfect linear 16 neurons, and 3 layers reveals a close correspondence alignment along the 45-degree line suggests that the model’s between predicted and actual values. However, the predictions are highly correlated with the actual PRNG outputs. dispersion of points away from the line of perfect agreement The tight clustering of the points around this line demonstrates (where predicted values equal actual values) suggests that the model’s effectiveness in capturing the underlying pattern while the model can approximate the PRNG’s output with of the PRNG sequence. However, the slight deviation of points high fidelity, it cannot achieve complete accuracy. The from the line implies that while the model can predict the variance from the line of perfect prediction could be general trend and distribution of the PRNG outputs, it cannot attributed to the deterministic yet complex nature of replicate the sequence with absolute precision. PRNGs, which inherently limits the predictability even with sophisticated models. Figure 10: Prediction vs actual values, MiddleSquare with the best result (0.988377) 9.2. Continuous-Output Scenario Analysis For the MiddleSquare PRNG, the Hybrid-C model with tanh activation, 16 neurons, 3 layers, and an output length of 3 The continuous-output models demonstrated even higher achieved a near-perfect mean score of 0.9955. Only 29% of predictive accuracy, with the Hybrid model configured for all models were able to break the 90% success milestone (Fig. continuous predictions (Hybrid-C) achieving remarkable 11). success. 51 Table 5 MiddleSquare, continuous output results Scenario Model type Neuron Activation function Epochs Layers Output length Mean score MiddleSquare Hybrid-C 16 tanh 1000 3 3 0.995479 MiddleSquare Hybrid-C 16 tanh 1000 2 2 0.992588 MiddleSquare Hybrid-C 8 tanh 1000 5 2 0.990003 MiddleSquare Hybrid-C 8 relu 1000 5 1 0.988989 MiddleSquare Hybrid-C 32 relu 1000 5 3 0.988566 MiddleSquare Hybrid-C 8 tanh 1000 3 1 0.988066 MiddleSquare Hybrid-C 16 relu 1000 2 2 0.986359 MiddleSquare Hybrid-C 16 tanh 1000 5 3 0.985923 MiddleSquare Hybrid-C 16 tanh 1000 5 2 0.985797 MiddleSquare Hybrid-C 8 tanh 1000 5 1 0.984813 Figure 11: MiddleSquare, continuous output, all models results For the LCG PRNG, the Hybrid-C model with tanh models were able to break the 90% success threshold activation, 8 neurons, 5 layers, and an output length of 2 (Fig. 12). achieved a near-perfect mean score of 0.992055. 20% of all Table 3 LCG, continuous output results Scenario Model type Neuron Activation function Epochs Layers Output length Mean score LCG Hybrid-C 8 tanh 1000 5 2 0.992055 LCG Hybrid-C 8 tanh 1000 5 3 0.989730 LCG Hybrid-C 16 tanh 1000 5 5 0.987614 LCG Hybrid-C 8 tanh 1000 3 5 0.986818 LCG Hybrid-C 8 tanh 1000 2 5 0.985174 LCG Hybrid-C 16 tanh 1000 3 2 0.984808 LCG Hybrid-C 32 tanh 1000 3 2 0.984462 LCG Hybrid-C 16 tanh 1000 2 1 0.984411 LCG Hybrid-C 32 tanh 1000 3 1 0.983866 LCG Hybrid-C 32 tanh 1000 2 1 0.983706 52 Figure 12: LCG, continuous output, all models results For the Xorshift PRNG, the Hybrid-C model with relu 15% of all models were able to break the 90% success activation, 16 neurons, 2 layers, and an output length of 2 threshold (Fig. 13). achieved a near-perfect mean score of 0.987906 (Table 7). Table 4 Xorshift, continuous output results Scenario Model type Neuron Activation function Epochs Layers Output length Mean score Xorshift Hybrid-C 16 relu 1000 2 2 0.987906 Xorshift Hybrid-C 16 relu 1000 2 5 0.985753 Xorshift Hybrid-C 8 relu 1000 5 5 0.985715 Xorshift Hybrid-C 8 relu 1000 2 2 0.984238 Xorshift Hybrid-C 32 relu 1000 2 2 0.983437 Xorshift Hybrid-C 16 relu 1000 2 3 0.981247 Xorshift Hybrid-C 8 relu 1000 2 1 0.980434 Xorshift Hybrid-C 32 relu 1000 2 1 0.978930 Xorshift RNN-C 32 tanh 1000 3 1 0.977685 Xorshift Hybrid-C 32 relu 1000 2 3 0.976177 Figure 13: Xorshift, continuous output, all models results For the MT PRNG, the Hybrid-C model with relu 12% of all models were able to break the 90% success activation, 32 neurons, 2 layers, and an output length of 2 threshold (Fig. 14). achieved a near-perfect mean score of 0.985006 (Table 8). 53 Table 5 MT, continuous output results Scenario Model type Neuron Activation function Epochs Layers Output length Mean score MT Hybrid-C 32 relu 1000 2 2 0.985006 MT RNN-C 32 relu 1000 5 1 0.981523 MT RNN-C 32 tanh 1000 5 1 0.980135 MT Hybrid-C 16 tanh 1000 2 3 0.976324 MT LSTM-C 32 relu 1000 5 1 0.976245 MT Hybrid-C 8 tanh 1000 2 1 0.975258 MT RNN-C 32 tanh 1000 3 1 0.974210 MT RNN-C 32 relu 1000 3 1 0.972664 MT RNN-C 32 relu 1000 2 1 0.965829 MT RNN-C 32 tanh 1000 2 1 0.963914 Figure 14: MT, continuous output, all models results The examination of continuous-output models reveals a correlation, along with a high success score of 0.9955, notable enhancement in predictive performance compared reflects the model’s exceptional predictive accuracy. The to single-output models. This is particularly evident in the dense clustering of points along the diagonal suggests that context of predicting sequences generated by the the model can reliably predict the MiddleSquare PRNG’s MiddleSquare PRNG. output with high confidence, and such precision is The performance plot illustrating the correlation indicative of the model’s ability to capture both the between predicted and actual values (Fig. 15) for the immediate and contextual dependencies within the PRNG’s continuous-output model shows an even tighter linear sequence. alignment than the single-output model. This near-perfect Figure 15: Prediction vs actual values, MiddleSquare with the best result (0.9955) 54 The scatter plot for the continuous-output Hybrid model, values throughout the sequence. This tight clustering which integrates CNN and LSTM architectures (Hybrid-C), indicates a substantial reduction in prediction errors and a showcases a substantial concentration of points closely strong alignment with the true PRNG sequence, suggesting aligned with the line of perfect prediction (Fig. 16). The a deeper understanding of the underlying patterns by the model, employing tanh activation with 16 neurons across 3 model. layers, exhibits a remarkable ability to track the actual Figure 16: Prediction vs actual values, MiddleSquare with the best result (0.9955) The continuous-output model’s superior performance, as This success rate is indicative of the models’ ability to evidenced by the closer proximity of predicted to actual decipher the underlying deterministic patterns that govern values and the higher success score, highlights the benefit PRNG outputs. of utilizing sequential context in PRNG output prediction. Even more impressive, the continuous-output model, The ability to forecast the sequence with a success score which utilizes sequences of values to predict subsequent reaching 0.9955 marks a significant milestone, suggesting outputs, reached a 99% success rate. This improvement that models incorporating sequence history can more suggests that incorporating more context in the form of effectively decode the deterministic yet complex structure continuous output sequences enables the models to better of PRNG outputs. capture the PRNGs’ inherent algorithms, leading to more This analysis implies that continuous-output models accurate predictions. hold great promise for applications where forecasting accuracy over sequences is critical. The insights gleaned 9.4. Implications for PRNG Analysis and from this research can inform the development of more Security secure PRNGs, capable of withstanding sophisticated The success of our models in predicting PRNG outputs with sequential analysis. Future work will likely explore the such high accuracy has profound implications for the fields expansion of this approach to more complex and higher- of cryptography and random number generation. While dimensional sequences, potentially integrating additional PRNGs are designed to produce sequences that are difficult layers of complexity and exploring the impact on model to predict, our results suggest that advanced neural network performance. models can uncover and exploit hidden patterns within these sequences. This finding calls for ongoing efforts to 9.3. Model Performance Across PRNGs enhance the unpredictability and security of PRNGs, Our study’s findings highlight the nuanced nature of PRNG ensuring they remain robust against sophisticated output prediction, with different models excelling for analytical techniques. specific generators. This variation underscores the importance of model selection tailored to the characteristics 10. Conclusions of the PRNG being analyzed. For instance, the best- performing model for the Xorshift generator might leverage This research delves into the predictability of PRNGs using its unique XOR and shift operations, whereas the optimal advanced neural network models. Our study demonstrates model for the Mersenne Twister (MT) would need to that tested architectures possess a remarkable ability to account for its complex bit manipulation and tempering predict the outputs of various PRNGs, with enhanced techniques. accuracy observed in continuous output prediction Remarkably, the single-output models consistently scenarios showcasing a superior performance in capturing achieved a 98% success rate across various PRNGs, long-term dependencies within PRNG sequences, affirming demonstrating a high level of accuracy in predicting the their suitability for complex sequence prediction tasks. next output value based solely on a single preceding value. 55 Our findings illuminate the nuanced dynamics of PRNG imperative to consider the ethical implications and potential predictability and the potential vulnerabilities inherent security risks associated with disseminating advanced within commonly used generators. By leveraging neural predictive models. Developing guidelines and best practices networks, we not only uncover the deterministic patterns for responsible research and application in this area is masked as randomness but also push the boundaries of crucial. understanding in cryptographic security and random Enhancing PRNG security: The ability of neural number generation. networks to predict PRNG outputs with such accuracy Future research should explore the integration of more highlights an urgent need for the cryptographic community complex neural architectures and the application of these to re-evaluate and enhance the design and implementation findings in real-world scenarios, such as secure of PRNGs. Ensuring that PRNGs can withstand analysis by communications and cryptographic key generation. The advanced predictive models is crucial for maintaining the implications of our work suggest a pivotal shift towards security and integrity of cryptographic systems, which rely more secure and unpredictable PRNG designs, bolstering heavily on the unpredictability of these generators. the defenses against adversarial predictions and enhancing the integrity of cryptographic systems. Acknowledgment This work was supported by the Shota Rustaveli National 11. Future research directions Foundation of Georgia (SRNSFG) [NFR-22-14060] as well as These findings have significantly advanced our the Ministry of Education and Science of Ukraine (grant understanding of the capabilities and limitations of current №0122U002361 “Intelligent system of secure packet data PRNG technologies when subjected to advanced neural transmission based on reconnaissance UAV”). network-based predictive models. The high success rates achieved by these models, particularly the 99% success rate References with continuous-output models, not only demonstrate the feasibility of predicting PRNG outputs but also underscore [1] K. Cho, et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine the intricate patterns that deterministic algorithms Translation, arXiv:1406.1078 (2014). generate—patterns that sophisticated models can uncover. [2] I. Sutskever, O. Vinyals, Q. Le, Sequence to Sequence This study opens several avenues for future research, Learning with Neural Networks, Advances in Neural aimed at both improving PRNG designs and developing Information Processing Systems 27 (2014). more advanced predictive models: [3] S. Hochreiter, J. Schmidhuber, Long Short-Term Advanced PRNG Algorithms: There is a clear need for Memory, Neural Computation 9(8) (1997) 1735–1780. the development of new PRNG algorithms that incorporate [4] F. Gers, J. Schmidhuber, F. Cummins, Learning to mechanisms specifically designed to counteract the Forget: Continual Prediction with LSTM. Neural capabilities of neural network-based predictive models. Computation 12(10) (2000) 2451–2471. Future research should focus on exploring algorithmic [5] A. Graves, A.-R. Mohamed, G. Hinton, Speech Recognition with Deep Recurrent Neural Networks, complexities that can more effectively obscure deterministic IEEE International Conference on Acoustics, Speech patterns. and Signal Processing (2013). Neural Network Enhancements: Our research has [6] A. Karpathy, The Unreasonable Effectiveness of shown that certain neural network architectures are more Recurrent Neural Networks (2015). adept at predicting PRNG outputs than others. Investigating [7] M. Islam, G. Chen, S. Jin, An Overview of Neural the development of novel neural network models or hybrid Network, American J. Neural Netw. Appl. 5(1) (2019) architectures that can more efficiently process and predict 7–11. doi: 10.11648/j.ajnna.20190501.12. complex sequences is an exciting frontier. This includes [8] K. O’Shea, R. Nash, An Introduction to Convolutional exploring deeper networks, attention mechanisms, and Neural Networks (2015). other advanced features that could further improve [9] Z. Li, et al., A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects prediction accuracy. (2021). Cross-Disciplinary Approaches: Combining insights [10] T. Lin, T. Guo, K. Aberer, Hybrid Neural Networks for from cryptography, machine learning, and complexity Learning the Trend in Time Series. theory could yield innovative approaches to both PRNG [11] D. Psichogios, L. Ungar, A Hybrid Neural Network- design and predictive modeling. Interdisciplinary research First Principles Approach to Process Modeling. might uncover new principles for creating sequences that [12] A. Vaswani, et al., Attention Is All You Need. are inherently more difficult to predict, as well as models Advances in Neural Information Processing Systems that are more adept at understanding complex patterns. 30 (2017). Real-World Application Scenarios: Applying our [13] J. Brownlee, Deep Learning for Time Series findings to real-world scenarios, where PRNGs are used Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python, Machine Learning Mastery (2018). under various constraints and for different purposes, will be [14] V. Desai, R. Patil, D. Rao, Using Layer Recurrent essential. This includes testing PRNGs in environments with Neural Network to Generate Pseudo Random Number high-security requirements, such as in blockchain Sequences, Int. J. Comput. Sci. 9 (2012) 324–334. technologies, secure communications, and digital [15] V. Maksymovych, et al., Hardware Modified Additive signatures. Fibonacci Generators Using Prime Numbers, Ethical Considerations and Security Implications: As Advances in Computer Science for Engineering and research progresses in predicting PRNG outputs, it is 56 Education VI, LNDECT 181 (2023). doi: 10.1007/978-3- 031-36118-0_44. [16] V. Maksymovych, O. Harasymchuk, M. Shabatura, Modified Generators of Poisson Pulse Sequences Based on Linear Feedback Shift Registers, Advances in Intelligent Systems and Computing, AISC 1247 (2021) 317–326. [17] V. Maksymovych, O. Harasymchuk, I. Opirskyy, The Designing and Research of Generators of Poisson Pulse Sequences on Base of Fibonacci Modified Additive Generator, International Conference on Theory and Applications of Fuzzy Systems and Soft Computing, ICCSEEA 2018: Advances in Intelligent Systems and Computing 754 (2019) 43–53. [18] R. Hamza, A Novel Pseudo Random Sequence Generator for Image-Cryptographic Applications, J. Info. Secur. Appl. 35 (2017) 119–127. [19] O. Harasymchuk, Generator of Pseudorandom Bit Sequence with Increased Cryptographic Security, Metallurgical and Mining Industry: Sci. Tech. J. 6(5) (2014) 24–28. [20] M. O’Neill, PCG: A Family of Simple Fast Space- Efficient Statistically Good Algorithms for Random Number Generation (2014). [21] M. Matsumoto, T. Nishimura, Dynamic Creation of Pseudorandom Number Generators (2015). [22] B. Widynski, Middle-Square Weyl Sequence RNG (2017). [23] K. Okada, et al., Learned Pseudo-Random Number Generator: WGAN-GP for Generating Statistically Robust Random Numbers, PLoS One 18(6) (2023). doi: 10.1371/journal.pone.0287025. 57