Automation of DDoS attack investigation in industrial control systems using Bayesian networks on Python ⋆ Valeriy Lakhno1,*,†, Miroslav Lakhno2,†, Olena Kryvoruchko3,†, Serhii Kaminskyi3,† and Vadym Makaiev1,† 1 National University of Life and Environmental Sciences of Ukraine, 15 Heroiv Oborony str., 03041, Kyiv, Ukraine 2 e-Docs.UA, 21A Degtyarivska str., 04119, Kyiv, Ukraine 3 State University of Trade and Economics, 19 Kyoto str., 02156 Kyiv, Ukraine Abstract This paper investigates the possibility of using Bayesian Networks (BN) to analyze and confirm the involvement of a specific computer in a DDoS attack on industrial control systems (ICS). The primary focus is on developing a Python software product that automates the calculation of probabilistic estimates from the collected evidence to confirm various hypotheses about the seized computer’s involvement in a DDoS attack. Automation of the analysis through the developed Python software product will eliminate subjective errors and bias, speed up data processing, and ensure objective conclusions based on the available evidence. The hypotheses and corresponding evidence related to the use of BN for modeling complex relationships between events during the execution of DDoS attacks from the suspect computer are considered. It is shown that the proposed approach facilitates more in-depth and accurate analysis of cybercrimes related to DDoS attacks and can significantly improve the investigation processes and decision-making in ensuring the security of ICS. Keywords industrial control systems, DDoS attacks, investigation, evidence analysis, Bayesian network, Python 1 1. Introduction attack targeted the railway management systems in Sweden [6]. The attack caused system disruptions, leading to train In the modern digital world, where more aspects of life are delays and cancellations. In 2017, a DDoS attack on a transitioning online, cybercrime and cybersecurity have semiconductor manufacturer caused failures in their become urgent problems hindering societal development. production management system, resulting in significant These problems require adequate solutions through the production delays and economic losses. Even this brief collective efforts of specialists in various fields, from IT to overview demonstrates that DDoS attacks pose a serious threat law, since many cybercrimes, such as DDoS attacks on to ICS, disrupting their normal operation and causing computer systems and networks (CSN), can have significant significant negative consequences. These attacks can halt consequences for individuals, organizations, and even states production processes, lead to economic losses, and even pose [1, 2]. The scenarios used by cybercriminals are quite safety threats [7]. Therefore, in this paper, we investigate the creative and constantly evolving, making cybercrime possibility of developing a Python software product that, based increasingly sophisticated and complex. on the mathematical apparatus of Bayesian Networks (BN), As demonstrated in [3, 4], DDoS attacks pose a significant helps automate the analysis and calculation of probabilistic danger to industrial control systems (ICS). These systems are estimates from collected evidence to confirm or refute often used in enterprises and critical infrastructure such as hypothesis. Such a tool will be extremely useful for the effective energy, water supply, transport, and manufacturing. Attacks on investigation of DDoS attacks, facilitating the work of ICS, including DDoS attacks, can lead to severe consequences, specialists and improving the accuracy of conclusions. such as operational disruptions, economic losses, and threats to A key role in investigating unauthorized interference in human safety. For instance, in 2013, an attack targeted the U.S. CSN, such as organizing DDoS attacks, is the search for water supply systems [5]. The attack could have caused evidence in the non-material (digital) environment. From a equipment failures controlling water distribution and software-technical perspective, the elements of CSN during wastewater treatment, posing a serious public health threat. an investigation at the site of a potential cyberattack, such Cybersecurity specialists managed to prevent such a scenario as a DDoS attack, require extreme caution, considering at an early stage of the attack’s development. In 2016, a DDoS CPITS-II 2024: Workshop on Cybersecurity Providing in Information 0000-0001-9695-4543 (V. Lakhno); and Telecommunication Systems II, October 26, 2024, Kyiv, Ukraine 0000-0001-6979-6076 (M. Lakhno); ∗ Corresponding author. 0000-0002-7661-9227 (O. Kryvoruchko); † These authors contributed equally. 0000-0002-4884-1517 (S. Kaminskyi); lva964@nubip.edu.ua (V. Lakhno); 0009-0008-5561-4508 (V. Makaiev) valss725@gmail.com (M. Lakhno); © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). olena_909@ukr.net (O. Kryvoruchko); s.kaminskyj@knute.edu.ua (S. Kaminskyi); makaiev.vadym@gmail.com (V. Makaiev) CEUR Workshop ceur-ws.org ISSN 1613-0073 282 Proceedings factors such as the large volume of electronic information, and networkx provide powerful tools for building, training, the presence of intellectual property rights on parts of the and visualizing BN. This greatly simplifies the development information, hidden data inaccessible to the regular process and allows focusing on solving specific tasks rather computer user, and the risks of data loss due to careless than creating tools from scratch. actions or the potential programmed automatic execution of The development environment used was PyCharm, one data destruction algorithms. of the most powerful and convenient development As demonstrated in [8, 9], using the BN apparatus to environments for Python, offering many tools that simplify prove the involvement of a specific computer in a DDoS the writing, debugging, and testing of code. It is worth attack is a powerful tool. Bayesian Networks (BN) allow for noting that Python and PyCharm run on all major operating modeling complex cause-and-effect relationships between systems (Windows, macOS, Linux), ensuring the possibility various aspects of digital evidence and drawing of developing and using the program across different substantiated conclusions based on available data. This is platforms. In our view, using Python and PyCharm to especially important for establishing the fact of a specific develop a software product automating evidence analysis computer’s involvement in carrying out a DDoS attack, with BN provides the optimal combination of convenience, which requires analyzing numerous factors and power, and flexibility. This allows the creation of efficient, probabilities. As shown in [8], BN can effectively integrate reliable, and easily maintainable solutions for cybersecurity data from various sources, including network activity logs, tasks, including investigating DDoS attacks on ICS. system configurations, and user information, significantly The main hypothesis (H), see Table 1 and Fig. 1, according enhancing the accuracy and reliability of investigations. to assumption (H_DDOS_T arg et), is that the seized computer Thus, the use of BN in cybercrime investigations opens new could have been used to carry out a DDoS attack on the target prospects for improving the efficiency and reliability of CSN. This hypothesis may include at least two sub-hypotheses. identifying participants in DDoS attacks. All the above has H1 is that the seized computer was used to gain access to the prompted our interest in this topic. target CSN, а H2 is that the seized computer was used to organize the DDoS attack. Evidence (E) for each sub-hypothesis 2. Methods and models might include, for example, the presence of the target CSN’s IP address on the seized computer or the matching of the seized A crucial aspect of finding and securing digital (electronic) computer’s IP address with the attacker’s IP address identified evidence is adhering to the “best evidence rule” [10–14]. by the provider. Compliance with this principle depends on using specialized Presenting the BN structure as shown in Fig. 1 offers knowledge in collecting electronic evidence, which IT many advantages. For example, visualization helps to more specialists possess. This helps safeguard data from easily understand the complex probabilistic relationships accidental deletion or damage and prevents cases of between hypotheses and evidence. The connections programmed self-destruction of files, for example, when an between nodes (hypotheses and evidence) are visible, incorrect password is entered into the directory. Given the facilitating understanding of the structure and logic of above, when searching for digital evidence, it is important reasoning. The graphical representation allowed us to to consider the identified evidence, such as tools for intuitively evaluate the influence of each piece of evidence executing DDoS attacks. Suppose, during the investigation on the sub-hypotheses and the main hypothesis. In general, of a DDoS attack, a computer suspected of carrying out the such a software product will help experts and users better attacks was seized. During the analysis of this computer’s understand the basis of their decisions and how various contents, specialized programs (Low Orbit Ion Canon, pieces of evidence affect the hypothesis’s probability. This HULK, PYLORIS, TORS HAMMER, etc.) or scripts for will contribute to more reasoned and confident decisions in launching DDoS attacks may be found. The work history or investigating such crimes. It is worth noting that graphical logs may contain records of launching tools commonly used representation makes the information accessible to a wide for DDoS attacks and connections to command servers used audience, including those who may not have in-depth to manage botnets. knowledge of mathematics and statistics. This facilitates The development of the research outlined in [8] involves discussion and explanation of conclusions among team creating a practical Python-based software product. This members and stakeholders. Additionally, visualization helps product will automate the calculation of probabilistic identify gaps in the data and dependencies that may require estimates of collected evidence to confirm hypotheses based further investigation or data collection, contributing to a on the mathematical apparatus of Bayesian Networks. This more comprehensive and detailed analysis of the situation. program will significantly simplify the work of both IT For implementing the Python program, we structured specialists and forensic investigators involved in sub-hypotheses and corresponding evidence for the main investigating DDoS attacks, providing accurate and reliable hypothesis (see Table 1). results comparable to those obtained with the GeNIe package. From a legal perspective, seized objects (computer Python is one of the most popular programming equipment and its components) are considered potential languages due to its simplicity and readability, allowing for sources of evidence, and any unprofessional actions the quick and efficient development of complex algorithms. involving them may result in the loss or inadmissibility of Additionally, Python has a rich set of libraries and such evidence. In this regard, a well-justified position frameworks for statistical analysis, machine learning, and emphasizes the need for advanced specialized training for investigators involved in cybercrime investigations, aligned working with Bayesian Networks. For example, libraries with modern challenges and the future development of the such as pgmpy (used in our product), scikit-learn, PyMC3, information technology sector. 283 Table 1 Structuring Sub-hypotheses and Evidence in the Python Product for Automating Analysis and Probabilistic Estimates Calculation of Collected Evidence for Hypotheses Confirmation Based on the Mathematical Apparatus of Bayesian Networks Main Hypothesis H: A seized computer was used to launch a DDoS attack on the target computer Sub-hypothesis H1: The seized Sub-hypothesis H2: The seized computer was used to conduct the DDoS attack computer was used to access the target Evidence for Sub-hypothesis H2: computer E5: Evidence of the suspect's qualifications was found. Evidence for Sub-hypothesis H1: E6: The IP address of the seized computer matches the attacker’s IP address at the time of E1: The IP address of the target the attack. computer was found on the seized E7: DDoS tools were found on the seized computer. computer. E8: Evidence of the user creating DDoS tools was found. E2: The URL address of the target E9: Log entries of searching for DDoS tools on the Internet were found. computer was found on the seized E10: Log entries of downloading DDoS tools from the Internet were found. computer. E11: A botnet control program was found. E3: The IP address of the target E12: Evidence of the user creating the botnet control program was found. computer matches the access IP address E13: Log entries of a DDoS attack launched on the target computer through the botnet were (as specified by the provider). found. E4: Log entries of access to the target E14: Log entries of connecting to the botnet were found. computer at the relevant time were E15: The IP address of the seized computer matches the botnet control IP address at the found. time of the attack. Figure 1: Structure of a Bayesian network, visualizing the main hypothesis, sub-hypotheses, and corresponding evidence Fig. 2 shows a general view of our software product with a use of a Bayesian network (BN) allows for more precise results output block displaying the probabilistic assessments consideration of the probabilities of various events and their of the collected evidence to support various hypotheses (Main interrelations, which often leads to objective conclusions. It hypothesis—the seized computer (CSN) was used to launch a is important to note that automated systems, such as the one DDoS attack on the target computer, along with two sub- proposed in this work, significantly accelerate the process hypotheses described earlier). In addition to this output of analyzing large volumes of data. format, the obtained results can be visualized more clearly in This is especially important in time-constrained the form of histograms, as shown in Fig. 3. environments during cybercrime investigations, as the use This format of visualizing conclusions in the form of of Bayesian networks allows for the effective representation histograms, obtained for the probabilities of various of complex dependencies between various pieces of evidence during the investigation of DDoS attacks from the evidence and hypotheses. Additionally, automation enables suspect’s computer, makes the process of analyzing the use of advanced algorithms and analytical methods that evidence more convenient and easier to interpret. may not be available during manual data processing. This Automation largely eliminates subjective errors and leads to higher-quality and deeper evidence analysis, bias that can occur during manual analysis of evidence. The increasing the chances of successfully investigating crimes 284 related to the implementation of DDoS attacks on ICS (industrial control systems). Figure 2: General view of the conclusions obtained during the calculation of probabilistic assessments of the collected evidence to support various hypotheses 285 Figure 3: Visualization of conclusions in the form of histograms, obtained for the probabilities of various pieces of evidence during the investigation of DDoS attacks from the suspect’s computer The development of a software product in Python using one presented above, provide quantitative probabilistic Bayesian networks, in our view, ensures the standardization assessments that assist investigators and experts in making of analysis methods. This allows practicing specialists in the well-informed decisions. In particular, modeling various field of cybercrime investigations to apply a unified scenarios and their probabilistic evaluations enables more approach to various investigations, simplifying the training accurate forecasting of outcomes and the development of and preparation of specialists and ensuring consistency in strategies for investigating such crimes in the future. methods and approaches. Automated systems, similar to the 286 Finally, automation ensures the transparency of the analysis References process, allowing the results to be easily reproduced and verified. This is critically important for the legal validity of [1] P. Anakhov, et al., Evaluation Method of the Physical conclusions and their presentation in court. Compatibility of Equipment in a Hybrid Information The prospect of further research lies in the addition of Transmission Network, Journal of Theoretical and dialogue windows for expert interaction to the developed Applied Information Technology 100(22) (2022) 6635– software product. This will significantly enhance the 6644. usability of the computational core based on the Bayesian [2] V. Zhebka, et al., Optimization of Machine Learning network, which is particularly important for investigating Method to Improve the Management Efficiency of applied cases related to DDoS attacks on industrial control Heterogeneous Telecommunication Network, in: systems (ICS). Expert dialogue windows will provide an Workshop on Cybersecurity Providing in Information intuitive and user-friendly interface, simplifying data entry and Telecommunication Systems, vol. 3288 (2022) and system interaction. This is crucial because experts 149–155. investigating DDoS attacks are often not programming [3] A. A. Cárdenas, et al., Attacks against process control specialists. A simple and clear interface will allow them to systems: risk assessment, detection, and response, 6th effectively use the software product without requiring deep ACM Symposium on Information, Computer and programming knowledge. Moreover, the introduction of Communications Security (2011) 355–366. dialogue windows will significantly reduce the time needed [4] Z. Jadidi, et al., Automated detection-in-depth in for data entry and processing. Experts will be able to industrial control systems, Int. J. Adv. Manufacturing interact with the system more quickly and efficiently, Technol. 118(7) (2022) 2467–2479. thereby accelerating the investigation process. [5] N. Tuptuk, et al., A systematic review of the state of cyber-security in water systems, Water, 13(1) (2021) 3. Conclusions 81. [6] C. Cheh, Protecting critical infrastructure systems In this paper, the following main results were obtained: using cyber, physical, and socio-technical models, It is shown that the use of Bayesian Networks (BN) in Doctoral dissertation, University of Illinois at Urbana- the developed Python software product will automate the Champaign (2019). process of analyzing collected evidence, eliminating [7] V. Astapenya, et al., Conflict Model of Radio subjective errors and bias often arising in the manual Engineering Systems under the Threat of Electronic processing of data during cybercrime investigations. Warfare, in: Workshop on Cybersecurity Providing in It is demonstrated that automating the analysis will Information and Telecommunication Systems, CPITS, significantly reduce the time required to process large vol. 3654 (2024) 290–300. amounts of data, which is especially important in time- [8] H. Tse, K.-P. Chow, M. Kwan, A Generic Bayesian limited conditions when investigating cybercrimes, Belief Model for Similar Cyber Crimes, 9th particularly DDoS attacks. International Conference on Digital Forensics (DF) It is established that for the task of establishing (2013) 243–255. doi: 10.1007/978-3-642-41148-9_17. responsibility for carrying out DDoS attacks, BN allows for [9] G. Yan, et al., Towards a Bayesian network game accounting for the probabilities of various events and their framework for evaluating ddos attacks and defense, relationships, leading to more accurate and objective ACM conference on Computer and communications conclusions. This is critically important for the legal security (2012) 553–566. justification of conclusions and their presentation in court. [10] K. L. Hui, S. H. Kim, Q.H. Wang, Marginal deterrence It is demonstrated that developing a Python-based in the enforcement of law: Evidence from distributed software product ensures the unification of analysis denial of service attack (2013). methods, allowing a consistent approach to different [11] P. Das, P. Sarkar, The Importance of Digital Forensics investigations, and simplifying the training and preparation in the Admissibility of Digital Evidence, NUJS J. Regul. of specialists. Stud. 7(60) (2022). It is shown that automation ensures the transparency of [12] O. Kryvoruchko, et al., Analysis of technical the analysis process, allowing for easy reproduction and indicators of efficiency and quality of intelligent verification of results, and enhancing trust in conclusions systems, Journal of Theoretical and Applied and their legal significance. Information Technology, 101(24) (2023). The presented approach and the developed software [13] A. Adranova, et al., Methodology forming for the product can be effectively used to model various scenarios approaches to the cyber security of information and their probabilistic assessments, allowing for more systems management, J. Theor. Appl. Inf. Technol. accurate predictions of cybercrime consequences and 98(12) (2020) 1993–2005. developing strategies for their investigation in the future. [14] H. Hnatiienko, et al., Prioritizing Cybersecurity The work demonstrates that the proposed automation of Measures with Decision Support Methods Using cybercrime analysis using BN is an important step in Incomplete Data, in: 21th International Scientific and improving the investigation and decision-making processes, Practical Conference “Information Technologies and particularly in the context of DDoS attacks on ICS. Security”, vol. 3241 (2021) 169–180. 287