Modeling the detection process of polymorphic malware based on the Lotka-Volterra model⋆ Maksym Chaikovskyi1,∗,†, Inna Chaikovska1,†, Tomas Sochor2,†, Inna Martyniuk 1,† and Oleksii Lyhun1,† 1 Khmelnytskyi National University, Instytuts’ka Str. 11, 29000, Khmelnytskyi, Ukraine 2 Prigo University, Havirov, Czech Republic Abstract The article proposes the use of the Lotka-Volterra model ("predator-prey" model) for modeling the process of detecting polymorphic malware. It is proposed to consider α as the probability that the number of polymorphic viruses will increase; β - the probability that polymorphic viruses of different levels of complexity will be detected using the selected methods, technologies and tools; γ - the probability that some of the selected methods, technologies and tools will not be effective in detecting polymorphic viruses of different levels of complexity as a result of the appearance of new varieties; δ - the probability that polymorphic viruses of different levels of complexity will require the complex use of selected methods, technologies and tools, as well as the latest approaches; x - quantitative measurement of polymorphic viruses at time t; y is a quantitative measure of the available technologies, methods and tools for detecting polymorphic viruses at time t. The influence of input indicators on the maximum rate of spread and detection of polymorphic viruses in its fluctuating process was studied. This approach confirms the feasibility of using a set of methods to detect polymorphic malware: string search algorithms, intelligent data analysis, sandbox analysis, machine learning, the method of developing structural functions, probabilistic logical networks. Keywords polymorphic malware, detection probability of polymorphic malware, Lotka-Volterra model 1 1. Introduction The use of tools and techniques to detect polymorphic malware can be compared to the classic predator-prey model. The Lotka-Volterra model ("predator-prey" model) describes a population consisting of two species that interact with each other. Victims die out at a rate equal to the number of encounters between predators and prey, which is proportional to the size of both populations. Predators reproduce at a rate that is proportional to the amount of prey eaten by the predators. The system of equations that describes such a population is called the Lotka-Volterra model. According to the conditions of the model, the victims eat the plants, and the predators eat the victims. We will use this model to simulate the process of detecting polymorphic malware. Polymorphic viruses in a computer system will act as a "victim", tools and methods for detecting polymorphic malware will act as a "predator". AdvAIT-2024: 1st International Workshop on Advanced Applied Information Technologies, December 5, 2024, Khmelnytskyi, Ukraine - Zilina, Slovakia ∗ Corresponding author. † These authors contributed equally. max.chaikovskyi@gmail.com (M. Chaikovskyi); inna.chaikovska@gmail.com (I. Chaikovska); tomas.sochor@osu.cz (T. Sochor); inmartunyk@ukr.net (I. Martyniuk); oleksii.lyhun@gmail.com (O. Lyhun) 0000-0002-9596-6697 (M. Chaikovskyi); 0000-0001-7482-1010 (I. Chaikovska); 0000-0002-1704-1883 (T. Sochor); 0009- 0007-7751-8974 (I. Martyniuk); 0009-0004-5727-5096 (O. Lyhun) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Literature review The problem of detecting malicious software is quite relevant and a significant amount of research by scientists is devoted to it. The work [1] reflects a comprehensive modern review of research on the malware detection model. The paper [2] proposes an intelligent agent system for detecting DDoS attacks using automatic feature selection and selection. In [3], it is stated that detection of malicious traffic in computer systems and improvement of security of computer networks is possible using the results of analysis and detection of malicious programs using machine learning algorithms to calculate the difference in correlation symmetry. The use of machine learning is also proposed in [15]. The study [4] proposed an approach that takes advantage of the deep transfer methodology and includes a fine-tuning method and various combination strategies to improve detection and classification performance without the need to develop training models from scratch. In [6], malicious software is detected with the help of convolutional neural networks (CNN), in [9] with the help of machine learning algorithms. The work [10] compares the methods of detecting malicious programs based on static, dynamic and hybrid analysis. In work [12] proposes a new systematic approach to identifying modern malware using dynamic deep learning-based methods combined with heuristic approaches to classify and detect five modern malware families: adware, Radware, rootkit, SMS malware, and ransomware. The work [13] proposes an integrated framework for implementing IoT with blockchain technology to guarantee high level of security and validation process based on the integration between consensus algorithms of blockchain (PBFT and Tangle). In [14], a conceptual model of multi-computer systems was developed, which is designed to ensure the functioning of anti-virus baits and traps for detecting malicious programs. In works [17, 18] proposed a novel detection approach by generating structural features through computing a stream of byte chunks using compression ratio, entropy, Jaccard similarity coefficient and Chi-square statistic test. The paper [20] presents an approach to the detection of metamorphic viruses based on the analysis of its obfuscation features. In [21], the K-NN algorithm was used to detect malicious software. In [22], a support vector machine (SVM) model was used to detect malicious software. Dynamic Malware Analysis with Reinforcement Learning was carried out in [24]. In [25], a method for determining the effectiveness of a distributed system for detecting anomalous manifestations is proposed. In work [26], a method for detecting unknown metamorphic viruses is proposed, which is based on the analysis of potentially suspicious behavior of programs on the host, and in work [27] - a method for detecting metamorphic viruses, based on the search for equivalent functional blocks. The paper [28] addresses the challenges associated with App-DDoS detection and presents a highly effective and adaptable solution for detecting various types of App-DDoS attacks. So, you can see quite a wide selection of methods for detecting malware. One of these models is also the Lotka-Volterra model (“predator-prey” model) [5, 23], which has found wide use in various areas of our life: in space research [7], biology [8, 11], in many in the fields of engineering [16], medicine [19], security assessment of cyber-physical systems [29]. However, the use of this model for researching the process of identifying polymorphic software is quite appropriate and relevant, which is why this study is devoted to it. 3. Methodology Consider the classic Lotka-Voltaire model and its adaptation to the process of detecting polymorphic malware. 3.1. The classic Lotka-Volterra model In general, the model of interspecific competition looks as follows: 𝑑𝑑𝑑𝑑 = (α − 𝛽𝛽𝛽𝛽)𝑥𝑥 � 𝑑𝑑𝑑𝑑 (1) 𝑑𝑑𝑑𝑑 = (−𝛾𝛾 + 𝛿𝛿𝛿𝛿)𝑦𝑦 𝑑𝑑𝑑𝑑 where x is the number of victims; y is the number of predators; t – time; α, β, γ, δ are coefficients that reflect the interaction between species. 3.2. Adaptation of the model to study the process of detection of polymorphic malware In the case of adaptation of the model to simulate the polymorphic malware detection process, α, β, γ, δ can display the following: α is the probability that the number of polymorphic viruses will increase; β is the probability that polymorphic viruses of different levels of complexity will be detected using the selected methods, technologies and tools; γ is the probability that some of the selected methods, technologies and tools will not be effective in detecting polymorphic viruses of different levels of complexity as a result of the appearance of new varieties; δ is the probability that polymorphic viruses of different levels of complexity will require the complex use of selected methods, technologies and tools, as well as the latest approaches; x - quantitative measurement of polymorphic viruses at time t; y is a quantitative measure of the available technologies, methods and tools for detecting polymorphic viruses at time t. It immediately follows from the system that if there are no polymorphic viruses (x = 0), then the number of necessary methods, technologies and tools for their detection will decrease exponentially with a certain initial coefficient (γ according to formula 1). 𝑦𝑦̇ = −𝛾𝛾 ∙ 𝑦𝑦 → 𝑦𝑦 = 𝐶𝐶1 ∙ 𝑒𝑒 −𝛾𝛾∙𝑡𝑡 , 𝐶𝐶1 𝜖𝜖𝜖𝜖., (2) A similar situation is obtained in the complete absence of methods, technologies and tools for detecting polymorphic viruses (y = 0): 𝐸𝐸𝑥𝑥̇ = 𝛼𝛼 ∙ 𝑥𝑥 → 𝑥𝑥 = 𝐶𝐶2 ∙ 𝑒𝑒 𝛼𝛼∙𝑡𝑡 , 𝐶𝐶2 𝜖𝜖𝜖𝜖., (3) This equation (3) is sometimes called the Malthus equation. Therefore, the growth of polymorphic viruses is exponential with a certain, predetermined constant (α). It is worth noting that the Lotka-Volterra model makes several assumptions: 1. There is a constant appearance of polymorphic viruses. 2. Polymorphic viruses, as well as their detection technologies, are in the computer system. 3. Only the presence of polymorphic viruses and their detection technologies in the computer system is taken into account. Let's find special points possessed by the system: 𝛼𝛼 𝑦𝑦(0) = (α − 𝛽𝛽𝛽𝛽)𝑥𝑥 = 0 𝛼𝛼𝛼𝛼 = 𝛽𝛽𝛽𝛽𝛽𝛽 𝛽𝛽 𝐸𝐸 � →� →� 𝛾𝛾 . (4) (−𝛾𝛾 + 𝛿𝛿𝛿𝛿)𝑦𝑦 = 0 𝛾𝛾𝛾𝛾 = 𝛿𝛿𝛿𝛿𝛿𝛿 𝑥𝑥(0) = 𝛿𝛿 It is clear that when x (0) = 0, y (0) = 0, the special point will be precisely (0, 0), but this case is not interesting, because at the zero moment of time there are no polymorphic viruses and technologies for their detection and, logically , no longer appear. Much more interesting things happen in the nonzero case. Depending on the initial parameters, a special point will change - such a number of viruses and their detection technologies, when both indicators remain unchanged and balanced. If the initial condition does not fall into a special point, the phase curves will be located around it, forming an infinite cyclic oscillation, which was exactly what Lotka and Volterra were talking about. That is, the number of polymorphic viruses will grow, and the number of effective methods for their detection will fall, then vice versa, and so on for an unlimited amount of time (within reasonable limits, of course). 3.3. Stages of the proposed integrated approach to detection, analysis and classification of polymorphic malware This approach is the second stage in the proposed comprehensive approach to detection, analysis and classification of polymorphic malware (Figure 1). 4. Experiments Consider the implementation of the "predator-prey" model for modeling the process of detecting polymorphic malware using the Lotka-Volterra equation solver [30]. The following scale is used to denote x and y parameters (table 1). The β indicator was formed on the basis of previous studies on the effectiveness of the complex use of the above methods for detecting polymorphic malware. 4.1. Experiment 1 Experiment 1 (2 methods were used to detect polymorphic viruses) involves the following input parameters (Figure 2, 3): α=0.2; β=0.3 (2 methods were used to detect polymorphic viruses); γ=0.7; δ=0.3; x=1; y=1; max_time = 100 (seconds); t = 1. Table 1 Point Scale for Input Parameters Ball х y β scale Polymorphic viruses of to 1 the 1st level of 1 method used (string search algorithm) 0.1 complexity 2 methods were used (string search algorithm + 1 of the Polymorphic viruses of methods (intelligent data analysis, sandbox analysis, 2 the 2nd and lower levels 0.3 machine learning, structural function development of complexity method) 3 methods were used (string search algorithm + 2 of the Polymorphic viruses of methods (intelligent data analysis, sandbox analysis, 3 the 3rd and lower levels 0.4 machine learning, structural function development of complexity method) 4 methods were used (string search algorithm + 3 Polymorphic viruses of methods (intelligent data analysis, sandbox analysis, 4 the 4th and lower levels 0.5 machine learning, structural function development of complexity method) Polymorphic viruses of 5 methods were used (row search algorithm, intelligent 5 the 5th and lower levels data analysis, sandbox analysis, machine learning, 0.6 of complexity structural function development method) 6 or more methods are used (string search algorithm, Polymorphic viruses of 6 or intelligent data analysis, sandbox analysis, machine the 6th and lower levels 0.9 more learning, structural function development method, of complexity probabilistic logic networks) Figure 1: A comprehensive approach to detection, analysis and classification of malicious software. It can be seen in Figures 2, 3 that the process is oscillatory. With the same initial values of the number of polymorphic viruses and methods of their detection on a point scale at the level of 1 point. Under these input values, the number of polymorphic viruses increases, and the number and efficiency of polymorphic virus detection methods decreases. When the value of y reaches β = 0.3, partial detection of polymorphic viruses occurs and their number begins to decrease. The decrease in the number of polymorphic viruses after a certain time begins to be affected by y, and the number of polymorphic viruses reaches the value (in point expression) γ/δ=0.7/0.3=2.33, the number of methods used to detect polymorphic malware also begins to decrease along with by reducing the number of polymorphic viruses. The decrease in the number of polymorphic viruses and methods of its detection decreases until y reaches the value α/β = 0.2/0.3=0.66. At this moment, the number of polymorphic viruses begins to increase, and after a certain period of time and methods of their detection. This process is constantly repeated with a certain period. The periodicity of the process can be clearly observed in the pictures. The number of polymorphic viruses and their detection methods fluctuates around the values of x = 2.33, y = 0.66, respectively. The periodicity of the process is well observed on the phase curve (x(t), y(t)), which is a closed line. The extreme left point of this curve is the point at which the number of polymorphic viruses reaches its minimum value, and the extreme right point - the maximum. Between these points, the number of effective detection methods first decreases to the lower point of the phase curve and then increases to the upper point of the phase curve. The phase curve covers the point x = 2.33 and y = 0.66. At this point, the system has a stationary state (dx/dt=0, dy/dt=0). If at the initial moment the system was at this point, then over time x(t) and y(t) will not change and will remain constant, in all other cases an oscillatory process will be observed. Based on these initial values, the maximum value of polymorphic malware detection methods (in terms of points) will be 2.33 points. It can be seen that the selected virus detection methods are not effective and lead to the spread of viruses to the level of almost 5 points. Figure 2: Temporal functions of the "predator-prey" system (x-axis – time, y-axis – point scale), experiment 1 4.2. Experiment 2 Experiment 2 (5 methods were used to detect polymorphic viruses) involves the following input parameters (Figure 4, 5): α=0.2; β=0.6 (5 methods were used to detect polymorphic viruses); γ=0.2; δ=0.3; x=1; y=1; max_time = 100 (seconds); t = 1. Figure 3: Phase portrait of the predator-prey system, experiment 1. Figure 4: Temporal functions of the "predator-prey" system (x-axis – time, y-axis – point scale), experiment 2. Figure 5: Phase portrait of the predator-prey system, experiment 2. It can be seen that these virus detection methods (5) are effective and lead to a spread of viruses slightly more than 2 points. 4.3. Experiment 3 Experiment 3 (6 methods were used to detect polymorphic viruses) involves the following input parameters (Figure 6, 7): α=0.5; β=0.9 (6 methods were used to detect polymorphic viruses); γ=0.3; δ=0.7; x=1; y=1; max_time = 100 (seconds); t = 1. The selected virus detection methods (6) are effective and result in a virus spread of slightly more than 1 point. Figure 6: Temporal functions of the "predator-prey" system (x-axis – time, y-axis – point scale), experiment 3. Figure 7: Phase portrait of the predator-prey system, experiment 3. 5. Conclusions The study proposes the use of the Lotka-Volterra model for modeling the process of detecting polymorphic malware. It is proposed to consider α as the probability that the number of polymorphic viruses will increase; β - the probability that polymorphic viruses of different levels of complexity will be detected using the selected methods, technologies and tools; γ - the probability that some of the selected methods, technologies and tools will not be effective in detecting polymorphic viruses of different levels of complexity as a result of the appearance of new varieties; δ - the probability that polymorphic viruses of different levels of complexity will require the complex use of selected methods, technologies and tools, as well as the latest approaches; x - quantitative measurement of polymorphic viruses at time t; y is a quantitative measure of the available technologies, methods and tools for detecting polymorphic viruses at time t. The influence of input indicators on the maximum rate of spread and detection of polymorphic viruses in its fluctuating process was studied. This approach confirms the feasibility of using a complex of 6 methods to detect polymorphic malware: string search algorithms, intelligent data analysis, sandbox analysis, machine learning, the method of developing structural functions, probabilistic logical networks. Declaration on Generative AI During the preparation of this work, the authors used Grammarly in order to: grammar and spelling check; DeepL Translate in order to: some phrases translation into English. After using these tools/services, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content. References [1] F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-rimy, T. A. E. Eisa, A. A. H. Elnour, Malware Detection Issues, Challenges, and Future Directions: A Survey. Applied Sciences, 12(17), (2022). doi: 10.3390/app12178482 [2] R. Abu Bakar, X. Huang, M.S. Javed, S. Hussain, M.F. Majeed, An Intelligent Agent-Based Detection System for DDoS Attacks Using Automatic Feature Extraction and Selection. Sensors, 23, (2023), 3333. doi: 10.3390/s23063333 [3] M. S. Akhtar, T. Feng, Malware Analysis and Detection Using Machine Learning Algorithms. Symmetry (Basel), 14(11), (2022), 2304. doi: 10.3390/sym14112304 [4] S. B. Atitallah, M. Driss, I. Almomani, A novel detection and multi-classification approach for IoT-malware using random forest voting of finetuning convolutional neural networks. Sensors, 22(11), (2022) 4302. doi: 10.3390/s22114302 [5] B. Bonnard, J. Rouot, Feedback classification and optimal control with applications to the controlled Lotka–Volterra model. Optimization, (2024) 1–24. doi:10.1080/02331934.2024.2392209 [6] A. Chakraborty, K. Kriti, Yateendra, M.S. Bennet Praba, Polymorphic Malware Detection by Image Conversion Technique. International Journal of Engineering and Advanced Technology (IJEAT), 9(3), (2020) 2898-2903. doi: 10.35940/ijeat.B4999.029320 [7] Y. Chen, J. Ni, Y.C. Ong, Lotka–Volterra models for extraterrestrial self-replicating probes. The European Physical Journal Plus 137, 1109 (2022). doi:10.1140/epjp/s13360-022-03320-3 [8] M. Clenet, F. Massol, J. Najim, Equilibrium and surviving species in a large Lotka–Volterra system of differential equations. Journal of Mathematical Biology, 87 (2023), 13. [9] R. Chiwariro, L. Pullagura, Malware Detection and Classification Using Machine Learning Algorithms, International Journal for Research in Applied Science & Engineering Technology, IJRASET, 11 (2023) 1727-1738. doi: 10.22214/ijraset.2023.55255 [10] A. Damodaran, F.D. Troia, C.A. Visaggio, T. H. Austin, M. Stamp, A comparison of static, dynamic, and hybrid analysis for malware detection, J Comput Virol Hack Tech 13 (2017) 1–12. doi: 10.1007/s11416-015-0261-z [11] J. Davis, D. Olivença, S. Brown, E. Voit, Methods of quantifying interactions among populations using Lotka-Volterra models. Frontiers in Systems Biology, 2, (2022) 1021897. doi: 10.3389/fsysb.2022.1021897 [12] A. Djenna, A. Bouridane, S. Rubab, I.M. Marou, Artificial intelligence-based malware detection, analysis, and mitigation. Symmetry, 15(3), (2023), 677. doi: 10.3390/sym15030677 [13] O. Emam, H. Fahmy, M. Mamdouh, Securing IoT Systems using Blockchain Algorithms. Communications on Applied Electronics, 7(34), (2020) 10-17. doi:10.5120/cae2020652871. [14] A. Kashtalian, S. Lysenko, O. Savenko, A. Nicheporuk, T. Sochor, V. Avsiyevych, Multi- computer malware detection systems with metamorphic functionality, Radioelectronic and Computer Systems 1 (2024) 152-175. doi: 10.32620/reks.2024.1.13. [15] A. J. Kurian, A. Santhosh, M. Subin, Enhanced malware detection framework leveraging machine learning algorithms. International Research Journal of Modernization in Engineering Technology and Science 06(03) (2024) 3597-3603. [16] Y. Lin, Q. Din, M. Rafaqat, A. A. Elsadany, Y. Zeng, Dynamics and Chaos Control for a Discrete- Time Lotka-Volterra Model. IEEE Access, 8, (2020) 126760-126775. [17] Y. T. Ling, · N. F. M. Sani, · M. T. Abdullah, · N. A. W. A. Hamid, Metamorphic malware detection using structural features andnonnegative matrix factorization with hidden markov model, Journal of Computer Virology and Hacking Techniques 18 (2022)183–203. [18] Y. T. Ling, N. F. M. Sani, M. T. Abdullah, N. A. W. A. Hamid, Structural Features with Nonnegative Matrix Factorization for Metamorphic Malware Detection, Computers & Security 104, 2 (2021) 102216. doi: 10.1016/j.cose.2021.102216 [19] A. Manikandan, Investigative Study of the Behavior of Lotka-Volterra Model of COVID-19. International Journal of Science and Research (IJSR), 10(11), (2021) 556-558. [20] G. Markowsky, O. Savenko, S. Lysenko, A. Nicheporuk, The technique for metamorphic viruses' detection based on its obfuscation features analysis, CEUR-WS, 2104 (2018): 680–687. [21] E. Masabo, et. al., Structural Feature Engineering approach for detecting polymorphic malware, in: Proceedings of the 15-th IEEE Intl Conf on Dependable, Autonomic and Secure Computing, 15-th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress, DASC/PiCom/DataCom/CyberSciTech, 2017, pp. 716-721. [22] C. B. Nwagwu, O. E. Taylor, N. D. Nwiabu, A Model for Detection of Malwares on Edge Devices. International Journal Of Engineering And Computer Science, 13(07), (2024) 26274-26283. [23] L. Poley, J. W. Baron, T. Galla, Generalized Lotka-Volterra model with hierarchical interactions. Physical Review E, 107, (2023) 024313. doi: 10.1103/PhysRevE.107.024313 [24] K. Potter, R. Shad, Dynamic Malware Analysis with Reinforcement Learning. Journal of Cyber Security, July 16, (2024). doi: 10.2139/ssrn.4897267 [25] B. Savenko, A. Kashtalian, A method for determining the effectiveness of a distributed system for detecting abnormal manifestations, Computer Systems and Information Technologies 2 (2022) 14–22. doi: 10.31891/csit-2022-2-2 In Ukrainian [26] O. Savenko, S. Lysenko, A. Nicheporuk, B. Savenko, Approach for the Unknown Metamorphic Virus Detection, in: Proceedings of the 8-th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS, Bucharest, Romania, 2017, pp. 71–76. doi: 10.1109/IDAACS.2017.8095052 [27] O. Savenko, S. Lysenko, A. Nicheporuk, B. Savenko, Metamorphic Viruses’ Detection Technique Based on the Equivalent Functional Block Search, CEUR-WS, 1844 (2017): 555–569. [28] D. M. Sharif, H. Beitollahi, Detection of application-layer DDoS attacks using machine learning and genetic algorithms. Computers & Security, 135, (2023) 103511. [29] S. Yevseiev, S. Pohasii, S. Milevskyi, O. Milov, Y. Melenti, I. Grod, D. Berestov, R. Fedorenko, O. Kurchenko, Development of a method for assessing the security of cyber-physical systems based on the Lotka–Volterra model. Eastern-European Journal of Enterprise Technologies, 5(9(113), (2021) 30–47. doi: 10.15587/1729-4061.2021.241638 [30] Lotka-Volterra equation solver. URL: https://fusion809.github.io/LotkaVolterra/