=Paper=
{{Paper
|id=Vol-3736/paper24
|storemode=property
|title=Obfuscation technologies of high-level source code using artificial intelligence
|pdfUrl=https://ceur-ws.org/Vol-3736/paper24.pdf
|volume=Vol-3736
|authors=Igor Golovko,Oleg Savenko,Petro Vizhevskiy,Olexandr Klein,Abdel-Badeeh M. Salem
|dblpUrl=https://dblp.org/rec/conf/icyberphys/GolovkoSVKS24
}}
==Obfuscation technologies of high-level source code using artificial intelligence==
Obfuscation technologies of high-level source code using artificial intelligence ⋆ Igor Golovko1,†, Oleg Savenko 1,† , Petro Vizhevskyi 1,†, , Olexandr Klein 1,†, , Abdel- Badeeh M. Salem 2,† 1 Khmelnytskyi National University, 11 Institutska Street, Khmelnytskyi, 29000, Ukraine 2 Ain Shams University, Egypt Abstract This article provides an in-depth exploration of "Obfuscation Technologies of Source Code," focusing on the latest advancements in methodologies to safeguard intellectual property in software. It meticulously analyzes several key obfuscation techniques, including Identifier Renaming, Control Flow Obfuscation, and the strategic insertion of Dead or Junk Code. Each technique is detailed in terms of its implementation, benefits, and the specific aspects of software security it enhances. The research further introduces a significant innovation through the integration of Artificial Intelligence (AI) in the obfuscation process. AI is leveraged to dynamically optimize obfuscation patterns and predict the most effective techniques tailored to specific software environments, which marks a considerable improvement over traditional methods that often require manual intervention and are prone to errors. The article substantiates these advancements with a theoretical framework that models the effectiveness of obfuscation strategies using advanced machine learning algorithms. These models assess the resilience of obfuscated code against reverse engineering, providing a quantitative basis for the enhancements in security measures. This comprehensive discussion not only sheds light on current practices but also sets the stage for future research and application in software security, making it an essential resource for developers and cybersecurity experts dedicated to enhancing the robustness of software protection. Keywords Code Obfuscation, Software Security, Artificial Intelligence in Security, Source Code Protection Strategies. 1 2 1. Introduction In the digital era, where software becomes an integral part of nearly every industry, the protection of intellectual property assumes a special significance. Code obfuscation, as a method of safeguarding software from reverse engineering, plays a pivotal role in ensuring the security and confidentiality of the developed product. This process involves transforming the primary ICyberPhyS-2024: 1st International Workshop on Intelligent & CyberPhysical Systems, June 28, 2024, Khmelnytskyi, Ukraine ∗ Corresponding author. † These authors contributed equally. mailto:i85.golovko@gmail.com (I. Golovko); mailto:savenko_oleg_st@ukr.net (O. Savenko); petro.vizhevskyi@gmail.com (P. Vizhevskyi); mailto:olexandrkleyn@gmail.com (O. Klein); mailto:abmsalem@yahoo.com (Abdel-Badeeh M. Salem); 0009-0004-2173-5126 (I. Golovko); 0000-0002-4104-745X (O. Savenko); 0009-0009-4851-0839 (P. Vizhevskyi), 0000-0002-1896-943X (O. Klein); 0000-0003-0268-6539 (Abdel-Badeeh M. Salem) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings code of the program into a form [1] that makes reverse engineering difficult, while still preserving its functionality. In contemporary programming, one of the greatest challenges is the protection of commercial secrets and innovations which are critical to a company's competitiveness. The leakage or unauthorized copying of software can lead to substantial financial losses and undermine the company's market position. The advancement of internet technologies and multimedia has heightened the need for research in the field of protection and security. Every organization, possessing its own intellectual property, faces the challenge of protecting its data, such as software from piracy or the injection of malicious code [2, 3]. Code obfuscation stands out as one of the advanced techniques in the domain of software protection, which involves transforming the initial code of the program in such a way that, while it becomes difficult to understand, it loses none of its functional properties [4, 5]. This complicates the process of extracting and utilizing important algorithms and procedures that are part of the software product, thus preserving the confidentiality of data processing through programs that is critically important for the business of software purchasers. Even if an obfuscated code can be deciphered by a persistent attacker, integrating obfuscation with other methods, such as code modification detection or protection updates, limits the time available to achieve malicious objectives. Intellectual property protection can be secured both legally and technically. While legal protection involves obtaining copyrights and signing contracts against the creation of duplicates, technical protection requires developers to implement protective mechanisms directly into the software. Together, these strategies form a multi-layered approach to program protection, which is crucial for ensuring long-term security in the digital age. 2. Main Obfuscation Techniques Before diving into the obfuscation process, let's briefly look at a very simplified model of the compilation process of high-level programming languages (C#/Java). First of all we need to understand basic definitions in this process. Source code — Intermediate Language code "IL code" is a stack-based assembly language and serves as the output of the compilation of high-level .NET/Java languages. JIT — Just-In-Time (JIT) compiler is a component of the runtime environment that compiles “IL code” to native machine code at run time. Let’s assume that we have written the code on C# language and want to obfuscate it. Schema below describe this process. Figure 1: simplified compilation scheme with obfuscation. On this schema we can see Post-Compilation Obfuscation process that helps us to protect our source code. Let's look at which obfuscation techniques can be used for that. Code obfuscation includes a series of techniques that make the software code more complex to analyze and understand while preserving its functionality [6]. Here are some of the principal obfuscation techniques commonly employed in software engineering. Identifier Renaming (Renaming) involves changing the names of variables, classes, methods, and other identifiers to non-informative or random names. This complicates the understanding of the program since semantic information that could assist in decrypting the purpose of code components is lost. Control Flow Obfuscation is a technique that modifies the logic of program execution in such a way that it retains functionality but makes the code less comprehensible. For example, the use of fake loops or unnecessary conditional statements makes the execution flow of the code less clear [7, 8]. Insertion of Dead or Junk Code adds code that does not affect the final behavior of the program but complicates its structure. This code can include non-executable instructions or functions that lead nowhere. String Encryption enables the encrypting text strings in the code such as error messages, URLs, or other sensitive data. It prevents the easy extraction of information from executable files. Resource Obfuscation: Protecting program resources such as images, audio files, and other assets by encrypting or modifying them. Control Flow Obfuscation is a technique that introduces changes in the program's control flow to complicate code analysis [27-30]. Instead of direct and obvious execution, the program is reorganized in such a way that the logic of its execution becomes more complex and less predictable for observers or analytical tools. This may include the introduction of false loops, dead code blocks, changes in the execution order of instructions, or the use of conditional statements that appear illogical or mixing control flows (Interleaving Code Paths): Merging several functions or execution paths into one, complicating the separation and analysis of individual components. Transparent Branches is a conditional statements that always execute or never execute, which misleads analytical tools. These methods can be used individually or in combinations to achieve a higher level of code security. The choice of specific obfuscation methods depends on the specific requirements and context of the program's use. 2.1. Identifier Renaming Method in Code Obfuscation Identifier renaming is one of the most popular and effective code obfuscation techniques. This method involves changing the names of variables, functions, classes, and other identifiers to names that carry no semantic load. The goal of this method is to create confusion or mislead anyone trying to perform reverse engineering or unlawfully use the code. Renaming identifiers is based on replacing semantically meaningful names with ones that are random or unintelligible. For example, a variable storing an intermediate result in calculations, originally named tempResult, might be renamed to a1 or x47[9]. This complicates understanding what the code does and reduces the possibility of its analysis on an intuitive level. Manual and Automatic: Identifier renaming can be implemented both manually and automatically using specialized obfuscation tools. Automatic renaming programs analyze the code and replace names using a generation of random character sequences or by using a predefined set of non-informative names [10]. Static Renaming: During static renaming, each unique identifier is assigned a new name that remains unchanged throughout the project. This is a simpler but less flexible approach. S → F → P, (1) Where: S – Set of original code identifiers. P – Set of obfuscated code identifiers. F – Transformation function. Thus, for each value "x" in the set S, there exists a unique value "y" in the set P satisfying the condition: F(x)=y Dynamic Renaming: In dynamic renaming, new names can change depending on the context(C — Context function) in which the identifier is used, adding an additional layer of complexity for those attempting to understand the program logic[11]. S → F → C → P, C�F(x)� = y ′ → C(y) = y ′ (2) Identifier renaming is an important obfuscation strategy used to protect software code from unauthorized access and analysis. Despite some limitations, this technique is one of the most common approaches in the software industry due to its effectiveness and ease of implementation. 2.2. Insertion of Dead or Junk Code The technique of inserting "dead" or "junk" code forms an integral part of the strategies used to complicate the reverse engineering process of software. This method incorporates code segments that, while non-functional concerning the program's outcome, enhance the complexity of the software structure, making it arduous for unauthorized interpretation or analysis. Such code may include inert instructions or purposeless functions that do not impact to the primary functionality of the program. Examples of "Dead" Code: 1. Superfluous variables and computations. int a = 10; int b = a * 2; // An extraneous variable and computation that remain unused 2. Non-functional loops: for (int i = 0; i < 10; i++) { // A loop that performs no meaningful action within the program } 3. Always-true conditional statements: if (true) { // This block of code will invariably execute } Developers may employ automated obfuscation tools to randomly intersperse such code within the source code, thereby mitigating pattern recognition strategies that could potentially identify and excise the redundant code. These tools also ensure that the integration of additional code does not disrupt the core logic or performance of the application. With this approach, we can improve the following indicators: - Increased Analytical Complexity. The insertion of "dead" code significantly muddles the structural clarity of the software, thus thwarting straightforward analytical efforts by potential attackers. - Versatility. This method is universally applicable across various programming languages and software architectures, enhancing its utility in diverse developmental contexts. 2.3. Control Flow Obfuscation Control Flow Obfuscation is a sophisticated technique aimed at complicating the understanding of a program's logic by altering the order of operations and instructions, as well as by introducing additional conditional transitions and loops[12, 13]. This method seeks to obscure the true execution path of a program, thereby hindering analysis and reverse engineering efforts [14,15]. One of the options of the Control Flow is interleaving Code Paths. Interleaving code paths is an advanced obfuscation technique that modifies the execution structure of a program such that logically independent blocks of code are interwoven. This significantly complicates the understanding of the program, as both analytical tools and humans struggle to easily separate individual execution streams. This method involves intertwining several functional parts of the code together, creating a single entangled execution flow that is difficult to separate into primary components. This can be achieved by crossing conditional operators, loops, and functions across different parts of the program. For example, consider the interleaving of conditional operators and loops: if (conditionA) { // Block A1 if (conditionB) { // Block B1 } // Block A2 } else { if (conditionB) { // Block B2 } // Block A3 } for (int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { if (i == j) { // Mixed operation } } } In these examples, the blocks of code and conditions are intertwined in such a way that the logical and execution flows of conditions A and B, as well as the loops i and j, interact with each other in a complex manner, making the code analysis more challenging. Implementing this method can be challenging as it requires a deep understanding of the program's logic and potential impact on performance. Developers must ensure that changes in control flows do not violate the business logic of the application or affect its performance. Automated obfuscation tools can aid in the implementation of this method, but it is crucial to conduct thorough testing. Constructing a mathematical model for this method is not straightforward. Such a model would need to utilize concepts from graph theory[17] and complexity theory to analyze and evaluate the impact of obfuscation on code comprehension. The model includes the following aspects: 1. Definition of Control Flow Graph (CFG)[17]. The basis for analyzing any program code is its Control Flow Graph (CFG), where nodes represent blocks of instructions (such as functions or basic instruction blocks) and edges show the flow of control between those blocks. CFG allows you to visualize and analyze the structure of the program [16]. Let G =(N,E), where each node n ∈ N corresponds to a base node. Each edge e=(ni,nj) ∈ E corresponds to a possible transfer of control from block ni to block nj. CFG provides a graphical representation of the possible paths to control the flow of execution. It differs from syntax-oriented IRS such as AST, which show grammatical structure. Consider the while loop shown below. Figure 2: The loop operator in different representations. The CFG reflects the essence of the loop: it is a control flow construct. The cyclic edge goes from stm1 to the condition at the beginning of the cycle. Ast, on the other hand, fixes the syntax; it is acyclic, but puts all the pieces in place to restore the source code for the loop. For conditional statements, the CFG will look like presented in Figure 3. Figure 3: The condition operator in different representations. In this example, the CFG displays the control flow construction for the conditional statement. Either stm1 or stm2 will be executed, but not both. 2. Functional mixing of streams. For each node in the CFG, a function can be defined that describes the mixing of control flows. This function can take into account variable factors such as the nesting depth of conditional statements or the number of dependencies between different parts of the code. Let’s assume 𝐷𝐷𝐷𝐷𝐷𝐷 - represents the dependency between node 𝑖𝑖 and node 𝑗𝑗. This can be expressed as the weight of an edge in a graph, where the weight indicates the strength of the dependency (for example, due to the number of variables shared between blocks).𝑁𝑁𝑁𝑁(Nesting of conditional operators ): evaluates the nesting depth of conditional operators in node 𝑖𝑖. This can be expressed as the number of conditional statements that directly or indirectly affect the execution of a block of code. Then for estimating the complexity of mixing flows can be presented as follows: (3) where: 𝑆𝑆 - is the total complexity of mixing control flows in the program. 𝑛𝑛 - is the number of nodes in the CFG. This function attempts to quantify the complexity of a program in terms of obfuscation, taking into account dependencies and nesting of conditional statements. It can be supplemented by other factors, such as: - Frequency of use of variables: Consider how often variables affecting node 𝑖𝑖 occur in other nodes. For each node 𝑖𝑖 in the CFG, we define 𝐹𝐹𝐹𝐹, which indicates the frequency of use of variables in this node. This can take into account both local and global variables used in the block. - Function side effects: evaluating the impact of functions called in a node on other parts of the program. Ei evaluates the impact of functions called at node 𝑖𝑖 on other parts of the program. This can include state changes that are not obvious from the local context of the node, such as changes to global variables, calls to other functions, etc. Therefore, the final model can be represented as: (4) Where: α and 𝛽𝛽 are weighting factors that regulate the influence of the frequency of use of variables and side effects of functions, respectively; 𝑛𝑛 is the number of nodes in the CFG. Variable usage frequency 𝐹𝐹𝐹𝐹 shows how heavily a node depends on certain variables, which can make it difficult to understand the data flows in the program. The side effects of functions 𝐸𝐸𝐸𝐸 allow you to evaluate how much the changes made by the functions affect the global state of the program, which also increases the overall complexity of the code. This model can be used to evaluate the effectiveness of obfuscation in terms of its ability to complicate code analysis. It allows you to quantify how changes in the structure of the application affect the ability of analysts or attackers to understand the logic of the application and detect vulnerabilities. 3. Quantification of complexity. Using metrics to quantify the complexity of mixed control flows, such as: McCabe cyclomatic complexity [18,19], which measures the number of linearly independent paths through a CFG. Proposed by Thomas McCabe in 1976, is a metric that measures the number of linearly independent routes through a program's control flow graph (CFG). This is one of the key indicators that helps to understand the complexity of the application from the point of view of its testing and maintenance. Formula for calculating cyclomatic complexity: V(G)=E−N+2P (5) Where: E- is the number of edges in the graph; N - is the number of nodes in the graph; 𝑃𝑃 - is the number of connectivity components (usually 𝑃𝑃=1 for most programs with a single entry point). Therefore, the cyclomatic complexity due to the introduction of obfuscation can be given by the function: ΔV(G)=V(G′)−V(G) (6) Where V(G) і V(G′) — cyclomatic complexities of the original and obfuscated graphs, respectively. Such a model helps to evaluate how effective obfuscation is in terms of increasing the complexity of the program. If ΔV(G) is significant, it can be assumed that obfuscation makes a significant contribution to protecting the program from unauthorized analysis and modifications. This model can serve as an important tool when selecting and configuring obfuscation techniques, as well as when evaluating their impact on the overall security of a software product. The number of intersections in the CFG, where a higher number of intersections may indicate a more complex obfuscation structure. 4. Predicting the impact of obfuscation. Using statistical methods to predict the effectiveness of obfuscation: - Building regression models to predict the effort required to understand obfuscated code based on the aforementioned complexity metrics. - Simulation of different attack scenarios on obfuscated code to evaluate its resistance to reverse engineering. 5. Risk assessment. Analysis of the possible risks associated with obfuscation, including the probability of successful reverse engineering or obfuscation detection [20]. This may involve using probability theory and statistics to assess risks. Let 𝑉𝑉 be the set of nodes in the CFG, and 𝐸𝐸 be the set of edges. The function 𝑓𝑓: 𝑉𝑉→𝑅𝑅 evaluates the "weight" of each node in terms of its impact on the overall complexity. Then the complexity of the code 𝐶𝐶 can be expressed as: 𝐶𝐶 = ∑𝑣𝑣∈𝑉𝑉 𝑓𝑓(𝑣𝑣) + 𝜆𝜆 ⋅ ∣𝐸𝐸∣ (7) where: 𝜆𝜆 — a parameter that controls the effect of the number of intersections. Such a model allows you to evaluate, analyze and optimize code obfuscation, providing a science-based approach to software protection. 3. Improvement of the obfuscation process with AI As we can discern from the previous section, many processes require the engineer to independently decide on the obfuscation method, conduct performance testing, etc., which is not always the most efficient or error-free approach to obfuscation, particularly for engineers with limited experience in code obfuscation. In such cases, utilizing artificial intelligence (AI) can significantly enhance the effectiveness of obfuscation techniques, even for engineers with minimal experience [21]. The idea of employing obfuscation mechanisms based on machine learning can be applied in the .NET obfuscation sphere to model obfuscation strategies, i.e., using machine learning algorithms to generate and optimize obfuscation rules that can be applied to .NET code. The model can learn from existing examples of obfuscated code to identify the most effective techniques. The machine learning model can predict the effectiveness of various obfuscation methods using the following process: 1. Model Training. The model trains on examples of code (using machine learning algorithms such as random forest [22,23] or gradient boosting [24,25]) that have been obfuscated using different methods. It learns the characteristics of the code (e.g., structure, execution flows, variable usage) that change as a result of each obfuscation method. 2. Obfuscation Assessment. Using a set of metrics such as resistance to reverse engineering, impact on performance, or effects on automated code analysis tools, the model evaluates the effectiveness of the obfuscation [28-31]. 3. Prediction. After analyzing the input code, the model can use the learned relationships between code features and obfuscation effectiveness to predict which methods will be most effective for new code. Thus, the model allows for the identification of optimal obfuscation strategies for specific use cases, providing better protection and minimizing negative impacts on software functionality. The mathematical model for assessing the effectiveness of obfuscation using machine learning can be constructed as follows: 1. Data: - X: A set of code features (e.g., number of operators, depth of nesting, types of operators). - Y: The target (dependent variable) which determines the effectiveness of obfuscation (e.g., time required for reverse engineering). 2. Loss Function. Can be defined to minimize the difference between the predicted effectiveness of obfuscation and the actual effectiveness. 3. Model. Uses a machine learning model f(X) to learn the relationship between code features and obfuscation effectiveness. 4.Optimization. Uses optimization methods to adjust the model parameters that best explain the effectiveness of obfuscation. (8) where: θ are the parameters of the model that we aim to optimize [26,27], 𝑋𝑋𝑋𝑋, are the features of the i-th code example, 𝑦𝑦𝑦𝑦, is the effectiveness of obfuscation for the i-th example, and the sum is calculated over all examples in the training set. The term min𝜃𝜃 signifies an optimization process where the goal is to find the parameter values 𝜃𝜃 that minimize the sum of the squared differences between observed values 𝑦𝑦𝑦𝑦 and the values predicted by. The aim of minθ is to adjust the parameters θ to achieve the lowest possible value of the sum of squared errors, indicating the best fit of the model to the data. This process is central to regression analysis, where you want to fit a model so that the predicted values are as close as possible to the actual data values. This modeling also enhances the automation process - integrating machine learning will allow for the automation of the obfuscation process, adapting it to specific needs and characteristics of the software as well as potential threats. Additionally, it can create dynamic code obfuscation processes - machine learning methods can help develop systems that dynamically adapt obfuscation depending on the context of software usage and changes in the external environment. 4. Experiments To verify the effectiveness of the model, we will use the following metrics: - Resistance to Analysis - an assessment of the code's ability to resist reverse engineering attempts. K (9) S=1 − 𝑀𝑀 where: K – number of successful analyses, M -total number of attempts - Change in Performance - the impact of obfuscation on the speed of the program. 𝑅𝑅 − 𝐿𝐿 (10) 𝑃𝑃 = 𝐿𝐿 Where: R – execution time after obfuscation, L – execution time before obfuscation - Preservation of Functionality: 𝑁𝑁 (11) 𝐹𝐹 = 𝑀𝑀 where: N – number of dysfunctional functions, M – total number of functions. - Pattern Detection: 𝑁𝑁 (12) 𝐷𝐷 = 1 − 𝑀𝑀 where: N – number of detected patterns, M – total number of patterns This metric is important because one of the main aspects of effective obfuscation is complicating or masking the logic or structure of the code so that it cannot be easily analyzed or recognized by static analysis tools, which often use patterns to identify typical constructions in program code. This expression shows the percentage of patterns that were not detected during the analysis, and therefore, the higher the value of D, the more effective the obfuscation in terms of avoiding pattern detection. - Code Complexity: 𝐶𝐶 = 𝑅𝑅 − 𝐿𝐿 (13) where: R – cyclomatic complexity after obfuscation, L - cyclomatic complexity before obfuscation. This metric helps assess the complexity of understanding and testing the code. High cyclomatic complexity indicates a high level of code complexity, which can increase the risk of errors and complicate understanding of the code. In the context of code obfuscation, the goal is to increase this complexity, thereby making the code less understandable for analysis or reverse engineering. This initial complexity indicator is important for assessing the effectiveness of obfuscation. An increase in cyclomatic complexity after obfuscation typically indicates that the obfuscation has added additional control paths, thereby potentially increasing the security of the program by complicating reverse engineering attempts. For reverse engineering and code analysis, we will use two tools: 1. Ildasm.exe [32]. 2. dotPeek [33]. To verify the effectiveness of the model, 100 dll/exe files compiled using MSBuild with .NET 8 programming language C# were used. Divide these DLLs into two groups (50/50): control (without AI) and experimental (with AI). Apply standard obfuscation methods to the control group without using AI. Calculate the average values for each metric for 50 iterations of the control group (without AI) and the experimental group (with AI). We are going to analyze such parameters: Resistance to Analysis (S): - Total number of reverse engineering attempts: 100. - Number of successful analyses: 20 — By successful analyses is meant the full reproduction of the program's behavior after decompiling IL code using Ildasm/dotPeek and transferring it to a new program that fully retains the behavior of the original program, and reproduces the same results as the original program. - Percentage of unsuccessful attempts: 𝑆𝑆 = 1 − 20/100 = 0.80 or 80%. Change in Performance (P): average program execution time: 200 ms. Preservation of Functionality (F): - Total number of functions: 1000. - Number of dysfunctional functions: 0. Pattern Detection (D): - Total number of patterns: 50. - Number of detected patterns: 30. - Percentage of undetected patterns: 𝐷𝐷=1 − 30/50 50 = 0.40 or 40%. Code Complexity (C). Cyclomatic complexity: 150 – means 150 different paths that potentially need to be checked to ensure full coverage during testing, making the code more complex to fully understand and support. The results of the experiment shown in Table 1. Table 1 The results of the experiment Metric Description Control group Experimental Comment (without AI) group (with AI) Resistance to An Number of Number of As we can see Analysis assessment of successful successful resistance to the code's analyses: 20 analyses: 5 analysis is increased ability to when using AI. resist reverse Percentage Percentage of engineering of unsuccessful unsuccessful attempts. attempts: attempts: 𝑆𝑆 = 1 − 20/100 𝑆𝑆 = 1 − 5/100 = 0.80 or 80% =0.95 or 95% Change in The impact of Average Average The execution time Performance obfuscation program program of the program has on the speed execution time: execution time: changed compared of the 210 ms 210 ms to the original program. program without Change in Change in obfuscation. But this Performance: Performance: is also true for the control group 𝑃𝑃 = (210 − 𝑃𝑃 = (210−200) (without AI) 200)/200 /200 = 0.05 or 5% = 0.05 or 5% increase increase Preservation A measure of Number of Number of As we can see, the of the dysfunctional dysfunctional number of Functionality preservation functions: 10 functions: 12 dysfunctional of the original functions is slightly functionality Percentage of Percentage of higher compared to of the code dysfunctional dysfunctional the control group (without AI). This after functions: functions: percentage can be obfuscation. reduced if the AI 𝐹𝐹=10 / 1000 𝐹𝐹 = 12 /1000 model is allowed to learn on its own or if = 0.01 or 1% = 0.012 or 1.2% the training period is extended. Pattern The ability of Number of Number of Detection obfuscation detected detected tools to avoid patterns: 30 patterns: 5 pattern Percentage of Percentage of detection. undetected undetected patterns: patterns: 𝐷𝐷 = 1 − 30/50 𝐷𝐷 = 1 − 5/50 =0.40 or 40% =0.90 or 90% 5. Conclusions The experimental results provide compelling evidence supporting the integration of AI in the obfuscation process, underscoring its potential to significantly enhance software security. Each of these metrics helps assess specific aspects of obfuscation, and their comparison before and after the application of AI allows measuring the real use impact of the of artificial intelligence on obfuscation. This also provides an opportunity to identify potential issues, such as increased execution time or loss of functionality, requiring additional attention and optimization. This approach allows for more precise adjustment of the use of AI for optimization of obfuscation in real conditions, ensuring a higher level of security of software. AI can analyze large volumes of data and choose optimal places and ways to apply obfuscation to maximize code complexity. After analyzing the metrics of the experiment, it is possible to distinguish: 1. Enhanced Efficacy of AI-Driven Obfuscation. The experiments demonstrated a notable improvement in resistance to reverse engineering attempts when AI-driven obfuscation techniques were employed compared to traditional methods. This indicates that AI can effectively increase the complexity and security of obfuscated code, making it more challenging for unauthorized analysis. 2. Performance and Functionality Consideration. While the use of AI in obfuscation shows promising results in enhancing security, it's important to also consider its impact on software performance and functionality. The experiments highlighted minimal impact on execution times and functionality, suggesting that AI-driven obfuscation can be implemented without significantly compromising the software's operational efficiency This approach to security can significantly reduce the costs and resources associated with resolving security issues after a product is released. Future research should explore additional AI models and techniques that could further enhance this aspect of software security. As these technologies become more sophisticated and available, we can expect changes in how companies approach the security of their software products. This change could encourage more industries to adopt obfuscation best practices, thereby increasing the overall level of security across sectors. References [1] K. D. Cooper, L. Torczon. Engineering a Compiler, Morgan Kaufmann; 3rd edition, 2023, 848 p. [2] S. A. Ebad, A. A.Darem, J. H. Abawajy. Measuring software obfuscation quality – A systematic literature review, IEEE Access 9 (2021) 99024-99038. [3] P. Ahire, J. Abraham. Mechanisms for source code obfuscation in C: Novel techniques and implementation. In Proceedings of the 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 12-14 March 2020, IEEE: New York, NY, USA, 2020, pp. 52–59. [4] A. B. M. Sultan, A. A. A.Ghani, N. M. Ali, N. I. Admodisastro. Hybrid obfuscation technique to protect source code from prohibited software reverse engineering, IEEE Access 8 (2020) 187326–187342. [5] S. Bhansali, A. Aris, A. Acar, H. Oz, A. S. Uluagac. A first look at code obfuscation for webassembly. In Proceedings of the15th ACM Conference on Security and Privacy in Wireless and Mobile Networks, San Antonio, Texas, USA, 16-19 May 2022, pp. 140–145. [6] Y. Li, Z. Sha, X. Xiong, Y. Zhao. Code Obfuscation Based on Inline Split of Control Flow Graph. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 28–30 June 2021, IEEE: New York, NY, USA, 2021, pp. 632–638. [7] C. K. Behera, G. Sanjog and D. L. Bhaskari. Control Flow Graph Matching for Detecting Obfuscated Programs, Software Engineering (2019) 267–275. [8] Y.-C. Chen, H.-Y. Chen, T. Takahashi, B. Sun and T.-N. Lin, Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection, IEEE Access, 9 (2021) 123208-123219 [9] B. Liu, W. Feng, Q. Zheng, J. Li, D. Xu. Software obfuscation with non-linear mixed boolean- arithmetic expressions. In Proceedings of the Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, 19–21 November 2021, pp. 276–292. [10] C. Catalano, P. Afrune P., et al. Security Testing Reuse Enhancing Active Cyber Defence in Public Administration. In Proceedings of the 2021 Italian Conference on Cybersecurity 2021, April 7-9, 2021, Salerno, Italy, pp.120–132 (2021). [11] C. Catalano, A.Chezzi, M. Angelelli, F. Tommasi. Deceiving AI-based malware detection through polymorphic attacks, Computers in Industry 143 (2022) 103751. [12] H. Ahmed, M. F. Hyder, M. F. Haque, P. C. Santos, Exploring compiler optimization space for control flow obfuscation, 139 (2024) 103704. [13] M. Gervasi, N. G. Totaro, A. Fornaio, D. Caivano, Big Data Value Graph: enhancing security and generating new Value from Big Data, In Proceedings of the 2023 Italian Conference on Cybersecurity 2023,May 03-05, 2023, Bari, Italy, 2023. [14] M. Schloegel, T. Blazytko, M. Contag, C. Aschermann, J. Basler, T. Holz, A. Abbasi. A. Technical Report: Hardening Code Obfuscation Against Automated Attacks. arXiv (2021), arXiv:2106.08913. [15] P. Rajba, W. Mazurczyk, Data hiding using code obfuscation. In Proceedings of the 16th International Conference on Availability, Reliability and Security, Vienna, Austria, 17-20 August 2021, pp. 1–10. [16] H. Yao, S. Zhang, R. Hong, Y. Zhang, C. Xu and Q. Tian, Deep representation learning with part loss for person re-identification, IEEE Trans. Image Process., 28 6 (2019) 2860-2871. [17] Q. Liu, S. Ji, C. Liu and C. Wu, A Practical Black-Box Attack on Source Code Authorship Identification Classifiers, In Proceedings of the IEEE Transactions on Information Forensics and Security, 15 June 2021, vol. 16, pp. 3620-3633. [18] J. Mayaka, J. C. Jung, Complexity reduction of the Engineered Safety Features Component Control System, 331 (2018) 194-203. [19] M. A. Subandri, R. Sarno, Cyclomatic Complexity for Determining Product Complexity Level in COCOMO II, 124 (2017) 478-486. [20] C. Basile, D. Canavese, L. Regano, P. Falcarin, B. De Sutter, A meta-model for software protections and reverse engineering attacks, Journal of Systems and Software, 150 (2019) 3-21. [21] I. Obeidat, M. AlZubi, Developing a faster pattern matching algorithms for intrusion detection system. International Journal of Computing, 18(3), 2019, 278-284. doi:10.47839/ijc.18.3.1520 [22] S. Kang, S. Lee, Y. Kim, S. K. Mok, E. S. Cho. Obfus: An obfuscation tool for software copyright and vulnerability protection. In Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy, Virtual, 26–28 April 2021, pp. 309–311. [23] H. Chen, M. Pendleton, L. Njilla and S. Xu. A survey on Ethereum systems security: Vulnerabilities attacks and defenses, ACM Comput. Surv. (CSUR), 53 3 (2020) 1-43. [24] F. Feyzi and S. Parsa, A program slicing-based method for effective detection of coincidentally correct test cases, Computing, 100 9 (2018) 927-969. [25] M. Zhang, P. Zhang, X. Luo and X. Feng, Source code obfuscation for smart contracts, In Proceedings of the 2020 27th Asia-Pacific Software Engineering Conference (APSEC), 01-04 December 2020, Singapore, Singapore, pp. 513-514. [26] G. James, et al. An Introduction to Statistical Learning: with Applications in R. Springer, 2nd edition, 2021. [27] K. Hajarnis, J. Dalal, R. Bawale, J. Abraham and A. Matange, A Comprehensive Solution for Obfuscation Detection and Removal Based on Comparative Analysis of Deobfuscation Tools. In Proceedings of the 2021 International Conference on Smart Generation Computing, Communication and Networking, Pune, India, 2021, pp. 1-7. [28] O. Savenko, A. Sachenko, S. Lysenko, G. Markowsky, N. Vasylkiv. Botnet detection approach based on the distributed systems. International Journal of Computing, 19, 2 (2020) 190-198. [29] A. Kashtalian, S. Lysenko, O. Savenko, A. Nicheporuk, T. Sochor, V. Avsiyevych. Multi- computer malware detection systems with metamorphic functionality. Radioelectronic and Computer Systems, 1 (2024) 152-175. doi: 10.32620/reks.2024.1.13 [30] G. Markowsky, O. Savenko, S. Lysenko, A. Nicheporuk. The technique for metamorphic viruses' detection based on its obfuscation features analysis. CEUR-WS, 2104 (2018) 680–687. [31] O. Savenko, S. Lysenko, A. Nicheporuk, B. Savenko, Approach for the Unknown Metamorphic Virus Detection, Proceedings of the 8-th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Bucharest (Romania), September 21–23, 2017. Bucharest, 2017. pp. 71–76. [32] Ildasm.exe (IL Disassembler) tool. URL: https://learn.microsoft.com/en- us/dotnet/framework/tools/ildasm-exe-il-disassembler. [33] dotPeek. Free .NET Decompiler and Assembly Browser. URL: https://www.jetbrains.com/decompiler.