Researching of methods for assessing the complexity of program code when generating input test data Konstantin Serdyukov Tatyana Avdeenko Novosibirsk State Technical University Novosibirsk State Technical University Novosibirst, Russia Novosibirst, Russia zores@live.ru tavdeenko@mail.ru Abstract—This article proposes a comparison of methods for One of the main goals of testing is to create a test sets that determining code complexity when generating data sets for would ensure a sufficient level of quality of the final product software testing. The article offers the results of a study for by checking most of the various paths of the program code, evaluating one path of program code, the work is not finished i.e. would provide maximum coverage. Nevertheless, the task yet, it will be further expanded to select data for testing many of finding many paths itself consists of several sub-tasks, the paths. To solve the problem of generating test data sets, it is solution of which is necessary to find high-quality test sets. proposed to use a genetic algorithm with various metrics for One of the local problems that can be solved to find a test set determining the complexity of program code. A new metrics is is to determine one of the most complex code paths. proposed for determining code complexity based on changing weights of nested operations. The article presents the results For the most part, validation and verification of software and comparison of the generated input test data for the passage products is difficult to optimize. It is especially difficult to along the critical path. For each metric considered in the automate the generation of test data, which for the most part article, conclusions are presented to identify specifics depending is done manually. on the selected data. Nevertheless, there are many studies using non-standard Keywords—test data generation, genetic algorithm, metrics of algorithms to solve the automation problem. For example, in asserting code complexity [1] it is proposed to use a Constraint-Based Algorithm for the Mort system, which uses error testing to find input test data. I. INTRODUCTION Test data is selected in such a way as to determine the Software engineering is a comprehensive, systematic presence or absence of certain errors. approach to the development and maintenance of software. Quite often, genetic algorithms are used in one way or When developing programs, the following stages are most another to solve this problem. The article [2] compares often distinguished - analysis, design, programming and different methods for generating test data, including genetic testing. At the stage of analysis, software requirements are algorithms, a random search method and other heuristic determined and documentation is performed. At the design methods. stage, the appearance of the program is detailed, its internal functionality is determined, the product structure is In [3] to solve the problem, it is proposed to use developed, and requirements for subsequent testing are Constraint Logic Programming and Symbolic Execution. In introduced. Writing the source code of a program in one of [4], the Constraint Handling Rules are used to help in manual the programming languages is done at the programming verification of problem areas in the program. stage. Some researchers use heuristic methods to automate the One of the most important steps in developing software testing process using a data-flow diagram. Studies of products is testing. Important goals of testing are the automation methods using this diagram were done in articles compliance of the developed program with the specified [5, 6, 7, 8]. In [5] it is proposed to additionally use genetic requirements, adherence to logic in the data processing algorithms to generate new input test data sets based on processes and obtaining correct final results. Therefore, for previously generated ones. testing it is very important to generate input test data, on the In articles [9, 10] it is proposed to use hybrid methods for basis of which the program will be checked for errors and generating test data. In [9], an approach is used that combines compliance with specified requirements. To esteem the strategies of Random Strategy, Dynamic Symbolic Execution quality of the input data a code coverage indicator is used, and Search-Based Strategies. The article [10] proposes a that is percentage of the entire program can the test sets theoretical description of the search method using the genetic “cover”. It is determined by the ratio of the tested operations algorithm. The approaches to search for local and global to the total number of operations in the code. extrema on real programs are considered. A hybrid approach Some software code testing processes are improving quite for generating test data is proposed - a Memetic Algorithm. slowly. The development of most types of test scenarios is The approach in [11] uses a hybrid intelligent search most often done manually, without the use of any automation algorithm to generate test data. Proposed approach center on systems. Because of this the testing process becomes the methods of Branches and Borders and the Hill Climbing incredibly complicated and costly both in time and in to improve intellectual search. finances, if you approach it in all seriousness. Up to 50% of all time costs can be spent on testing some programs. There are also studies using machine learning, for example, in [12]. It proposes a method using a neural network Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Data Science and user-customizable clustering of input data for sequential chromosomes. But to increase the rate of convergence of the learning. solution, the initial population can be specified in a certain way, or random values can be analyzed in advance to exclude Novelty Search can also be used to generate test data. In definitely inappropriate ones. the article [13] it is proposed to use this approach to evaluate large spaces of input data and is compared with approaches Population assessment. Each of the chromosomes is based on the genetic algorithm. evaluated by a fitness function. Based on the given requirements, chromosomes get the exact value of how well The possibilities of generating test data for testing web they correspond to the problem being solved. services are also being investigated, for example, in the WDSL specification [14].  Selection. After each of the chromosomes has its own fitness value, the best chromosomes are selected. For the convenience of generating test data, UML Selection can be done by different methods, for diagrams are also used [15, 16]. The articles suggest using example, from the sorted in order first n chromosomes genetic algorithms to generate triggers for UML diagrams are selected, or only the most suitable, but not less that will allow to find a critical path in the program. The than n, etc. article [17] proposes an improved method based on a genetic algorithm for selecting test data for many parallel paths in  Crossing. [23]. The first is a significant difference UML diagrams. from standard optimization methods. After selection of In addition to UML diagrams, the program can be chromosomes suitable for solving the problem, they described as a Classification-Tree Method developed by crossing. Random chromosomes from all the "chosen Grochtmann and Grimm [18]. In [19] the problem of ones" randomly generate new chromosomes. Crossing constructing trees is considered and an integrated occurs on the basis of the choice of a certain position classification tree algorithm is proposed, and in [20] was in two chromosomes and the replacement of parts of investigated the developed ADDICT prototype (short for each other. After the required number of chromosomes AutomateD test Data generation using the Integrated is generated to create a population, the algorithm Classification-Tree methodology) for an integrated approach. proceeds to the next step. This article proposes a comparison of different methods  Mutation. [24]. Also the step specific to GA. In a for evaluating code complexity for generating test data. The random order, a random gene can change values to a article is structured as follows. Section 2 introduces random one. The main point in a mutation is the same terminology and provides basic information on the genetic as in biology - to bring genetic diversity into a algorithm. The third section sets the problem to be solved and population. The main goal of mutations is to obtain introduces one of the methods for assessing code complexity. solutions that could not be obtained with existing Section 4 proposes the results of the operation of the input genes. This will allow, firstly, to avoid falling into data generation method using the introduced code estimation local extremes, since a mutation can allow the method. In section 5 there is comparing of different code algorithm to be transferred to a completely different evaluation methods. branch, and secondly, to “dilute” the population in order to avoid a situation where in the whole II. GENETIC ALGORITHM population there will be only identical chromosomes that will not generally move towards a solution. Formally, the genetic algorithm is not an optimization method, at least in the understanding of classical optimization After all the steps have been passed on, it is defined methods. Its purpose is not to find the optimal and best whether the population has reached the desired accuracy of solution, but to find close enough to it. Therefore, this the decisions or has come to limit the number of populations, algorithm is not recommended to be used if fast and well- and if so, the algorithm stops working. Otherwise, the cycle developed optimization methods already exist. But at the with the new population is repeated until the conditions are same time, the genetic algorithm perfectly shows itself in achieved. solving non-standardized tasks, tasks with incomplete data or for which it is impossible to use optimization methods III. PROBLEM DESCRIPTION because of the complexity of implementation or the duration The use of genetic algorithms in the testing process allows of execution [21, 22]. to find the most complex parts of the program in which the A genetic algorithm is considered to be completed if a risks due to errors are greatest. Evaluation occurs due to the certain number of iterations is passed (it is desirable to limit use of the fitness function, the parameters of which are the the number of iterations, since the genetic algorithm works on weights of each passable operation. Definition of weights, i.e. the basis of trial and error, which is a rather lengthy process), the complexity of the program code, occurs due to various or if a satisfactory value of the fitness function was obtained. metrics used depending on the requirements for the input sets. As usual a genetic algorithm solves the problem of The task of generating input test data consists of three maximizing or minimizing and the adequacy of each solution subtasks: (chromosome) is evaluated using the fitness function. 1. Search for input data for passing along one of the most The genetic algorithm works according to the following complex code paths. Difficulty is determined by the principle: chosen metric for code evaluation; Initialization. A fitness function is introduced. An initial 2. The exclusion or reduction of the weights of population is being formed. In classical theory, the initial operations on the path for which the data were population is formed by randomly filling each gene in the selected, based on the fitness function for other paths; VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 300 Data Science 3. Generation of input test data for many paths of always maximum (random values were limited to 100), the program code. second value is less than the first, but more than the third. The limit on the number of sets of input data is established after the development stage and will allow to concentrate on certain paths. TABLE I. COMPARISON OF RESULTS Population Test 1 Test 2 Test 3 Test 4 The whole algorithm is performed cyclically - the 0 1: 78, 23, 35 1: 97, 3, 6 1: 92, 97, 28 1: 15, 67, 26 procedure for searching for input data for one path is started, 2: 62, 36, 95 2: 82, 77, 64 2: 38, 66, 52 2: 32, 27, 83 after which operations in this path are excluded from further 3: 52, 35, 27 3: 24, 47, 57 3: 63, 76, 64 3: 37, 52, 64 4: 17, 77, 73 4: 90, 13, 82 4: 7, 24, 56 4: 70, 49, 64 calculations and the data search for one path is started again. 5: 75, 9, 96 5: 81, 69, 24 5: 57, 48, 8 5: 67, 29, 94 As one of the ways to determine the complexity of the 20 1: 95, 64, 54 1: 97, 80, 4 1: 99, 13, 10 1: 99, 71, 45 2: 95, 64, 29 2: 97, 80, 53 2: 99, 13, 11 2: 99, 71, 15 code, an method is proposed that works as follows: 3: 95, 64, 54 3: 97, 80, 28 3: 99, 13, 11 3: 99, 71, 3  The first operation is assigned a weight of, for 50 1: 95, 64, 54 1: 97, 80, 29 1: 99, 13, 10 1: 99, 71, 60 2: 95, 64, 29 2: 97, 80, 4 2: 99, 13, 11 2: 99, 71, 3 example, 100 units. 3: 95, 64, 54 3: 97, 80, 53 3: 99, 13, 11 3: 99, 71, 3  Each subsequent operation is also assigned a weight - Result (100) 1: 95, 64, 54 1: 97, 80, 4 2: 95, 64, 29 2: 97, 80, 29 1: 99, 13, 10 2: 99, 13, 11 1: 99, 71, 60 2: 99, 71, 45 if there are no conditions or cycles, the weight is taken in accordance with the previous operation. V. COMPARISON OF METHODS FOR ASSESSING THE  Conditions share the weight in accordance with the COMPLEXITY OF PROGRAM CODE rule - if the condition contains only one branch (only if For the researching, several tests of the algorithm with ...), then the weight of each operation is reduced by four different metrics were carried out - a modified metric, 80%. If the condition is divided into several branches the logic of which was described in Section 3, SLOC metrics (if ... else ...), then the weight is divided into equal for evaluating the number of lines of code, ABC metrics and parts - for two branches 50% / 50%, for three 33% / Jilb metrics. 33% / 33%, etc. The metric SLOC (abbr. Source Lines of Code) is  The weights of operations in the cycle remain, but can determined by the number of lines of code. This metric takes also be multiplied by certain weights, if necessary. into account only the total number of lines of code in the program, which makes it the easiest to understand. In this  All nested restrictions are summed, for example, for case, the number of lines refers to the number of commands, two nested conditions the weight of operations will be and not the physical number of lines. 80% * 80% = 64% The ABC metric, or Fitzpatrick metric, is a metric that is Assigned weights can be used to develop test cases using determined based on three different indicators ABC = . The first indicator na (Assignment) is allocated for lines weight assigned on one or another branch for certain values of code that are responsible for assigning variables a certain of input parameters. value, for example, int number = 1. The indicator nb (Branch) For convenience, we introduce the following notation: is responsible for using functions or procedures, that is, operands that work out of sight of the current program code. X - data sets; The indicator nc (Condition) calculates the number of logical F (X) is the value of the fitness function for each data set operands, such as conditions and loops. The metric value is depending on the calculated values of the weights. calculated as the square root of the sum of the squared values of na, nb and nc. The challenge is to maximize the objective function, i.e. F (X) → max F = √n𝑎 2 + n𝑏 2 + n𝑐 2 (1) It is noteworthy that one line of code can be taken into IV. THE RESULTS OF THE METHOD account in different parameters, for example, when assigning In accordance with the previously proposed option for a variable the value of a certain function (double number = assessing the complexity of program code, this method is Math.Pow (2, 3) is assigned both in na and nb). The being finalized to better meet real requirements. Weights are disadvantages of this metric include the possible return of a considered in accordance with the operability of the program, zero value in some parts of the code. in other words, the more iterations the program performs, the The Jilb metric is aimed to determine the complexity of more weight the initial test version will have. program code depending on its saturation with conditional The first population is formed by random values. Each operands. This metric is useful for determining the population contains 100 chromosomes. The total number of complexity of program code, both for writing and for the population is also 100. Due to this, a sufficient number of understanding it: different options will be formed and the best ones will be F = 𝑐𝑙/𝑛, (2) selected. Table 1 presents the test results. where cl – the number of conditional operands, n – the total In each of the tests, at least two different versions of the number of lines of code. data were generated, in which the considered program code For testing, code is used with many different paths, where will work the most times, which means that it will go the one is critical which has the largest number of operations. The greatest number of times in different ways. In addition, you selection of input data for this path will be a solution to the can see certain patterns in the results - the first value is subtask and from this data it will be possible to determine VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 301 Data Science how accurately the data is selected. The critical path will be lines of code. The results are presented in table 3. The reached if the 1st and 3rd values from the selected data are algorithm with this metric picked up 3 sets - (63, 72, 91), (68, greater than 50 and 1 value is less than 3. 50, 94) and (80, 70, 88). All three satisfy the conditions for passing along the critical path. The following genetic algorithm settings are used to generate input test data: As with the previous metric, the algorithm in the first  Number of generations – 100 generation picked up suitable data.  Number of populations in one generation – 100 C. ABC metric Range of received data values – (0, 100) This metric takes into account more variations of the A. Results using the metric proposed in the article values, such as assigning values to variables, logical checks An algorithm with this metric selects data with a priority and function calls. The algorithm with the ABC metric picked of operations of a higher level. As a result (99th generation), up 2 options for the input data that pass along the critical path two data sets were obtained - (70, 9, 78) and (75, 67, 82). - (69, 46, 78) and (77, 36, 98). The remaining results are Both sets go along the longest code path, which is the presented in table 4. solution to the subtask. Table 2 presents the first 10 options in TABLE IV. RESULTS OF METHOD WITH ABC METRIC each of the generations. ABC metric TABLE II. RESULTS OF THE PROPOSED METRIC Variant\Gen. 0 1 99 Modified metric 1 (95, 27, 97) (77, 36, 98) = (69, 46, 78) = = 6 351 6 351 6 351 Variant\Gen. 0 1 99 2 (77, 36, 98) (69, 46, 78) = (77, 36, 98) = 1 (70, 9, 78) = 164 (75, 67, 82) = (70, 9, 78) = = 6 351 6 351 6 351 100 164 100 164 100 3 (69, 46, 78) (61, 65, 95) = (69, 46, 78) = 2 (75, 67, 82) = (61, 29, 94) = (70, 9, 78) = = 6 351 6 351 6 351 164 100 164 100 164 100 4 (61, 65, 95) (95, 27, 97) = (69, 46, 78) = 3 (61, 29, 94) = (63, 52, 87) = (75, 67, 82) = = 6 351 6 351 6 351 164 100 164 100 164 100 5 (5, 67, 92) = (61, 65, 95) = (69, 46, 78) = 4 (63, 52, 87) = (63, 49, 83) = (75, 67, 82) = 3 538 6 351 6 351 164 100 164 100 164 100 6 (5, 87, 95) = (95, 27, 97) = (69, 46, 78) = 5 (63, 49, 83) = (70, 9, 78) = (75, 67, 82) = 3 538 6 351 6 351 164 100 164 100 164 100 7 (1, 35, 60) = (61, 65, 95) = (69, 46, 78) = 6 (5, 68, 90) = (63, 52, 87) = (75, 67, 82) = 3 538 6 351 6 351 96 382 164 100 164 100 8 (1, 70, 53) = (69, 46, 78) = (77, 36, 98) = 7 (60, 37, 3) = (70, 9, 78) = (75, 67, 82) = 3 538 6 351 6 351 32 500 164 100 164 100 9 (60, 30, 12) (69, 46, 78) = (69, 46, 78) = 8 (12, 80, 49) = 16 (70, 9, 78) = (70, 9, 78) = = 768 6 351 6 351 000 164 100 164 100 10 (60, 49, 73) (69, 46, 78) = (69, 46, 78) = 9 (47, 12, 17) = 16 (70, 9, 78) = (70, 9, 78) = = 768 6 351 63 51 000 164 100 164 100 10 (53, 35, 76) = 16 (61, 29, 94) = (75, 67, 82) = D. Jilb metric 000 164 100 164 100 Unlike previous metrics, this one takes into account the Can be seeing that the algorithm works quite efficiently absolute complexity of the program, which is calculated by and already in the first generation the data was selected for dividing the number of cycles and conditions by the total the critical path. number of operations on the way. The complexity of the B. SLOC metric program is determined in a completely different way, which led to the fact that the input data was selected for a different TABLE III. RESULTS OF METHOD WITH SLOC METRIC SLOC metric path. The results are presented in table 5. Variant\Gen. 0 1 99 TABLE V. RESULTS OF METHOD WITH JILB METRIC 1 (64, 14, 96) = (68, 50, 94) = (63, 72, 91) = Jilb metric 6 411 6 411 6 411 Variant\Gen. 0 1 99 2 (68, 50, 94) = (80, 70, 88) = (68, 50, 94) = 1 (75, 51, 3) = (62, 25, 41) = (78, 45, 21) = 6 411 6 411 6 411 100 100 100 3 (80, 70, 88) = (65, 81, 89) = (68, 50, 94) = 2 (92, 33, 11) (94, 22, 35) = (63, 36, 10) = 6 411 6 411 6 411 = 100 100 100 4 (65, 81, 89) = (63, 72, 91) = (63, 72, 91) = 3 (94, 22, 35) (98, 51, 12) = (75, 51, 3) = 6 411 6 411 6 411 = 100 100 100 5 (63, 72, 91) = (74, 83, 76) = (80, 70, 88) = 4 (98, 51, 12) (80, 42, 20) = (80, 42, 20) = 6 411 6 411 6 411 = 100 100 100 6 (74, 83, 76) = (64, 69, 91) = (68, 50, 94) = 5 (80, 42, 20) (78, 45, 21) = (78, 45, 21) = 6 411 6 411 6 411 = 100 100 100 7 (64, 69, 91) = (69, 88, 85) = (68, 50, 94) = 6 (78, 45, 21) (80, 59, 8) = (63, 36, 10) = 6 411 6 411 6 411 = 100 100 100 8 (69, 88, 85) = (64, 14, 96) = (63, 72, 91) = 7 (80, 59, 8) = (5, 40, 27) = (80, 42, 20) = 6 411 6 411 6 411 100 100 100 9 (5, 39, 72) = (63, 72, 91) = (63, 72, 91) = 8 (5, 40, 27) = (99, 38, 29) = (78, 45, 21) = 3 618 6 411 6 411 100 100 100 10 (2, 67, 73) = (68, 50, 94) = (80, 70, 88) = 9 (99, 38, 29) (62, 25, 41) = (75, 51, 3) = 3 618 6 411 6 411 = 100 100 100 10 (62, 25, 41) (63, 36, 10) = (80, 42, 20) = This metric is the simplest from the point of view of = 100 100 100 implementation, it takes into account only the total number of VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 302 Data Science The data obtained are very different both with other REFERENCES metrics and within the metric. This is due to the features of [1] A.D. Richard and A.O. Jefferson, “Constraint-Based Automatic Test the tested code - it has one common loop, within which there Data Generation,” IEEE Transactions on Software Engineering”, vol. 17, no. 9, pp. 900-910, 1991. is one common condition. If this condition is not met, then [2] P. Maragathavalli, M. Anusha, P. Geethamalini and S. Priyadharsini, none of the operations, except the cycle and conditions, will “Automatic Test-Data Generation for Modified Condition. Decision be taken into account when calculating the metric. A value of Coverage Using Genetic Algorithm,” International Journal of 100 indicates that among all operations on the path, all are Engineering Science and Technology, vol. 3, no. 2, pp. 1311-1318, cycles or conditions, i.e. formally selected input data are 2011. options when the first condition was not met and other [3] C. Meudec, “ATGen: Automatic Test Data Generation using operations were not taken into account. Constraint Logic Programming and Symbolic Execution,” Software Testing Verification and Reliability, 2001. VI. CONCLUSION [4] R. Gerlich, “Automatic Test Data Generation and Model Checking with CHR,” 11th Workshop on Constraint Handling Rules, 2014. Evolutionary methods work in such a way as to find the [5] M.R. Girgis, “Automatic Test Data Generation for Data Flow Testing best solutions to problems that are impossible or too costly to Using a Genetic Algorithm,” Journal of Universal Computer Science, solve with standard optimization methods. They do not vol. 11, no. 6, pp. 898-915, 2005. always work quickly or efficiently, but in problems with non- [6] E.J. Weyuker, “The complexity of data flow criteria for test data standard approaches it shows superiority. selection,” Inf. Process. Lett., vol. 19, no. 2, pp. 103-109, 1984. [7] A. Khamis, R. Bahgat and R. Abdelazi, “Automatic test data The introduction of various metrics for calculating the generation using data flow information,” Dogus University Journal, fitness function made it possible to add a method for vol. 2, pp. 140-153, 2011. generating input test data of greater variability and the ability [8] S. Singla, D. Kumar, H. M. Rai and P. Singla, “A hybrid pso to introduce new data requirements. Each metric is focused approach to automate test data generation for data flow coverage with on specific code parameters and can be used when data must dominance concepts,” Journal of Advanced Science and Technology, vol. 37, pp. 15-26, 2011. be selected in accordance with certain requirements. In addition, in the case when the metric does not select data [9] Z. Liu, Z. Chen, C. Fang and Q. Shi, “Hybrid Test Data Generation,” State Key Laboratory for Novel Software Technology, ICSE efficiently, it is possible to use other metrics that can overlap Companion Proceedings of the 36th International Conference on each other’s shortcomings. Software Engineering, pp. 630-631, 2014. [10] M. Harman and P. McMinn, “Theoretical and Empirical Study of All analyzed metrics, with the exception of the Jilb Search-Based Testing: Local, Global, and Hybrid Search,” IEEE metric, generated several data sets for the critical path that Transactions on Software Engineering, vol. 36, no. 2, pp. 226-247, was originally selected. It is noticeable that metrics for a 2010. small code of 130 lines with several code paths successfully [11] Y. Xing, Y. Gong, Y. Wang and X. Zhang, “Hybrid Intelligent select data in the first generation, which indicates a rather Search Algorithm for Automatic Test Data Generation,” high convergence rate of the algorithm. In subsequent Mathematical Problems in Engineering, 2015. generations, various options are sequentially eliminated. [12] C. Paduraru, and M.C. Melemciuc, “An Automatic Test Data Generation Tool using Machine Learning,” 13th International The conducted studies allow to propose a new method for Conference on Software Technologies (ICSOFT), pp. 472-481, 2018. generating test data based on the genetic algorithm, in which [13] M. Boussaa, O. Barais, G. Sunyé and B. Baudry, “Novelty Search the fitness function will be formed not on the basis of one of Approach for Automatic Test Data Generation,” 8th International the known metrics for assessing code complexity (as in this Workshop on Search-Based Software Testing, 2015. paper), but on the basis of a hybrid metric, which is a [14] M. Lopez, H. Ferreiro and L.M. Castro, “DSL for Web Services Automatic Test Data Generation,” 25th International Symposium on weighted sum of the indicators present in metrics considered Implementation and Application of Functional Languages, 2013. in this paper. It also seems promising in terms of increasing [15] C. Doungsa-ard, K. Dahal, A.G. Hossain and T. Suwannasart, “An the degree of code coverage by creating an effective automatic test data generation from UML state diagram using genetic mechanism for regulating (increasing and decreasing) the algorithm,” IEEE Computer Society Press, pp. 47-52, 2007. weights of operations in the fitness function while increasing [16] S.Sabharwal, R. Sibal and C. Sharma, “Applying Genetic Algorithm the nesting level of the code section. for Prioritization of Test Case Scenarios Derived from UML Diagrams,” IJCSI International Journal of Computer Science Issues, In the future, it is planned to expand ways to determine vol. 8, no. 3-2, 2011. the complexity of the code. In addition to using metrics [17] C. Doungsa-ard, K. Dahal, A. Hossain and T. Suwannasart, “GA- directly, it is planned to develop a method for taking into based Automatic Test Data Generation for UML State Diagrams with account indicators of the number of operations, functions, Parallel Paths,” Part of book Advanced design and manufacture to gain a competitive edge: New manufacturing techniques and their conditions and cycles with different weights. It is also role in improving enterprise performance, pp. 147-156, 2008. possible to establish the degree of reduction or increase in the [18] M. Grochtmann and K. “Grimm, “Classification trees for partition weights of operations at different levels of nesting. This will testing. Software Testing,” Verification and Reliability, vol. 3, no. 2, allow you to set the priority for input generation when certain pp. 63-82, 1993. requirements arise. [19] T.Y. Chen, P.L. Poon and T.H. Tse, “An integrated classification-tree methodology for test case generation,” International Journal of ACKNOWLEDGMENT Software Engineering and Knowledge Engineering, vol. 10, no. 6, pp. 647-679, 2000. The reported study was funded by RFBR, project number [20] A. Cain, T.Y. Chen, D. Grant, P.L. Poon, S.F. Tang and T.H. Tse, 19-37-90156. The research is supported by Ministry of “An Automatic Test Data Generation System Based on the Integrated Science and Higher Education of Russian Federation (project Classification-Tree Methodology,” Software Engineering Research No. FSUN-2020-0009) and Applications, Lecture Notes in Computer Science, vol. 3026, 2004. [21] K.Serdyukov and T. Avdeenko, “Investigation of the genetic algorithm possibilities for retrieving relevant cases from big data in VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 303 Data Science the decision support systems,” CEUR Workshop Proceedings, vol. [23] W.M. Spears, “Crossover or mutation?” Foundations of Genetic 1903, pp. 36-41, 2017. Algorithms, vol. 2, pp. 221-237, 1993. [22] R.S. Praveen and K. Tai-hoon, “Application of Genetic Algorithm in [24] H. Mühlenbein, “How genetic algorithms really work: Mutation and Software Testing,” International Journal of Software Engineering and hillclimbing,” Parallel Problem Solving from Nature, vol. 2, 1992. Its Applications, vol. 3, no. 4, pp. 87-96, 2009. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 304