=Paper=
{{Paper
|id=Vol-2667/paper66
|storemode=property
|title=Researching of methods for assessing the complexity of program code when generating input test data
|pdfUrl=https://ceur-ws.org/Vol-2667/paper66.pdf
|volume=Vol-2667
|authors=Konstantin Serdyukov,Tatyana Avdeenko
}}
==Researching of methods for assessing the complexity of program code when generating input test data ==
Researching of methods for assessing the
complexity of program code when generating input
test data
Konstantin Serdyukov Tatyana Avdeenko
Novosibirsk State Technical University Novosibirsk State Technical University
Novosibirst, Russia Novosibirst, Russia
zores@live.ru tavdeenko@mail.ru
Abstract—This article proposes a comparison of methods for One of the main goals of testing is to create a test sets that
determining code complexity when generating data sets for would ensure a sufficient level of quality of the final product
software testing. The article offers the results of a study for by checking most of the various paths of the program code,
evaluating one path of program code, the work is not finished i.e. would provide maximum coverage. Nevertheless, the task
yet, it will be further expanded to select data for testing many of finding many paths itself consists of several sub-tasks, the
paths. To solve the problem of generating test data sets, it is solution of which is necessary to find high-quality test sets.
proposed to use a genetic algorithm with various metrics for One of the local problems that can be solved to find a test set
determining the complexity of program code. A new metrics is is to determine one of the most complex code paths.
proposed for determining code complexity based on changing
weights of nested operations. The article presents the results For the most part, validation and verification of software
and comparison of the generated input test data for the passage products is difficult to optimize. It is especially difficult to
along the critical path. For each metric considered in the automate the generation of test data, which for the most part
article, conclusions are presented to identify specifics depending is done manually.
on the selected data.
Nevertheless, there are many studies using non-standard
Keywords—test data generation, genetic algorithm, metrics of algorithms to solve the automation problem. For example, in
asserting code complexity [1] it is proposed to use a Constraint-Based Algorithm for the
Mort system, which uses error testing to find input test data.
I. INTRODUCTION Test data is selected in such a way as to determine the
Software engineering is a comprehensive, systematic presence or absence of certain errors.
approach to the development and maintenance of software. Quite often, genetic algorithms are used in one way or
When developing programs, the following stages are most another to solve this problem. The article [2] compares
often distinguished - analysis, design, programming and different methods for generating test data, including genetic
testing. At the stage of analysis, software requirements are algorithms, a random search method and other heuristic
determined and documentation is performed. At the design methods.
stage, the appearance of the program is detailed, its internal
functionality is determined, the product structure is In [3] to solve the problem, it is proposed to use
developed, and requirements for subsequent testing are Constraint Logic Programming and Symbolic Execution. In
introduced. Writing the source code of a program in one of [4], the Constraint Handling Rules are used to help in manual
the programming languages is done at the programming verification of problem areas in the program.
stage.
Some researchers use heuristic methods to automate the
One of the most important steps in developing software testing process using a data-flow diagram. Studies of
products is testing. Important goals of testing are the automation methods using this diagram were done in articles
compliance of the developed program with the specified [5, 6, 7, 8]. In [5] it is proposed to additionally use genetic
requirements, adherence to logic in the data processing algorithms to generate new input test data sets based on
processes and obtaining correct final results. Therefore, for previously generated ones.
testing it is very important to generate input test data, on the
In articles [9, 10] it is proposed to use hybrid methods for
basis of which the program will be checked for errors and
generating test data. In [9], an approach is used that combines
compliance with specified requirements. To esteem the
strategies of Random Strategy, Dynamic Symbolic Execution
quality of the input data a code coverage indicator is used,
and Search-Based Strategies. The article [10] proposes a
that is percentage of the entire program can the test sets
theoretical description of the search method using the genetic
“cover”. It is determined by the ratio of the tested operations
algorithm. The approaches to search for local and global
to the total number of operations in the code.
extrema on real programs are considered. A hybrid approach
Some software code testing processes are improving quite for generating test data is proposed - a Memetic Algorithm.
slowly. The development of most types of test scenarios is
The approach in [11] uses a hybrid intelligent search
most often done manually, without the use of any automation
algorithm to generate test data. Proposed approach center on
systems. Because of this the testing process becomes
the methods of Branches and Borders and the Hill Climbing
incredibly complicated and costly both in time and in
to improve intellectual search.
finances, if you approach it in all seriousness. Up to 50% of
all time costs can be spent on testing some programs. There are also studies using machine learning, for
example, in [12]. It proposes a method using a neural network
Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science
and user-customizable clustering of input data for sequential chromosomes. But to increase the rate of convergence of the
learning. solution, the initial population can be specified in a certain
way, or random values can be analyzed in advance to exclude
Novelty Search can also be used to generate test data. In definitely inappropriate ones.
the article [13] it is proposed to use this approach to evaluate
large spaces of input data and is compared with approaches Population assessment. Each of the chromosomes is
based on the genetic algorithm. evaluated by a fitness function. Based on the given
requirements, chromosomes get the exact value of how well
The possibilities of generating test data for testing web they correspond to the problem being solved.
services are also being investigated, for example, in the
WDSL specification [14]. Selection. After each of the chromosomes has its own
fitness value, the best chromosomes are selected.
For the convenience of generating test data, UML
Selection can be done by different methods, for
diagrams are also used [15, 16]. The articles suggest using
example, from the sorted in order first n chromosomes
genetic algorithms to generate triggers for UML diagrams
are selected, or only the most suitable, but not less
that will allow to find a critical path in the program. The
than n, etc.
article [17] proposes an improved method based on a genetic
algorithm for selecting test data for many parallel paths in Crossing. [23]. The first is a significant difference
UML diagrams. from standard optimization methods. After selection of
In addition to UML diagrams, the program can be chromosomes suitable for solving the problem, they
described as a Classification-Tree Method developed by crossing. Random chromosomes from all the "chosen
Grochtmann and Grimm [18]. In [19] the problem of ones" randomly generate new chromosomes. Crossing
constructing trees is considered and an integrated occurs on the basis of the choice of a certain position
classification tree algorithm is proposed, and in [20] was in two chromosomes and the replacement of parts of
investigated the developed ADDICT prototype (short for each other. After the required number of chromosomes
AutomateD test Data generation using the Integrated is generated to create a population, the algorithm
Classification-Tree methodology) for an integrated approach. proceeds to the next step.
This article proposes a comparison of different methods Mutation. [24]. Also the step specific to GA. In a
for evaluating code complexity for generating test data. The random order, a random gene can change values to a
article is structured as follows. Section 2 introduces random one. The main point in a mutation is the same
terminology and provides basic information on the genetic as in biology - to bring genetic diversity into a
algorithm. The third section sets the problem to be solved and population. The main goal of mutations is to obtain
introduces one of the methods for assessing code complexity. solutions that could not be obtained with existing
Section 4 proposes the results of the operation of the input genes. This will allow, firstly, to avoid falling into
data generation method using the introduced code estimation local extremes, since a mutation can allow the
method. In section 5 there is comparing of different code algorithm to be transferred to a completely different
evaluation methods. branch, and secondly, to “dilute” the population in
order to avoid a situation where in the whole
II. GENETIC ALGORITHM population there will be only identical chromosomes
that will not generally move towards a solution.
Formally, the genetic algorithm is not an optimization
method, at least in the understanding of classical optimization After all the steps have been passed on, it is defined
methods. Its purpose is not to find the optimal and best whether the population has reached the desired accuracy of
solution, but to find close enough to it. Therefore, this the decisions or has come to limit the number of populations,
algorithm is not recommended to be used if fast and well- and if so, the algorithm stops working. Otherwise, the cycle
developed optimization methods already exist. But at the with the new population is repeated until the conditions are
same time, the genetic algorithm perfectly shows itself in achieved.
solving non-standardized tasks, tasks with incomplete data or
for which it is impossible to use optimization methods III. PROBLEM DESCRIPTION
because of the complexity of implementation or the duration The use of genetic algorithms in the testing process allows
of execution [21, 22]. to find the most complex parts of the program in which the
A genetic algorithm is considered to be completed if a risks due to errors are greatest. Evaluation occurs due to the
certain number of iterations is passed (it is desirable to limit use of the fitness function, the parameters of which are the
the number of iterations, since the genetic algorithm works on weights of each passable operation. Definition of weights, i.e.
the basis of trial and error, which is a rather lengthy process), the complexity of the program code, occurs due to various
or if a satisfactory value of the fitness function was obtained. metrics used depending on the requirements for the input sets.
As usual a genetic algorithm solves the problem of The task of generating input test data consists of three
maximizing or minimizing and the adequacy of each solution subtasks:
(chromosome) is evaluated using the fitness function.
1. Search for input data for passing along one of the most
The genetic algorithm works according to the following complex code paths. Difficulty is determined by the
principle: chosen metric for code evaluation;
Initialization. A fitness function is introduced. An initial 2. The exclusion or reduction of the weights of
population is being formed. In classical theory, the initial operations on the path for which the data were
population is formed by randomly filling each gene in the selected, based on the fitness function for other paths;
VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 300
Data Science
3. Generation of input test data for many paths of always maximum (random values were limited to 100), the
program code. second value is less than the first, but more than the third.
The limit on the number of sets of input data is
established after the development stage and will allow to
concentrate on certain paths. TABLE I. COMPARISON OF RESULTS
Population Test 1 Test 2 Test 3 Test 4
The whole algorithm is performed cyclically - the 0 1: 78, 23, 35 1: 97, 3, 6 1: 92, 97, 28 1: 15, 67, 26
procedure for searching for input data for one path is started, 2: 62, 36, 95 2: 82, 77, 64 2: 38, 66, 52 2: 32, 27, 83
after which operations in this path are excluded from further 3: 52, 35, 27 3: 24, 47, 57 3: 63, 76, 64 3: 37, 52, 64
4: 17, 77, 73 4: 90, 13, 82 4: 7, 24, 56 4: 70, 49, 64
calculations and the data search for one path is started again. 5: 75, 9, 96 5: 81, 69, 24 5: 57, 48, 8 5: 67, 29, 94
As one of the ways to determine the complexity of the 20 1: 95, 64, 54 1: 97, 80, 4 1: 99, 13, 10 1: 99, 71, 45
2: 95, 64, 29 2: 97, 80, 53 2: 99, 13, 11 2: 99, 71, 15
code, an method is proposed that works as follows: 3: 95, 64, 54 3: 97, 80, 28 3: 99, 13, 11 3: 99, 71, 3
The first operation is assigned a weight of, for 50 1: 95, 64, 54 1: 97, 80, 29 1: 99, 13, 10 1: 99, 71, 60
2: 95, 64, 29 2: 97, 80, 4 2: 99, 13, 11 2: 99, 71, 3
example, 100 units. 3: 95, 64, 54 3: 97, 80, 53 3: 99, 13, 11 3: 99, 71, 3
Each subsequent operation is also assigned a weight - Result
(100)
1: 95, 64, 54 1: 97, 80, 4
2: 95, 64, 29 2: 97, 80, 29
1: 99, 13, 10
2: 99, 13, 11
1: 99, 71, 60
2: 99, 71, 45
if there are no conditions or cycles, the weight is taken
in accordance with the previous operation. V. COMPARISON OF METHODS FOR ASSESSING THE
Conditions share the weight in accordance with the COMPLEXITY OF PROGRAM CODE
rule - if the condition contains only one branch (only if For the researching, several tests of the algorithm with
...), then the weight of each operation is reduced by four different metrics were carried out - a modified metric,
80%. If the condition is divided into several branches the logic of which was described in Section 3, SLOC metrics
(if ... else ...), then the weight is divided into equal for evaluating the number of lines of code, ABC metrics and
parts - for two branches 50% / 50%, for three 33% / Jilb metrics.
33% / 33%, etc.
The metric SLOC (abbr. Source Lines of Code) is
The weights of operations in the cycle remain, but can determined by the number of lines of code. This metric takes
also be multiplied by certain weights, if necessary. into account only the total number of lines of code in the
program, which makes it the easiest to understand. In this
All nested restrictions are summed, for example, for case, the number of lines refers to the number of commands,
two nested conditions the weight of operations will be and not the physical number of lines.
80% * 80% = 64%
The ABC metric, or Fitzpatrick metric, is a metric that is
Assigned weights can be used to develop test cases using determined based on three different indicators ABC = . The first indicator na (Assignment) is allocated for lines
weight assigned on one or another branch for certain values of code that are responsible for assigning variables a certain
of input parameters. value, for example, int number = 1. The indicator nb (Branch)
For convenience, we introduce the following notation: is responsible for using functions or procedures, that is,
operands that work out of sight of the current program code.
X - data sets; The indicator nc (Condition) calculates the number of logical
F (X) is the value of the fitness function for each data set operands, such as conditions and loops. The metric value is
depending on the calculated values of the weights. calculated as the square root of the sum of the squared values
of na, nb and nc.
The challenge is to maximize the objective function, i.e. F
(X) → max F = √n𝑎 2 + n𝑏 2 + n𝑐 2 (1)
It is noteworthy that one line of code can be taken into
IV. THE RESULTS OF THE METHOD account in different parameters, for example, when assigning
In accordance with the previously proposed option for a variable the value of a certain function (double number =
assessing the complexity of program code, this method is Math.Pow (2, 3) is assigned both in na and nb). The
being finalized to better meet real requirements. Weights are disadvantages of this metric include the possible return of a
considered in accordance with the operability of the program, zero value in some parts of the code.
in other words, the more iterations the program performs, the The Jilb metric is aimed to determine the complexity of
more weight the initial test version will have. program code depending on its saturation with conditional
The first population is formed by random values. Each operands. This metric is useful for determining the
population contains 100 chromosomes. The total number of complexity of program code, both for writing and for
the population is also 100. Due to this, a sufficient number of understanding it:
different options will be formed and the best ones will be
F = 𝑐𝑙/𝑛, (2)
selected. Table 1 presents the test results.
where cl – the number of conditional operands, n – the total
In each of the tests, at least two different versions of the number of lines of code.
data were generated, in which the considered program code
For testing, code is used with many different paths, where
will work the most times, which means that it will go the
one is critical which has the largest number of operations. The
greatest number of times in different ways. In addition, you
selection of input data for this path will be a solution to the
can see certain patterns in the results - the first value is
subtask and from this data it will be possible to determine
VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 301
Data Science
how accurately the data is selected. The critical path will be lines of code. The results are presented in table 3. The
reached if the 1st and 3rd values from the selected data are algorithm with this metric picked up 3 sets - (63, 72, 91), (68,
greater than 50 and 1 value is less than 3. 50, 94) and (80, 70, 88). All three satisfy the conditions for
passing along the critical path.
The following genetic algorithm settings are used to
generate input test data: As with the previous metric, the algorithm in the first
Number of generations – 100 generation picked up suitable data.
Number of populations in one generation – 100 C. ABC metric
Range of received data values – (0, 100)
This metric takes into account more variations of the
A. Results using the metric proposed in the article values, such as assigning values to variables, logical checks
An algorithm with this metric selects data with a priority and function calls. The algorithm with the ABC metric picked
of operations of a higher level. As a result (99th generation), up 2 options for the input data that pass along the critical path
two data sets were obtained - (70, 9, 78) and (75, 67, 82). - (69, 46, 78) and (77, 36, 98). The remaining results are
Both sets go along the longest code path, which is the presented in table 4.
solution to the subtask. Table 2 presents the first 10 options in TABLE IV. RESULTS OF METHOD WITH ABC METRIC
each of the generations. ABC metric
TABLE II. RESULTS OF THE PROPOSED METRIC Variant\Gen. 0 1 99
Modified metric 1 (95, 27, 97) (77, 36, 98) = (69, 46, 78) =
= 6 351 6 351 6 351
Variant\Gen. 0 1 99
2 (77, 36, 98) (69, 46, 78) = (77, 36, 98) =
1 (70, 9, 78) = 164 (75, 67, 82) = (70, 9, 78) =
= 6 351 6 351 6 351
100 164 100 164 100
3 (69, 46, 78) (61, 65, 95) = (69, 46, 78) =
2 (75, 67, 82) = (61, 29, 94) = (70, 9, 78) =
= 6 351 6 351 6 351
164 100 164 100 164 100
4 (61, 65, 95) (95, 27, 97) = (69, 46, 78) =
3 (61, 29, 94) = (63, 52, 87) = (75, 67, 82) =
= 6 351 6 351 6 351
164 100 164 100 164 100
5 (5, 67, 92) = (61, 65, 95) = (69, 46, 78) =
4 (63, 52, 87) = (63, 49, 83) = (75, 67, 82) =
3 538 6 351 6 351
164 100 164 100 164 100
6 (5, 87, 95) = (95, 27, 97) = (69, 46, 78) =
5 (63, 49, 83) = (70, 9, 78) = (75, 67, 82) =
3 538 6 351 6 351
164 100 164 100 164 100
7 (1, 35, 60) = (61, 65, 95) = (69, 46, 78) =
6 (5, 68, 90) = (63, 52, 87) = (75, 67, 82) =
3 538 6 351 6 351
96 382 164 100 164 100
8 (1, 70, 53) = (69, 46, 78) = (77, 36, 98) =
7 (60, 37, 3) = (70, 9, 78) = (75, 67, 82) =
3 538 6 351 6 351
32 500 164 100 164 100
9 (60, 30, 12) (69, 46, 78) = (69, 46, 78) =
8 (12, 80, 49) = 16 (70, 9, 78) = (70, 9, 78) =
= 768 6 351 6 351
000 164 100 164 100
10 (60, 49, 73) (69, 46, 78) = (69, 46, 78) =
9 (47, 12, 17) = 16 (70, 9, 78) = (70, 9, 78) =
= 768 6 351 63 51
000 164 100 164 100
10 (53, 35, 76) = 16 (61, 29, 94) = (75, 67, 82) = D. Jilb metric
000 164 100 164 100
Unlike previous metrics, this one takes into account the
Can be seeing that the algorithm works quite efficiently absolute complexity of the program, which is calculated by
and already in the first generation the data was selected for dividing the number of cycles and conditions by the total
the critical path. number of operations on the way. The complexity of the
B. SLOC metric program is determined in a completely different way, which
led to the fact that the input data was selected for a different
TABLE III. RESULTS OF METHOD WITH SLOC METRIC
SLOC metric
path. The results are presented in table 5.
Variant\Gen. 0 1 99 TABLE V. RESULTS OF METHOD WITH JILB METRIC
1 (64, 14, 96) = (68, 50, 94) = (63, 72, 91) = Jilb metric
6 411 6 411 6 411 Variant\Gen. 0 1 99
2 (68, 50, 94) = (80, 70, 88) = (68, 50, 94) = 1 (75, 51, 3) = (62, 25, 41) = (78, 45, 21) =
6 411 6 411 6 411 100 100 100
3 (80, 70, 88) = (65, 81, 89) = (68, 50, 94) = 2 (92, 33, 11) (94, 22, 35) = (63, 36, 10) =
6 411 6 411 6 411 = 100 100 100
4 (65, 81, 89) = (63, 72, 91) = (63, 72, 91) = 3 (94, 22, 35) (98, 51, 12) = (75, 51, 3) =
6 411 6 411 6 411 = 100 100 100
5 (63, 72, 91) = (74, 83, 76) = (80, 70, 88) = 4 (98, 51, 12) (80, 42, 20) = (80, 42, 20) =
6 411 6 411 6 411 = 100 100 100
6 (74, 83, 76) = (64, 69, 91) = (68, 50, 94) = 5 (80, 42, 20) (78, 45, 21) = (78, 45, 21) =
6 411 6 411 6 411 = 100 100 100
7 (64, 69, 91) = (69, 88, 85) = (68, 50, 94) = 6 (78, 45, 21) (80, 59, 8) = (63, 36, 10) =
6 411 6 411 6 411 = 100 100 100
8 (69, 88, 85) = (64, 14, 96) = (63, 72, 91) = 7 (80, 59, 8) = (5, 40, 27) = (80, 42, 20) =
6 411 6 411 6 411 100 100 100
9 (5, 39, 72) = (63, 72, 91) = (63, 72, 91) = 8 (5, 40, 27) = (99, 38, 29) = (78, 45, 21) =
3 618 6 411 6 411 100 100 100
10 (2, 67, 73) = (68, 50, 94) = (80, 70, 88) = 9 (99, 38, 29) (62, 25, 41) = (75, 51, 3) =
3 618 6 411 6 411 = 100 100 100
10 (62, 25, 41) (63, 36, 10) = (80, 42, 20) =
This metric is the simplest from the point of view of = 100 100 100
implementation, it takes into account only the total number of
VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 302
Data Science
The data obtained are very different both with other REFERENCES
metrics and within the metric. This is due to the features of [1] A.D. Richard and A.O. Jefferson, “Constraint-Based Automatic Test
the tested code - it has one common loop, within which there Data Generation,” IEEE Transactions on Software Engineering”, vol.
17, no. 9, pp. 900-910, 1991.
is one common condition. If this condition is not met, then
[2] P. Maragathavalli, M. Anusha, P. Geethamalini and S. Priyadharsini,
none of the operations, except the cycle and conditions, will “Automatic Test-Data Generation for Modified Condition. Decision
be taken into account when calculating the metric. A value of Coverage Using Genetic Algorithm,” International Journal of
100 indicates that among all operations on the path, all are Engineering Science and Technology, vol. 3, no. 2, pp. 1311-1318,
cycles or conditions, i.e. formally selected input data are 2011.
options when the first condition was not met and other [3] C. Meudec, “ATGen: Automatic Test Data Generation using
operations were not taken into account. Constraint Logic Programming and Symbolic Execution,” Software
Testing Verification and Reliability, 2001.
VI. CONCLUSION [4] R. Gerlich, “Automatic Test Data Generation and Model Checking
with CHR,” 11th Workshop on Constraint Handling Rules, 2014.
Evolutionary methods work in such a way as to find the [5] M.R. Girgis, “Automatic Test Data Generation for Data Flow Testing
best solutions to problems that are impossible or too costly to Using a Genetic Algorithm,” Journal of Universal Computer Science,
solve with standard optimization methods. They do not vol. 11, no. 6, pp. 898-915, 2005.
always work quickly or efficiently, but in problems with non- [6] E.J. Weyuker, “The complexity of data flow criteria for test data
standard approaches it shows superiority. selection,” Inf. Process. Lett., vol. 19, no. 2, pp. 103-109, 1984.
[7] A. Khamis, R. Bahgat and R. Abdelazi, “Automatic test data
The introduction of various metrics for calculating the generation using data flow information,” Dogus University Journal,
fitness function made it possible to add a method for vol. 2, pp. 140-153, 2011.
generating input test data of greater variability and the ability [8] S. Singla, D. Kumar, H. M. Rai and P. Singla, “A hybrid pso
to introduce new data requirements. Each metric is focused approach to automate test data generation for data flow coverage with
on specific code parameters and can be used when data must dominance concepts,” Journal of Advanced Science and Technology,
vol. 37, pp. 15-26, 2011.
be selected in accordance with certain requirements. In
addition, in the case when the metric does not select data [9] Z. Liu, Z. Chen, C. Fang and Q. Shi, “Hybrid Test Data Generation,”
State Key Laboratory for Novel Software Technology, ICSE
efficiently, it is possible to use other metrics that can overlap Companion Proceedings of the 36th International Conference on
each other’s shortcomings. Software Engineering, pp. 630-631, 2014.
[10] M. Harman and P. McMinn, “Theoretical and Empirical Study of
All analyzed metrics, with the exception of the Jilb Search-Based Testing: Local, Global, and Hybrid Search,” IEEE
metric, generated several data sets for the critical path that Transactions on Software Engineering, vol. 36, no. 2, pp. 226-247,
was originally selected. It is noticeable that metrics for a 2010.
small code of 130 lines with several code paths successfully [11] Y. Xing, Y. Gong, Y. Wang and X. Zhang, “Hybrid Intelligent
select data in the first generation, which indicates a rather Search Algorithm for Automatic Test Data Generation,”
high convergence rate of the algorithm. In subsequent Mathematical Problems in Engineering, 2015.
generations, various options are sequentially eliminated. [12] C. Paduraru, and M.C. Melemciuc, “An Automatic Test Data
Generation Tool using Machine Learning,” 13th International
The conducted studies allow to propose a new method for Conference on Software Technologies (ICSOFT), pp. 472-481, 2018.
generating test data based on the genetic algorithm, in which [13] M. Boussaa, O. Barais, G. Sunyé and B. Baudry, “Novelty Search
the fitness function will be formed not on the basis of one of Approach for Automatic Test Data Generation,” 8th International
the known metrics for assessing code complexity (as in this Workshop on Search-Based Software Testing, 2015.
paper), but on the basis of a hybrid metric, which is a [14] M. Lopez, H. Ferreiro and L.M. Castro, “DSL for Web Services
Automatic Test Data Generation,” 25th International Symposium on
weighted sum of the indicators present in metrics considered Implementation and Application of Functional Languages, 2013.
in this paper. It also seems promising in terms of increasing [15] C. Doungsa-ard, K. Dahal, A.G. Hossain and T. Suwannasart, “An
the degree of code coverage by creating an effective automatic test data generation from UML state diagram using genetic
mechanism for regulating (increasing and decreasing) the algorithm,” IEEE Computer Society Press, pp. 47-52, 2007.
weights of operations in the fitness function while increasing [16] S.Sabharwal, R. Sibal and C. Sharma, “Applying Genetic Algorithm
the nesting level of the code section. for Prioritization of Test Case Scenarios Derived from UML
Diagrams,” IJCSI International Journal of Computer Science Issues,
In the future, it is planned to expand ways to determine vol. 8, no. 3-2, 2011.
the complexity of the code. In addition to using metrics [17] C. Doungsa-ard, K. Dahal, A. Hossain and T. Suwannasart, “GA-
directly, it is planned to develop a method for taking into based Automatic Test Data Generation for UML State Diagrams with
account indicators of the number of operations, functions, Parallel Paths,” Part of book Advanced design and manufacture to
gain a competitive edge: New manufacturing techniques and their
conditions and cycles with different weights. It is also role in improving enterprise performance, pp. 147-156, 2008.
possible to establish the degree of reduction or increase in the [18] M. Grochtmann and K. “Grimm, “Classification trees for partition
weights of operations at different levels of nesting. This will testing. Software Testing,” Verification and Reliability, vol. 3, no. 2,
allow you to set the priority for input generation when certain pp. 63-82, 1993.
requirements arise. [19] T.Y. Chen, P.L. Poon and T.H. Tse, “An integrated classification-tree
methodology for test case generation,” International Journal of
ACKNOWLEDGMENT Software Engineering and Knowledge Engineering, vol. 10, no. 6, pp.
647-679, 2000.
The reported study was funded by RFBR, project number
[20] A. Cain, T.Y. Chen, D. Grant, P.L. Poon, S.F. Tang and T.H. Tse,
19-37-90156. The research is supported by Ministry of “An Automatic Test Data Generation System Based on the Integrated
Science and Higher Education of Russian Federation (project Classification-Tree Methodology,” Software Engineering Research
No. FSUN-2020-0009) and Applications, Lecture Notes in Computer Science, vol. 3026,
2004.
[21] K.Serdyukov and T. Avdeenko, “Investigation of the genetic
algorithm possibilities for retrieving relevant cases from big data in
VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 303
Data Science
the decision support systems,” CEUR Workshop Proceedings, vol. [23] W.M. Spears, “Crossover or mutation?” Foundations of Genetic
1903, pp. 36-41, 2017. Algorithms, vol. 2, pp. 221-237, 1993.
[22] R.S. Praveen and K. Tai-hoon, “Application of Genetic Algorithm in [24] H. Mühlenbein, “How genetic algorithms really work: Mutation and
Software Testing,” International Journal of Software Engineering and hillclimbing,” Parallel Problem Solving from Nature, vol. 2, 1992.
Its Applications, vol. 3, no. 4, pp. 87-96, 2009.
VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 304