Data Mining and Big Data Software testing based on global search of several variables functions discontinuity Kovartsev A.N., Popova-Kovartseva D.A., Gorshkova Е.Е. Samara State Aerospace University Abstract. Testing of software products (SP) is one of the most important stages of SP lifecycles, when it can be found not only the errors, connected to computational algorithm coding process, but also those ones input at the stages of mathematical model (MM) development. The idea of the proposed software testing algorithm, directed on detection of fatal errors in MM, is based on a global search of several variables functions discontinuity. Moreover, a heuristic characteristic function, that takes into account the peculiarities of the problem, is used, which can significantly increase the efficiency of the error situations search algorithm. Keywords: Global search algorithm, discontinuity point, mathematical model, software module, optimization function. Citation: Kovartsev A.N., Popova-Kovartseva D.A., Gorshkova Е.Е. Software testing based on global search of several variables functions discontinuity. Proceedings of Information Technology and Nanotechnology (ITNT-2015), CEUR Workshop Proceedings, 2015; 1490: 389-396. DOI: 10.18287/1613- 0073-2015-1490-389-396 1. Introduction Modern mathematical models (MM) allow to obtain extensive and highly accurate information about the processes occurring in nature and in technology by calculation. Computational experiments carried out with the mathematical model implemented as a computer program provides research shortening and its cost reduction. MM used in the calculations are finally realized in the form of software products (SP). Errors (including computational by nature errors) committed during the development of a mathematical model are directly inherited into correspondent SP [1, 2]. In this connection, it is appropriate to combine the troubleshooting procedures in mathematical models with the process of testing the respective SP. If we consider a class of programs that are based on numerical analysis and computational mathematics, i.e., programs that implement some of mathematical models, the most difficult identifiable errors will be fatal computational errors such as "division by zero", resulting in exponent overflow and program execution failure [3]. Testing of these programs is a challenging task since the area of erroneous combinations of input data in this case is small, and the probability of accidental 389 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… exposure in this area of one of the test data set points, implemented, for example, by means of stochastic methods of testing, is negligible. At the same time, order loss or overflow errors can be efficiently traced when viewed from the perspective of solving the problem of search of second order discontinuity points and use of algorithms for global optimization (GO) of several variables functions [4]. 2. Problem statement and basic definitions Formally, the computer module can be interpreted as a vector function that acts from set of current module inputs to set of calculated values The module domain will be described as the Cartesian product of the domains of each parameter where – the domain of the n-th module parameter. By analogy, the set of admissible values of the function F (X ) is presented by where - the range of the m-th component of the vector function. We will define the area of error situations as the set wherein when there is a fatal computing error in a software module F (X ) . In this case, if fi ( X )   then the function has a second-order discontinuity. The measure of error situations is zero for second-order discontinuity points, which creates serious difficulties in finding errors of this type. These errors are classified as "rare" in [3]. A comprehensive test focused on detecting "rare" errors may require a huge number of function tests, almost exhaustive search of all values of the module source data. This approach could not be implemented within a reasonable time. The idea of the method of the second order discontinuity points searching [3] is based on the assumption that the second-order discontinuity points for mathematical functions could be found by solving one of optimization problems: max f k ( X ) ( min f k ( X ) ), k  1, m (1) X X Indeed, the function value at the second-order discontinuity point is designated as   . It could be associated for the computer with a large maximum positive or negative number, for example, the global extreme of the function in the broadest sense of the word. Applying the methods of global optimization, we are sure to discover function second-order discontinuity points. 3. An optimization approach for solving the problem of finding the second-order discontinuity points The most effective method among the well-known one-parameter methods of multiextremal optimization is information-statistical method of R.G. Strongin [5]. This method is based on the use of the approximate posterior probability distribution of the global extremum location, which is formed during a function testing that implements a more balanced strategy for function global minimum search. This 390 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… strategy is so effective that it is often transferred from one-dimensional case to the case of several variables functions optimization. To simplify the situation, we will consider the scalar function f ( X )  max fi ( X ) . For optimizable function we assume it to be a Lipschitz function with constant K, that means the condition f ( x)  f ( x)  K x  x is fulfilled. In [5] it is shown that the function extremum search is realized by maximizing of a fairly simple characteristic function: 2 ( yi  yi1 ) R(i )  m( xi  xi1 )   2( yi  yi1 ) , (2) m( xi  xi1 ) where m - the Lipschitz constant estimate, which is calculated during the function extremum search: 1 M  0, yi  yi 1 m M  max , rM M  0, i ( xi  xi 1) where r – parameter. To solve the problem of finding of several variables functions discontinuity irreparable points, we will choose the characteristic function similar to (2). From the point of solving the problem of finding of second-order discontinuity points the Strongin method didn’t manage to be effective because this method successful for Lipschitz functions, that are continuous by definition, loses its properties being applied to a class of discontinuous functions. Let us analyze the characteristic function (2), which was deduced during information-statistical approach. For simplicity, we assume r  1 . Formula (2) is presented in the following form: 2 y i R(i )  mxi   2( y i  y i 1 ), xi  ( xi  xi 1 ), y i  ( y i  y i 1 ). mxi Providing that yi m  max  max f ( xi )  max f i , i x i i i we have 2 f i R(i )  max f i xi  xi  2( yi  yi1 ). max f i Using the first two terms of the Taylor expansion of optimizable function at the point xi 1 ( yi  yi 1  fi xi ), we obtain 2 (max f i  f i ) R(i )  xi  4 yi1. max f i Hense, 391 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… 2 (max f i  fi ) R(i )  xi  4 yi 1. (3) max fi Analysis of the formula (3) shows that the characteristic function "inclined" to post test points in the vicinity of local minima of a function to be optimized, while tending towards the global minimum, or in uncertainty intervals, the dimensions of which are large compared to other intervals of the function. We modify the characteristic function (3), based on the following assumptions. Firstly, the algorithm of second-order discontinuity points search, in principle, should not rely on local extrema of the function, it should react to a rapid increase or decrease in the function growth rate. Secondly, the function itself does not contain any information about its future behavior. The first-order derivative shows the function growth rate, while the second- order derivative - acceleration of this growth. It is obvious that the second-order derivative is more informative in this sense, therefore, we will choose ( fi ) as the 2 first component of the characteristic function. Squaring the second-order derivative is due to the fact that we don’t need information about the sign of the function curvature at the point of discontinuity. As a measure of uncertainty of partial interval, by analogy with the Strongin method, we will choose the length of the interval, improved by certain adjustment parameter r. On this basis we can choose the function R(i)  ( f ( xi ))2 (xi ) r as the characteristic function of the irreparable discontinuity points search method. In a multidimensional case, for example for a function of 2 variables, the second-order differential could be used instead of the second-order derivative, and a characteristic function could be represented in the form: (4) 4. Second-order discontinuity points search algorithm Let us choose a unit square П  [0, 1]  [0, 1] , which is proportionally divided into four smaller squares during the fission process, as the errors searching range in computing module. To calculate the second differential of the function by its values, calculated at the nodal points of the grid, it is easy to build a complete second-order polynomial , then Summarizing the results, it is possible to construct an algorithm of finite- differences method (FDM) of second-order discontinuity points search: Step 1. Regular full interpolation grid {X ij } i, j  0,1, 2 is formed for the current square Values of the function at the vertices of the square and in the center were known while building on the previous step of the algorithm. We should only calculate the function values at the midpoints of the edges of a square. As a result, we have (k ) (k ) Z  zi, j , i, j  0, 1, 2 . A source square bounded by regular interpolation grid 392 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… nodes {X ij } i, j  1, 2,3, is divided into four new squares Qk  Qk 1  Qk 2  Qk 3  Qk 4 . Step 2. Using the matrix an interpolation polynomial is constructed for c In the centers X k l of the squares 1, 2,3,4, the characteristic function is calculated R(k  l )  (d P2, 2 ( X k l )) (hk ) . Here hk - the length of the 2 (k ) c 2 2r squares side Qk l . Step 3. Calculated values of the characteristic function are entered in the list R in descending order of the function values, i.e. ( i R(i )  R(i  1) . Step 4. As a "decisive" item from the list R we choose one with a maximum characteristic, i.e., R (1). Thus, it is deleted from the list R. Step 5. If algorithm stop condition is not fulfilled ( (max zi  Msup )  (min hi  ) ), then go to step 1. i i Stop condition of the algorithm described in step 5 contributes a conclusion of its work, if the test function is outside the domain of the function, which is tantamount to the appearance of the error. 5. Computational experiments Test functions shown in Table 1 were considered to evaluate the effectiveness of the FDM algorithm. The first two test functions were constructed of known functions by their modifications. For example, the known modification of Rastrigin test function [6], which has 625 local minimum and one global in the unit square, has been converted into a function having a large number of local extrema and one "slit- shaped" second-order discontinuity. The second test function was constructed using division of one by Griewank function, which gave rise to a large number of second-order discontinuity points. The third Kovartsev test function was formed by the addition of a single second-order discontinuity point to a linear combination of error functions generating smooth function with more than 20 local extrema. In literature the efficiency of search algorithms is typically evaluated using the operating characteristics machine [7]. Operating characteristic is a dependence of an error detection probability on the number of iterations of the algorithm. In our case it depends on the amount of calls to the tested function Since the second-order discontinuity points can be found in any of the global optimization algorithms the efficiency of the proposed algorithm FDM was compared with the efficiency of direct method GO, for example, the modified bisection method (BM) [8]. Operating characteristics of FDM and BM methods built for test functions 1-3 (table 1) are presented in figure 1. Operating characteristics of the FDM algorithm are denoted by solid line in the figure 1, operating characteristics of the BM algorithm - by dashed line. 393 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… Table 1. A set of test functions Function General view f ( x1, x2 )  1/(10sin2 (z1) Modified Rastrigin function:  ( z1  1)2 (1  10sin2 (z2 ))  ( z2  1)2 ), z1  1  (11  20( x1  a  0.55)) / 4; x1, x2 [0; 1] . z 2  1  (11  20( x2  b  0.55)) / 4; A set of local extrema. "Slit-shaped" second-order discontinuity. Modified Griewank function: f ( x1 , x2 )  1 /((x12  x22 ) / 400 x , x [0; 1] . 1 2  cos(x1 ) cos(x2 / 2 )  1) A set of second-order discontinuity points. Test Kovartsev function: 19 ( x1  a1i ) 2  ( x2  a 2 i ) 2  f ( x1, x2 )   (i  1)e 0.01 x1, x2 [0; 1] . i 0 ( x1  b1 )  ( x 2  b2 ) 2 2   1/(1  e 0.01 ) 20 local extrema. One second-order discontinuity point. GKLS test functions. Continuous twice differentiable function. 10 local extrema. One global extremum. No points of discontinuity is observed. As we can see from the graphs, the efficiency of the proposed FDM algorithm is much greater than the efficiency of the bisection method for functions #1 and #3. It happens as bisection method is focused on the optimization of continuous functions conceptually, which forces it to make more detailed "view" of the function areas with Lipschitz constant evaluation increase. This situation arises whenever the function is calculated at points of its discontinuity. FDM method conversely is looking for areas of rapid growth of the test function. Function 2 (Griewank) has so many points of irreparable discontinuity that their detection is easily implemented with the same efficiency using FDM and BM methods. According to Fig. 1, the BM method has a slight advantage over the FDM method. The significant advantage of FDM over BM was recorded for all other functions. Figure 2 shows the operating characteristics of these methods, built for continuous GKLS test function. 394 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… Fig. 1. – Operating characteristics of FDM and BM algorithm Fig. 2. – Operating characteristics of algorithms for continuous functions It is clear from the figure, the efficiency of FDM algorithm for continuous functions is much lower than efficiency of BM algorithm. The finite difference method is forced to examine more carefully the space of the optimized variables in 395 Information Technology and Nanotechnology (ITNT-2015) Data Mining and Big Data Kovartsev A.N., Popova-Kovartseva D.A… case of continuous functions when there are no second-order discontinuity points (test software module has no errors). In this connection, there is an idea to use the FDM and BM algorithms for computational modules testing in a combined form and stop the calculation when either the FDM finds an error or BM stops. Conclusion The algorithm of global search of second-order discontinuity points, designed to detect fatal errors in mathematical models of computational algorithms, is considered in this paper. The basic idea of the algorithm is the addition of a heuristic original characteristic function to a classical algorithm of global optimization, where the first is based on the analysis of the characteristic Strongin function and considers the peculiarities of the problem. It has significantly increased the efficiency of the error search algorithm. This result allows us to hope on the organization of the error search for the functions of large dimensions. In the future, we plan to develop FDM algorithm using modern supercomputer technologys. Acknowledgements This work was supported by the state Ministry of Education and Science of the Russian Federation as part of the activities of the Program of competitiveness improving of SSAU among the world's leading research and education centers for 2013-2020. References 1. Lipaev VV. Program testing. M.: Radio and connection, 1986; 296 p. [in Russian] 2. Kotliarov VP, Kolikova TV. Basis of software testing. M.: BINOM, 2006; 285 p. [in Russian] 3. Kovartsev AN. Automation of software development and testing. Samara State Aerospace University, 1999; 150 p. [in Russian] 4. Kovartsev AN, Popova-Kovartseva DA. On efficiency of parallel algorithms for global optimization of functions of several variables. Computer Optics, 2011; 35(2): 256-261. [in Russian] 5. Strongin RG. Global optimum search. M.: Znanie, 1990. [in Russian] 6. Abakarov ASh, Sushkov YuA. Adaptation of random search using autocatalytic curve. SPb.: SPbGU, 2005; 67-75. [in Russian] 7. Gergel VP, Strongin RG. Absolute. Software system for global optimization method studying. Textbook. Nizhegorodsky University Press, 1998; 141 p. [in Russian] 8. Kovartsev AN, Popova-Kovartseva DA, Abolmasov PV. Efficiency study of global parallel optimization for multivariable function. Vestnik NNGU, 2013; 3(1): 252-261. [in Russian] 396 Information Technology and Nanotechnology (ITNT-2015)