Data Science


 TERNARY TREES USAGE FOR COMPUTATIONAL
       EXPERIMENT DATA STORAGE

               A.N. Kovartsev1, D.A. Popova-Kovartseva1, E.E. Gorshkova1
                    1
                        Samara National Research University, Samara, Russia


       Abstract. Repository for a great number of the suspected function runs must of-
       ten be organized during the realization of numerical global optimization method
       or program modules testing methods. In this connection the problem of a fast-
       access search of required element arises. A way to solve the problem can be
       found through the organization of an effective data repository, which allows for
       rapid implementation of the search and insertion of new elements. The binary
       and ternary trees storage options are considered in the paper, their comparison
       is made, recommendations for use of analyzed data models are reported.

       Keywords: binary trees, ternary trees, data structures, global optimization.


       Citation: Kovartsev AN, Popova-Kovartseva DA, Gorshkova ЕЕ. Ternary trees
       usage for computational experiment data storage. CEUR Workshop Proceed-
       ings, 2016; 1638: 820-827. DOI: 10.18287/1613-0073-2016-1638-820-827


Introduction
Considerable volume of function "runs" must be stored during application of numeri-
cal global optimization method (GO) [1, 2] or program modules testing methods [3,
4]. The main requirement for GO algorithms and program modules testing methods is
the speed of the algorithm. Let us assume that the computation time of function f (x )
is sufficiently large. This situation is typical for solving optimization problems in
technical systems design process, such as gas turbine engines, aircraft, etc. Some tests
in the known GO algorithms are repeated in increments (for example, in bisection
method or its modified version [5]) and that leads to redundant computation of f (x ) .
A way to solve this problem can be found through the organization of an effective
data repository allowing for rapid implementation of the search and insertion opera-
tions of new elements.


1      Disadvantages of traditional binary trees usage

Let us agree, that by the test we mean function f (x) calculation at the point speci-
fied by optimization algorithm or function testing algorithm. Optimization methods


Information Technology and Nanotechnology (ITNT-2016)                                     820
Data Science                                     Kovartsev AN, Popova-Kovartseva DA. et al…


complexity is the total number of algorithm steps [6]. However, taking into account
the high advanced computers performance, by optimization algorithms complexity it's
more accurate to mean the total number of calls to the optimizable function, since the
algorithm steps can have different complexity, and each function call takes the same
amount of time for any algorithm. On the other hand, the number of function calls is
independent of the computer processing power that allows comparing algorithms to
be implemented on computers with different performance.
At the same time, each function call may require significant computing time depend-
ing on the complexity of the test function.
This article deals with the problem of data repository organizing for GO algorithm
based on the modified bisection method (BM) [5]. In the BM algorithm the search
space partitioning strategy may be defined as non-uniform space division with the use
of regular structures (see [1]). However, it is necessary to compute the function at the
same points of independent variables set.
In order to avoid repeated calculations, it is expedient to store the point coordinates
and function values at them in a separate data repository. Thus the data array may
have considerable size. In this case there is the problem of finding an element in the
array. For example, the average complexity of sequential search in an unordered and
in an ordered array is O(n) [7].
Practically, binary trees (b-trees) are generally used to increase the efficiency of the
search algorithm [8, 9]. Binary search for balanced binary trees [10] has an average
complexity proportional to O(log 2 n) and depends on the depth of a binary tree.
By the depth of a binary tree is generally understood [10] the number of nodes on the
longest path from the root of the tree to its leaves. However, usage of binary trees for
data storing at the test points of the tested or optimized function has its own aspects.
In particular, a certain procedure of data entry or additional sorting of data is required
to build an optimal balanced binary tree even for regular grids. For the irregular grids
of experiment data sorting is essential.
Figure 2 shows the balanced binary tree built for the regular grid (see. Fig. 1) using
one of the rational methods of construction of the balanced binary tree.


               Fig. 1. Strategy of the balanced binary tree building on a regular grid

In addition, computational experiment must comply with the sequence shown in Fig-
ure 1. The calculations start from a point with coordinates (2, 2). A binary tree shown
in Figure 2 corresponds to this plan.


Information Technology and Nanotechnology (ITNT-2016)                                    821
Data Science                                  Kovartsev AN, Popova-Kovartseva DA. et al…


However, the gradual detalization of the search area is peculiar for BM, and it starts
from a large sample spacing with a gradual decrease in specific fragments in the
searching range. The range of search "densifies" nonuniformly here. In this case, it's
impractical to achieve the balance of the binary tree, and this leads to efficiency loss
of search algorithms. Figure 3 shows a binary tree corresponding to the experiment
shown in Figure 4.


                    Fig. 2. Balanced binary tree for 5*5 experiment grid


                                 Fig. 3. Binary tree for BM

As can be seen from the figure binary tree becomes not balanced for irregular grid.
One way to improve the efficiency of the search algorithm is to use more complex
data structures, such as ternary trees (t-trees).


                       Fig. 4. Construction of the balanced binary tree


Information Technology and Nanotechnology (ITNT-2016)                               822
Data Science                                  Kovartsev AN, Popova-Kovartseva DA. et al…


2      Ternary trees
Two ordering relationships ("<" and "> =" or "<=" and ">") are used to build a binary
tree that allow to implement a procedure of binary division of the ordered set. Three
conditions are required for the ternary tree construction which allow to split an or-
dered set into three parts. As such, will be the following conditions "<", "=" and ">".
Points of the experiment will be organized by the following criterion z  x  y , i.e.
by the sum of the experimental points coordinates. In general, points with coordinates
that correspond to the condition of equality, not often encountered; in this case, a
ternary tree is converted into binary. However, there is a fair amount of such points
for regular grids. Points that satisfy the condition of equality are located on the diago-
nals ( x  y  const ) for 5 * 5 grid in figure 5.


                           Fig. 5. Construction of the ternary tree

The result is a t-tree, shown in Figure 6.
The comparison between an average efficiency of search for b- and t-trees for differ-
ent regular grids of partitioning the search space is of some interest.
Let n be a number of nodes of a regular grid for each coordinate, N  n 2 - the total
number of grid nodes, k - the depth of a complete binary tree [10].


                        Fig. 6. Ternary tree for 5*5 experiment grid

A binary tree is "complete" if both subtrees of all nodes are "balanced", i.e. they have
approximately the same number of nodes [10].
The total number of nodes for a "complete" (balanced) b-tree can be calculated from
the formula [7] 2 k  1 .


Information Technology and Nanotechnology (ITNT-2016)                                 823
Data Science                                    Kovartsev AN, Popova-Kovartseva DA. et al…


Suppose that the complete balanced tree is constructed for each regular grid, so that
the first few levels except the last one are completely filled with nodes. Basically, it is
impossible to build completely filled "balanced" trees for n  n grid, because the
number of elements in the grid and in complete binary tree does not match.
For any number of elements (in this case the n  n grid) the "complete" tree depth can
be calculated from the formula:
     log 2 N  , if  log 2 N   log 2 N ,
k 
     log 2 N   1, else.
On the assumption of equally probable binary tree filling with the information in the
grid nodes we calculate the average complexity of the search on the "complete" b-
tree. For a grid with the number of elements N  n 2 , the elements of the "complete"
tree up to the depth k  1 will be completely filled. 2 k elements are located on each k
level of the binary tree. It is easy to show that for equally probable distribution of
nodes of a regular grid in a binary tree the average complexity of search can be calcu-
lated using the following formula:
          1                       k
 B( N )  2 (k n 2  (k  1)  2 ).
          n
Unfortunately, "complete" balanced b-tree is generated only when the inclusion of
grid elements into a binary tree has a certain order. In general, the search process can
form linear structure with greater depth than the "complete" b-tree.
Knuth [11] has shown that, if all possible options of elements inclusion (N!) into the
binary tree are assumed to be equally probable, then the average depth of a binary
search is N and averagely 1.386 log 2 N comparisons are required per search. Figure
7 shows graphs of average search complexity. It can be seen, that the average com-
plexity of the search of the "complete" binary tree is much smaller than in case of
arbitrary filling of b-tree with grid elements.
The ternary tree depth is almost the same as the dimension of the grid, as the tree
depth cannot be less than the number of nodes on the grid diagonal because of the
equality operation. It is much more difficult to calculate the average complexity of the
search T(N) for a ternary tree. Figure 7 shows a graph of behavior T(N) of the number
of grid elements resulted from the computing experiment for quasioptimal t-trees.


         Fig. 7. Comparison of the average complexity of the search on the b and t trees


Information Technology and Nanotechnology (ITNT-2016)                                      824
Data Science                                Kovartsev AN, Popova-Kovartseva DA. et al…


As can be seen from the figure a ternary tree has disadvantages in the description of
the regular grid of the experiment in comparison with optimal binary tree. However, a
margin of free branches in the tree that can be filled with new grid nodes is very im-
portant in the nonlinear search space partitioning method when changing the sample
spacing of the search space. In this regard, the binary tree loses much to t-tree, since
b-tree branches are located at the last level of the tree.
The reservation availability for new experimental points occurs in the t-tree, which do
not increase the depth of search long enough. The latter happens because the opera-
tion of the equality "=" implements the partition of the grid points set into equivalence
classes. All classes are comparable to operations "<" and ">" and form "branches" of
equivalence.
Grid densifying in separate squares leads to the appearance of new branches, that
quickly find an empty place on a tree. The upper levels of binary trees tend to be al-
ready completed by this time and further development is possible only by increasing
the depth of the tree.
It is inappropriate to sort b-tree in order to improve its structure when the operations
of element inserting and searching for an item in the tree are made at the same time.
The fastest sorting algorithms require N log 2 N operations to improve the tree struc-
ture [12]. In this case, the efficiency of the dichotomous search on balanced b-tree
will be lost due to the sort operations.
Thus, ternary tree can be used as the data model of experimental points placement
during software module testing.


3      The efficiency of the search algorithm in ternary tree
       research
The efficiency of the search algorithm in ternary tree depends on the properties of the
test function. We used 6 test functions with different topologies in the computational
experiments and, therefore, different computational complexity in terms of solving
the problem of GO. The functions are different: from relatively simple rational func-
tions, more complex ravine functions, multiextremal functions (including discontinu-
ous functions) to ROOT standard test functions offered on the site [13].
Figure 8 shows the results of computational experiments of the efficiency of the
search algorithm in ternary tree research for the considered test functions. For clarity,
dependency graphs of the average search length of optimal "complete" binary tree and
evaluation of the same values for equally probable way of filling b-tree proposed by
Knuth [11] are shown below.
As can be seen from the figure, in most cases, the efficiency of the search on the t-tree
is comparable with the average complexity of the search at the optimal binary tree.
The latter is achieved due to the large number of ternary tree free connections, located
on the upper levels. In general, the average search length of t-tree has an acceptable
value for large volumes of stored data. In comparison it should be noted that t-tree
was discussed in unsorted form, while b-tree, for the same amount of information was
optimal and balanced. The latter means the high efficiency of the search on t-trees.


Information Technology and Nanotechnology (ITNT-2016)                                825
Data Science                                   Kovartsev AN, Popova-Kovartseva DA. et al…


If we consider the relative average search length (ratio of the average search length to
the total number of elements of the tree), then with increasing of the size of t-tree
search efficiency only increases (see Fig. 9).


                     Fig. 8. The efficiency of the search on the t-trees


                   Fig. 9. The relative efficiency of the search on the t-trees


Conclusion
In this article, we considered the problem of data repository organization and data
search algorithm on ternary trees, designed to serve the global optimization methods
and function tests that require processing of large volumes of information. Using of
ternary trees gives a number of advantages in comparison with traditional binary
trees, as it does not require to perform a sort operation or balance the tree in order to
improve the efficiency of the search on ternary trees, and the average complexity of
the search is comparable to the efficiency of search on the binary optimal balanced
tree.


Information Technology and Nanotechnology (ITNT-2016)                                826
Data Science                                    Kovartsev AN, Popova-Kovartseva DA. et al…


Acknowledgements
This work was supported by the state Ministry of Education and Science of the Rus-
sian Federation as part of the activities of the Program of competitiveness improving
of SSAU among the world's leading research and education centers for 2013-2020.


References
 1. Kovartsev AN, Popova-Kovartseva DA. On efficiency of parallel algorithms for global op-
    timization of functions of several variables. Computer Optics, 2011; 35(2): 256-261. [in
    Russian]
 2. Kovartsev АN. A deterministic evolutionary algorithm for the global optimization of
    morse cluster. Computer Optics, 2014; 39(2): 234-240. [in Russian]
 3. Kovartsev AN, Popova-Kovartseva DA, Abolmasov PV. Efficiency study of global paral-
    lel optimization for multivariable function. Vestnik NNGU, 2013; 3(1): 252-261. [in
    Russian]
 4. Kovartsev АN. An efficient algorithm for testing the truth of assertions for real numbers
    expressed in relational signatures. Computer Optics, 2014; 38(3): 550-554. [in Russian]
 5. Kovartsev AN, Popova-Kovartseva DA, Gorshkova EE. Software testing based on global
    search of several variables functions discontinuity. Proceedings of International Con-
    ference Information Technology and Nanotechnology (ITNT-2015), 2015: 389-396.
 6. Nemirovskij AS, Yudin DB. The complexity of the problems and the effectiveness of op-
    timization methods. Moscow: Nauka, 1979; 384 p. [in Russian]
 7. Makkonel J. Analysis of algorithms. Introductory course. Moscow: Tekhnosfera, 2002;
    304 p. [in Russian]
 8. Adamenko AN, Kuchukov AM. Logic programming and Visual Prolog. St. Petersburg: BHV-
    Peterburg, 2003; 992 p. [in Russian]
 9. Ivanov BN. Discrete mathematics. Algoritms and programs. Moscow: Laboratoriya bazovyh
    znanij, 2001; 288 p. [in Russian]
10. Meyer B, Baldwin K. Programming methods. V. 2 (3). Moscow: Mir, 1982; 386 p.
11. Knuth D. Art of Computer Programming. Sorting and searching. V. 3. Moscow: Mir, 1978; 848 p.
12. Holl P. Computational structure. Introduction to nonnumerical programming. Moscow: Mir, 1978;
    216 p.
13. Global optimization algorithms. Sourse: <http:www.r-tech.narod.ru/gtest/html>


Information Technology and Nanotechnology (ITNT-2016)                                        827