Application of Genetic Algorithm in Problems of
       Approximation of Complex Multidimensional
      Dependencies and Identification of Parameters of
                     Theoretical Models

        Dmitry Glukhov1[0000-0003-4983-2919], Tatsiana Hlukhava2[0000-0002-3154-3863],
                      Aliaksandr Lukyanau3[0000-0001-6812-2184]

    Polotsk State University, Blokhin str., 29, Novopolotsk, 211440, Republic of Belarus
        1
          d.gluhov@psu.by, 2t.gluhova@psu.by, 3a.lukyanov@psu.by


       Abstract. The article proposes a method for constructing an analytical ap-
       proximation of n-dimensional data, based on the use of a genetic algorithm. A
       feature of the method is that the encoding of the search space is performed in
       the form of a parsing tree for an algebraic expression by the parser of the con-
       text-free grammar of the class LR (1). In addition, during the evolutionary
       process, in addition to the use of structure mutations (subject to their positive
       influence), the stage of mutation of the coefficients is performed, which allows
       avoiding the target function falling into local extremum. And also at each step
       of the evolutionary process, there is a stage for searching for an extremum in
       the space of coefficients and a stage for simplifying the analytical model.


       Keywords: genetic algorithm, approximation, theoretical models


1      Introduction

By the complexity of a multidimensional dependency, we mean the uncertainty re-
garding the functional dependencies present in the data between input variables and
the output value of the function. Such tasks include problems of identifying parame-
ters of analytical models of approximation of multidimensional dependencies, identi-
fying parameters of theoretical probability distributions, identifying parameters of
trend models in forecasting.
   Data approximation in the absence of a priori information about the real model re-
quires the use of universal approximators, such as artificial neural networks, fuzzy
logical approximators. However, the analytical representation of the model is the most
compact and most accurate. Approximation methods based on the identification of
parameters of some theoretical model solve the problem using an immutable model
adopted from any assumptions of reasonableness.


  Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
   We are making an attempt to use the genetic algorithm to find the most accurate
analytical model of the approximator. The genetic algorithm performs not only the
identification of parameters of a theoretical model, but also the search for the model
itself in the space of possible modifications of the structure. The purposeful move-
ment of the population of models to models providing a qualitative approximation is
carried out due to the law of evolutionary selection according to the criterion of
minimizing the standard deviation.


2      Implementation of genetic algorithm

A genetic algorithm (GA) is a heuristic search algorithm used to solve optimization
and simulation problems by sequentially selecting, combining, and varying the de-
sired parameters using mechanisms that resemble biological evolution.
   Previously, we made attempts to solve this problem on multi-agent systems, which
for analytical models in the space of 3 variables showed results superior to results of
artificial neural networks, but on a limited variety of models [1, 2, 3]. The main dis-
advantage of such approximators is a limited set of structure transformations formu-
lated in the form of local heuristics. Heuristics are obvious ways for a human expert
to change the structure. Accordingly, modifications that are not obvious to the expert
become impossible. In our opinion, the use of GA removes this restriction.
   In addition, we have built approximators based on fuzzy logic (FLA). Fuzzy logic
systems proposed by Lotfi Zadeh [8] were developed in works [9, 10]. If we consider
the unknown parameter as continuous, then in this case we can draw a parallel be-
tween the conclusion about the value of the unknown parameter and the approxima-
tion of the function and talk about the property of a fuzzy system to act as a linguistic
approximator. The advantage of such approximators is their speed, and the main dis-
advantage – limited opportunities to improve approximation accuracy.
   The context-free grammar Gexp of the LR(1) class of algebraic expressions pro-
posed by us is presented below in the Beckus-Naur form:
formula:    exp     {ANode* o = (ANode*)$1; created-
Nodes.push_back(o);}
|     formula ';' exp {ANode* o = (ANode*)$3; created-
Nodes.push_back(o);}
;
exp : NUM     {ANodeNUM* o = new ANodeNUM($1); $$ =
(void*) o;}
| '(' exp ')' {$$ = $2;}
| exp '+' exp {ANode* o = (ANode*) new AN-
odeOp((ANode*)$1, (ANode*)$3, PLUS); $$ = (void*) o;}
| exp '-' exp {ANode* o = (ANode*) new AN-
odeOp((ANode*)$1, (ANode*)$3, MINUS); $$ = (void*) o;}
| exp '/' exp {ANode* o = (ANode*) new AN-
odeOp((ANode*)$1, (ANode*)$3, DIV); $$ = (void*) o;}
| exp '*' exp {ANode* o = (ANode*) new AN-
odeOp((ANode*)$1, (ANode*)$3, MUL); $$ = (void*) o;}
| 'p' 'o' 'w' '(' exp ',' exp ')' {ANode* o = (ANode*)
new ANodeOp((ANode*)$5, (ANode*)$7, POW);
           $$ = (void*) o;}
| 's' 'i' 'n' '(' exp ')'   {ANode* o = (ANode*) new ANo-
deFunc((ANode*)$5, SIN); $$ = (void*) o;}
| 'c' 'o' 's' '(' exp ')'   {ANode* o = (ANode*) new ANo-
deFunc((ANode*)$5, COS); $$ = (void*) o;}
| 'l' 'o' 'g' '(' exp ')'   {ANode* o = (ANode*) new ANo-
deFunc((ANode*)$5, LOG); $$ = (void*) o;}
| 'e' 'x' 'p' '(' exp ')'    {ANode* o = (ANode*) new
ANodeFunc((ANode*)$5, EXP); $$ = (void*) o;}
| 't' 'a' 'n' '(' exp ')'   {ANode* o = (ANode*) new ANo-
deFunc((ANode*)$5, TAN); $$ = (void*) o;}
| 'c' 't' 'a' 'n' '(' exp ')' {ANode* o = (ANode*) new
ANodeFunc((ANode*)$6, CTAN); $$ = (void*) o;}
| NAME {ANode* o = (ANode*) new ANode-
NAME(string((char*)$1)); $$ = (void*) o; delete $1;}
;


3      Parsing process

In the process of syntactic parsing, as we can see, the right-side parsing tree is built in
the form of an object-oriented structure. In order to be able to compile an algebraic
expression into a tree of objects nested into each other, we have developed a class
system presented in the class diagram (Figure 1).
   As a result of syntactic parsing the initialization file, an initial population of ana-
lytical n-dimensional models is created. Initial variations of the models are formulated
by an expert in the formal language of the Gexp grammar.
   Objects of the base class ANode and derived classes ANodeNUM, ANodeNAME,
ANodeOp, ANodeFunc, ANodeProxy attribute semantic stack nodes and form a tree
of right-side parsing of an algebraic expression.
   The Genetic class implements the main stages of the GA. Namely:

1. initialization() - initialization of the population with expert-defined algebraic ex-
   pressions for the case of n-variables;
2. calcAllErrors() - calculating of the approximation error for each individual in its
   current state;
3. coefficientMutation() – the process of random cloning of individuals with mutation
   of coefficients to ensure the possibility of exit from local extremes;
4. structureMutation() – the process of cloning with a structure mutation, subject to
   the positive effect of the mutation;
5. fitCoefficiets() – the process of searching for the extremum of the target function
   in the coefficient space. In the implementation of this process, we have chosen the
   gradient descent algorithm, but using coordinate relaxation. This algorithm is not
   fast, but it is reliable enough;
6. simlifyStructure() – simplifies the structure of an expression in specific cases.


Fig. 1. Class diagram of GA software for searching the model of an analytical n-dimensional
approximator

Simplification of the structure leads to the removal and replacement of semantic tree
nodes (the ANodeProxy class is provided for the simplification procedure) and is
provided in the following cases:

 x * 0 | 0 * x  0;
 x0  1; x1  x;
 x * 1 | 1 * x  x;
Simplification is triggered when the algorithm detects the convergence of the coeffi-
cient to 0 or 1.
   The block diagram of the evolutionary search algorithm is shown in Figure 2.


Fig. 2. Block diagram of the evolutionary search algorithm of the developed genetic algorithm

As the objective function we have chosen the standard deviation between the theoreti-
cal and calculated values of the function (SD). In the m-dimensional space of the
coefficients of analytical models, the SD surface may have local extremes due to the
limited data area and the complexity of approximated dependencies. However, the
nature of the surface allows us to suggest the possibility of using the gradient descent
algorithm to search for a local or global extremum. The nature of the SD surface in
the space of two coefficients a, b of a function of one variable is illustrated
in Figure 3.
   In a number of works devoted to this topic, it is proposed to search for coefficients
that minimize SD as a result of mutation of coefficients and to use the Nelder-Mead
algorithm to search for a local extremum. [5, 6, 11]
   Given that we are trying to minimize the standard error of the deviation of some
analytical function from a discrete data set, we conducted a comparative analysis of
the applicability of various numerical optimization methods and provide the user of
the system with a choice of a specific method. Our approach is based on two points:

1. The mutation of coefficients is necessary for an abrupt transition to a new extre-
   mum search point, which allows the algorithm to run several extremum search
   strategies from different areas of the n-dimensional search space, and, therefore, to
   avoid getting into local extremes.
2. Application of the most qualitative methods for finding the extremum depending
   on the specifics of the problem (gradient descent method, simplex method, etc.) for
   identification of coefficients with a given accuracy.


             Fig. 3. The SD surface in the space of coefficients a, b of models:

1)                   , 2)                  , 3)                       , 4)
Table 1. Example of the result of identifying model coefficients using the coordinate descent
method

№          Formula                                          Standard deviation
    Before identification of coefficients
    1      5 * x1 * x1 + 3                                  0.00000
    2      2 * x1 + 3 * x2 + 4                              56.92715
    3      2 * pow(x2, 2) + 2                               154.52993
    4      15 * pow(x1, 2) + 13                             166.70333
    5      2 * x1 * x1 + 3 * x2 * x2 + 2 * x1 * x2          359.15568
        + 3 * x1 + 3 * x2 + 3
    6      11 * pow(x1, 3) + 3                              +INF
    7      pow(x1, x2) * 5                                  +INF
    After identification of coefficients
    1      5 * x1 * x1 + 3                                  0.00000
    2      27.9921 * x1 + -2.02022 * x2 + 3.12347           17.37498
    3      1.8 * pow(x2, 1.5) + 1.579                       57.33454
    4      5.00068 * pow(x1, 1.99993) + 2.99844             0.00100
    5      4.99973 * x1 * x1 -0.48889 * x2 * x2 +
        0.33344 * x1 * x2 -3.33275 * x1 + 4.86652           0.00099
        * x2 + 3.2222
    6      5.00068 * pow(x1, 1.99993) + 2.99844             0.00100
    7      pow(x1, x2) * 3.3e-006                           50.52150


4       Algorithm operation example

In our numerical experiments, the best convergence rate was shown by the coordinate
descent method, in which the step is performed along the coordinate giving the
maximum gain. However, the search and implementation of more efficient optimiza-
tion algorithms remains the subject of further research.
   It is important to note that we have proposed a new method for screening out
equivalent formulas. Since the same formula, due to the commutativity of operations
(+, -, /, *) and the distributivity of the corresponding pairs of operations, as well as the
possibility of arbitrary arrangement of brackets, can be written in many different
ways, we proposed to form numerical hashes of formulas using special codes of vari-
ables, coefficients and functions and a mirror - modified algebra (+ instead of *,-
instead of /, * instead of +, / instead of -).
    An example of the operation of this hash function is shown in the Table 2:

                 Table 2. An example of the operation of this hash function

Formula                                      Hash
2 * log(x1) * x2                             16994
log(x1) * 2 * x2                             16994
x2 * 2 * log(x1)                             16994
log(x1) * x2 * 2                             16994
2 * (log(x1) + x2)                           2.89382e+007
2 * log(x1) + x2 * 2                         2.89722e+007

   An example of the gradient descent method for 7 population individuals with an
accuracy of 0.001 is shown in Table 1. The example also shows that we have devel-
oped a mechanism for processing results in the form of non-numbers and infinities, as
large SD values. This approach allows you to continue searching without the occur-
rence of exceptional overflow situations and divisions by zero.


5       Algorithm application example

As an example of the application of the GA we constructed for solving an applied
problem, we consider the problem of accounting for transients in stationary non-
isothermal models of gas transportation.
   Formulation of the problem: when the compressor station is switched on, gas heat-
ing begins and, despite the fact that the heated gas has not yet spread through the
pipeline, the stationary model describes the state of the pipeline that will occur after
the transition to steady state. Thus, significant sections of the pipeline in the model
are found to have an overvalued gas temperature and, given the strong dependence of
gas density on temperature, with an underestimated value of the gas reserve.
   We calculated the correlation coefficient, which for the considered areas with tem-
perature increase showed a high negative correlation, so for the 4 graphs given for
example in Figure 4, the correlation coefficient was Correl(X, Y) = { -0,84369, -
0,88839, -0,88509, -0,85517}.
Fig. 4. Examples of transient graphs for the main gas pipeline model OJSC «Gazprom transgaz
Belarus»

As the authors of the software package for calculating the gas reserve on the main gas
pipeline of OJSC "Gazprom transgaz Belarus», we attempted to eliminate the influ-
ence of such transients by introducing the inertia of the temperature change at the
compressor output during the time of system transition to the steady state.
   We used GA to find an analytical relationship between the value of the transition
time y and the current value of the gas flow Q, the ground temperature Tg, and the
temperature difference at the output of the compressor station dT. The task of the
approximator is to find such an analytical dependence that has a minimum SD for
describing y(Q, Tg, dT).
         Comparative results of GA operation are given in Table 3.

                       Table 1. Comparative results of GA operation

№    Approximation         Model                                             SD
     method
     GA                    1.66597    * pow(1.7573, 1.1225          *        1.078879
                           log(3.21177 * pow(dT * 4.23434, 1.1878))) /
                           pow(log(0.0858575 * Q), 0.0742341) -
                           4.15915 * dT
1    GA                    52 - 14.48 * pow(dT, 0.08961) / pow(Q, -          1.330954
                           0.1755)
2    GA                    102.607    *    pow(dT,      -0.369501)   /       1.200513
                           log(0.265488 * Q)
3    FLA                   The model is described in our work [10]           1.080616
   The solution found by GA exceeded in accuracy the fuzzy logical approximator
with the selected optimal smoothing settings, while giving the most compact repre-
sentation of the approximator.
          The dynamics of improving the approximation quality can be demonstrated
by the graph of positive mutations shown in Figure 5.
   One of the parameters of the proposed GA is the limit size of the formula, which
allows obtaining a wide variety of compact variants of approximators.
   Developed GA has the possibility of flexible adjustment of limit thresholds of
population size, percentage and death period of unpromising individuals, the probabil-
ity of a period and type of mutations of the structure and coefficients, formulations of
model simplification tools, algorithms of search of extremum. In particular, to solve
the problem of accounting for transients in stationary non-isothermal models of gas
transportation, we are interested not only in time, but also in the nature of temperature
inertia, which requires collecting and processing of additional information.


                   Fig. 5. SD change graph as a result of mutation steps
6      Conclusion

  This article proposes a new method for constructing an analytical approximation of
complex n-dimensional dependencies based on the use of the genetic algorithm.
  The developed GA has the following features:

1. Search space encoding is performed as a parsing tree of an algebraic expression by
   the parser of the context-free grammar of the class LR(1);
2. In the course of the evolutionary process, in addition to the use of structure muta-
   tions (provided their positive effect), the stage of coefficients mutation is per-
   formed, which allows avoiding the target function falling into local extremes;
3. At each step of the evolutionary process there is a stage of searching for an extre-
   mum in the coefficient space and a stage of simplifying the analytical model.

   The developed GA helped us solve the problem of accounting for transients in sta-
tionary non-isothermal gas transportation models. The resulting models were imple-
mented as part of the software package for calculating the gas reserve on the main gas
pipeline of OJSC "Gazprom transgaz Belarus», which reduced the impact of tran-
sients on the gas imbalance.


References
1. Glukhov A.O., Trofimov V.V.: Multi-agent structures for solving the traveling salesman
   problem. Problemy menedzhmenta: sb. nauchn, tr. Vypusk №3; pod obshch. red. prof. O.A.
   Strakhovoy, pp. 71–76. Izd-vo SPbGUEF, Saint-Petersburg (2000).
2. Trofimov V.V., Glukhov A.O.: Approximation on multi-agent structures. Ekonomicheskaya
   kibernetika: sistemnyy analiz v ekonomike i upravlenii: sb. nauchn. tr. Vypusk №1; pod red.
   D.V. Sokolova i N.N. Pogostinskoy, pp. 40–53. Izd-vo SPbGUEF, Saint-Petersburg (2000).
3. Glukhov A.O., Glukhov D.O., Trofimov V.V., Trofimova L.A.: Modified hybrid genetic
   algorithm of discreet optimization problems. 2017 XX IEEE International Conference on
   Soft Computing and Measurements (SCM 2017). Saint-Petersburg (2017).
4. Pozharskiy D. A., Zolotov N.B., Semenov I.E.: Genetic algorithm for finding the approxi-
   mation coefficients of a function in contact problems for a cylinder. Molodoy uchenyy №24
   (158), pp. 122–125 (2017).
5. Kildiushov M.S.: A program for reconstructing approximated algebraic functions from
   several variables from a set of discrete values of a function. Online magazine
   «NAUKOVEDENIE» Vol. 7, №5, (2015) http://naukovedenie.ru/PDF/136TVN515.pdf.
   doi: 10.15862/136TVN515
6. Kildiushov M.S.: The use of genetic algorithms for the restoration of approximated alge-
   braic functions with a certain accuracy. Nauka i biznes: puti razvitiya. – Fond razvitiya
   nauki i kultury (Tambov) № 1 (55), pp. 25–31 (2016). ISSN: 2221-5182
7. Driankov D., Hellendoorn H., Reinfrank M.: An introduction to fuzzy control. Springerver-
   lag (1993).
8. Lotfi A. Zadeh: Fuzzy Sets. Information & Control vol. 8, pp. 338–353 (1965).
9. Glukhov D.О.: Dynamic expert system by fuzzy inference rules to automations an examina-
   tion of complex objects. Budownictwo i Inzynieria Srodowiska. – Zielonogorsk: Politech-
   nika Zielonogorska, pp. 105–109 (1998).
10. Glukhov D.О., Glukhova T.M., Kundas S.P.: Soft calculations for organization of computer
    representation of nomograms on the example of calculating the limit creep coefficient.
    Vestnik Polotskogo gosudarstvennogo universiteta. – Series C: Basic Sciences, №3. pp. 2–6
    (2010).
11. Zvonkov V.B., Popov A.M.: Comparative study of classical optimization methods and ge-
    netic algorithms. Vestnik Sibirskogo gosudarstvennogo aehrokosmicheskogo universiteta
    im. Akademika M.F. Reshetneva, pp. 23–27. Izdatelstvo: Federalnoe gosudarstvennoe
    byudzhetnoe obrazovatelnoe uchrezhdenie vysshego obrazovaniya «Sibirskij gosudarstven-
    nyj universitet nauki i texnologij imeni akademika M.F. Reshetneva», Krasnoyarsk (2013).
    ISSN: 1816-9724