Bayesian Networks' Development Based on Noisy-MAX
    Nodes for Modeling Investment Processes in Transport

Volodymyr Lytvynenko1[0000-0002-1536-5542], Nataliіa Savina2[0000-0001-8339-1219], Jan Krejci3
 [0000-0003-4365-5413]
                       , Mariia Voronenko1[0000-0002-5392-5125 ], Maryna Yakobchuk2[ 0000-0003-
                          1587-1370]
                                     , Olena Kryvoruchko2[000-0002-5426-7347]
                 1
                 Kherson National Technical Unіversity, Kherson, Ukraine,
       2
        National University of Water and Environmental Engineering, Rivne, Ukraine,
         3
           Jan Evangelista Purkyne University in Usti nad Labem, Czech Republic,
                  immun56@gmail.com, mary_voronenko@i.ua,
                n.b.savina@nuwm.edu.ua, 2marysa2@ukr.net,
                         o.p.kryvoruchko@nuwm.edu.ua,
                                jan.krejci@ujep.cz


       Abstract. This article focuses on the use of Bayesian networks for analyzing the
       growth relationship of Ukraine's gross domestic product (GDP) from the volume
       of investment in the transport industry and offers a comparative description of
       the use of different structural training algorithms. It is shown that Noisy-max
       nodes as compared to General nodes provide relatively high initial accuracy.
       General nodes require a repeated validation procedure. When using the Hirerical
       sampling method, the accuracy of the network result with General nodes re-
       mains unchanged, and with Noisy-max nodes, it increases (in our case, by
       12.32%). However, Noisy-max nodes entail an increase in time and computa-
       tional costs.

       Keywords: Transport industry; General nodes; Noisy-max nodes; Bayesian
       networks; Structural learning; Parametric learning; Sensitivity analysis; Valida-
       tion.


1      Introduction

Successful implementation of the investment policy will contribute to the implemen-
tation of one of the main country's economy tasks to increasing the number of main
domestic investment resources sources. This will create the necessary prerequisites
for the production growth and expanded reproduction of GDP in order to increase the
population's well-being.
    Therefore, it would be advisable to determine the informative investment indica-
tors that have the greatest impact on the dynamics of Ukraine’s GDP. It is necessary
to develop a model of the relationship between capital investment and GDP.
   In [1], the author defined investment as “the current increase in capital property
values as a result of production activities during a given period,” or as a share of in-
come for a given period that was not used for consumption.
   It is believed that investment is the material basis for the economy's structuring.
Solving the problem of investment will mean the beginning of not only the economy's
restructuring but also its stabilization and subsequent growth [2].
   The term "investment" has several meanings. First, it means the purchase of shares,
bonds with the expectation of certain financial results. Secondly, real assets, for ex-
ample, machinery, equipment necessary for the production and sale of any goods. In
the broadest sense, investments provide the mechanism necessary to finance the
growth and development of the country's economy, region, industry or enterprise.
   Successful implementation of the investment policy will contribute to the imple-
mentation of one of the main tasks of the country's economy to increasing the number
of main sources of domestic investment resources. This will create the necessary pre-
requisites for the production growth and expanded reproduction of GDP to increase
the population well-being.
   An important problem is the identification of causal relationships between causes
or factors affecting GDP. Therefore, it would be advisable to determine informative
investment indicators that have the greatest impact on the dynamics of Ukraine's
GDP. The results presented in the article concern research on the development of
probabilistic deterministic models using Bayesian networks to identify the factors
affecting investment in the transport industry of Ukraine’s GDP.
   Considering that one of the difficulties in the development of Bayesian networks is
an exponential increase in the parameters number in their conditional probability ta-
bles (CPT), this study proposes a technique for using noisy-MAX nodes to simulate
economic processes.
   The purpose of the study is constructing a Bayesian network model using noisy-
MAX nodes for analyzing the dependence of Ukraine's GDP growth on the volume of
investments in the transport industry.


2          Problem Statement

For a set of events X   , i  1,
                                      i
                                                  , N that are related, and a set of learning data
D   d1 ,                   
                  , dn  , di  xi xi
                                  1  2    N
                                             xi      , is given. Here the subscript is the observation
amount, and the upper one is the variable amount, n –is the amount of observations,
each observation consists of N  N  2  variables, and each j -th variable
 j  1,     , N  has A j   0 ,1,        ,    1
                                                    j
                                                             2 conditions. Based on a given
                                                               j


training sample, you need to build an acyclic graph connecting the event sets
 X i , i=1,…,N . In addition, each BN structure g  G is represented by a set N of pre-
              
decessors P  ,
                    1
                                  
                          , P  , that is, for each vertex j  1,
                              N
                                                                            , N , P  it is a variety of
                                                                                   j
                                         
parent vertices, such that P j   X 1 ,              
                                                   , X   \ X   . In this study, the Bayesian
                                                        N       j


network of modeling investment processes in transport and their impact on GDP are
built using noisy-MAX nodes. To do this, we have events X   , i  1, , N that are
                                                                         i


affected by the uncertainties of a different nature. And also we have data describing
these events.


3         Review of the Literature

There is a wide variety of Bayesian networks applications, including design [3], con-
sumer behavior [4], social behavior [4], support for clinical decision making [5, 6],
system biology [6], ecology [7], and so on.
   The use of Bayesian networks in socio-economic research is widely considered in
[8,9], where they are one of the mathematical tools for analyzing social behavior,
since they allow describing, modeling and predicting any empirical data: quantitative,
qualitative, and also data of mixed nature. Bayesian networks make it possible to use
both probabilities obtained by analytical or statistical methods and expert estimates,
as shown in [5, 6, 7].
   The advantage of using Bayesian networks is their resistance to incomplete, inac-
curate and noisy information. In these cases, the result will reflect the most likely
outcome of events [8, 10].
   One of the main problems of Bayesian networks is the rapid increase in parameters
while increasing the number of parents. To solve this problem, the most widely used
are Noisy-MAX [11, 12], since they use multivalued variables. This approach has
proven itself in many real-world applications [13, 14, 15]). A small amount of param-
eters, that will be enough to indicate the entire CPT is a major advantage. This allows
improving the quality of the distributions extracted from the data [16], as well as re-
ducing the spatial and temporal complexity of the algorithms for the BN [8,17].


4       Materials and Methods

A pair <G, В> called a Bayesian network (BN), when the first part of G is a acyclic
directed graph corresponding to random variables. When each variable is autonomous
of its parents in G, so a graph is written as a composition of autonomous conditions.
The second part of the pair, B, is the composition of parameters defining the network.
It composed of parameters Qxi | pa ( X i )  P( xi | pa(Xi )) for each possible xi value from
Xi and pa( X i ) from Pa( X i ) , where Pa( X i ) means the variable Xi parents set in G .
Each variable Xi in graph G is suggested as a vertex. If we consider more than only
one graph, then we use the notation to identify the parents PaG ( X i ) in graph G.
     The BN’s cumulative probability B is determined by the equation
PB ( X 1 ,..., X N )  i 1 PB (Xi | Pa(Xi )) .
                        N
    The BN represents a model for getting probabilistic dependencies, as well as the
absence of these dependencies. At the same time, the A→B relationship can be caus-
al, when event A causes B to occur. So that is, at the time, when there is a mechanism
by which value is accepted by A affects the value adopted by B. When all BN’s con-
nections are causal, so BN is called causal.
    From the existing discretization methods (hierarchical discretization, discretization
on the same width of classes, discretization on the same number of points inside the
classes) for the existing data set a hierarchical discretization method was chosen
[18,19.].
    Structural methods of BN training are algorithms, such as: the Bayesian Search, the
Essential Graph Search, the TAN. In our study, the Greedy Thick Thinning algorithm
is used.
    The essence of the Greedy algorithm for constructing the BN structure is as fol-
lows. The structure learning algorithm, called Greedy Thick Thinning (GTT), is found
on the approach of Bayesian search. GTT was proposed in [6]. The GTT algorithm
begins with the construction of an empty graph and then the stepwise multiple addi-
tions of an arc. This process occurs without creating a cycle.
    Arcs are added to maximize the marginal likelihood P (D | S). This process is re-
peated until the addition of the arc leads to a positive increase. This phase is called
“thickening”.
    Then the arcs are removed step by step until the removal of the arc leads to a posi-
tive increase in P (D | S). This phase is called “thinning”. The algorithm is quite effec-
tive due to the fact that it is exposed to the trap of local maxima. In GTT, you can use
two priorities. The priority of BDeu provides an equal score through the equivalent of
Marcov’s classes. The priority of K2 is constant in all variables and, as a rule, is used
to maximize P(G|D) when directly searching for a space of graphs.
    Validation of the developed network was carried out according to the algorithm of
maximizing the expectation, which was proposed for the first time in 1977 in [20].
The algorithm finds local optimal estimates of the maximum likelihood of parameters.
If the values of all nodes are known, then training (at some step M) would be simple,
since we would have to have all the necessary information.
    Therefore, at stage E, calculations of the expected likelihood value (expectation of
the likelihood) are made, including latent variables, as if we were able to observe
them. In step M, the maximum values of parameters’ likelihood are calculated (max-
imum likelihood estimates) of the parameters using the maximization of the expected
likelihood values obtained in step E. Next, the algorithm again performs step E using
the parameters obtained in step M and so on.
    A whole series of such algorithms was developed, based on the algorithm of max-
imizing the expectation [21-23]. For example, the structural algorithm for maximizing
the mathematical expectation (structural EM algorithm) combines a standard algo-
rithm for maximizing the mathematical expectation to optimize parameters, and an
algorithm for the structural search of a selection model.
    This algorithm builds networks using penalty probabilistic values, which include
values, derived from Bayesian information criteria, the principle of minimum descrip-
tion length, and others.
    Noisy-MAX node is made up of a child node Y, accepting                                                             nY values that may be
tagged from 0 tо nY  1 , and N parents, Pa Y    X1 ,, X N  , representing causes of
Y. If X i =0 , it means the absence of X i . If all the reasons are missing, the result is
also missing, then the Noisy-MAX are determined according to the formula [8]:

                                        
                                   P Y  0 | X i  0i   1                                                                           (1)

If the degrees’ maximum produced by X acted independently, the degree reached by
Y is determined by the formula:


                                        i
                                            
               P Y  y | x    P Y  y | X i  xi , X j  0
                                                                                                     j , j i 
                                                                                                               
                                                                                                                                         (2)

x represents a special combination of the parents of Y, x   x1 ,..., xN  . The probabili-
ties that the result will take a certain value of y, in the case when X i equals a certain
value of xi , provided that all the other reasons for Y are absent , are the parameters
for the link X i  Y :

                              
                   cyxi  P Y  y | X i  xi X j  0
                                                                           j , j i 
                                                                                     
                                                                                                                                         (3)

If Xi takes nX i values, so according to formula 1, the amount of parameters required

                                               
for the link X i  Y will be nX i  1   nY  1 . This model needs a reference to only
one parameter, if all the variables included in Noisy are binary. We can determine the
new parameters using the following formula:

                                                                                     
                                                                                             y
               C yxi  P Y  y | X i  xi , X j  0 , j  i    c yxi ,                                                               (4)
                                                                   j             
                                                                                             y


the formula 2 can be represented as:

                             P Y  y | x1 ,        , xn    C yxi ,                                                                    (5)
                                                                i


CPT can be obtained given the following conditions:

                         P Y  0 | x 
                                                             if y  0
           P  y | x                                                                                                                   (6)
                         P Y  y | x   P Y  y  1 | x  if y  0
                        


5      Experiments and Results

In developing the BN, the Genie 2.3 software environment was used. The initial
nodes type is General, each node has 5 states from s1 to s5.
   The following macroeconomic indicators for the period from 2012 (1st quarter) to
2017 (IV quarter) - 24 points were taken as experimental data for calculating the de-
pendence of Ukraine's GDP growth from the volume of investment in the transport
industry:
  x1 - the volume of investments in land and pipeline transport of actual prices;
  x2 - the volume of investment in water transport;
  x3 - the volume of investment in air transport;
  x4 - the volume of investments in warehousing and auxiliary activities in the field
      of transport;
  x5 - the volume of investments in postal and courier activities.
The variety of available data can be divided into two sets: 16 measurements is train-
ing sample A, 8 measurements is test sample B (Table 1).

         Table 1. Statistical data of capital investments and gross domestic product

           x1            x2           x3           x4            x5            y
         1992,1         35,7        275,3        2820,1         20,4        292894
         4378,5         26,8         176         3642,6         35,4        347842
         2601,8         41,7        133,5        3581,9         168         389213
         3691,8         28,4        195,2        4140,5        172,4        381289
          587,8         6,3          98,6        1779,2           5         303753
         1189,4         37,3        137,6         2042           5,1        354814
         1440,2         31,8        131,6         3039           8,2        398000
         1650,3         21,4        155,3        3929,4        202,1        408631
          590,3         29,3         73,3        1876,8          3,8        316905
         1140,6          41          79,2        2230,4          4,1        382391
          652,3          89          71,4        1820,4          9,1        440476
         1072,6         48,8         96,2        3921,9        105,5        447143
         1805,6         23,1        116,3         904,4          4,9        375991
          874,6         96,2        193,7        1935,6          2,8        456715
         2386,9        111,6        125,6        2083,2          5,3        566997
         2150,9        102,9        223,1        3058,7         72,6        588841
         1952,2         36,6         99,2        1439,6         13,3        455298
         2256,5          38         177,7        2114,6         22,5        535701
         3365,5         53,3        218,4        2646,5         18,7        671456
          7383         106,2        202,3        2527,1         66,5        722912
          3693          50,8        210,2         1454           6,6        591008
           4181,9         55,4        260,4         1979     51,8      664760
           4627,7         56,9        372,2         2852,1   53,7      833130
           9455,6         74,6        340,4         5640,6   285,3     894022

At the first stage of the available in the GeNie2.3 Academic software environment,
the methods of structural learning select the appropriate method. Figure 1 presents the
results of selection.


Fig. 1. Selection of a structured learning method

As a result of the experiment, a Bayesian network consisting of 6 nodes was obtained.
After parametric learning, primary validation was carried out. The model has
achieved 22.9% level of accuracy during the test. After revalidation, the accuracy of
the model was 54.34%.
    In the next step, we changed the type of all nodes to Noisy with five states from s1
to s5, the resulting node Y. The network remains the same, the data file also does not
change. We conduct parametric learning, primary validation and sensitivity analysis.
Comparison of accuracy is presented in the table 2:
         Table 2. Comparison of accuracy after primary validation and revalidation.

                             The initialaccuracy                Accuracy after last
                                                                    validation
                         Overall           Accuracy of      Overall net-     Accuracy
                         network           the             work accuracy     of the
                         accuracy,%        result ,%       ,%                 result,%
 General nodes             22,53              23,00          22,83             54,34
 Noisy-max nodes           25,46              54,34          27,83             54,35

Next, we apply Hirerical discretization method. We use the Greedy algorithm, we
repeat all the steps: structural training, parametric training, validation, sensitivity
analysis and re-validation. the accuracy of the result remained unchanged 54,34%.
Comparison of accuracy after changing the sampling method is given in table 3.
   We apply Hirerical discretization method now to Noisy nodes, then using the
Greedy algorithm, we repeat all the steps: structural learning, parametric learning,
validation, sensitivity analysis and repeated validation. The accuracy of the entire
network decreased slightly - 21.21%, but the accuracy of the result was higher by
12.32% and amounted to 66.66%.

         Table 3. Comparison of accuracy after changing the discretization method.

                                   Discretization method             Discretization method
                                        Weigher                          Hirerical
                           Overall net-       Accuracy of     Overall net-      Accuracy of
                           work accura-       the             work accura-     the
                           cy,%               result ,%       cy ,%             result ,%

    General nodes             22,53              54,34          22,83             54,34
    Noisy-max nodes           25,46              54,35          21,21             66,66


6      Discussion

During the selection of the structural learning algorithm, it was revealed that the
Greedy algorithm turned out to be an adequate method when working with the exist-
ing data set.
   With Noisy-max nodes, the required resulting accuracy is achieved immediately
after the initial validation. This suggests that for a network with this type of nodes
there is no need for sensitivity analysis and re-validation (Table 2).
   When using the Hirerical discretization method, the accuracy of the result with the
General nodes remains unchanged, and with Noisy-max nodes, it increases by 12.32%
(Table 3).
   When using General nodes, the EM execution time during the validation process
was 13 seconds. For Noisy nodes, the EM algorithm spent three times as much com-
puting time (37 seconds). On small data sets with a small BN size, such time costs can
be neglected. However, as the network increases, the time costs (and hence the com-
puting power) will be tangible and this will have to be taken into account.


7      Conclusion

Noisy-max nodes, compared to General nodes, provide relatively high initial accura-
cy. General nodes require a repeated validation procedure. When using the Hirerical
discretization method, the accuracy of the network result with General nodes remains
unchanged, and with Noisy-max nodes, it increases (in our case, by 12.32%). Howev-
er, Noisy-max nodes entail an increase in time and computational costs.
   In future studies, it is planned to use the dynamic Bayesian network approach in
order to trace the levels of key indicators in different time slices.


    References
 1. Keynes, J. M.: The General Theory of Employment. In: The Quarterly Journal of Econom-
    ics (1937)
 2. Zvi, M., Kane, A., Bodie, A.:Essentials of Investments Fourth Edition, Jan 1, (2001)
 3. Andreassen, S., Woldbye, M., Falck B., Andersen, S.K.: MUNIN - A causal probabilistic
    network for interpretation of electromyographic findings. In: Proceedings of the Inter-
    national Joint Conference on Artificial Intelligence, Milan, Italy, 366-372. (1987)
 4. Castillo, E., Gutierrez, J., Hadi, A.: Sensitivity analysis in discrete Bayesian networks.
    IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, Vol.
    27(4), 412–423. (1998)
 5. Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic net-
    works from data, Machine Learning 9, 309-347. (1992)
 6. Cheng, J., Druzdzel, M.: AIS-BN: An Adaptive Importance Sampling Algorithm for Evi-
    dential Reasoning in Large Bayesian networks. In: Journal of Artificial Intelligence Re-
    search (JAIR), Vol. 13, 155-188. (2000)
 7. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM
    algorithm. J. Roy. Stat. Soc. , Vol. 39(B), 1–38. (1977).
 8. Díez, F., Galán, S.: Efficient computation for the noisy MAX, Int. J. Int. Syst., Vol. 18/2,
    165–177. (2004)
 9. Kayaalp, M., Cooper, F.: A Bayesian network Scoring Metric That Is Based on Globally
    Uniform Parameter Priors, 251-258. (2002)
10. Kjærulff, U., van der Gaag, J.: Making sensitivity analysis computationally efﬁcient. In:
    Proceedings of the Sixteenth Conference Uncertainty in Artiﬁcial Intelligence (UAI). San
    Francisco, CA: Morgan Kaufmann Publishers, 317–325. (2000)
11. Henrion, M., Kanal, L., Levitt, T., Lemmer, J.: Some practical issues in constructing belief
    networks. In: Uncertainty in Artificial Intelligence 3, New York: Elsevier, 161-173. (1989)
12. Díez, F.: Parameter adjustment in Bayes networks. The generalized noisy OR-gate. In: Proc.
    9th Conf. Uncertainty Artif. Intell., Washington, DC, 99-105. (1993)
13. Díez, F., Mira, J., Iturralde, E., Zubillaga, S.: DIAVAL, a Bayesian expert system for
    echocardiography. In: Artif. Intell. Med., Vol. 10/1, 59-73. (1997)
14. Pradhan, M., Provan, G., Middleton, B., Henrion, M.: Knowledge engineering for large be-
    lief networks. In: Proc. 10th Annu. Conf. UAI, San Francisco, CA, 484-490. (1994)
15. Shwe, M., Middleton, B., Heckerman, D., Henrion, M., Horvitz, E., Lehmann, H.,
    Cooper, G.: Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR
    knowledge base: I. The probabilistic model and inference algorithms. In: Methods Inf.
    Med.,Vol. 30/ 4, 241-255. (1991)
16. Onisko, A., Druzdzel, M., Wasyluk, H.: Learning Bayesian network parameters from small
    data sets: Application of noisy-OR gates. In: Int. J. Approx. Reason., Vol. 27/2, 165–182.
    (2001)
17. Zhang, N., Poole, N.:Exploiting causal independence in Bayesian network inference. In: J.
    Artif. Intell. Res., Vol. 5/1, 301-328. (1996)
18. Beinlich, N. , Suermondt, N. , Chavez, N. , Cooper, G: The ALARM monitoring system:
    A case study with two probabilistic inference techniques for belief networks, In: Pro-
    ceedings of the Second European Conference on Artificial Intelligence in Medicine,
    London, England, 247-256. (1989)
19. Darwiche, N.: A differential approach to inference in Bayesian networks. In Uncertainty in
    Artiﬁcial Intelligence: Proceedings of the Sixteenth Conference (UAI 2000), San Francis-
    co, CA: Morgan Kaufmann Publishers, 123–132. (2000)
20. Friedman, N.: The Bayesian structural EM algorithm / Fourteenth conference on Uncer-
    tainty in Artificial Intelligence (UAI’98), Madison, Wisconsin, USA, 24–26 July, 1998. –
    SF.: Morgan Kaufmann, 129-138. (1998)
21. Henrion, M.: Some practical issues in constructing belief networks. In: Kanal L.N.,
    Levitt T.S. and Lemmer J.F. (Eds.), Uncertainty in Artificial Intelligence 3 (North-
    Holland, Amsterdam, 161-173. (1989)
22. van der Gaag, L. C., Coup´ e, V. M.: Sensitivity analysis for threshold decision making
    with Bayesian belief networks. In Lamma, E., and Mello, P., eds., AI*IA 99: Advances in
    Artiﬁcial Intelligence, Lecture Notes in Artiﬁcial Intelligence. Berlin: Springer-Verlag. 37
    – 48. (2000)
23. Zhang, Z., Kwok, J., Yeung D.: Surrogate maximization (minimization) algorithms for
    AdaBoost and the logistic regression model. In: Proceedings of the twenty-first interna-
    tional conference on machine learning (ICML), 117 p. (2004)