Synthesis of the System of Iterative Dynamic Risk Assessment
of Information Security
Denis Berestov1, Oleg Kurchenko1, Yuri Shcheblanin1, Volodymyr Mishchenko2,
and Nataliia Mazur3
1
  Taras Shevchenko National University of Kyiv, 24 Bohdan Hawrylyshyn str., Kyiv, 04116, Ukraine
2
  State Service of Special Communications and Information Protection of Ukraine, 13 Solomianska str., Kyiv,
03110, Ukraine
3
  Borys Grinchenko Kyiv University, 18/2 Bulvarno-Kudravska str., Kyiv, 04053, Ukraine

                Abstract
                Information on the implementation of the architecture of the system of iterative dynamic risk
                assessment based on neural networks is given. The choice of network structure, types of
                neurons and algorithms for learning neural networks are substantiated.

                Keywords1
                Risk assessment, information security, network, neural network.

1. Dynamic Iterative Risk Assessment as Part of a Continuous Audit System
    To date, there is clearly a significant gap between the development of the technologies used in the
creation of systems, and methods of assessment of protection effectiveness of these systems. The
complexity of information systems is growing rapidly, which inevitably leads to an increase in the
complexity of threat analysis and evaluation of its applied protection methods. Insufficient assessment
of the created systems from the point of view of information security, in its turn, leads to emergence of
new threats or rising probability of realization of old ones. Existing means of automation are poorly
adapted to the development of technologies and do not allow for a comprehensive analysis of the
relationship between the technologies used in terms of information security.In this regard, the issue of
automating processes and tasks that are solved in the process of the analysis and risk management, as
well as increasing the relevance of the results acquire greater importance. When using the classical
approach, the time between the initial analysis of the system and the creation of a report often exceeds
the time of emergence and implementation of threats. In addition, to carry out the procedure of
information security risk analysis in the classical approach, it is necessary to conduct modeling of an
automated system, which in itself is quite a difficult task.
    Therefore, the following main drawbacks of the existing approaches to the assessment of information
security risks can be distinguished: the complexity of work in conditions of obvious incomplete
information about the components of risk and their ambiguous properties; the need to create a model of
information system; the duration of the process and the rapid loss of relevance of the evaluation results;
the complexity of aggregation of data from various sources, including statisticians and expert
assessments; the need to involve individual specialists in risk analysis; subjectivity and ambiguity of
the received estimations; difficulties in using assessments for management tasks and the complexity of
process automation. In this regard, there is a need to obtain a gradually refined risk assessment in the
course of the work of a specialist. By automating the process of accounting for threats associated with
the emergence of new vulnerabilities in standard software and formalizing changes in the business
landscape, you can create an environment that allows the specialist to create reports on the security of
an information system, based on a series of consecutive reports for a short period of time. Processing

CPITS-II-2021: Cybersecurity Providing in Information and Telecommunication Systems, October 26, 2021, Kyiv, Ukraine
EMAIL: berestov@ukr.net (D. Berestov); kurol@ukr.net (O. Kurchenko); sheblanin@ukr.net (Y. Shcheblanin); mischen-ko_w@ukr.net (V.
Mishchenko); n.mazur@kubg.edu.ua (N. Mazur)
ORCID: 0000-0002-3918-2978 (D. Berestov); 0000-0002-3507-2392 (O. Kurchenko); 0000-0002-3231-6750 (Y. Shcheblanin); 0000-0002-
7578-1759 (V. Mishchenko); 0000-0001-7671-8287 (N. Mazur)
             ©️ 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                135
these data by using different methods of statistical forecasting will determine the optimal set of
countermeasures taking into account “future risks” and thus raise the effectiveness of preventive
countermeasures to significantly reduce the response time of the system to new vulnerabilities [1,2].
   Returning to the definition, a continuous audit is an environment that allows the internal or external
auditor to make judgments on significant issues, based on a series of reports created simultaneously or
with a small interval. Accordingly, we define the concept of security risk analysis as an environment
that allows a specialist to assess information security risks based on created - simultaneously or with a
small interval - reports on the operation of the AS, means of protection of information and information
security incidents related to the implementation of threats [3].
   In order to implement a continuous security risk evaluation, it is necessary to create a system of
dynamic iterative information security risk assessment. In this case by the input contour in a general
form parameters of the system will be observed. In accordance with the approaches used in the work to
obtain a risk assessment, it is necessary to assess the a posteriori probability of the realization of threats
on the basis of observational data [4].

2. Architecture of the System of Dynamic Iterative Risk Assessment
    The described approach served as a basis for the development of the architecture of the system of
iterative dynamic risk assessment of information security. It is based on the principles of adaptability,
universality and compatibility. By adaptability we mean the ability of the system to effectively perform
specified functions in a wide range of changing conditions. In relation to the system of iterative dynamic
risk assessment, the implementation of this principle is to use the mechanism of distributed collection,
storage and processing of information as the main architectural solution.
    The application of the principle of universality lies in the possibility of using the system for a wide
range of tasks of information security risk analysis. The approach developed in this way can be used
both in systems of intellectual content filtering and for creation of high-level reports which are used in
the organizations for administrative decision-making.
    The application of the principle of compatibility is that the complex should provide for the possibility
of integration with the available in the automated system means of information protection and with the
mechanisms of dynamic iterative risk analysis in the information security management system. Among
other things, this approach also allows to quickly and efficiently implement new functionality in the
complex, while maintaining a high level of reliability and stability of protective properties. Also the
complex can be easily integrated with different systems and platforms. The diagram of data flows of the
system is presented in Fig. 1.

                                                    CONTROLLER


                                           Modules                             Module of risk
 External network   Array of sensors                          Neuron network                      Database
                                        of processing                           assessment


Figure 1: The diagram of data flows

    The input of the system is sensor data (e.g., intrusion detection systems, anti-virus programs,
firewalls) on potentially dangerous activity, the overall level of network activity and load on a particular
part of the automated system, etc., as well as expert assessments of quantitative in-dicators of
functioning of information security systems. From the array of sensors the data enters the risk
assessment module. There the received data are transformed into a canonical form—for this purpose
their normalization and alignment is carried out. Then, the processed data is fed to the input of the neural
network. The neural network module solves the task of classification of input vectors and estimation of
the a posteriori probability of belonging of input data to output classes. The results obtained during the
probability assessment are included in the risk assessment module, where a direct quantitative
assessment is performed. The obtained results are stored in a database for further use.


                                                        136
   Classic client-server architecture was chosen as the architecture of the software package. In this case,
given the specifics of the task, the agent's approach to data collection was applied. Security agents are
installed at all key points of information exchange. The main functionality of security agents used to
solve this problem is to collect and transmit the necessary data to servers of security management. In
order to ensure the scalability of the architecture, security servers must provide the ability to create a
hierarchy. In this case, the entire hierarchy of secu-rity servers implements the functionality of the risk
assessment module and the function of the neural network. Distributed architecture allows one neural
network to be physically imple-mented by multiple security servers if necessary.


Figure 2: Scheme of interaction of modules of the complex

3. Research on the Application of Neural Networks
    In mathematical statistics, the tasks of classification are also called the tasks of discriminant analysis.
In machine learning the problem of classification is solved, as a rule, with the help of methods of
artificial neural networks when setting up the experiment in the form of supervised training.
Experimental methods of unsupervised training are used to solve another problem—clustering or
taxonomy. In these tasks, the division of objects of the training sample into classes is not set, and it is
necessary to classify objects only on the basis of their similarity with each other. In some applied areas,
as well as in the mathematical statistics itself, due to the proximity of the tasks, clustering tasks are often
no longer distinguished from classification tasks.
    Mathematical problem statement is described as follows. Let X be a set of descriptions of objects, Y
is a set of numbers (or names) of classes. There is an unknown target dependence—reflection у*: Х
→ Y, the values of which are known only on the objects of the final training sample Хm = {(х1,у1),
..., (хm,уm)}. It is necessary to construct an algorithm: Хm = {(х1,у1), ..., (хm,уm)}, capable of
classifying an arbitrary object х ∈ Х.
    Probabilistic statement of the problem is considered to be more general. It is assumed that the set of
pairs “object, class” X × Y is a probability space with an unknown probability measure P. There is a
final training sample of observations Хm = – {(x1,у1), ..., (xm, ym)}, generated according to
probabilistic measure P. It is necessary to construct the algorithm: X → Y capable of classifying an
arbitrary object x ϵ X.


                                                     137
    A feature is a reflection of f: X → Df, where Df is the set of admissible values of the feature. If the
given features are f1,...,fn, then the vector х =(f1(x),…,fn(x)) is called a characteristic description
of the object xϵX. A characteristic description may be identified with the objects themselves. Thus the
set 𝑋 = 𝐷𝑓1 × … × 𝐷𝑓𝑛 is called a space of features.
    Depending on the set of Df features are divided into the following types:
     Binary feature: Df = {0,1}.
     Nominal feature: Df is finite set.
     Ordinal feature: Df is finite ordered set.
     Quantitative feature: Df is the set of real numbers.
    Often there are applied tasks with different types of features. Not all methods are suitable for their
solution.
    The following main types of classification tasks can be distinguished:
     Two-class classification.The most technically simple case, which is the basis for solving more
       complex problems.
     Multi-class classification. When the number of classes reaches many thousands (for example, in
       the recognition of hieroglyphs or merged speech), the task of classification becomes much more
       difficult.
     Non-perennial classes.
     Ordinary classes. An object can belong to several classes at the same time.
     Fuzzy classes. It is necessary to determine the degree of belonging of the object to each of the
       classes, usually a real number from 0 to 1 [5, 6].
    During the information security audit, data on the parameters of the monitored system and expert
assessment of control mechanisms are collected. On the basis of these observations, in fact, a conclusion
is drawn about the affiliation of these input vectors of different classes of “danger” in terms of
information security. In the simplest case, we can consider the following scenario: the collection of
information about the activity in the AS takes place and on the basis of observations a conclusion is
made about the danger of specific events or their safety. Such a task arises, for example, in the
implementation of systems for detecting and preventing intrusions. In a more general case, the number
of risk classes can be chosen arbitrarily. To solve the problem of classification it is necessary to construct
the so-called discriminant function meant for the partition of an n-dimensional spatial vector into the
desired classes. In most practical applications, it is necessary to find not the function itself, but its
approximation.
    The emergence of neural networks can be associated with the article by McCulloch and Pitts [7],
which describes the mathematical model of the neuron and the neural network. It has been shown that
both Boolean functions and finite state machines can be represented by neural networks.
   Later, Rosenblnatt [8] proposed a model, which he called the perceptron, and a learning
algorithm for such a model. It had been shown that perceptrons can solve some problems more
efficiently than computers of traditional architecture. However, a serious mathematical analysis
of perceptrons, conducted by Minski and Paperton [9], later revealed serious limitations in the
applicability of perceptrons. In particular, it was shown that some problems, which in principle
can be solved by a perceptron, might require unrealistically large amount of time for practical
applications or unrealistically large number of neurons [10].
    Later limitations were relaxed by replacing the threshold functions of neuronal activation with
sigmoid. Tsybenko, Funahashi and Hornik independently proved the following fact. Let y be a fixed
sigmoid function, and let f be a continuous function of n variables on compact K ϵ Rn. Then f can be
approximated in the sense of a uniform approximation by a four-layer network (with two hidden layers),
and the activation functions of the first and last layer are linear, and for the intermediate layers are equal
to y.
    Hecht-Nielsen proved the representability of the continuous function of multiple variables by means
of a two-layer neural network with components of the input signal, 2n + 1 components of the first
(hidden) layer with sigmoid activation functions and M components of the second layer with unknown
activation functions


                                                    138
    Thus, in a nonconstructive form, the solvability of the problem of representing a function in a rather
arbitrary form on a neural network was proved. These results have allowed the widespread use of ANN
to solve many applied tasks, including information security risk analysis.

4. Architectures of Neural Network
    New types of neural network architecture are constantly appearing, and they can be confusing. We
have collected for you a kind of crib, which contains most of the existing types of ANN. Although all
of them are presented as unique, the pictures show that many of them are very similar [11].
    The architectures of neural networks can be divided into supervised and unsupervised networks that
are taught with or without teachers. Mixed networks using both learning methods can be singled out. In
the course of the analysis, various architectures of neural networks considered as the core of the system
of dynamic iterative analysis of information security risks, examples of which are shown in Fig. 3.


Figure 3: The example of the architecture of a neural network

   In particular, we consider networks with learning algorithms of supervised and unsupervised
learning, including multilayer perceptrons with direct signal propagation, counter-propagation networks
based on Kohonen and Grossbeng neurons, self-organized Kohonen networks, Hopfield and Hemming
feedback networks, dynamic associative memory networks, networks and algorithms based on adaptive
resonance theory (ART), cognitrons and non- cognitrons. Comparisons of neural network architectures
are given in Table 1.
   To solve the problem of information security risk analysis, it is most appropriate to see the use of a
multi-layered perceptron. In [12] it was shown that a multilayered perceptron learnt by the method of
inverse error propagation can be used to estimate the a posterior probability. This issue, as well as the
question of the optimality of such an assessment will be discussed in detail below. The use of a
multilayer perceptron is the simplest option for using neural networks to solve the problem of
information security risk assessment [13]. It can be shown that in the case of a smooth function with
added Gaussian noise, a neural network with a coding scheme “1 with c” trained to minimize the
standard deviation will approximate the a posteriori probability. However, to solve the problems of
classification for the purpose of risk analysis, it is more relevant to analyze the solution for the problem
of coding “1 with 2” from the distribution.


                                                   139
   Table 1
   Comparison of neural network architectures
  Method of        The rule of    Architecture            Algorithm of learning     Task
  learningн        learningн
  Supervised       Correction     One-layer and           Algorithms of             Classification of images
                   of errorн      multilayer perceptron   perceptron learning       Approximation of
                                                          Reverse propagation       function
                                                          Adaline і Madalinнe
                   Boltzmann      Recurrent               Boltzmann algorithm       Classification of images
                                                          of learning
                   Hebb           Multilayer              Linear discriminant       Data analysis
                                  Direct propagation      analysis                  Classification of images
                   Competition    Competition             Vector quantization       Categorization of the
                                                                                    inner class
                                                                                    Data compression
                                  ART Network             ART Map                   Classification of images н
  Unsupervised     Correction     Multilayer              Projection of Sammon      Categorization of the
                   of errorн      Direct propagation                                inner class Data analysis
                   Hebb           Direct propagation or   Analysis of main          Data analysis
                                  comparison              components                Data compression
                                  Hopfield Network        Learning of associative   Associative memory
                                                          memory
                   Competition    Competition             Vector quantization       Categorization Data
                                                                                    compression
                                  SOM of Kohonen          SOM of Kohonen            Categorization, data
                                                                                    analysis
                                  ART Network             ART1, ART2                Categorization
  Mixed            Correction     Network RBF             Algorithm of learning     Classification of images
                   of error                               RBF                       Approximation of
                                                                                    function
                                                                                    Prediction
                                                                                    Control

   In this case, for the activation of the neuron network it is necessary to apply the sigmoid function of
the activation of the form:
                                                      1
                                           𝑔(𝑎) =           .                                         (1)
                                                   1 + 𝑒 −𝑎
   Then, given that all the values of the probabilities by definition lie in the range [0,1], the learning
error function depends on the learning data, and has the form:
                                     𝑐
                                                𝑦𝑘𝑛                 1 − 𝑦𝑘𝑛
                         𝑦 = − ∑ ∑ {𝑑𝑛𝑘 ln ( 𝑛 ) + (1 − 𝑑𝑛𝑘 )𝑙𝑛             }                         (2)
                                                𝑑𝑘                  1 − 𝑑𝑘𝑛
                                   𝑘=1
where 𝑦𝑘𝑛 is orresponding output values, 𝑑𝑘𝑛 is target values of training.
    A well-known method of inverse error propagation can be used to teach such a multilayer perceptron.
This method of learning a multilayer perceptron was first described by A. Galushkin and—
independently and simultaneously—by P.J. Verbos. It was further substantially developed by D.I.
Rumelhnarth, J. E. Hinton, and R. J. Williams as well as—both independently and simultaneously—by
S.I. Kartsev and V.A. Okhonin. This is an iterative gradient algorithm that is used to minimize the error
of the multilayer perceptron and obtain the desired output.
    The idea of this method is to propagate error signals from the outputs of the network to its inputs, in
the direction inverse to the direct propagation of signals in the normal mode. Bartz and Okhonin
immediately proposed a general method (the “duality principle”) applicable to a broader class of
systems, including systems with a delay, distributed systems, etc. For this method, there is proof of
convergence, but it contains the assumption of an infinitely small step of adjusting the weights of
neurons. In real conditions, the method does not always converge and has a number of drawbacks. P.D.


                                                   140
Wassermann described the adaptive algorithm of step selection, automatically correcting the step size
in the learning process. The essence of the method is to combine the Cauchy machine with the idea of
a gradient descent of reverse propagation that allows to construct a system that finds a global minimum
while maintaining a high rate of reverse propagation. More sophisticated neural network architectures
can also be used to address information security risk analysis. Their application is the direction of further
research.

5. The Accuracy of the a Posteriori Probability Assessment with the Help of
Multi-Layered Perceptron

   Let’s consider the task of classification once more. Let the space 𝑋 ∊ ℌ𝑑 , which consists of objects,
be divided into n groups. We denote the fact that the object x belongs to the j group as wj. Let P (wj) be
the probability of a randomly selected object belonging to the group j.
   f(x|wj) is conditional density function of the probability that x will be a member of the group j of
conditional probability
                                                   𝑓(𝑥, 𝑤𝑗 )
                                        𝑃(𝑤𝑗 |𝑥) =          ,                                        (3)
                                                     𝑓(𝑥)
                                    𝑓(𝑥, 𝑤𝑗 ) = 𝑓(𝑥, 𝑤𝑗 )𝑃(𝑤𝑗 ),                                        (4)
                                                  𝑚

                                          𝑓(𝑥) = ∑ 𝑓(𝑥, 𝑤𝑗 ).                                           (5)
                                                 𝑗=1
   Let us consider ∈ ℌ𝑑 and ∈ ℌ𝑚 as random variables, where fx,y(x,y) is function of joint probability
distribution. For these conditions, we consider the problem of determining the reflection F: ℌ𝑑 → ℌ𝑚 ,
such as:
                                                  𝑚
                                                                   2
                                 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐸 [∑(𝑦𝑖 − 𝐹𝑖 (𝑥)) ],                                          (6)
                                                 𝑖=1
where Е is this is the expected value defined for the co-distribution function, yi and Fi(x) is these are
also components of Fi(x), respectively. It is known that there is a solution to this problem:
                                            𝐹(𝑥) = 𝐸[𝑦|𝑥].                                              (7)
   In the task of classification as х let us consider the same object and as у is variable class affiliation.
For example, in this way:
   уі = 1, if the object х belongs to to the class I and yi = 0 in other case, then
       𝐹𝑖 (𝑥) = 𝐸[𝑦𝑖 |𝑥] = 1 ∙ 𝑃(𝑦𝑖 = 1|𝑥) + 0 ∙ 𝑃(𝑦𝑖 = 0|𝑥) = 𝑃(𝑦𝑖 = 1|𝑥) = 𝑃(𝑤𝑖 |𝑥).                  (8)
   So F(x) is there is nothing but an a posteriori probability
   Let
                      𝑚                                 𝑚
                                      2                                2
             𝑄 = 𝐸 [∑(𝑦𝑖 − 𝐹𝑖 (𝑥)) ] = ∫          ∫ [∑(𝑦𝑖 − 𝐹𝑖 (𝑥)) ]𝑓𝑥,𝑦 (𝑥, 𝑦)𝑑𝑥𝑑𝑦.                   (9)
                     𝑖=1                       𝑥∈𝑋 𝑦 𝑖=1
   Using the dichotomous character of the variable у, joint distribution function fXly(x,y) can be
expressed as f(х,w), where w is class affiliation vector.
   In particular,
                                                   (𝑥, 𝑤1 )
                                                   (𝑥, 𝑤2 )
                                       (𝑥, 𝑦) = [           ].                               (10)
                                                      …
                                                   (𝑥, 𝑤𝑚 )
   And the value Q, respectively, can be expressed as
                                     𝑚     𝑚
                                                             2
                           𝑄=∫      ∑ [∑(𝑦𝑖 − 𝐹𝑖 (𝑥)) ] 𝑓(𝑥, 𝑤𝑗 )𝑑𝑥 =                                  (11)
                                𝑥∈𝑋 𝑗=1 𝑖=1


                                                       141
                                𝑚           𝑚
                                                                 2
                             = ∑∫         [∑(𝑦𝑖 − 𝐹𝑖 (𝑥)) ] 𝑓(𝑥, 𝑤𝑗 )𝑑𝑥 =
                                𝑗=1 𝑥∈𝑋 𝑖=1
                            𝑚
                                                       2
                      = ∑∫             [(𝑦𝑖 − 𝐹𝑖 (𝑥)) + ∑ 𝐹𝑖2 (𝑥)] 𝑓(𝑥, 𝑤𝑗 )𝑑𝑥 .
                         𝑗=1 𝑥∈𝑋                            𝑖≠𝑗
   That, after transformations give us
                                  𝑚

                    𝑄=∫          ∑[𝐹𝑖2 (𝑥) − (2𝐹𝑖 (𝑥) − 1)𝑃(𝑤𝑗 |𝑥)]𝑓(𝑥)𝑑𝑥 =
                             𝑥∈𝑋 𝑖=1
                        𝑚                                                                             (12)
               =∫      ∑[(𝐹𝑖 (𝑥) − 𝑃(𝑤𝑖 |𝑥))2 + 𝑃(𝑤𝑖 |𝑥)(1 − 𝑃(𝑤𝑖 |𝑥))]𝑓(𝑥)𝑑𝑥
                   𝑥∈𝑋 𝑖=1
   Let’s define
                                          𝑚

                            𝜎𝐴2 = ∫ ∑ 𝑃(𝑤𝑖 |𝑥)(1 − 𝑃(𝑤𝑖 |𝑥)𝑓(𝑥)𝑑𝑥 ,                                   (13)
                                   𝑥∈𝑋 𝑖=1
                                         𝑚

                             𝜎𝜀2 = ∫       ∑(𝐹𝑖 (𝑥) − 𝑃(𝑤𝑖 |𝑥))2 𝑓(𝑥)𝑑𝑥 .                             (14)
                                       𝑥∈𝑋 𝑖=1
    Then 𝑄 = 𝜎𝐴2 + 𝜎𝜀2 , 𝜎𝐴 can be interpreted as approximation error, and 𝜎𝜀 is as the error of evaluation.
    Now we consider a neural network with a direct propagation of the signal and the receiving object x
at the input. We denote the connection of the neuron і and the neuron j as (i,j), and the weight of
connection as wij. The output of the neuron we denote as 𝑎𝑖 = 𝐹𝑖 (𝑥𝑖 ), where 𝐹𝑖 is activation function of
the neuron w.
    This network can be considered as a reflection F:ℌ𝑑 → ℌ𝑚 , if d is dimension of the input vector, and
m is the dimension of the output vector. The weights of the neural network are determined by the
learning outcomes. Neural network learning can be seen as problematic error minimization, or
                                                       𝐿
                                            1                2
                                    𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∑ ∑ (𝑦𝑖𝑙 − 𝑎𝑖𝑙 ) ,                                       (15)
                                            𝐿
                                                     𝑖=1 𝑖∈𝑁𝑜
where L is the number of learning vectors, and 𝑦𝑖𝑙 is the target value (output value, according to the
training element sample) in case of problematic classification 𝑦𝑖𝑙 = 1, if the object belongs to the class
аnd 𝑦𝑖𝑙 = 0 if not.
    A comparison of this expression with expression (8) shows that the problem of learning a neural
network with direct signal propagation (multilayer perceptron) is the same least squares problem, where
the expected value E is calculated on the learning set. Accordingly, of interest is the question of how
accurate the estimate obtained by the neural network is.
    We denote as lm m the dimensional single hypercube [0,1]𝑚 and as С(lm) is the value range of the
function on lm.
    The following theorem was proved by Tsybenko[14]:
    Theorem. Let 𝜑(у) is a limited monotonically increasing real function other than a constant. Then
for any function 𝑓 ∈ С(lm) and for any 𝜀 > 0 there exists integer N and the corresponding set of constants
ai, bi ∈ R, wi ∈ Rm, where i = l,...,N, such that it is possible to determine the approximating function
                                                 𝑁

                                       𝐹(𝑥) = ∑ 𝛼𝑖 𝜑(𝑤𝑖𝑇 𝑥 + 𝑏𝑖 )                                     (16)
                                                 𝑖=1
with property: |𝐹(𝑥) − 𝑓(𝑥)| < 𝜀 for all 𝑥 𝜖 𝑙𝑚 .
    Thus, the neural network, being a superposition of sigmoid functions can be considered as a universal
approximator. It also follows from the above theorem that any function can be represented by an
artificial neural network with one hidden layer. It can also be proved that any neural network with the
number of hidden layers more than one, can be represented in the form of a neural network with one
hidden layer


                                                           142
   To fully describe the neural network, it is necessary to determine the number of neurons in the hidden
layer. Determining the amount of neurons in the hidden layer in the general case is not an obvious task.
The accuracy of approximation increases with the number of neurons in the latent layer. At H neurons
the error of approximation is estimated as O(1/H). However, an increase in the number of neurons in
the hidden layer leads to the so-called retraining of the neural network [14]. Thus, the quality of
information processing is significantly affected by the structure of the neural network. Insufficient or
excessively large number of neurons in the hidden layer reduces the efficiency of information
processing.
   In solving practical problems, they usually resort to the empirical choice of network parameters. In
the event that the conditions of the network are unknown, it is necessary to use the algorithm that ensures
the selection of optimal parameters
   In the works [14] for multilayer neural networks with serial connections and RBF networks,
algorithms for structural-parametric optimization were obtained. These algorithms are based on cross-
checking procedures, which represent one of the types of re-sampling, and are practically independent
of a priori assumptions. Consistent use of a priori information in the training of neural networks allows
Bayesian methodology, which represents a single conceptual basis for the formation of objective
functions of both parametric and structural optimization. The development of constructive irreversible
algorithms for finding the optimal structure of neural networks remains an urgent task.

6. Algorithm of Dynamic Iterative Risk Assessment of Information Security
    We give a brief description of the logic of the algorithm for dynamic iterative risk assessment of
information security using the proposed approach. The following is a description of the algorithm in the
form of block diagrams. In Fig. 4 the basic block diagram of the algorithm of the system of iterative
dynamic assessment of information security risks is given. When the algorithm is initialized, it is
checked whether the neural network is trained or artificial. If the network is not trained, then it would
be trained in accordance with the chosen algorithm. If the network is properly trained, then the input
and preparation of data from sensors and data provided by technical experts are carried out.
    At the next stage of the work of algorithm, the calculations of the a posteriori values of the
probabilities of the realization of threats—that is, the output values of the network—are performed.
Then the current data on the value of assets exposed to information security risks is obtained. At the
final stage of the algorithm the calculation of the final risk assessment of information security is made.
Then, the operation of the algorithm continues after a certain period of time, if no command is received
to complete it.
    Next, we consider the block diagrams of algorithms of network training and calculating its output
values. Before this let’s look at a three-layered neural network with weights Wij, pictured in Fig. 4.


                                                   143
                                       Start


     Network                      Is the network
                     No
     training                         trained?


                                       Yes


                                  Obtaining data
                                   from sensors


                             Preparation of input data


                                Calculating output
                                 network values


                                                               No
                                  Obtaining data
                                  on the current
                                  price of assets


                            Obtaining final evaluation


                               Is the work finished?


                                       Yes


                                       End

Figure 4: Block scheme of algorithm of itererative dynamic risk assessment

   Let’s denote as Oi the output of the ith node. The purpose of the training is such adjustment of weights
that at the output of the system a given input vector was obtained. It is assumed that for each input vector
there is a corresponding output vector. Such a pair of vectors is called a training pair. A set of such
pairs is required for training.


                                                         144
Figure 5: Scheme of multi-layered percetron (training)

   Before the start of training, small initial values selected at random should be assigned to all weights.
This choice ensures that the network will not be saturated with large values of weights and a number of
other cases that can lead to errors.
   Here is an algorithm for learning the reverse propagation network:
   Step 1. Assign small random values of the weights.
   Step 2. Choose a random pair of vectors from the learning set.
   Step 3. Calculate the output of the neural network.
   Step 4. Compare the initial output vector with the output training vector. Determine the value of the
error. If this value is not acceptable, adjust the values of the weights and go to Step 3.
   Step 5. Choose a new random pair of vectors not used in previous iterations. If there is such a pair,
turn to Step 2.
   Step 6. The end.
   The block diagram of the algorithm for learning the reverse propagation network is presented in
Fig. 6.
   Adjustment of weights is carried out starting from the last layer. For the last level node:
                                    𝛿𝑗 = −𝑜𝑗 (1 − 𝑜𝑗 )(𝑡𝑗 − 𝑜𝑗 )                                     (17)


                                                   145
                                          Start


                                Assigned by the weights
                                  of the initial values


                               Choice of a pair of vectors
                                   from training set


                           Calculation of output of neuron
                                       network


                                 Comparison of received
                               and learning output vectors


                                      Is the error                 Yes
                 No
                                      acceptable?


                                                                 Are there any
          Setting of weights                 No                                    Yes
                                                               learning vectors?


                                             End

Figure 6: Block diagram of a learning algorithm of a reverse propagation network

   For internal network node
                                  𝛿𝑗 = −𝑜𝑗 (1 − 𝑜𝑗 )          ∑          𝛿𝑘 𝑤𝑗,𝑘                   (18)
                                                          𝑘∈𝑂𝑢𝑡𝑝𝑢𝑡𝑠(𝑗)
where 0 < 𝜂 < 1 is a multiplier that specifies the speed of “movement.” For each edge of the network
{і,j}
                                     ∆𝑤𝑖,𝑗 = 𝛼∆𝑤𝑖,𝑘 + (1 − 𝛼)𝜂𝛿𝑗 𝑜𝑗 ,                              (19)

                                             𝑤𝑖,𝑗 = 𝑤𝑖,𝑗 + ∆𝑤𝑖,𝑗 .                                 (20)
   Such adjustment is carried out for all layers with weights of connections.

7. Computational Complexity of the Algorithm of Dynamic Iterative Risk
Assessment
   Execution of the algorithm described above is necessary for all threats. Final complexity of the whole
algorithm depends on the number of threats linearly, since for each threat a separate neural network is
formed that does not depend on the type of threat. The calculating complexity of the algorithm for one
threat is dependent on the complexity of the training of the neural network and a number of neurons in


                                                             146
it and does not depend on the input data. In the event that the network is trained, the calculating
complexity is constant and depends linearly on the volume of input data.
    Training of the network depends only on the volume of learning vectors and convergence of
algorithm for the final time. That is, in the case of convergence of the algorithm the complexity of
obtaining an assessment and complexity of learning linearly is dependent on the volume of input data.
The calculating complexity T has the form:
                                            𝑇 = 𝑂(𝑛).                                                (21)
where n is the number of input vectors.
   The convergence of the algorithm of reverse error propagation can be proved only for a finitely small
step of change of weights. This leads to an endlessly large time of the work of algorithm. A method of
adaptive choice of the step was proposed by P. D. Vasserman, which solves this problem and ensures
the convergence of the algorithm to the global minimum for a finite number of steps. In addition, you
can artificially set the number of iterations, which limits the time of the learning process.

4. Conclussions

   The architecture of the system of iterative dynamic risk assessment with the use of Bayesian
approach based on neural networks is described in detail. The choice of network structure, types
of neurons and learning algorithms are substantiated.
   Neural networks have the following properties:
    Allow directly to receive an assessment of the a posteriori value of probability.
    Can learn in the process of system operation.
    Ensure the automation of the process of accounting for threats and their assessment.
    Allow to provide aggregation of quantitative and qualitative data as well as automation
      of the process of obtaining assessments.
    Provide formalization of changes in the structure of the AS.
    Ensure the creation of assessments, which are gradually refined in the course of work
      with the system.

5. References
    [1] Utkin L.V. http://www.levvu.narod.ru/Papers/Bayes.pdf
    [2] Dreyfus, Neural networks: methodology and applications, Birkhauser, 2015.
    [3] Berestov, D., et al., Analysis of features and prospects of application of dynamic iterative
assessment of information security risks, in: Workshop on Cybersecurity Providing in Information and
Telecommunication Systems (CPITS), vol. 2923, 329–335, 2021.
    [4] Shevchenko, H., et al., Information security risk analysis SWOT, in: Workshop on Cybersecurity
Providing in Information and Telecommunication Systems, CPITS, vol. 2923, 309–317, 2021.
    [5] Lifshits Y. Automatic text classification (slides). Algorithms for the Internet.
    [6] Merkov A. B. Basic methods used for handwriting recognition, 2015.
    [7] McCulloch W., Pitts W. Logical calculus of ideas related to nervous activity. Avtomaty. Izd.
foreign lit., 362–384, 1956.
    [8] Rosenblatt F. Principles of neurodynamics. Perceptron and the theory of brain mechanisms. Mir,
1965.
    [9] Minsky M., Papert S. Perceptrons, Mir, 1971.
    [10] V. Buriachok, V. Sokolov, P. Skladannyi, Security rating metrics for distributed wireless systems,
in: Workshop of the 8th International Conference on "Mathematics. Information Technologies. Education":
Modern Machine Learning Technologies and Data Science (MoMLeT and DS), vol. 2386, 222–233, 2019.
    [11] Barsky A.B. Neural networks: recognition, control, decision making. Finance and statistics, 2014.
176 p.


                                                   147
   [12] GOST 19.701-90. Schemes of algorithms, programs, data and systems. Conventions and
execution rules.
   [13] Yakhyaeva G.E. Fuzzy sets, neural networks: textbook, 2nd ed., rev. Internet University of
Information Technologies; Binomial. Knowledge Lab, 2010.
   [14] Milov V.R. Training of neural RBF-networks based on structural-parametric optimization
procedures. Neurocomputers: development and application, 5, 29–33, 2003.


                                               148