Fractal structure of training of a three-layer neural
                                network
                                Bohdan Melnyk†, Serhiy Sveleba†, Ivan Katerynchuk∗,†, Ivan Kuno† and Volodymyr
                                Franiv†

                                Ivan Franko National University of Lviv, Universytetska St.1, 79000 Lviv, Ukraine


                                                Abstract
                                                The study of the fractal structure for a multilayer neural network is carried out in the work. The
                                                number of neurons in the input and hidden layers corresponded to the size of the input array.
                                                The software product was implemented in the Python programming environment for
                                                recognizing printed numbers. The sample of the input array for printed digits was 5 plus 4
                                                variants of digit distortion with an error of ≈15% for the 3x5 digit array and ≈10% for the 4x7
                                                digit array. The sample of the heterogeneous array was 8, and contained 3 options that did not
                                                correspond to any number. The study of the fractal structure was performed in three modes of
                                                multilayer neural network training, undertraining, satisfactory training, and retraining. It was
                                                established that the appearance of the fractal structure is caused by the retraining of neurons.
                                                Retraining neurons causes to local minima appears on the training error objective function.
                                                This leads to an increase in the error in the formation of the value of the correction function of
                                                the training weights. The process of transition of the neural network from the retraining mode
                                                to the chaotic mode is determined by the process of doubling the number of local minima on the
                                                objective function of the training error. Since the cause of the retraining and chaotic regimes are
                                                the same, the formation of the fractal structure in them is similar. Non-homogeneity of the input
                                                array negatively affects the formation of the fractal structure of the training process.

                                                Keywords
                                                the multilayer neural network, the fractal structure, Adam optimization method 1


                                1. Introduction
                                The process of formation of the objective function of the training error of a neuron is
                                determined by the contribution of each neuron from the previous layer [1]. Such a process
                                of formation of the neural network training error indicates that the objective function of


                                MoMLeT-2024: 6th International Workshop on Modern Machine Learning Technologies, May, 31 - June, 1, 2024,
                                Lviv-Shatsk, Ukraine
                                ∗ Corresponding author.
                                † These authors contributed equally.

                                   bohdan.melnyk@lnu.edu.ua (B. Melnyk); serhiy.sveleba@lnu.edu.ua (S. Sveleba);
                                ivan.katerynchuk@lnu.edu.ua (I. Katerynchuk); ivan.kuno@lnu.edu.ua (I. Kuno);
                                volodymyr.franiv@lnu.edu.ua (V. Franiv)
                                    0000-0001-6399-6317 (B. Melnyk); 0000-0002-0823-910X (S. Sveleba); 0000-0001-8877-8324 (I.
                                Katerynchuk); 0000-0001-6092-7949 (I. Kuno); 0000-0001-9856-1962 (V. Franiv)
                                         © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
the training error should be considered as a set of periodic functions that determine the
existence of neural network training modes. Namely, the mode of undertraining,
satisfactory training, and retraining. It is known [2] that the process of retraining is
associated with the appearance of local minima on the objective error function, and causes
an increase in the value of the training error. The appearance of local minima, according to
the work [2], is described in the first approximation by the logistical function of doubling
their number from the training step. Doubling the number of existing local minima
ultimately leads to the emergence of a chaotic mode of neural network training. In [2], the
process of training a neural network with the dynamics of an incommensurate
superstructure is compared. In particular, in this work it was noted that in the sinusoidal
mode of an incommensurate superstructure is characterized by the absence of harmonics.
An increase in the magnitude of the anisotropic interaction is accompanied by the
appearance of harmonics of an incommensurate superstructure, which leads to the soliton
regime of an incommensurate superstructure. A further increase in the anisotropic
interaction causes the appearance of a block structure characterized by different
periodicity, and the average value of the wave vector of an incommensurate
superstructure for a given ensemble can take an incommensurate value. At the same time,
the formation of a chaotic phase can be traced. A similar picture can be traced in the
process of training a neural network. So, with an increase in the training step, in the mode
of retraining individual neurons, the appearance of local minima can be traced. An
increase in the number of local minima in the first approximation is described by the
process of doubling their number with an increase in the training step. This process is
described by a recurrence function on the form:

                                     хn+1= alpha - xn - xn2

   This is confirmed by the results of the Fourier analysis of the objective function of
training error and the appearance of branching diagrams [2].
   The fractality of an incommensurate superstructure is determined both by the
processes of harmonic occurrence and by the processes of nucleation and annihilation of
solitons. A number of works are devoted to the analysis of fractality of an incommensurate
superstructure. The fractality of the neural network training process may be determined
by the appearance of local minima on the objective function of the training error. That is,
the process of retraining a neural network may be fractal in nature. Thus, the study of the
fractal structure of the training error function in different modes of neural network
training will confirm or identify new mechanisms for the formation of training error.
   The study of fractal structure in stochastic systems is important for understanding and
predicting their dynamics. It can assist in explaining complex structures and interactions
in chaotic systems, as well as in developing effective methods for modeling and controlling
such systems. Therefore, the task of this work is to establish a picture of the neural
network training process, and the features of the formation of training error in the
retraining mode.
2. Methodology
Since the appearance of local minima is described by the process of doubling their number
by the training error function, the reflection of fractality will be carried out with the help
of a logistic function that describes the doubling process, that is, a complex function on the
form:

                                           zn+1= -zn-zn2

    Taking into account the peculiarities of this mapping, namely, that the variable zn is a
complex quantity in which the real part is the training step (alpha), and the imaginary part
is the value of the weight correction function (w). The image of the fractal structure was
performed in the coordinates: the value of the correction of the training weights w, the
training step alpha, is the speed of moving away from the solution of points that are not
included in the solution of this system is represented by different colors on figures.
    The dynamics of the fractal structure was investigated depending on the training
parameters of the neural network, in particular on the number of iterations and the
dimension of the input array and its homogeneity, the training step and the parameters for
optimizing the training process. The study of the fractal structure was performed for a
multilayer neural network. The number of neurons in the input and hidden layer
corresponded to the size of the input array. The program was written in Python and
performed the recognition of printed numbers. The training data for printed digits were 5
representations of the digit plus 4 variants with a distortion of the digit with an error of
15% for the 3x5 figure setting array and >10% for the 4x7 digit setting array. We used the
optimization method of training Adam [3], which is characterized by a monotonous
process of training a neural network. On the basis of this architecture, a study of the fractal
structure was carried out when applying this method of training optimization to the
objective function of the training error. When studying the fractal structure from the
neural network training parameters, in particular from the number of iterations and the
size of the input array and its homogeneity, and the training step, the optimization
parameters for this optimization method were selected according to the works [4] and
were, for β1 = 0.9 and β2 = 0.999.

3. Homogeneous training data
According to the Fourier spectra of the training error function, the retraining process
begins to be traced in the vicinity of the training step alpha = 0.45. Therefore, let's
consider the formation of a fractal structure when changing the alpha training step in the
range of 0.1 - 0.7. Fig.1 shows the view of the fractal structure from the number of
iterations, provided that the optimization method of Adam training is applied with the
optimization parameters β1=0.9 and β2=0.999. Under the condition of 10, 100, 500
iterations of the fractal structure image for the number "0" (Fig.1) in the retraining mode
of the neural network, a complex boundary is demonstrated, which gradually reveals
smaller and smaller recursive details when zoomed in. A set boundary is made up of
smaller versions of the basic form, so the fractal property of self-similarity refers to the
entire set, not just a part of it. With an increase in the number of iterations, there is a
change in the fractal picture. Namely, the appearance of smaller recursive details can be
traced. It is known [5] that with an increase in the number of iterations, the process of
retraining can be traced, which may prompt a change in the picture of fractality when the
number of iterations changes. Similar fractal structures have been obtained for other
printed figures. Thus, the presence of a fractal structure indicates that in the mode of
retraining the neural network, and taking into account the Fourier spectra of the error
function, there is an increase in the number of local minima, and in the first approximation
this process is described by the function of doubling their number. The mechanism of
transition to the chaotic mode of neural network training is described by the process of
doubling the number of local minima.


              a)                             b)                            c)

 Figure 1: Fractal structure of training of a three-layer neural network when
 recognizing printed digit "0" given by a 3x5 array depending on the training step, with
 the number of iterations: a) 10, b) 100, c) 500.

   It is known that for this multilayer neural network, according to Fourier spectra, the
process of retraining begins to be traced in the vicinity of the value of the training step
0.45. With a further increase in the training step, the chaotic training mode of the neural
network begins to manifest itself. In order to identify the manifestation of the neural
network retraining process in the fractal structure, a study of the fractal structure from
the size of the training step was carried out. Fig.2 shows the view of the fractal structure
when changing the step alpha training. Starting from alpha = 0.3, the recurrence system
describes the magnitude of the training error from the training step and demonstrates the
absence of a solution to the system. Under the condition alpha = 0.4, this system has a
single solution, and is characterized by almost no retraining process. That is, this mode of
neural network training demonstrates a satisfactory training process. A further change in
the training step leads to the appearance of two, four, and so on stable solutions, followed
by a transition to the mode described by the doubling process and a transition to a chaotic
training mode. Under these conditions, this mode of training is described by a fractal
structure (alpha=0.4÷0.7). According to the obtained fractal structure in Fig.2, the chaotic
training mode of the neural network is described by the appearance of additional small
details of different levels. Thus, the process of transition from the retraining mode to the
chaotic training mode of the neural network is accompanied by an increase in the number
of local minima, and therefore the appearance of additional small details of the fractal
structure. Similar results were obtained for other figures.


               a)                                b)                             c)


               d)                                 e)                            g)
Figure 2: Fractal structure of training of a three-layer neural network when recognizing
printed digit "0" given by a 3x5 array depending on the training step: a) 0.2, b) 0.3, c) 0.4,
d) 0.5, e) 0.6, g) 0.7, 100 iterations.

   The key quantity that describes a fractal quantitatively is the "fractal dimension".
However, in different sources, this term is understood as different quantities: the
Minkowski dimension, the Hausdorff-Bezikovich dimension, the self-similarity dimension.
The Hausdorff-Bezikovich dimension DH is a measure of the division of an object into
parts of size r, followed by counting the number of N(r) parts covering the object under
study [6]. Fig.3 shows the results of the fractal dimension calculated by the Hausdorff-
Bezikovich method when changing the training step. The dependence of the fractal
dimension on the training step also indicates the emergence of a satisfactory neural
network training mode in the vicinity of alpha = 0.3, and the appearance of the neural
network retraining process at alpha > 0.3. A further increase in the training step leads to
an increase in the fractal dimension, which indicates the transition to a chaotic mode [7] of
neural network training. Similar dependencies were obtained for other figures.
   It is known that an increase in the number of iterations can also lead to overtraining of
the neural network. To confirm this statement, consider the effect of the number of
iterations on the shape of the fractal structure. For this purpose, at the alpha=0.3 training
step, a study of the impact of the number of iterations on the neural network training
process was conducted. The training step was chosen close to the value at which the
process of retraining the neurons of the neural network begins to be traced. That is, the
fractal structure was investigated at the training step alpha = 0.3, provided that the
number of iterations increased. At a given value of the training step of a multilayer neural
network, a certain number of iterations are required to achieve the retraining mode. Fig.4
shows the fractal structure for the digit "0" given by a 3x5 array, provided alpha = 0.3,
when changing the number of iterations and applying the Adam optimization method to
the training error function.


Figure 3: Dependence of the fractal dimension on the training step alpha under the
condition of 100 iterations, for the digit "0" given by a 3x5 array.

   According to Fig.4, under the condition of 10 iterations, the mode of undertraining can
be traced. At 100 iterations, a satisfactory training mode can be traced, with the
emergence of a retraining mode. At 500 iterations and above, the mode of retraining the
neural network with the formation of a fractal structure is clearly manifested. Thus, the
emergence of a fractal structure is due to the process of retraining the neural network.
Similar patterns of fractal structure from the number of iterations were obtained for other
figures. That is, with an increase in the number of iterations, the formation of a fractal
structure can be traced, which indicates that starting with a certain number of iterations,
the process of retraining can be traced, and this process is associated with the process of
the emergence of local minima and doubling of their number.
   Fig.5 shows the dependence of the fractal dimension calculated by the Hausdorff-
Bezikovich method on the number of iterations, the training step alpha = 0.3, for the figure
"0" given by a 3x5 array. According to Fig.5, the dependence of the fractal dimension on
the number of iterations is characterized by a minimum of 100 iterations, and then
increases with an increase in the number of iterations. Taking into account the results
given in Fig.4, in the vicinity of 100 iterations, this system is characterized by a
satisfactory training mode with no retraining mode of the neural network. With an
increase in the number of iterations, there is a retraining of neurons, with the formation of
a fractal structure (Fig. 4), and an increase in the fractal dimension (Fig. 5).


              a)                              b)                            c)


              d)                              e)                            g)
Figure 4: Fractal structure of training of a three-layer neural network when
recognizing printed digit "0" given by a 3x5 array depending on the number of
iterations: a) 10, b) 100, c) 500, d) 1000, e) 5000, g) 10000, the maximum training step
alpha = 0.3.


  Figure 5: Dependence of fractal dimension on the number of iterations, alpha=0.3, for
  the figure "0" given by a 3x5 array.
4. Heterogeneous training data
Let consider the figure of the influence of the parameter β2 on the formation of the fractal
structure. Figure 6 shows pictures of the fractal structure depends the parameter β2,
which characterizes the degree of attenuation of the previous values of the square of the
gradient of the objective function of the training error.
   According to [10], this parameter is decisive in the process of training a neural
network, and its optimal value is equal to 0.999. Fig.6 shows pictures of the fractal
structure when changing the optimization parameter β2 in the range of 0.1÷0.9999. The
obtained fractal structures in Fig.6 do not undergo a significant change when the β 2
parameter changes. Although it should be noted that with an increase in the value of β 2,
the picture of the fractal structure becomes richer in the presence of smaller fragments.
The values of the fractal dimension given in Table 1 also demonstrate the above pattern.
That is, there are no qualitative changes in the vicinity of the value of β2 = 0.999. It is
possible that the tendencies of changes in the training error from the value β2, which were
declared in [10], can be manifested with larger sizes of the input array, or for more
heterogeneous arrays.

Table 1.
The dependence of the fractal dimension on the value of the optimization parameter β2 of
the Adam optimization method with the optimization parameter β1=0.9, provided that the
maximum training step is alpha=0.5, for the digit "0" given by a 3x5 array.
          β2                             Hausdorff dimension
          0.1                            1.99584
          0.9                            1.99548
          0.99                           1.99628
          0.999                          1.99584
          0.9999                         1.99592

   The dynamics of the fractal structure for this input array from the training step (Fig.7)
demonstrates similar dynamics of changes as for the array with inhomogeneity ≈15%
(Fig.2). In the vicinity of alpha=0.3, a homogeneous training process can be traced with
almost no retraining mode of the neural network. A further increase in the training step
causes the emergence of a retraining mode, which subsequently makes the transition to a
chaotic training mode. According to Fourier studies of the spectra of the training error
function [2], already at alpha>0.5, the emergence of a chaotic training mode of the neural
network can be traced. According to Fig.8, the fractal structure does not show changes
when changing the training mode from retraining to chaotic.
                                               b)                            c)
              a)


                                               e)
              d)

Figure 6: Fractal structure of training of a three-layer neural network in recognition of
printed numbers depending on the value of β2: a) 0.1, b) 0.9, c) 0.99, d) 0.999, e) 0.9999,
provided that the maximum training step is alpha = 0.5, 100 iterations, for the digit "0"
given by the array and the heterogeneity of the input array > 15%.


   Although the magnitude of the fractal dimension from the training step, in this range of
changes alpha is characterized by an increase. This indicates an increase in the
randomness of the training mode at alpha> 0.4. It is possible that the transition to a
chaotic mode of training is associated with a change in the smaller fractal structure. To
confirm or refute this assumption, let us consider the influence on the fractal structure of
such training parameters as the number of iterations and the optimization parameter β2.
An increase in the number of iterations at the beginning leads to a transition to the
retraining mode of neural networks (100 iterations), and subsequently to a chaotic
training mode. The transition from the retraining mode to the chaotic mode is
accompanied by an increase in the fine structure of the lower (first and second) level.
               a)                               b)
                                                                               c)


               d)                               e)                             g)
 Figure 7: Fractal structure of three-layer neural network training when recognizing
 printed numbers given by a 3x5 array depending on the training step: a) 0.1, b) 0.2, c)
 0.33, d) 0.5, e) 0.6, g) 0.7, 100 iterations, for the digit "0" and the heterogeneity of the
 input array ≈40%.

   Comparing the obtained fractal structure under the condition of non-homogeneity of
the input array ≈40% with the obtained fractal structure with the structure under the
condition of non-homogeneity of the input array ≈15% (Fig. 4), it can be argued that an
increase in the heterogeneity of the input array leads to a decrease in the number of small
details on the fractal structure.


 Figure 8: Dependence of the fractal dimension on the training step alpha under the
 condition of 100 iterations, for the digit "0" given by a 3x5 array, with non-homogeneity
 of the input array ≈40%.
              a)                             b)                             c)
 Figure 9: Fractal structure of training of a three-layer neural network when recognizing
 printed numbers given by an array of 3x5, depending on the number of iterations: a)10,
 b)100, c)500, the maximum training step alpha = 0.3, for the digit "0" and the
 heterogeneity of the input array ≈40%.


   It is known [8] that the parameter β2 affects the degree of attenuation of the previous
values of the square of the gradient of the objective function of the training error, and is
characterized by the minimum value of the training error at β2 = 0.999. Fig.10 shows
fractal structures at different values of the optimization parameter β2, provided that the
maximum training step is alpha = 0.5, 100 iterations, for the number "0" and the input
array is not homogeneous ≈40%.
   When the parameter β2 is changed in the range of 0.1÷0.99, no changes in the fractal
structure can be observed. At β2 = 0.999, a change in the small details of the fractal
structure begins to be traced. Namely, the spatial areas of their existence are beginning to
increase. A further change in the β2 parameter leads to a more pronounced picture of such
changes. The calculated fractal dimension according to the Hausdorff-Bezikovich method
(Table 2) shows a similar dynamics from the optimization parameter β2. Namely, in the
interval of change of the optimization parameter β2 = 0.1÷0.999, there is a decrease in the
value of the fractal dimension, reaching the smallest value at β2 = 0.999. A further change
in β2 leads to a sharp increase in the fractal dimension.

Table 2.
The dependence of the fractal dimension on the value of the optimization parameter β 2 of
the Adam optimization method with the optimization parameter β1=0.9, provided that the
maximum training step is alpha=0.5, for the digit "0" given by a 3x5 array with the
inhomogeneity of the input array ≈ 40%.
        β2                              Hausdorff dimension
        0.1                             1.99689
        0.9                             1.99670
        0.99                            1.99608
        0.999                           1.99539
        0.9999                          1.99608
                a)                                 b)                             c)


                d)                                 e)

 Figure 10: Fractal structure of training of a three-layer neural network when
 recognizing printed digits given by a 3x5 array, depending on the value of β2: a) 0.1, b)
 0.9, c) 0.99, d) 0.999, e) 0.9999, provided that the maximum training step alpha = 0.5,
 100 iterations, for the number "0" and the heterogeneity of the input array ≈ 40%


    Let's consider the effect of the size of the input array on the fractal structure of the
neural network. Fig.11 shows the fractal structure with a different number of iterations
for the array of setting the figure 4x7, and the heterogeneity of the sample ≈10% (Fig.11,a)
and ≈40% (Fig.11,b), from the change in the training step alpha = 0.01÷0.7, and with the
optimization parameter β2=0.999. An increase in the number of iterations is accompanied
by a slight change in the fractal structure due to its shift along the real axis. The real axis in
our case corresponds to the change in the training step. Therefore, a shift to the region of
higher values of the real part may indicate a redistribution of contributions of a particular
mode of neural network training. As you know, with an increase in the number of
iterations, there is an increase in the role of the retraining mode, and in the future, the role
of the chaotic training mode. An increase in sample heterogeneity (≈40%) does not lead to
a change in the overall picture of the fractal structure (Fig. 11, b). As noted above, an
increase in the heterogeneity of the figure sample leads to a decrease in changes in the
fractal pattern of the structure. Comparing samples with heterogeneity of ≈10% and
≈40%, a similar pattern can be noted.
    a) 10 iterations       a) 100 iterations      a) 500 iterations     a) 1000 iterations


    b) 10 iterations       b) 100 iterations     b) 1000 iterations    b) 5000 iterations
 Figure 11: Fractal structure of training of a three-layer neural network when recognizing
 printed numbers depending on the training step, when representing a digit in a 4x7
 array, with heterogeneity of the input array ≈10% a) and ≈40% b), for the digit "0".


    Taking into account the above-mentioned dependencies of changes in the fractal
structure on the training parameters (number of iterations, training step, optimization
parameter β2), a common pattern can be noted. An increase in the value of the training
parameters causes a shift in the picture of the fractal structure in the interval of higher
values of the actual part of its representation. In our opinion, this indicates that the
retraining mode of the neural network and the chaotic mode are equally involved in the
formation of the fractal structure. This is not surprising, since the reasons for the
occurrence of retraining mode and chaotic mode are the same. If there are differences in
fractal structures that describe the retraining mode and the chaotic mode, they are related
to fine details.

5. Conclusions
So, summarizing the above, it can be noted that the fractal structure of the neural network
training process is due to the retraining of neurons. Neuronal overtraining causes the
appearance of local minima on the target function of the training error. This leads to an
increase in the error in the formation of the value of the correction function of the training
weights. The process of transition of the neural network from the retraining mode to the
chaotic mode is due to the process of doubling the number of local minima on the target
function of the training error. Since the reason for the occurrence of the retraining mode
and the chaotic mode are the same, the retraining mode of the neural network and the
chaotic mode are equally involved in the formation of the fractal structure in the process
of training the neuron. Heterogeneity of the input array has a negative impact on the
formation of the fractal structure of the training process.

References
[1] S. O. Subbotin, Neural networks: theory and practice: teaching. (Ed. O. O. Evenok),
     Zhytomyr, 2020, 184p.
[2] B. Melnyk, S. Sveleba, I. Katerynchuk, I. Kuno and V. Franiv, Multilayer Neural
     Network Training Error when AMSGrad, Adam, AdamMax Methods Used. COLINS-
     2024: 8th International Conference on Computational Linguistics and Intelligent
     Systems, April 12–13 Lviv, Ukraine, 2024, pp. 232-254 URL: https://ceur-ws.org/Vol-
     3664/paper17.pdf
[3] D. P. Kingma, J. B. Adam, A Method for Stochastic Optimization 3rd International
     Conference       for    Learning     Representations,     San     Diego,   2015,    URL:
     https://doi.org/10.48550/arXiv.1412.6980
[4] J. Ma, D. Yarats, Momentum and Adam for deep learning, 7th ICLR: New Orleans, LA,
     USA, 2019, pр.19-21 URL: https://arxiv.org/abs/1810.06801
[5] K. Kawaguchi, Effect of Depth and Width on Local Minima in Deep Learning Neural
     Computation. MIT Press Volume 31 Issue 7, 2019, pp. 1462-1498. URL:
     https://doi.org/10.1162/neco_a_01195
[6] O. V. Kapustyan, V. V. Pichkur, V. V. Sobchuk, Theory of Dynamic Systems, Vezha-Druk,
     Lutsk, Ukraine, 2020, 348 p.
[7] I. Y. Adashevska, O. O. Kraievska, Self-similarity as a characteristic property of fractal.
     Fractal (fractional) dimension of Hausdorff, Scientific achievements of modern
     society: abstr. of 4th Intern. Sci. and Practical Conf., Liverpool, United Kingdom, 4-6
     December          2019,      pp.     603-612.      URL:       http://sci-conf.com.ua/wp-
     content/uploads/2019/12/scientific-achievements-of-modern-society_4-6.12.19.pdf.
[8] Y. Dokkyun , A. Jaehyun and J. Sangmin, An Effective Optimization Method for Machine
     Learning Based on ADAM, Appl. Sci. Volume 10, 2020, pp. 1073-1093, URL:
     https://www.mdpi.com/2076-3417/10/3/1073
[9] X. Zeng, Z. Zhang and D. Wang, AdaMax Online Training for Speech Recognition, CSLT
     TECHNICAL                    REPORT-20150032,                    2016,              URL:
     http://www.cslt.org/mediawiki/images/d/df/Adamax_Online_Training_for_Speech_
     Recog nition.pdf
[10] S. J. Reddi, K. Satyen, & S. Kumar, On the Convergence of Adam and Beyond. 6th ICLR,
     Vancouver, BC, Canada, 2018, р.23-35 URL: https://arxiv.org/abs/1904.09237