<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fractal Distribution of Medical Data in Neural Network</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Lviv 79013</addr-line>
          ,
          <institution>Ukraine Julius-Maximilians-University Würzburg</institution>
          ,
          <addr-line>Am Hubland, D-97074 Würzburg, German</addr-line>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Nowadays the topic of deep learning is becoming more and more popular. Moreover, almost every organization want to have at least one specialist in this area, because artificial intelligence can help your medicine to grow and to increase its productivity. Research of one of the types of neural network - fractal neural network. Testing and comparing with other neural networks. We will take one dataset and test it on our neural networks and then compare the results. Trained and tested neural networks with graphs and comparisons of their output. In the current paper we implemented custom neural network and fractal neural network. Then we trained and tested them on CIFAR-10 dataset. Custom neural network showed us worse results, but each iteration took up to 10 seconds, when 1 iteration of fractal neural network took up to 3 minutes. Moreover, our network is pretty simple, so we can say that that is suits better for datasets with lower quantity of classes. Fractal neural network showed us pretty good results, but I am sure that with more powerful computing resources and more time it can perform much better.</p>
      </abstract>
      <kwd-group>
        <kwd>neural networks</kwd>
        <kwd>model</kwd>
        <kwd>medical data</kwd>
        <kwd>keras</kwd>
        <kwd>train</kwd>
        <kwd>dataset</kwd>
        <kwd>accuracy</kwd>
        <kwd>loss</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the current paper we want to make a research about one part of deep learning –
fractal neural networks. A neural network is a network or circuit of neurons, or in a
modern sense, an artificial neural network, composed of artificial neurons or nodes.
Thus, a neural network is either a biological neural network, made up of real
biological neurons, or an artificial neural network, for solving artificial intelligence (AI)
problems. The connections of the biological neuron are modeled as weights. A
positive weight reflects an excitatory connection, while negative values mean inhibitory
connections. All inputs are modified by a weight and summed. This activity is
referred as a linear combination. Finally, an activation function controls the amplitude
of the output. For example, an acceptable range of output is usually between 0 and 1,
or it could be −1 and 1. There are many types of neural networks, and residual neural
network is one of them [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ].
      </p>
      <p>A residual neural network (ResNet) is an artificial neural network (ANN) of a
kind that builds on constructs known from pyramidal cells in the cerebral cortex.
Residual neural networks do this by utilizing skip connections, or short-cuts to jump
over some layers. Typical ResNet models are implemented with double- or
triplelayer skips that contain nonlinearities (ReLu) and batch normalization in between. An
additional weight matrix may be used to learn the skip weights; these models are
known as HighwayNets. Models with several parallel skips are referred to as
DenseNets. In the context of residual neural networks, a non-residual network may be
described as a plain network.</p>
      <p>
        One motivation for skipping over layers is to avoid the problem of vanishing
gradients, by reusing activations from a previous layer until the adjacent layer learns its
weights. During training, the weights adapt to mute the upstream layer, and amplify
the previously-skipped layer. In the simplest case, only the weights for the adjacent
layer's connection are adapted, with no explicit weights for the upstream layer. This
works best when a single non-linear layer is stepped over, or when the intermediate
layers are all linear. If not, then an explicit weight matrix should be learned for the
skipped connection (a HighwayNet should be used) [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref6">1-4, 6</xref>
        ].
      </p>
      <p>
        Fractal neural network uses non-residual network approach. Macro-architecture of
fractal neural networks is based on self-similarity. Repeated application of a simple
expansion rule generates deep networks whose structural layouts are precisely
truncated fractals. These networks contain interacting subpath of different lengths, but do
not include any pass-through or residual connections; every internal signal is
transformed by a filter and nonlinearity before being seen by subsequent layers. The key
may be the ability to transition, during training, from effectively shallow to deep.
Additionally, fractal networks exhibit an anytime property: shallow subnetworks
provide a quick answer, while deeper subnetworks, with higher latency, provide a more
accurate answer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Review of the Literature</title>
      <p>
        Fractal neural networks are relatively new, that is why there are only a few articles on
this theme. Frankly speaking, there is only one brief and complex paper about Fractal
neural networks. It was published at ICLR 2017 as a conference paper by Gustav
Larsson, Michael Maire and Gregory Shakhanaovich [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Their paper is called
“FractalNet: Ultra-Deep Neural Networks without Residuals”. They briefly describe
fractal neural networks and how do they work. Also, they compare the results of this
network with more than 20 other networks on about 10 different datasets. They
published code for FractalNet implementation which weare going to update and use in
current paper. So, their paper is very useful, full of important information. They have
very powerful computing resources, which helps them to train and test networks on a
different data for a long time.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Materials and Methods</title>
      <p>In order to implement and run our networks we will use Python 3 and Google
Collaboratory as our working environment.</p>
      <p>
        Colaboratory is a free Jupyter notebook environment that requires no setup and
runs entirely in the cloud. With Colaboratory you can write and execute code, save
and share your analyses, and access powerful computing resources, all for free from
your browser. Also,it provides good GPU in order to operate our networks [
        <xref ref-type="bibr" rid="ref12 ref4">4, 12</xref>
        ].
      </p>
      <p>For training and testing we pick CIFAR10 dataset from Keras.</p>
      <p>
        Keras is a high-level neural networks API, written in Python and capable of
running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on
enabling fast experimentation. Being able to go from idea to result with the least possible
delay is key to doing good research [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ].
      </p>
      <p>
        The CIFAR-10 dataset is a collection of images that are commonly used to train
machine learning and computer vision algorithms. It is one of the most widely used
datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32
color images in 10 different classes. The 10 different classes represent airplanes, cars,
birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each
class [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Computer algorithms for recognizing objects in photos often learn by example.
CIFAR-10 is a set of images that can be used to teach computer how to recognize
objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can
allow researchers to quickly try different algorithms to see what works. Various kinds
of convolutional neural networks tend to be the best at recognizing the images in
CIFAR-10.</p>
      <p>In order to implement our Sequential model we will use the following layers and
functions:</p>
      <p>
        1) ReLU stands for rectified linear unit, and is a type of activation function.
Mathematically, it is defined as y = max(0, x). ReLU is linear (identity) for all positive
values, and zero for all negative values. This means that [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]:
      </p>
      <p>It’s cheap to compute as there is no complicated math. The model can therefore
take less time to train or run.</p>
      <p>It converges faster. Linearity means that the slope doesn’t plateau, or “saturate,”
when x gets large. It doesn’t have the vanishing gradient problem suffered by other
activation functions like sigmoid or tanh.</p>
      <p>It’s sparsely activated. Since ReLU is zero for all negative inputs, it’s likely for
any given unit to not activate at all.</p>
      <p>
        2) Softmax is a function that takes as input a vector of K real numbers, and
normalizes it into a probability distribution consisting of K probabilities. That is, prior to
applying softmax, some vector components could be negative, or greater than one;
and might not sum to 1, but after applying softmax, each component will be in
interval(0,1),and the components will add up to 1, so that they can be interpreted as
probabilities. Softmax is often used in neural networks, to map the non-normalized output
of a network to a probability distribution over predicted output classes[
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ].
      </p>
      <p>3) Dropout is a regularization technique for neural network models. Dropout is a
technique where randomly selected neurons are ignored during training. They are
“dropped-out” randomly. This means that their contribution to the activation of
downstream neurons is temporally removed on the forward pass and any weight updates are
not applied to the neuron on the backward pass.</p>
      <p>
        As a neural network learns, neuron weights settle into their context within the
network. Weights of neurons are tuned for specific features providing some
specialization. Neighboring neurons become to rely on this specialization, which if taken too
far can result in a fragile model too specialized to the training data. This reliant on
context for a neuron during training is referred to complex co-adaptations [
        <xref ref-type="bibr" rid="ref9">9, 17</xref>
        ]
4) Max pooling is a sample-based discretization process. The objective is to
downsample an input representation (image, hidden-layer output matrix, etc.), reducing its
dimensionality and allowing for assumptions to be made about features contained in
the sub-regions binned.
      </p>
      <p>
        This is done to in part to help over-fitting by providing an abstracted form of the
representation. As well, it reduces the computational cost by reducing the number of
parameters to learn and provides basic translation invariance to the internal
representation[
        <xref ref-type="bibr" rid="ref10 ref15">10, 15</xref>
        ].
      </p>
      <p>
        Also,we will use the optimization algorithms described below:
1) The RMSprop optimizer is similar to the gradient descent algorithm with
momentum. The RMSprop optimizer restricts the oscillations in the vertical direction.
Therefore, we can increase our learning rate and our algorithm could take larger steps
in the horizontal direction converging faster[
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>
        2) Adaptive Moment Estimation (Adam) is a method that computes adaptive
learning rates for each parameter. It stores both the decaying average of the past
gradients mt, similar to momentum and also the decaying average of the past squared
gradients vt, similar to RMSprop and Adadelta. Thus, it combines the advantages of
both the methods. Adam is the default choice of the optimizer for any application in
general [
        <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiment</title>
      <p>So, for training our networks, we chose CIFAR10 dataset. We will train our network
to classify 10 different objects: doctor, patient, disease, mode, ward, hospital, surgery,
tablet, syringe, prescription. The classes are completely mutually exclusive. There is
no overlapping between classes. This means that you will, not find an image with 2
different classes at the same time.</p>
      <p>This means, that we could apply our network to solve different medical problems.
For example, it can be helpful for predicting diagnosis relying on the cardiogram.</p>
      <p>We will make custom sequential model for comparing with fractal one. Sequential
model is simply a linear stack of layers. So, you can just create an empty model, and
then add as many layers as you want. In this model we add few activation layers,
connection layers, regularization layers, convolutional layers, pooling layers. Here is
our final version</p>
      <p>In the table above you can see the full training process with its accuracy and loss at
each step of the training. The best results are highlighted</p>
      <p>Also, on the following graphs (Fig. 3, 4) you can see a dependency of accuracy
and loss according to epochs. Accuracy is calculated as the amount of right
predictions divided by all predictions.</p>
      <p>So, from the graph we can see the logarithmic increase of accuracy. Also, we can
notice optimal amount of training after which the accuracy increases very slightly.</p>
      <p>
        Model for fractal neural network is much more complicated than our custom
model. It has much more layers and much more configurations. The full implementation
of the fractal neural network model could be found by the link in references [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It
was published with a paper at ICLR 2017 by Gustav Larsson, Michael Maire and
Gregory Shakhanaovich, as I mentioned in literature review section [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Now let us train this network the same way as we did with our custom network.
This time we will make 70 epochs, because training fractal network takes more time
and computing resources. Below you can see a piece of our training process (Fig. 5).</p>
      <p>In the table 2 you can see the full training process with its accuracy and loss at each
step of the training. Best results are highlighted.
Best results are marked with green color.</p>
      <p>Also, on the following graphs (Fig.6, 7) you can see a dependency of accuracy and
loss according to epochs. As with our custom network accuracy is calculated as the
amount of right predictions divided by all predictions. So, from the graph we can see
the logarithmic increase of accuracy. Also, we can notice optimal amount of training
after which the accuracy in сreases very slightly. So, it looks similar to our custom
neural network graph, but as we see, that the accuracy here is better.
Now it is time to test our trained models on a test dataset. It is the set of images which
haven’t been used during training process. The process is similar, but we iterate
through our dataset only once, and output the results immediately. The results for our
custom network are the following (Fig. 8).</p>
      <p>This test showed us 0.7929 accuracy, which means that from 10000 labeled
images with 10 different object classes our network predicted 7929 images right and 2071
images wrong. Our test accuracy become lower than the training one (8.111), which
means that we overfit our model on a train dataset a little bit. It means that our weight
fits a bit better for our train dataset. Lowering the training time may improve our
accuracy a little bit.</p>
      <p>Now let us head back to our fractal network. Our best accuracy was achieved at
the very the end of epochs, which means, that further training may lead to better
results. But it will take more time and more computing resources. Our accuracy is pretty
good, but first let us test in on test data set and check if we didn’t overfit our network
(Fig. 9).</p>
      <p>This test showed us 0.8864 accuracy,which means that from 10000 labeled images
with 10 different object classes our network predicted 8864 images right and 1136
images wrong.Our test accuracy is the same as train one(0.8864),which means that we
didn’t overfit our model on a train dataset.</p>
      <p>In the table below you can see the final comparison of our models. All trainings
and testing were made inside Google Collaboratory with its own GPU.
In the current paper we run custom neural network and fractal neural network inside
Google Collaboratory using given GPU. Then we trained and tested them on
CIFAR10 dataset. Custom neural network showed us worse results than fractal one, but each
iteration took up to 10 seconds, when 1iteration of fractal neural network took up to 3
minutes. Moreover, our network is pretty simple, so we can say that that is suits better
for datasets with lower quantity of classes. Fractal neural network showed us pretty
good results, but we are sure that with more powerful computing resources and more
time it can perform much better.</p>
      <p>As we mentioned before, we can apply this technology on a different medical data
to solve various kinds of medical problems. This can help to decrease the amount of
human mistakes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Estivill-Castro</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Amoeba: Hierarchical clustering based on spatial proximity using Delaunay diagram</article-title>
          ,
          <source>9th Intern. Symp. on spatial data handling</source>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>41</lpage>
          Beijing, China (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kang</surname>
          </string-name>
          , H.-Y.,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>B.-J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          -J.:
          <source>P2P Spatial query processing by Delaunay triangulation, Lecture notes in computer science</source>
          , vol.
          <volume>3428</volume>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>150</lpage>
          , Springer/Heidelberg (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Boehm</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kailing</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroeger</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Density connected clus-tering with local subspace preferences</article-title>
          ,
          <source>IEEE Computer Society, Proc. of the 4th IEEE Intern. conf. on data mining</source>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          , Los Alamitos (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Boyko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhovska</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basystiuk</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Performance evaluation and comparison of software for face recognition, based on dlib and opencv library</article-title>
          ,
          <source>Second International Conference on Data Stream Mining and Processing</source>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>482</lpage>
          , DSMP (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Boehm</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kailing</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroeger</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Density connected clus-tering with local subspace preferences” IEEE Computer Society</article-title>
          ,
          <source>Proc. of the 4th IEEE Intern. conf. on data mining</source>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          , Los Alamitos (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Harel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Clustering spatial data using random walks</article-title>
          ,
          <source>Proc. of the 7th ACM SIGKDD Intern. conf. on knowledge discovery and data mining</source>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>286</lpage>
          , San Francisco, California (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Tung</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          . :
          <article-title>Spatial clustering in the presence of obstacles</article-title>
          ,
          <source>The 17th Intern. conf. on data engineering (ICDE'01)</source>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>367</lpage>
          , Heidelberg (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Veres</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhovska</surname>
          </string-name>
          , N.:
          <article-title>Elements of the formal model big date</article-title>
          ,
          <source>The 11th Intern. conf. Perspective Technologies and Methods in MEMS Design (MEMSTEH)</source>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>83</lpage>
          ,
          <string-name>
            <surname>Polyana</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehrke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Automatic sub-space clustering of high dimensional data</article-title>
          , vol.
          <volume>11</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>5</fpage>
          -
          <lpage>33</lpage>
          ,
          <article-title>Data mining knowledge discovery (</article-title>
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ankerst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
          </string-name>
          , H.-P.:
          <article-title>Towards an effective cooperation of the user and the computer for classification</article-title>
          ,
          <source>Proc. of the 6th ACM SIGKDD Intern. conf. on knowledge discovery and data mining</source>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>188</lpage>
          , Boston, Massachusetts, USA (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peuquet</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gahegan</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata</article-title>
          ,
          <source>vol. 3, N. 7</source>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>253</lpage>
          , Geoinfor-matica (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Boyko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhovska</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sviridova</surname>
          </string-name>
          , N.:
          <article-title>Use of machine learning in the forecast of clinical consequences of cancer diseases</article-title>
          ,
          <source>In 7th Mediterranean Conference on Embedded Computing</source>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>536</lpage>
          , IEEE MECO'
          <year>2018</year>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Boyko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Advanced technologies of big data research in distributed information systems</article-title>
          , Radio Electronics, Computer Science,
          <source>Control. № 4</source>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>77</lpage>
          , Zaporizhzhya: Zaporizhzhya National Technical University (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Larsson</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shakhnarovich</surname>
          </string-name>
          , G.:
          <article-title>FractalNet: Ultra-Deep Neural Networks without Residuals</article-title>
          , http://people.cs.uchicago.edu/~larsson/fractalnet/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mochurad</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solomiia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Optimizing the Computational Modeling of Modern Electronic Optical Systems</article-title>
          . In: Lytvynenko V.,
          <string-name>
            <surname>Babichev</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wójcik</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vynokurova</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vyshemyrskaya</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radetskaya</surname>
            <given-names>S</given-names>
          </string-name>
          .
          <source>(eds) Lecture Notes in Computational Intelligence and Decision Making</source>
          , pp
          <fpage>597</fpage>
          -
          <lpage>608</lpage>
          ,
          <string-name>
            <surname>ISDMCI</surname>
          </string-name>
          <year>2019</year>
          .
          <source>Advances in Intelligent Systems and Computing</source>
          , vol
          <volume>1020</volume>
          . Springer, Cham. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>