<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Capsule Network for Video Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksandr Y. Buyko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey N. Vinogradov</string-name>
          <email>vinogradov_an@rudn.university</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor P. Tishchenko</string-name>
          <email>igor.p.tishchenko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ailamazyan Program Systems Institute of RAS (PSI RAS) 4a Petra-I st.</institution>
          ,
          <addr-line>s. Veskovo, Pereslavl district, Yaroslavl region, 152021, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Technologies Peoples' Friendship University of Russia (RUDN University)</institution>
          <addr-line>6 Miklukho-Maklaya st., Moscow, 117198, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>17</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>In this article, samples of object recognition on video and selection of unique scenes are considered. We used a new algorithm of capsule networks as a tool for video analysis. The algorithm is a continuation of the development of convolutional neural networks. Convolutional networks use a scalar as the base element to be processed. In turn, capsule networks are processing vectors, and use a special routing algorithm. These fundamental diferences allow capsule networks to be more invariant to the rotations and changes in illumination of the recognized object. This fact has become the key to choosing this type of networks for analysis of dynamic video. In this article, we propose a method for video segmentation. The essence of this method is as follows. First, you need to determine the main acting objects on adjacent frames. Then, it is necessary to determine whether these objects coincide, if not, then the second frame is considered the moment of transition to another scene. The proposed method was tested on the custom-collected dataset based on videos from YouTube. There were two classes of objects in dataset. The results presented in the article show that we were not able to achieve a high level of accuracy of video segmentation. Also worth noting that the learning process took quite a long time.</p>
      </abstract>
      <kwd-group>
        <kwd>and phrases</kwd>
        <kwd>machine learning</kwd>
        <kwd>video recognition</kwd>
        <kwd>video analysis</kwd>
        <kwd>video segmentation</kwd>
        <kwd>capsule Network</kwd>
        <kwd>ConvNet</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright © 2018 for the individual papers by the papers’ authors. Copying permitted for private
and academic purposes. This volume is published and copyrighted by its editors.
In: K. E. Samouylov, L. A. Sevastianov, D. S. Kulyabov (eds.): Selected Papers of the 1st Workshop
(Summer Session) in the framework of the Conference “Information and Telecommunication
Technologies and Mathematical Modeling of High-Tech Systems”, Tampere, Finland, 20–23 August,
2018, published at http://ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The neural networks have already become a powerful tool for solving various
nontrivial tasks in the field of computer science. Unfortunately, today there is no universal
solution in the field of neural networks and the solution of each case requires special
research. Convolutional neural network (CNN) have proven themselves as a relatively
efective method for classifying data [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], but such networks have a number of problems
and are able to work efectively under certain conditions. The article [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] disturbed the
world of neural networks. Geofrey Hinton has represented a capsule network that could
become a new generation of convolutional neural networks. According to the author,
the proposed network are to solve the problem of poor translation invariance and lack of
information about orientation. It is important to note, Hinton has laid the preconditions
for the creation of capsule networks in his previous articles [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. It is a well-known
fact that CNN have problems with turning objects or changing lighting conditions [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
Moreover, such a problem greatly afects on the efectiveness of the actions recognition
on the videos, i.e. on a series of images.
      </p>
      <p>
        Thus, in this article, we will study the use of a capsule network for pattern recognition
on video and the selection of individual scenes. Networks will be aimed at recognizing
objects on dynamic frames such. The transition of the scene will be considered in case
of a significant change of the frames’ characteristics, such as the change of the existing
object. Here is the number of articles that inspired us on writing this paper: [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8–11</xref>
        ].
2.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Capsule network algorithm</title>
      <p>Capsule network in its structure is similar to the convolutional network (Fig. 1a) and
includes several layers consisting of capsules (Fig. 1b). Under certain conditions, the
capsules can be activated. Each active capsule will be used to select another capsule
from the next layer using a special routing algorithm. Routing is a key feature of capsule
networks. Capsules are activated depending on various properties of the image, for
example, shape, position, color, texture, etc. Thus, each capsule after training will be
responsible for some property of objects in the image. It is assumed that the capsule
will be active if there is a property on the image for which the capsule responds (Fig. 1c).
A vector is obtained at the exit of the capsule. In this work, the vector will represent
the probability of belonging to a particular class.</p>
      <p>Procedure 1 Routing algorithm.
1: procedure ROUTING( |, ,  )
2: for all capsule  in layer  and capsule  in layer ( + 1) :  ← 0.
3: for  iterations do
4: for all capsule  in layer  : c ←   (b ) . softmax computes Eq. 3
5: for all capsule  in layer ( + 1) : s ←   û |
6: for all capsule  in layer ( + 1) : v ← squash(s ) . squash computes Eq. 1
7: for all capsule  in layer  and capsule  in layer ( + 1) :  ←  + û | . v
return v .</p>
      <p>
        The full algorithm of capsule network is represented in original paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2.1.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Proposed method (analytics)</title>
      <p>To begin with, in this study, we decided to analyze only visual information, which in
the vast majority of cases is enough to identify objects on the video. Thus, the image
series will be analyzed.</p>
      <p>The goal is to design a deep learning architecture to identify the transition points
between two diferent main active objects on the one video. In this paper we aim to use
capsule networks instead of neuron ones to construct the deep learning architecture. This
kind of networks was chosen as an alternative to CNN, which have obvious drawbacks
when analyzing video.</p>
      <p>First, the CNNs of the scalar and additive nature of neurons in CNNs, neurons on
any given layer of the network are ambivalent to the spatial relationships of the neurons
within their core of the previous layer and, therefore, within their efective receptive
ifeld of given input. In turn, in capsule networks, information on each layer is stored
not as a scalar, but as a vector. Such vectors are capable of storing information about
such attributes as spatial orientation, magnitude and the other newly-derived attributes
depending on the type of capsule layer. The capsules of the upper level are “routed” to
the capsule on the next layer using a special dynamic routing algorithm. This algorithm
takes into account the consistency between the capsule vectors, eventually forming
meaningful partial relationships that are not found in standard CNNs. The internal
representation of the convolutional neural network data does not take into account the
spatial hierarchies between simple and complex objects. Therefore, if the image depicts
the eyes, nose and lips for a convolutional neural network in an image, this is a clear sign
of having a face. A rotation of the object worsens the quality of recognition, whereas
the human brain easily solves this problem.</p>
      <p>
        According to the [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the capsule networks reduce the recognition error of the object
in another angle by 45% in comparison with CNN.
      </p>
      <p>The proposed method is described below.</p>
      <p>
        On the input we get an array of images  [1.. ]. Next, in a cycle, the network receives
one image per step. The top preprocessing (compressing) layer is aimed at resize the
image to the size of the first convolutional capsule layer, regardless of the size of the
original image. For that reason the Convolution algorithm with a Lanczos filter [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is
used.
      </p>
      <p>After exploring several potential architectures we realized that, in fact, to solve the
task, there is no need to accurately identify the object to identify the transition points,
we decided to use 3 layers of the network to reduce the data abstraction to reduce the
complexity of the calculations.</p>
      <p>
        We decided to use the similar model as in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. There is summary of three hidden
fully connected and output layers of proposed model:
– The first layer process images with the size of 64 × 64 pixels.
– The second one is a convolutional layer with a 64 × 9 × 9 sized filters and stride of
1 which leads to 64 feature maps of size 56 × 56.
– The third layer is a Primary Capsule layer resulting from 256 × 9 × 9 convolutions
with strides of 2. This layer consists of 32 “Component Capsules” with dimension
of 8 each of which has feature maps of size 24×24
– The final output layer includes 2 capsules, referred to as “Class Capsules” which
are aimed to return the probability of belonging to a class.
      </p>
      <p>This is how the entire series of images of one video is processed. On the output we
get an array with the probabilities of belonging to the objects on the frames to one of
the classes in the format [( 1,  2)1, ..., ( 1,  2) ]. Finally, it is necessary to detect the
moment when the main active objects changes. For that case, we simply compare the
values of the probabilities obtained. The moment of transition is the case when:
–  1 &lt;  2( + 1)
–  2( + 1) −  1( + 1) &lt; TP,
where TP is a transition point probability constant which describes the sensitivity of
transitions. The transition constant value is obtained experimentally and depends on the
type of video. Obviously, for the image index, when the transition occurred, determine
the transition time on the timeline.</p>
      <p>
        Summarizing, this superficial research aims at testing CapsNet as an efective method
for analyzing the series of images by detecting the change of the main acting object.
Also, we have taken into account [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] while working on this paper.
      </p>
      <p>The general scheme of proposed method is demonstrated on Fig. 2.</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental environment</title>
      <p>We have prepared a special set of videos, which includes random videos of cars and
running people from YouTube with size of 320 × 240. The training set included 500
videos randomly cut onto 5s sub-videos and randomly glued together.</p>
      <p>Thus, we aimed at recognition of “suddenly-changed” senses. We have used a set of
50 video and a tenfold cross-validation method to assess the accuracy of network.</p>
      <p>
        The method was coded with Python 3.6 and TensorFlow 1.10.0. We used the code
template [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] as a start point of ours.
      </p>
      <p>
        The training process was conducted on a cluster of 4 computers which were build
according the results in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Computers had the following characteristics:
– OS: Ubuntu 18.04
– GPU: Nvidia Geforce 540m
– CPU: Intel Core i5|i7
– RAM: 4-8 GB
2.3.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>As a result, we have achieved the best result with an accuracy of 13.31% on a 3
hidden layer network. After a series of experiments it was found that the best results
were obtained with TP = 0.4</p>
      <p>The results of the assessment test are shown in Table 1.</p>
      <p>Results of the test:</p>
      <p>According to the results, even in the absence of transitions the accuracy level of
correctly recognized object was 87.25%. In the case where the video consisted of two
scenes with people, the network successfully detected the transition only in 7.27 percent
of cases. The transitions on the videos with cars were detected 2.25% better than on
video with people. Perhaps this can be explained by video quality and higher dynamics
of background change. The results of transitions detection on “People+Cars” videos are
illustrated on Fig. 3.</p>
      <p>Definitely, we can say that the most of guessed recognized transitions were among
diferent classes of objects. In addition, we assessed the influence of background saturation
by its middle value (0 – the darkest, 1 – the brightest). We found out that the background
is extremely important in the recognition. The best results have been achieved between
the values 0.3 and 0.4.</p>
      <p>Given the overall low accuracy of transition detection on 2 scenes videos, we decided
not to test the method on the videos with more scenes.</p>
      <p>We have to point out the problem with performance. The training lasted about 6
hours, that a bit longer than CNN with similar architecture and parameters.</p>
      <p>The drawbacks of such method:
– a small and poorly organized collection of data
– the usage of predetermined number of classes
– poorly optimized architecture and routings of the network
– weak algorithm for determining object shifts by frames</p>
      <p>
        We are convinced that it is necessary to split the method onto two modules: object
recognition and transition detection. The main reason for this breakdown is the need
for highly specialized modules capable of solving tasks at diferent levels of abstraction.
However, the integration of these modules will not be obvious. In addition, we suppose,
it would be interesting to apply approaches from adjacent areas for data analysis, for
example, approaches of dynamic scaling [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or queuing theory methods [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to improve
the eficiency of calculations.
      </p>
      <p>3.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>To sum up, in this paper we have studded the capsule network in the task of
recognizing objects on video and highlighting unique scenes. Despite the stated eficiency
of image recognition, capsule networks showed a low level of accuracy of recognition of
transitions between scenes on videos.</p>
      <p>
        It is important to note, even in the case of a high level of recognition accuracy
the weak point of the CapsNet approach for video recognition is the limitation on
classes. People should encode as little as possible the amount of knowledge in the AI
software, and instead force them to rely on themselves from scratch. Therefore, currently
considered algorithms are not able to provide acceptable image recognition eficiency
for the exact segmentation of video. Considering the success of recurrent networks
for recognizing actions on video in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], we can assume that it is worth exploring the
proposed in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] Recurrent Capsule Network.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The publication has been prepared with the support of the “RUDN University
Program 5-100”. The work is partially supported by state program 0077-2016-0002
«Research and development of machine learning methods for the anomalies detection».</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>In NIPS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Voigtlaender</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Leibe</surname>
          </string-name>
          .
          <article-title>Online adaptation of convolutional neural networks for video object segmentation</article-title>
          .
          <source>In BMVC</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Sara</given-names>
            <surname>Sabour</surname>
          </string-name>
          , Nicholas Frosst, and
          <string-name>
            <surname>Geofrey E. Hinton.</surname>
          </string-name>
          «
          <article-title>Dynamic routing between capsules»</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <fpage>3856</fpage>
          -
          <lpage>3866</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Alex</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geofrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Learning multiple layers of features from tiny images</article-title>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Geofrey</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alex Krizhevsky</surname>
            , and
            <given-names>Sida D</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Transforming auto-encoders</article-title>
          .
          <source>In International Conference on Artificial Neural Networks</source>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Geofrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          , Sara Sabour, Nicholas Frosst.
          <article-title>Matrix capsules with EM routing (PDF)</article-title>
          .
          <source>April</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Parnian</given-names>
            <surname>Afshar</surname>
          </string-name>
          , Arash Mohammadi, Konstantinos N. Plataniotis, “Brain Tumor Type Classification via Capsule Networks” / CoRR, abs/
          <year>1802</year>
          .10200,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>K.-K. Maninis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Caelles</surname>
          </string-name>
          , Y. Chen, “
          <article-title>Video Object Segmentation Without Temporal Information”</article-title>
          , In CoRR, abs/1709.06031,
          <year>2017</year>
          , http://arxiv.org/abs/1709.06031S.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Caelles</surname>
            ,
            <given-names>K.K.</given-names>
          </string-name>
          <string-name>
            <surname>Maninis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pont-Tuset</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Leal-Taixé</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Cremers</surname>
            , and
            <given-names>L. Van Gool</given-names>
          </string-name>
          , “
          <string-name>
            <surname>One-Shot Video</surname>
          </string-name>
          Object Segmentation”, CVPR,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. G. Bertasius,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Torresani</surname>
          </string-name>
          .
          <article-title>Semantic segmentation with boundary neural ifelds</article-title>
          .
          <source>In CVPR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kundu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Vineet</surname>
            , and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Koltun</surname>
          </string-name>
          .
          <article-title>Feature space optimization for semantic video segmentation</article-title>
          .
          <source>In CVPR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Thibault</given-names>
            <surname>Neveu</surname>
          </string-name>
          .
          <article-title>Understand and apply CapsNet on trafic sign classification</article-title>
          .
          <source>Becoming Human</source>
          ,
          <year>November 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wilhelm</surname>
            <given-names>Burger</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mark J.</given-names>
            <surname>Burge</surname>
          </string-name>
          .
          <article-title>Principles of digital image processing: core algorithms</article-title>
          . Springer,
          <year>2009</year>
          . P.
          <volume>231</volume>
          -
          <fpage>232</fpage>
          . ISBN 978-1-
          <fpage>84800</fpage>
          -194-7.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Max</surname>
            <given-names>Pechyonkin</given-names>
          </string-name>
          , “
          <article-title>Understanding Hinton's Capsule Networks. Part I: Intuition</article-title>
          .”,
          <source>Deep Learning, Nov 3</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zafar</surname>
          </string-name>
          , “
          <article-title>Beginner's Guide to Capsule Networks” [https://www.kaggle.com/fizzbuzz/beginner-s-guide-to-capsule-</article-title>
          <source>networks] 03.08</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kondratyev</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tishchenko</surname>
            <given-names>I</given-names>
          </string-name>
          .
          <source>Concept of Distributed Processing System of Image Flow // Robot Intelligence Technology and Applications 4. Results from the 4th International Conference on Robot Intelligence Technology and Applications</source>
          (RiTA2015) / ed. by J.
          <string-name>
            <surname>-H. Kim</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Karray</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sincak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Myung</surname>
          </string-name>
          .
          <source>Serie “Advances in Intelligent Systems and Computing”</source>
          ,
          <volume>447</volume>
          (
          <year>2016</year>
          )
          <fpage>479</fpage>
          -
          <lpage>487</lpage>
          . URL: https://link.springer.com/chapter/10.1007%
          <fpage>2F978</fpage>
          -
          <fpage>3</fpage>
          -
          <fpage>319</fpage>
          -31293-4_
          <fpage>38</fpage>
          . DOI:
          <volume>10</volume>
          .1007/978.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sopin</surname>
            <given-names>E.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorbunova</surname>
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaidamaka</surname>
            <given-names>Y.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaripova</surname>
            <given-names>E.R.</given-names>
          </string-name>
          <article-title>Analysis of Cumulative Distribution Function of the Response Time in Cloud Computing Systems with Dynamic Scaling, Automatic Control</article-title>
          and
          <source>Computer Sciences</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          )
          <year>2018</year>
          , Pages
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          . DOI:
          <volume>10</volume>
          .3103/S0146411618010066.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Gaidamaka</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaripova</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <article-title>Comparison of polling disciplines when analyzing waiting time for signaling message processing at SIP-server</article-title>
          .
          <source>Communications in Computer and Information Science</source>
          ,
          <volume>564</volume>
          (
          <year>2015</year>
          )
          <fpage>358</fpage>
          -
          <lpage>372</lpage>
          . DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -25861-4_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Buyko</surname>
            <given-names>AY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinogrdov</surname>
            <given-names>AN</given-names>
          </string-name>
          “
          <article-title>Action Recognition in Videos with Long Short-Term Memory Recurrent Neural Networks”</article-title>
          , Applied Informatics,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>Srikumar</given-names>
            <surname>Sastry</surname>
          </string-name>
          . “
          <article-title>Recurrent Capsule Network for Image Generation”</article-title>
          ,
          <year>2018</year>
          .G. Citti,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <article-title>A cortical based model of perceptual completion in the roto-translation Space</article-title>
          , J. Math. Imaging Vis.,
          <volume>24</volume>
          (
          <year>2006</year>
          )
          <fpage>307</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>