The system of convolution neural networks automated training
Vladislav A. Sobolevskiia
a
    St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 14th line V.O., 39, St.
    Petersburg, 199178, Russia

                 Abstract
                 In this paper the research related to the creation of a program complex, which realizes the
                 automated generation of service-programs for the artificial intelligence systems based on the
                 convolution neural networks is presented. The presented program complex to accelerate and
                 simplify the generation and training of convolutional neural networks.

                 Keywords
                 Machine learning, convolutional neural networks, service-oriented architecture, internet of
                 things


1. Introduction                                                                                  This leads to the fact that the task of creating
                                                                                             the systems of CNN generation automation
   In modern world the recognition                                                           process for one or the other spheres is becoming
technologies of photo and video images are                                                   very relevant [4-6]. At the same time, the
being implemented more intensively. The                                                      demand for a system suitable for solving typical
development of this sphere became possible                                                   tasks from different spheres is becoming more
due to the appearance of new convolution                                                     acute. There are many tasks of one class (for
neural network (CNN) architectures and the                                                   example, the recognition of certain tree species
modification of existing ones. The given type of                                             in space images, landscape peculiarities,
architecture turned out to be successful enough                                              specific nature objects etc), the solving
for solving the tasks of image analysis,                                                     principle of which has been already discovered
segmentation and semantic recognition. The                                                   or they are being handled on the basis of an
higher the CNN accuracy and capabilities are,                                                individual CNN production [7-9] or not being
the more complex CNN become. Some of the                                                     solved at all due to the lack of specialists.
most successful and widespread CNN                                                               Additionally, a lot of CNNs are produced in
architectures at the moment have a plenty of                                                 forms of program prototypes (for instance,
heterogeneous layers [1-3]. This leads not only                                              using MatLab) and such prototypes require
to the increase of work quality, but to the                                                  improvement for implementing into the
complication in creating and training such                                                   existing monitoring systems which are
networks.                                                                                    designed at specific stacks of applied
   At the same time, the number of tasks that                                                programming languages (C++, Java, Python
can be solved using CNN rises. The given tasks                                               etc). In its turn, this makes the further
not always demand the application of the most                                                development and the following implementation
complex and foremost CNN architectures, but                                                  of prototypes more complicated.
they are still quite difficult and regular users                                                 For solving the given tasks, the system of
without any knowledge of deep learning                                                       convolution neural networks automated
methods and their implementation skills would                                                training was designed based on the service-
not be able to create and adapt these networks                                               oriented approach within the project presented
correctly. It can be said that the quantity of such                                          in this article. The approach of artificial neural
tasks is growing faster than the number of                                                   networks automated generation is not new and
professionals capable of solving them.                                                       there are some works upon this topic [10-13].
                                                                                             All these works point to the fact that the
Models and Methods for Researching Information Systems in                                    automation of machine learning models
Transport, Dec. 11-12, St. Petersburg, Russia                                                production process will allow to fasten the
EMAIL: arguzd@yandex.ru (V. A. Sobolevskii)
ORCID: 0000-0001-7685-4991 (V. A. Sobolevskii)
                                                                                             process of developing program products for
             ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative
             Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                             solving a multitude of tasks. The system
             CEUR Workshop Proceedings (CEUR-WS.org)                                         described in the article elaborates the idea of
                                                                                             automation and has module extensible structure

                                                                                                                                         100
which allows to add and combine trainable           currently implemented on the basis of a genetic
architectures, training algorithms, data            algorithm) which was developed with an
normalization, validation etc. Moreover, due to     expectation of changeability. The other
genetic algorithms, the given system is capable     algorithms of solution search can be used
of automated CNN generating and training            instead of it and there is no need to make
which allows non-professionals who are not          significant modifications to other modules for
aware of neural networks setting details to use     the use of these algorithms.
it for solving typical tasks. The work result of        This approach is based on the principles of
this system is not only a built architecture, but   transparency and scalability which allows to
a generated executable file with additional         expand the program product functionality by
REST and SOAP wrappings that without any            adding new modules, not by modifying the
preliminary preparations will allow to start the    existing ones.
produced CNN as a service and apply to it from          It is obvious that the given approach would
other systems and program complexes. This           not allow to implement the automated training
presents the system as a tool for a quick and       of all possible CNN architectures. However, the
effortless solving of simple typical tasks by       generation and training processes of typical
regular users.                                      architectures have a precise and consecutive
    By present time, the designed system had        algorithm. Having implemented the given
already been used for generating simple deep        algorithm in the program complex it would be
neural networks that were introduced into third-    possible to solve the task of typical neural
party program products for solving specific         network solutions streaming (conveyor)
applied tasks [14-15]. In suggested article the     implementation as the main one.
capabilities of the given program complex               The service-oriented approach in the
which were improved using CNN automated             developed program complex occurs in the fact
training are described.                             that all modules should not be necessarily
                                                    installed to one and the same personal computer
2. The service-oriented approach                    (PC). Modules can be distributed between
                                                    different PCs or placed in cloud storages. Thus,
   in neural networks automation                    the given program complex can be
   generation                                       implemented in the form of a distributed system
                                                    that blends into the SOA paradigm completely.
    The service-oriented architecture (SOA) of          At the program product operation result
applications implies a module approach to the       level SOA is maintained by the implementation
program application development [16]. In the        of autonomous service containing CNN trained
considered situation the given paradigm is          to solve a specific task. This service is cross-
implemented at several levels.                      platformed and it can be launched without any
    At the level of the program complex itself      prior installing and additional software setting
SOA maintains            the modularity and         on the basis of some operation systems (which
interchangeability of CNN generation and            is possible due to the cross-platform of the
training algorithms. Thus, the whole process of     given modules implementation language -
automated generation and training is divided        Python [17]). Respectively, such module can be
into some consecutively evoked program              used in the systems maintaining both SOA
modules:                                            paradigm and the Internet of Things (IoT) via
    •    the input data normalization module;       interfaces REST and SOAP [18-20].
    •    the generation module of chosen CNN
    or the module of pre-trained CNN                3. The algorithm of convolution
    architecture initialization;
    •    the CNN training module (including            neural networks automated
    verification and validation submodules).           training
    Each of these modules is presented in
several realization variants (for various CNN           The difficulty in CNN production and
architectures) and certain realizations are         training lies in the fact that they are being
chosen depending on the requirements. In            trained only having a marked training dataset
addition, these modules are evoked from an          which describes the class of recognizable
external automated training module (it is           objects. The recognition of different object

                                                                                             101
classes requires various CNN architectures and       supposed to be used for different classes of
their parameter settings. Due to the CNN             tasks. Although the use of specific
complexity this task becomes very resource-          algorithms would have fastened the
intensive. This is one of the CNN key                operation speed for some task classes, but it
restrictions of CNN trained with a teacher. Now      inevitably would have slowed the operation
the approach which consists in multitasking          speed for other classes. The inaccuracy
CNN creation for different science fields that       estimation calculated using CNN target
can solve the whole class of tasks is often used     parameter value relatively to the real value
[21-23]. The given approach has some                 of a test dataset (formula 2) lies in the basis
advantages, particularly the higher accuracy for     of the fitness function
selected objects. However, the development of                             1                     (2)
each of these CNNs is more resource-intensive              𝑓𝑖 =                      ,
                                                                   ∑𝑀              2
and demands participation of specialists able to                 √ 𝑗=1(𝜀𝑖𝑗 − 𝜔𝑗 )
project the architectures of such networks. The                            𝑋
alternative solution described in this article is    where εij is the output value of a target
the automated training of models. This kind of       parameter, which was forecast by i-network
solution implies simultaneous training of some       in response to an input test j-vector, ωj is the
CNNs based on prepared information dataset           real value of a test dataset in response to an
for the following situational choice of the most     input test j-vector, X is the quantity of test
precise model which leads to the necessity to        vectors.
solve the task of models parametrical                The result of a calculation according to the
adaptation quality assessment. At the same           given formula is a "fitness level" value,
time, the formation task of training dataset in      which is inversely proportional to the mean
common case does not require special                 squared error of i-CNN at the test dataset. As
knowledge [24]. The automated system (AS)            a result of selection, M is selected to the
described in the article is relevant in such cases   current generation out of (M + Nd + Nr)
when the development of a wholesome CNN              CNN with the maximum pi value (choice
able to solve the task in the most accurate way      probability of i-CNN).
is unprofitable. Using this system, it is possible   4. For all CNN the mean squared error of
to create CNN able to solve the assigned task        the target parameter value calculated by
cheaper and faster with an accuracy specified        them relatively to the real test dataset value
by user.                                             is computed. If at least one CNN shows the
    The algorithm of CNN selection was               mean squared error lower than the set value,
implemented in the following way:                    the cycle stops. The CNN with the lowest
    1. In the first parent population a fixed        mean squared error is treated as a "winner".
    CNN number (M) is generated with                 Otherwise, the return to point 2 takes place.
    randomly set parameters.                         In addition, the population of each iteration
    2. Nd of new CNNs is generated, the              is stored separately. If the population of a
    parameters of which are selected randomly        current iteration coincides completely with a
    out of two occasionally chosen parent            previous population, it means that during all
    CNNs, and also Nr of CNN, the parameters         iteration the CNN configuration with the
    of which are set completely randomly             most accuracy has not been found and the
    considering the given value ranges for these     unconditional transition to step 5 is carried
    parameters.                                      out.
    3. Further, the CNN selection is                 5. If a CNN with the mean squared error
    performed using the roulette method              lower than the set value is not found, the
    (formula 1) [25]                                 cycle launches from the step 1 with a new
                          𝑓𝑖                   (1)   parent population, for which new random
                 𝑝𝑖 = 𝑁       ,
                      ∑𝑗=1 𝑓𝑗                        parameter values are set. If the solution is
    where pi is the choice probability of i-CNN,     not found after I iteration, the task is
    fi is the value of fitness function for i-CNN,   declared to be unsolvable with specified
    N is the quantity of CNN in population. The
    roulette method was chosen as the most
    universal one, because the algorithm is


                                                                                              102
   settings and the output from the algorithm is      [31]. By default, MRCNN is already capable of
   performed.                                         recognizing fundamentally different object
                                                      classes, from automobiles to animals. That is
4. Technologies used in the                           why, by proper additional training, it would be
                                                      able to recognize a wide range of objects that
   developed program complex                          are not included into COCO dataset.
                                                         The program complex was tested on the
    This program complex is developed in              calculation task of the amount of deer in a herd
programming language Python, the main assets          from air photography. Besides the fact that deer
of which relate to its cross-platform,                do not belong to the COCO dataset and
extensibility and large amount of sided program       MRCNN is not able to distinguish them by
libraries used for solving specified tasks. The       default from the range of other creatures (sheep,
suggested programming language was chosen             gazelles, cows, horses), the specificity of this
because at the moment it happens to be the main       task has something to do with the fact that
solution for deep learning systems development        photos are made from various angles and
and also because it allows to realize SOA             distances, at different landscapes and during all
paradigm easily [26, 27]. Keras and                   seasons, which result in the fact that deer can be
TensorFlow libraries are used for training            shot under different angles, in various scales
algorithms implementation.                            and can have diverse colouring. What is more,
    Such stack of technologies is explained by        due to the size of herds, deer often cover one
the fact that the program does not face the           another in photos. This leads to the fact that the
implementation task of untypical solutions. On        described task in non-trivial and the application
the contrary, the quick realization of already        of CNN trained at common amount of data is
known architectures is required. The use of           impossible. In figure 1 the recognition results of
already developed, tested and optimized               one out of two images using MRCNN without
libraries satisfies the set task completely. At the   additional training are shown.
same time, the key requirements are
extensibility and scalability. Respectively, the
program complex realization on the basis of a
constantly extending program platform will
allow to add new CNN architectures and their
work tools at the cost of one program interface.
The cross-platform of the described stack and
the support of SOA paradigm will allow to scale
the program complex to different hardware.
    It is important to mention separately that
CUDA SDK is also included in the used
program libraries, which allows to exploit
hardware acceleration during artificial neural
network training using NVidia video cards [28,        Figure 1: The deer recognition and calculation
29]. The use of this technology makes the             using basic MRCNN trained at COCO dataset
process of CNN training significantly faster
[30].                                                     It can be noted that there are plenty of false
                                                      negative errors evoked by the COCO dataset
5. The approbation of automated                       specificity, in which there is an insufficient
   convolution neural network                         number of images with similar scaling of
                                                      objects. To get rid of false operations, it is
   training program complex                           required to train the network using images
                                                      marked for the specified task. That is why
    For approbation of the program complex            MRCNN was additionally trained using the
prototype performing additional training of           CNN automated training system prototype. The
Mask R-CNN (MRCNN) CNN architecture                   training was conducted in the automated mode
trained on COCO dataset was developed. The            based on the training dataset specified by a user.
given configuration was chosen because of the         The following parameters of a training process
balance between universality and accuracy             were varied in the prototype:

                                                                                                 103
    •    the quantity of training epochs;            is used by specialists in machine learning, the
    •    the quantity of training steps in each      current interface is not adapted for using by
    epoch;                                           regular users. Because of this, the accessibility
    •    the speed of training;                      for the wide user audience which is one of the
    •    the threshold of detection skipping.        key tasks facing the program complex is not
    The CNN declared to be the winner by a           being solved at the moment.
system was trained on 3 epochs, with 53                  In addition, because of the high-
training steps in each, 0,0058 training speed and    performance requirements, during the given
0,86 threshold of detection skipping. The            program product functioning the program
described network for the same image                 complex transition to highly productive servers
recognized correctly 58 out of 93 deer and did       is needed for the commercial use. The
not perform any false negative error (figure 2).     calculation specifity during CNN training puts
                                                     a range of requirements to the hardware and the
                                                     commercial use implies the parallel training of
                                                     several models that can load the system
                                                     significantly.    Despite     the    calculation
                                                     parallelism put in the program complex
                                                     architecture using SOA, it is demanded to
                                                     perform the additional research and stress-tests
                                                     to outline the specific requirements to the
                                                     hardware.

                                                     Acknowledgements
                                                        This work was supported by the RFBR grant
Figure 2: The deer recognition and calculation
                                                     №19-37-90112 and the budgetary theme 0073-
using additionally trained MRCNN
                                                     2019-0004.
    Of course, the trained CNN did not reach the
maximum possible accuracy, but it can be             References
improved in the future. What is more, the
recognition accuracy may be increased by using
the other CNN architectures. But the prototype       [1] A. Krizhevsky, I. Sutskever, G. E. Hinton,
testing can be considered successful because             ImageNet classification with deep
program and service coverages were generated             convolutional         neural        networks,
for additionally trained MRCNN which will                Communications of the ACM (2017),
allow to use the received CNN for solving the            volume 60, issue 6, pp. 84 – 90.
set task right away. Due to the unified interface,   [2] K. Simonyan, A. Zisserman, Very deep
it will be possible to perform the                       convolutional networks for large-scale
implementation of the most accurate CNNs in              image recognition, 3rd International
the future. Even if in the following versions a          Conference on Learning Representations
different CNN architecture is used, the program          (2015).
and service coverage interface will not change,      [3] M. D. Zeiler, R. Fergus, Visualizing and
and it will not be required to introduce changes         understanding convolutional networks, 3th
into the programs at the client side.                    European Conference on Computer Vision
                                                         (2014), volume 8689, issue 1, pp. 818 –
6. Conclusion                                            833.
                                                     [4] Z. Geng, Y. Wang, Automated design of a
                                                         convolutional neural network with multi-
   Nowadays, the program complex is at its
                                                         scale filters for cost-efficient seismic data
prototype stage and it is used for the
                                                         classification, Nature Communications,
development of some off-site applications. First
                                                         volume 11, issue 1, 2020.
of all, to start the full operation the
                                                     [5] M. Witsuba, A. Rawat, T. Pedapati,
improvement of user application interface is
                                                         Automation of deep learning, Proceedings
needed. As at the prototyping step the product


                                                                                               104
     of the 2020 International Conference on             International Conference Application of
     Multimedia Retrieval (2020), pp. 5-6.               Information       and     Communication
[6] B. Baker, O. Gupta, N. Naik, R. Raskar,              Technologies, Baku, Azerbaijan, pp. 324 –
     Designing neural network architectures              328, 2019.
     using reinforcement learning, 5th              [15] V. A. Zelentsov, A. M. Alabyan, I. N.
     International Conference on Learning                Krylenko, I. Yu. Pimanov, M. R.
     Representations (2017).                             Ponomarenko, S. A. Potryasaev, A. E.
[7] Ateeq-ur-Rauf, A. R. Ghumman, S.                     Semenov, V. A. Sobolevskii, B. V.
     Ahmad, H. N. Hashmi, Performance                    Sokolov, R. M. Yusupov, A Model-
     assessment of artificial neural networks            Oriented     System     for     Operational
     and support vector regression models for            Forecasting of River Floods, Herald of the
     stream flow predictions, Environmental              Russian Academy of Sciences, volume 89,
     Monitoring and Assessment, volume 190,              issue 4, pp. 405 – 417, 2019. doi:
     issue 12, article 704, 2018.                        10.1134/S1019331619040130.
[8] Z. Alizadeh, J. Yazdi, J. H. Kim, A. K. Al-     [16] M. Bell, Introduction to Service-Oriented
     Shamiri, Assessment of machine learning             Modeling, in Service-Oriented Modeling:
     techniques for monthly flow prediction.             Service      Analysis,      Design      and
     Water (Switzerland), volume 10, issue 11,           Architecture, Wiley & Sons, New York,
     article 1676, 2018.                                 NY, 2008.
[9] J. Lantrip, M. Griffin, A. Aly, Results of      [17] V. John, Guttag Introduction to
     near-term forecasting of surface water              Computation and Programming Using
     supplies, Proceedings of the 2005 World             Python:      With       Application      to
     Water and Environmental Resources                   Understanding, 2nd Edition, MIT Press,
     Congress, Anchorage, Alaska, US, 2005.              Cambridge, Massachusetts, 2016.
     doi: 10.1061/40792(173)447.                    [18] Y. Mesmoudi, M. Lamnaour, Y. E. L.
[10] I. Bello, B. Zoph, V. Vasudevan, Q. V. Le,          Khamlichi, A. Tahiri, A. Touhafi, A.
     Neural       optimizer      search      with        Braeken, Design and implementation of a
     Reinforcement learning, 34th International          smart gateway for IoT applications using
     Conference on Machine Learning (2017),              heterogeneous      smart     objects,   4th
     volume 1, pp. 712-721.                              International Conference on Cloud
[11] H. Cai, T. Chen, W. Zhang, Y. Yu, J.                Computing          Technologies         and
     Wang, Efficient architecture search by              Applications, Cloudtech, 2018.
     network transformation, 32nd AAAI              [19] D. Hanes, IoT Fundamentals: Networking
     Conference on Artificial Intelligence               Technologies, Protocols, and Use Cases
     (2018), pp. 2787-2794.                              for the Internet of Things, Cisco Press,
[12] J.-D. Dong, A.-C. Cheng, D.-C. Juan, W.             Indianapolis, Indiana, 2017.
     Wei, M. Sun, DPP-Net: Device-Aware             [20] T. Erl, Service-Oriented Architecture:
     Progressive Search for Pareto-Optimal               Analysis and Design for Services and
     Neural Architectures, Lecture Notes in              Microservices, 2nd Edition, Prentice Hall,
     Computer Science (including subseries               Upper Saddle River, New Jersey, 2016.
     Lecture Notes in Artificial Intelligence and   [21] D. Xu, Z. Tian, R. Lai, X. Kong, Z. Tan,
     Lecture Notes in Bioinformatics), volume            W. Shi, Deep learning based emotion
     11215, pp. 540-555, 2018.                           analysis of microblog texts, Information
[13] M. Wistuba, Deep learning architecture              Fusion, volume 64, pp. 1-11, 2020.
     search by neuro-cell-based evolution with      [22] U. Ozkaya, F. Melgani, M. Belete Bejiga,
     function-preserving mutations, Lecture              L. Seyfi, M. Donelli, GPR B scan image
     Notes in Computer Science (including                analysis with deep learning methods,
     subseries Lecture Notes in Artificial               Measurement: Journal of the International
     Intelligence and Lecture Notes in                   Measurement Confederation, volume 165,
     Bioinformatics), volume 11052, pp. 243-             2020.
     258, 2019.                                     [23] A. Dutta, T. Batabyal, M. Basu, S. T.
[14] V. Mikhailov, A. Spesivtsev, V.                     Acton, An efficient convolutional neural
     Sobolevsky, N. Kartashev, Multi-Model               network for coronary heart disease
     Estimation of the Dynamics of Plant                 prediction,    Expert     Systems      with
     Community Phytomass, The 13th IEEE                  Applications, volume 159, 2020.

                                                                                             105
[24] M. Sewak, M. R. Karim, P. Pujari,
     Practical convolutional neural networks:
     implement advanced deep learning models
     using     Python,     Packt    Publishing,
     Birmingham, UK, 2018.
[25] L. A. Gladkov, V. V. Kureichik, V. M.
     Kureichik, Genetic algorithms: a textbook,
     2nd Edition, Fizmatlit, Moscow, Russia,
     2006.
[26] T.     Ziade,    Python     Microservices
     Development,        Packt      Publishing,
     Birmingham, UK, 2017.
[27] G. C. Hillar, Internet of Things with
     Python, Packt Publishing, Birmingham,
     UK, 2016.
[28] D. B. Tuomanen, Hands-On GPU
     Programming with Python and CUDA:
     Explore      high-performance      parallel
     computing with CUDA, Packt Publishing,
     Birmingham, UK, 2018.
[29] J. Han, B. Sharma, Learn CUDA
     Programming: A beginner's guide to GPU
     programming and parallel computing with
     CUDA 10.x and C/C++, Packt Publishing,
     Birmingham, UK, 2019.
[30] B. Vaidya, Hands-On GPU-Accelerated
     Computer Vision with OpenCV and
     CUDA:       Effective    techniques    for
     processing complex image data in real
     time using GPUs, Packt Publishing,
     Birmingham, UK, 2019.
[31] K. He, G. Gkioxari, P Dollar, R. Girshick,
     Mask R-CNN, Proceedings of the IEEE
     International Conference on Computer
     Vision, volume 2017, pp. 2980-2988,
     2017.


                                                   106