The system of convolution neural networks automated training Vladislav A. Sobolevskiia a St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 14th line V.O., 39, St. Petersburg, 199178, Russia Abstract In this paper the research related to the creation of a program complex, which realizes the automated generation of service-programs for the artificial intelligence systems based on the convolution neural networks is presented. The presented program complex to accelerate and simplify the generation and training of convolutional neural networks. Keywords Machine learning, convolutional neural networks, service-oriented architecture, internet of things 1. Introduction This leads to the fact that the task of creating the systems of CNN generation automation In modern world the recognition process for one or the other spheres is becoming technologies of photo and video images are very relevant [4-6]. At the same time, the being implemented more intensively. The demand for a system suitable for solving typical development of this sphere became possible tasks from different spheres is becoming more due to the appearance of new convolution acute. There are many tasks of one class (for neural network (CNN) architectures and the example, the recognition of certain tree species modification of existing ones. The given type of in space images, landscape peculiarities, architecture turned out to be successful enough specific nature objects etc), the solving for solving the tasks of image analysis, principle of which has been already discovered segmentation and semantic recognition. The or they are being handled on the basis of an higher the CNN accuracy and capabilities are, individual CNN production [7-9] or not being the more complex CNN become. Some of the solved at all due to the lack of specialists. most successful and widespread CNN Additionally, a lot of CNNs are produced in architectures at the moment have a plenty of forms of program prototypes (for instance, heterogeneous layers [1-3]. This leads not only using MatLab) and such prototypes require to the increase of work quality, but to the improvement for implementing into the complication in creating and training such existing monitoring systems which are networks. designed at specific stacks of applied At the same time, the number of tasks that programming languages (C++, Java, Python can be solved using CNN rises. The given tasks etc). In its turn, this makes the further not always demand the application of the most development and the following implementation complex and foremost CNN architectures, but of prototypes more complicated. they are still quite difficult and regular users For solving the given tasks, the system of without any knowledge of deep learning convolution neural networks automated methods and their implementation skills would training was designed based on the service- not be able to create and adapt these networks oriented approach within the project presented correctly. It can be said that the quantity of such in this article. The approach of artificial neural tasks is growing faster than the number of networks automated generation is not new and professionals capable of solving them. there are some works upon this topic [10-13]. All these works point to the fact that the Models and Methods for Researching Information Systems in automation of machine learning models Transport, Dec. 11-12, St. Petersburg, Russia production process will allow to fasten the EMAIL: arguzd@yandex.ru (V. A. Sobolevskii) ORCID: 0000-0001-7685-4991 (V. A. Sobolevskii) process of developing program products for ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). solving a multitude of tasks. The system CEUR Workshop Proceedings (CEUR-WS.org) described in the article elaborates the idea of automation and has module extensible structure 100 which allows to add and combine trainable currently implemented on the basis of a genetic architectures, training algorithms, data algorithm) which was developed with an normalization, validation etc. Moreover, due to expectation of changeability. The other genetic algorithms, the given system is capable algorithms of solution search can be used of automated CNN generating and training instead of it and there is no need to make which allows non-professionals who are not significant modifications to other modules for aware of neural networks setting details to use the use of these algorithms. it for solving typical tasks. The work result of This approach is based on the principles of this system is not only a built architecture, but transparency and scalability which allows to a generated executable file with additional expand the program product functionality by REST and SOAP wrappings that without any adding new modules, not by modifying the preliminary preparations will allow to start the existing ones. produced CNN as a service and apply to it from It is obvious that the given approach would other systems and program complexes. This not allow to implement the automated training presents the system as a tool for a quick and of all possible CNN architectures. However, the effortless solving of simple typical tasks by generation and training processes of typical regular users. architectures have a precise and consecutive By present time, the designed system had algorithm. Having implemented the given already been used for generating simple deep algorithm in the program complex it would be neural networks that were introduced into third- possible to solve the task of typical neural party program products for solving specific network solutions streaming (conveyor) applied tasks [14-15]. In suggested article the implementation as the main one. capabilities of the given program complex The service-oriented approach in the which were improved using CNN automated developed program complex occurs in the fact training are described. that all modules should not be necessarily installed to one and the same personal computer 2. The service-oriented approach (PC). Modules can be distributed between different PCs or placed in cloud storages. Thus, in neural networks automation the given program complex can be generation implemented in the form of a distributed system that blends into the SOA paradigm completely. The service-oriented architecture (SOA) of At the program product operation result applications implies a module approach to the level SOA is maintained by the implementation program application development [16]. In the of autonomous service containing CNN trained considered situation the given paradigm is to solve a specific task. This service is cross- implemented at several levels. platformed and it can be launched without any At the level of the program complex itself prior installing and additional software setting SOA maintains the modularity and on the basis of some operation systems (which interchangeability of CNN generation and is possible due to the cross-platform of the training algorithms. Thus, the whole process of given modules implementation language - automated generation and training is divided Python [17]). Respectively, such module can be into some consecutively evoked program used in the systems maintaining both SOA modules: paradigm and the Internet of Things (IoT) via • the input data normalization module; interfaces REST and SOAP [18-20]. • the generation module of chosen CNN or the module of pre-trained CNN 3. The algorithm of convolution architecture initialization; • the CNN training module (including neural networks automated verification and validation submodules). training Each of these modules is presented in several realization variants (for various CNN The difficulty in CNN production and architectures) and certain realizations are training lies in the fact that they are being chosen depending on the requirements. In trained only having a marked training dataset addition, these modules are evoked from an which describes the class of recognizable external automated training module (it is objects. The recognition of different object 101 classes requires various CNN architectures and supposed to be used for different classes of their parameter settings. Due to the CNN tasks. Although the use of specific complexity this task becomes very resource- algorithms would have fastened the intensive. This is one of the CNN key operation speed for some task classes, but it restrictions of CNN trained with a teacher. Now inevitably would have slowed the operation the approach which consists in multitasking speed for other classes. The inaccuracy CNN creation for different science fields that estimation calculated using CNN target can solve the whole class of tasks is often used parameter value relatively to the real value [21-23]. The given approach has some of a test dataset (formula 2) lies in the basis advantages, particularly the higher accuracy for of the fitness function selected objects. However, the development of 1 (2) each of these CNNs is more resource-intensive 𝑓𝑖 = , ∑𝑀 2 and demands participation of specialists able to √ 𝑗=1(𝜀𝑖𝑗 − 𝜔𝑗 ) project the architectures of such networks. The 𝑋 alternative solution described in this article is where εij is the output value of a target the automated training of models. This kind of parameter, which was forecast by i-network solution implies simultaneous training of some in response to an input test j-vector, ωj is the CNNs based on prepared information dataset real value of a test dataset in response to an for the following situational choice of the most input test j-vector, X is the quantity of test precise model which leads to the necessity to vectors. solve the task of models parametrical The result of a calculation according to the adaptation quality assessment. At the same given formula is a "fitness level" value, time, the formation task of training dataset in which is inversely proportional to the mean common case does not require special squared error of i-CNN at the test dataset. As knowledge [24]. The automated system (AS) a result of selection, M is selected to the described in the article is relevant in such cases current generation out of (M + Nd + Nr) when the development of a wholesome CNN CNN with the maximum pi value (choice able to solve the task in the most accurate way probability of i-CNN). is unprofitable. Using this system, it is possible 4. For all CNN the mean squared error of to create CNN able to solve the assigned task the target parameter value calculated by cheaper and faster with an accuracy specified them relatively to the real test dataset value by user. is computed. If at least one CNN shows the The algorithm of CNN selection was mean squared error lower than the set value, implemented in the following way: the cycle stops. The CNN with the lowest 1. In the first parent population a fixed mean squared error is treated as a "winner". CNN number (M) is generated with Otherwise, the return to point 2 takes place. randomly set parameters. In addition, the population of each iteration 2. Nd of new CNNs is generated, the is stored separately. If the population of a parameters of which are selected randomly current iteration coincides completely with a out of two occasionally chosen parent previous population, it means that during all CNNs, and also Nr of CNN, the parameters iteration the CNN configuration with the of which are set completely randomly most accuracy has not been found and the considering the given value ranges for these unconditional transition to step 5 is carried parameters. out. 3. Further, the CNN selection is 5. If a CNN with the mean squared error performed using the roulette method lower than the set value is not found, the (formula 1) [25] cycle launches from the step 1 with a new 𝑓𝑖 (1) parent population, for which new random 𝑝𝑖 = 𝑁 , ∑𝑗=1 𝑓𝑗 parameter values are set. If the solution is where pi is the choice probability of i-CNN, not found after I iteration, the task is fi is the value of fitness function for i-CNN, declared to be unsolvable with specified N is the quantity of CNN in population. The roulette method was chosen as the most universal one, because the algorithm is 102 settings and the output from the algorithm is [31]. By default, MRCNN is already capable of performed. recognizing fundamentally different object classes, from automobiles to animals. That is 4. Technologies used in the why, by proper additional training, it would be able to recognize a wide range of objects that developed program complex are not included into COCO dataset. The program complex was tested on the This program complex is developed in calculation task of the amount of deer in a herd programming language Python, the main assets from air photography. Besides the fact that deer of which relate to its cross-platform, do not belong to the COCO dataset and extensibility and large amount of sided program MRCNN is not able to distinguish them by libraries used for solving specified tasks. The default from the range of other creatures (sheep, suggested programming language was chosen gazelles, cows, horses), the specificity of this because at the moment it happens to be the main task has something to do with the fact that solution for deep learning systems development photos are made from various angles and and also because it allows to realize SOA distances, at different landscapes and during all paradigm easily [26, 27]. Keras and seasons, which result in the fact that deer can be TensorFlow libraries are used for training shot under different angles, in various scales algorithms implementation. and can have diverse colouring. What is more, Such stack of technologies is explained by due to the size of herds, deer often cover one the fact that the program does not face the another in photos. This leads to the fact that the implementation task of untypical solutions. On described task in non-trivial and the application the contrary, the quick realization of already of CNN trained at common amount of data is known architectures is required. The use of impossible. In figure 1 the recognition results of already developed, tested and optimized one out of two images using MRCNN without libraries satisfies the set task completely. At the additional training are shown. same time, the key requirements are extensibility and scalability. Respectively, the program complex realization on the basis of a constantly extending program platform will allow to add new CNN architectures and their work tools at the cost of one program interface. The cross-platform of the described stack and the support of SOA paradigm will allow to scale the program complex to different hardware. It is important to mention separately that CUDA SDK is also included in the used program libraries, which allows to exploit hardware acceleration during artificial neural network training using NVidia video cards [28, Figure 1: The deer recognition and calculation 29]. The use of this technology makes the using basic MRCNN trained at COCO dataset process of CNN training significantly faster [30]. It can be noted that there are plenty of false negative errors evoked by the COCO dataset 5. The approbation of automated specificity, in which there is an insufficient convolution neural network number of images with similar scaling of objects. To get rid of false operations, it is training program complex required to train the network using images marked for the specified task. That is why For approbation of the program complex MRCNN was additionally trained using the prototype performing additional training of CNN automated training system prototype. The Mask R-CNN (MRCNN) CNN architecture training was conducted in the automated mode trained on COCO dataset was developed. The based on the training dataset specified by a user. given configuration was chosen because of the The following parameters of a training process balance between universality and accuracy were varied in the prototype: 103 • the quantity of training epochs; is used by specialists in machine learning, the • the quantity of training steps in each current interface is not adapted for using by epoch; regular users. Because of this, the accessibility • the speed of training; for the wide user audience which is one of the • the threshold of detection skipping. key tasks facing the program complex is not The CNN declared to be the winner by a being solved at the moment. system was trained on 3 epochs, with 53 In addition, because of the high- training steps in each, 0,0058 training speed and performance requirements, during the given 0,86 threshold of detection skipping. The program product functioning the program described network for the same image complex transition to highly productive servers recognized correctly 58 out of 93 deer and did is needed for the commercial use. The not perform any false negative error (figure 2). calculation specifity during CNN training puts a range of requirements to the hardware and the commercial use implies the parallel training of several models that can load the system significantly. Despite the calculation parallelism put in the program complex architecture using SOA, it is demanded to perform the additional research and stress-tests to outline the specific requirements to the hardware. Acknowledgements This work was supported by the RFBR grant Figure 2: The deer recognition and calculation №19-37-90112 and the budgetary theme 0073- using additionally trained MRCNN 2019-0004. Of course, the trained CNN did not reach the maximum possible accuracy, but it can be References improved in the future. What is more, the recognition accuracy may be increased by using the other CNN architectures. But the prototype [1] A. Krizhevsky, I. Sutskever, G. E. Hinton, testing can be considered successful because ImageNet classification with deep program and service coverages were generated convolutional neural networks, for additionally trained MRCNN which will Communications of the ACM (2017), allow to use the received CNN for solving the volume 60, issue 6, pp. 84 – 90. set task right away. Due to the unified interface, [2] K. Simonyan, A. Zisserman, Very deep it will be possible to perform the convolutional networks for large-scale implementation of the most accurate CNNs in image recognition, 3rd International the future. Even if in the following versions a Conference on Learning Representations different CNN architecture is used, the program (2015). and service coverage interface will not change, [3] M. D. Zeiler, R. Fergus, Visualizing and and it will not be required to introduce changes understanding convolutional networks, 3th into the programs at the client side. European Conference on Computer Vision (2014), volume 8689, issue 1, pp. 818 – 6. Conclusion 833. [4] Z. Geng, Y. Wang, Automated design of a convolutional neural network with multi- Nowadays, the program complex is at its scale filters for cost-efficient seismic data prototype stage and it is used for the classification, Nature Communications, development of some off-site applications. First volume 11, issue 1, 2020. of all, to start the full operation the [5] M. Witsuba, A. Rawat, T. Pedapati, improvement of user application interface is Automation of deep learning, Proceedings needed. As at the prototyping step the product 104 of the 2020 International Conference on International Conference Application of Multimedia Retrieval (2020), pp. 5-6. Information and Communication [6] B. Baker, O. Gupta, N. Naik, R. Raskar, Technologies, Baku, Azerbaijan, pp. 324 – Designing neural network architectures 328, 2019. using reinforcement learning, 5th [15] V. A. Zelentsov, A. M. Alabyan, I. N. International Conference on Learning Krylenko, I. Yu. Pimanov, M. R. Representations (2017). Ponomarenko, S. A. Potryasaev, A. E. [7] Ateeq-ur-Rauf, A. R. Ghumman, S. Semenov, V. A. Sobolevskii, B. V. Ahmad, H. N. Hashmi, Performance Sokolov, R. M. Yusupov, A Model- assessment of artificial neural networks Oriented System for Operational and support vector regression models for Forecasting of River Floods, Herald of the stream flow predictions, Environmental Russian Academy of Sciences, volume 89, Monitoring and Assessment, volume 190, issue 4, pp. 405 – 417, 2019. doi: issue 12, article 704, 2018. 10.1134/S1019331619040130. [8] Z. Alizadeh, J. Yazdi, J. H. Kim, A. K. Al- [16] M. Bell, Introduction to Service-Oriented Shamiri, Assessment of machine learning Modeling, in Service-Oriented Modeling: techniques for monthly flow prediction. Service Analysis, Design and Water (Switzerland), volume 10, issue 11, Architecture, Wiley & Sons, New York, article 1676, 2018. NY, 2008. [9] J. Lantrip, M. Griffin, A. Aly, Results of [17] V. John, Guttag Introduction to near-term forecasting of surface water Computation and Programming Using supplies, Proceedings of the 2005 World Python: With Application to Water and Environmental Resources Understanding, 2nd Edition, MIT Press, Congress, Anchorage, Alaska, US, 2005. Cambridge, Massachusetts, 2016. doi: 10.1061/40792(173)447. [18] Y. Mesmoudi, M. Lamnaour, Y. E. L. [10] I. Bello, B. Zoph, V. Vasudevan, Q. V. Le, Khamlichi, A. Tahiri, A. Touhafi, A. Neural optimizer search with Braeken, Design and implementation of a Reinforcement learning, 34th International smart gateway for IoT applications using Conference on Machine Learning (2017), heterogeneous smart objects, 4th volume 1, pp. 712-721. International Conference on Cloud [11] H. Cai, T. Chen, W. Zhang, Y. Yu, J. Computing Technologies and Wang, Efficient architecture search by Applications, Cloudtech, 2018. network transformation, 32nd AAAI [19] D. Hanes, IoT Fundamentals: Networking Conference on Artificial Intelligence Technologies, Protocols, and Use Cases (2018), pp. 2787-2794. for the Internet of Things, Cisco Press, [12] J.-D. Dong, A.-C. Cheng, D.-C. Juan, W. Indianapolis, Indiana, 2017. Wei, M. Sun, DPP-Net: Device-Aware [20] T. Erl, Service-Oriented Architecture: Progressive Search for Pareto-Optimal Analysis and Design for Services and Neural Architectures, Lecture Notes in Microservices, 2nd Edition, Prentice Hall, Computer Science (including subseries Upper Saddle River, New Jersey, 2016. Lecture Notes in Artificial Intelligence and [21] D. Xu, Z. Tian, R. Lai, X. Kong, Z. Tan, Lecture Notes in Bioinformatics), volume W. Shi, Deep learning based emotion 11215, pp. 540-555, 2018. analysis of microblog texts, Information [13] M. Wistuba, Deep learning architecture Fusion, volume 64, pp. 1-11, 2020. search by neuro-cell-based evolution with [22] U. Ozkaya, F. Melgani, M. Belete Bejiga, function-preserving mutations, Lecture L. Seyfi, M. Donelli, GPR B scan image Notes in Computer Science (including analysis with deep learning methods, subseries Lecture Notes in Artificial Measurement: Journal of the International Intelligence and Lecture Notes in Measurement Confederation, volume 165, Bioinformatics), volume 11052, pp. 243- 2020. 258, 2019. [23] A. Dutta, T. Batabyal, M. Basu, S. T. [14] V. Mikhailov, A. Spesivtsev, V. Acton, An efficient convolutional neural Sobolevsky, N. Kartashev, Multi-Model network for coronary heart disease Estimation of the Dynamics of Plant prediction, Expert Systems with Community Phytomass, The 13th IEEE Applications, volume 159, 2020. 105 [24] M. Sewak, M. R. Karim, P. Pujari, Practical convolutional neural networks: implement advanced deep learning models using Python, Packt Publishing, Birmingham, UK, 2018. [25] L. A. Gladkov, V. V. Kureichik, V. M. Kureichik, Genetic algorithms: a textbook, 2nd Edition, Fizmatlit, Moscow, Russia, 2006. [26] T. Ziade, Python Microservices Development, Packt Publishing, Birmingham, UK, 2017. [27] G. C. Hillar, Internet of Things with Python, Packt Publishing, Birmingham, UK, 2016. [28] D. B. Tuomanen, Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA, Packt Publishing, Birmingham, UK, 2018. [29] J. Han, B. Sharma, Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++, Packt Publishing, Birmingham, UK, 2019. [30] B. Vaidya, Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA: Effective techniques for processing complex image data in real time using GPUs, Packt Publishing, Birmingham, UK, 2019. [31] K. He, G. Gkioxari, P Dollar, R. Girshick, Mask R-CNN, Proceedings of the IEEE International Conference on Computer Vision, volume 2017, pp. 2980-2988, 2017. 106