=Paper= {{Paper |id=Vol-3041/202-206-paper-37 |storemode=property |title=The Deep Distributed Learning Implementations with a Large Dataset on the Desktop Grid |pdfUrl=https://ceur-ws.org/Vol-3041/202-206-paper-37.pdf |volume=Vol-3041 |authors=Ilya Kurochkin }} ==The Deep Distributed Learning Implementations with a Large Dataset on the Desktop Grid== https://ceur-ws.org/Vol-3041/202-206-paper-37.pdf
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



THE DEEP DISTRIBUTED LEARNING IMPLEMENTATIONS
   WITH A LARGE DATASET ON THE DESKTOP GRID
                                          I.I. Kurochkin1,a
          1
              Institute for Information Transmission Problems of Russian Academy of Sciences,
                          Bolshoy Karetny per. 19, build.1, Moscow, 127051, Russia

                                      E-mail: a qurochkin@gmail.com


Machine learning methods and in particular deep neural networks are often used to solve the problem
of image classification. There is a trend towards an increase in training data and an increase in the size
of neural network architectures. The process of training a deep neural network with millions of
parameters can take thousands of hours on modern computing devices. Distributed computing systems
can be used to reduce training time. The wide scalability of grid systems and the ease of connecting
new computational nodes can significantly reduce the training time for deep neural networks. But at
the same time, you should take into account the peculiarities of data exchange between the nodes of
the desktop grid system. Methods of organizing distributed deep learning are proposed using the
example of the image classification problem. The technique of random formation of a local dataset is
proposed. The results of using synchronous and asynchronous approaches to distributed learning are
shown.

Keywords: deep neural networks, distributed computing systems, desktop grid, large dataset,
image classification, distributed deep learning



                                                                                             Ilya Kurochkin



                                                               Copyright © 2021 for this paper by its authors.
                      Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                     202
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021




1. Introduction
        Machine learning methods and deep neural networks are quite relevant and effective for
solving various problems of image classification in applied problems [1]. Training deep neural
networks is computationally hard. Deep learning with large datasets can take weeks or even months on
modern personal computers or servers. With the help of modern video cards (GPU), training time can
be significantly reduced compared to using only the CPU. To minimize the training time for deep
neural networks, high-performance computing systems are used. Unfortunately, not all researchers
have access to supercomputers. As a result, they use either reduced datasets or simplified deep neural
network architectures.
        Therefore, the use of distributed computing systems is a logical step to reduce training time.
Distributed computing systems have a number of features [2]: heterogeneity of computing nodes;
possible failure of nodes and communication channels; occurrence of errors and delays in calculations
and data transmission; autonomy of calculations at the nodes and the difficulty of coordination.
         Some types of distributed computing systems have a control node (server). Such centralized
distributed systems include desktop grid systems and mobile grid systems. There are various software
for organizing such grid systems, for example: BOINC [3], HTCondor, Grid Engine and etc.
         BOINC is one of the popular platforms for organizing distributed computing and voluntary
distributed computing. On the basis of this platform, more than 100 projects have already been
deployed to solve various scientific problems. BOINC consists of a client and a server. The computing
node of the desktop grid (BOINC client) periodically requests new tasks on the server and sends the
results [3]. The server part is deployed under the Linux operating system and consists of several
separate applications (daemons). The BOINC backend has a shared MySQL database or a MariaDB
distributed database. These daemons can be run asynchronously. Main BOINC daemons: work
generator, scheduler, feeder, transitioner, validator, assimilator, file deleter and database purger. If
necessary, each of their daemons can be modified or changed. For example, the validator's algorithm
can be changed in validator-daemon to check the correctness of the results obtained.
         Data exchange between the nodes of the desktop grid, bypassing the server, is not expected.
This imposes additional restrictions on the formation of input data for subtasks and scheduling the
distribution of tasks among computational nodes. It is necessary to adapt the deep learning for
computing on a desktop grid system. The separation of one computational learning task into many
autonomous subtasks should occur both between computational nodes (parallel execution) and over
time (sequential execution) [4]. With time separation, coordination between nodes becomes possible
via the server.
         Deep neural networks can be used not only to determine the presence of objects in images, but
also to classify texture images. In images where there are no objects with clear contours or the number
of objects is so large that it is difficult and meaningless to select all the objects separately. In these
cases, it is best to use all the information in the image and define it as a texture. This approach is often
used for multiple images obtained using optical microscopy techniques.


2. Big datasets
        One of the main goals in preparation for training deep neural networks is the dataset creation.
For the problem of image classification, a dataset is a set of images of the same size. In this case, only
one class value can be assigned to one image.
        There are reference datasets with texture images. They can be used to check the adequacy of
the chosen architecture of a deep neural network. The reference classification problem can be used to
test scheduling systems and the quality of deep learning in a desktop grid system. For example, the
following datasets can be cited: Kylberg Texture Dataset (28 classes; image size: 576х576 pixels) [5],
Brodatz (112 classes; image size: 512x512 pixels), KTH-TIPS2 (11 classes; image size: 200x200

                                                    203
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



pixels). The image size in these datasets is small and does not exceed 600x600 pixels. This is due to
the size of the input layer on popular deep neural network architectures. Such architectures for solving
problems of classification of texture images include: VGG16, AlexNet, T-CNN [6], InceptionV3,
ResNet50.
         As a rule, when obtaining images by various microscopy methods, the size of the images is
significantly higher, for example: 2272x1704 or 1920x1080 [7]. This means that the original image
should be divided into fragments. At the same time, it should be taken into account that important
information can get to the border of the fragment. This situation can be corrected by the method of
splitting into fragments, where fragments from the original image will be taken with an overlap of
30% -50% (Figure 1).




                             Figure 11. Dividing the image into fragments

       As a result, the number of images for the dataset increases significantly. In addition, it should
be taken into account that the orientation of the images obtained using microscopic methods is not
known. This means that you should place not only fragments of the original images, but also
fragments of images rotated at different angles. As a result, from one original image of size
2272x1704, you get: 192 fragments, taking into account 50% overlap in each coordinate, and 6972
fragments, taking into account the rotation angle from 0 to 355 degrees with a step of 5 degrees.
          As a result of augmentation, a dataset of 4,559,688 fragments is obtained from 654 source
images. The number of rotation angles during augmentation may be less, but the size of the dataset
will still be large. On disk, such a dataset will occupy about 65 GB.


3. Distributed deep learning
        To train most architectures of convolutional neural networks, the SGD (Stochastic Gradient
Descent) method and its variations are used, for example: ISGD, Adagrad, Adam [4]. Several
approaches are used to distribute train the same deep neural networks. First of all, distinguish between
the approaches of partitioning by model and partitioning by data. This paper assumes that the training
model will be the same on all nodes of the desktop grid. The use of a large dataset does not make it
possible to transfer it to all nodes of the desktop grid, which makes the data partitioning approach the
only possible one. In accordance with this approach, the results of training local models will be
aggregated on the server. The process of training a local model on a computational node consists of
sequential iterations. The input data for each iteration will be the current gradients from the aggregated
global model on the server. According to the methods of aggregating local gradients, a synchronous
and asynchronous approach to learning is distinguished.
        In the synchronous approach (all reduce), each iteration ends up collecting all the results of the
local models. In the case of a heterogeneous distributed computing system, fast nodes will wait for
slow nodes to finish. After the completion of the iteration, the local results are aggregated and a global
gradients are formed. This set of gradients will be the input data for the next iteration of the local
training models. The synchronous approach provides a relatively fast learning rate. The disadvantages
of the synchronous approach include the irrational use of computing resources – small values of



                                                   204
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



utilization of computing nodes. This disadvantage can be partially compensated for when the results
are collected only from a certain percentage of nodes (for example, 80% [8]).
        In contrast to the synchronous approach, in the asynchronous approach there is no
synchronization when aggregating the results of local models. The global model gradients on the
server are updated asynchronously. But because of this, there is an effect of "aging of gradients",
which significantly slows down learning process. There are modifications of asynchronous methods,
including gradient sparsification [9].
        Despite significant differences in the synchronous and asynchronous approaches, both of them
can be implemented on the desktop grid. With any approach, the problem of a large dataset remains.
Its location on the BOINC server will make it a bottleneck when transferring data and results. To
unload the BOINC server, it is necessary to place the dataset on several FTP servers (Figure 2). Then
each node has the opportunity to download data from one of the FTP servers, and at the same time it
will not waste the resources of the main server of the grid system.


                      FTP-servers
                      with dataset
                                                             Im
                                                  Im




                                                                ag
                                                     ag




                                                                 s e
                                                      s e




                                     Imag
                                          e   s             Internet     Ima
                                                                            g   es


                                                                       Tas
                                             k                            k
                                          Tas


                   BOINC-server                                               Computational node


                                       Figure 12. Data exchange schema

         Taking into account the features of the desktop grid system, a logical step was the formation
of a random local dataset for training the neural network at one iteration. The size of the local dataset
can vary, and the presence of a large number of nodes in the desktop grid will allow you to cover the
entire dataset. However, the dataset consists of fragments of the original images, therefore, when
dividing into training and test samples, a restriction should be introduced. So for the formation of a
local test sample, fragments of images that are currently participating in the training sample cannot be
used. The balance requirement for all classes for the local dataset was taken into account.

4. Results
        Experiments on distributed deep learning for solving the image classification problem were
carried out. Synchronous (Table 1) and asynchronous (Table 2) approaches have been implemented.
SGD algorithm and its modification DGS (Deep Gradient Compression) were implemented.
Distributed learning implementations varied significantly among themselves. Various datasets were
used: for the synchronous approach, a combined dataset was used (12 classes from Kylberg Texture
Dataset + 3 class from selfmade big dataset [8]), and for the asynchronous approach, one of the
CIFAR reference datasets was used.




                                                               205
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



                                                     Table 4. Results with synchronous distribute learning
  Algorithm            Number of         Number of         Size of local   Size of local test      Accuracy
                       iterations          nodes             train set            set
synSGD            18                 5                  6 000              2 700                0.792

                                Table 5. Results with asynchronous distribute learning (reference dataset)
  Algorithm            Number of         Number of         Size of local   Size of local test      Accuracy
                       iterations          nodes             train set            set
SGD               50                 6                  50 000             10 000               0.939
DGS               50                 6                  50 000             10 000               0.926

        Despite the small number of nodes in test grid systems, the training results allow us to
conclude that it is possible to conduct distributed deep learning on desktop grid systems. The overhead
costs for the local datasets creation have been moved to the computational nodes side.

5. Conclusion
        The implementation of synchronous and asynchronous approaches of distributed deep learning
showed the possibility of using desktop grid systems to solve image classification problem. Placing a
large dataset on independent FTP-servers relieved the load on the desktop grid server. The proposed
method of forming local datasets will allow for a desktop grid with a large number of nodes to cover a
dataset of a big size.

6. Acknowledgement
       The reported study was funded by RFBR according to the research projects No. 18-29-03264
and No. 19-07-00802
References
[1] Y. LeCun, Y. Bengio, G. Hinton. “Deep learning” // Nature 521.– 2015.– pp. 436–444.
[2] Ian Foster, Carl Kesselman. “Тhe Grid: Blueprint for a New Computing Infrastructure”, SF.:
    Morgan Kaufmann Publishers, 1998.
[3] Anderson, D.P. “BOINC: A Platform for Volunteer Computing.” // Journal of Grid Computing ,
    2020. DOI: 10.1007/s10723-019-09497-9.
[4] Ben-Nun, Tal, and Torsten Hoefler. "Demystifying parallel and distributed deep learning: An in-
    depth concurrency analysis." ACM Computing Surveys (CSUR) 52(4), 2019. pp.1-43.
[5] Kylberg, Gustaf. Kylberg Texture Dataset v. 1.0. Centre for Image Analysis, Swedish University
    of Agricultural Sciences and Uppsala University, 2011.
[6] Andrearczyk, Vincent, and Paul F. Whelan. "Using filter banks in convolutional neural networks
    for texture classification." Pattern Recognition Letters 84 (2016): pp. 63-69.
[7] Kolosova, Olga Yu, et al. "Cryostructuring of polymeric systems. 58. Influence of the H2N-
    (CH2) n-COOH–type amino acid additives on formation, properties, microstructure and drug
    release behaviour of poly (vinyl alcohol) cryogels." Reactive and Functional Polymers, Vol. 167,
    2021. 105010. DOI: 10.1016/j.reactfunctpolym.2021.105010
[8] Kurochkin I.I., Kostylev I.S. Solving the Problem of Texture Images Classification Using
    Synchronous Distributed Deep Learning on Desktop Grid Systems. In: Voevodin V., Sobolev S.
    (eds) Supercomputing. RuSCDays 2020. Communications in Computer and Information Science,
    Vol. 1331. Springer, Cham, 2020. DOI: 10.1007/978-3-030-64616-5_55
[9] Y. Lin, S. Han, H. Mao et al. “Deep Gradient Compression: Reducing the Communication
    Bandwidth for Distributed Training” In ICLR, arXiv:1712.01887, 2018.

                                                     206