<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IDDM-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Classification of X-Ray Images of the Chest Using Convolutional Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lesia Mochurad</string-name>
          <email>lesia.i.mochurad@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Dereviannyi</string-name>
          <email>andriidereviannyi@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uliana Antoniv</string-name>
          <email>Uliana.s.antoniv@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Department</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Specialized Computer Systems, Lviv Polytechnic National University</institution>
          ,
          <addr-line>Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polytechnic National University</institution>
          ,
          <addr-line>Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>4</volume>
      <fpage>19</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>A proven way to detect various injuries: from fractures to heart failure, is an X-ray. However, because this examination method depends on the doctor's visual analysis, it can lead to misdiagnosis, that is, the case when the early stage of pneumonia will not be recognized and treatment will be ineffective. This study proposes using a convolutional neural network to classify chest X-rays to solve this problem. To do this, we analyzed the materials on the classification using neural networks for different areas of computer vision. In particular, convolutional neural networks for medical use are considered. The classification model of images on a database that included 112 thousand captions and 30 thousand unique patients is trained. High accuracy values of 0.93 and completeness of 0.99 models were obtained. An analysis of the literature on the convolutional neural networks was performed. Their shortcomings are taken into account, and a new optimization approach is proposed. The classification results were compared with a parallel approach on a GPU and a sequential on a CPU. The model trains on the GPU is 6.13 times faster than on the CPU based on the proposed algorithm. Computer vision, image classification model, parallelization, acceleration, GPU, CPU.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        When it comes to identifying images, it is straightforward for us humans to recognize and
distinguish the different features of the depicted objects. All because our brains are constantly
subconsciously training on the same set of data, we can easily distinguish between various entities. In
contrast, the computer looks at the world around it differently: it is an array of numerical values that
form the critical aspects of an image or video that it tries to recognize. The principle by which the
system interprets the picture is radically different from how people do it. To vision, a computer needs
image recognition algorithms to analyze and understand what is happening in an image or its
sequence. An excellent example is the identification of pedestrians and vehicles, which is possible due
to the preliminary categorization and sorting of millions of prints – data provided by users [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        Medicine is a clear favorite for areas that require a reliable image identification system while
generating a large amount of data on which you can train the same recognition [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the
biggest challenge in collecting medical data is practical analysis and processing for their further
use [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. There are many methods of organizing the data obtained. We will consider one of them,
namely the classification, because it is widely used in the medical field, for example, to detect the
Antoniv).
      </p>
      <p>2021 Copyright for this paper by its authors.</p>
      <p>
        The classification task is for a computer to analyze an image and assign it to an appropriate class,
usually setting a label to a particular image. For us, the classification of images is elementary, but it is
also a great example of the Moravec paradox – that simple for humans is difficult for artificial
intelligence [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Early image classification depended on pure pixels – the computer split the image provided into
individual pixels. The problem with this approach is that two photos of the same object may look
different. Different backgrounds, angles, poses, and many other factors made it difficult for computers
to recognize and categorize images. This problem is designed to correct deep learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The latter
involves the use of computer systems known as neural networks.
      </p>
      <p>Unfortunately, the task of classification is quite resource-intensive and when using weak
computing power or a consistent algorithm to obtain the result can take a long time. In such cases,
optimization or the possibility of using better resources is usually considered.</p>
      <p>The aim of this study is to propose a parallel algorithm and analyze the advantages obtained in
solving the problem of classifying X-ray images to detect pneumonia using the CPU and GPU.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>
        Classification of images in training often involves the use of convolutional neural networks
(CNN). The latter proved their power when AlexNet won the ImageNet competition by a wide margin
from other, more traditional neural networks. Since then, CNN has become one of the most promising
machine learning algorithms. It is widely used to solve problems involving large-scale datasets.
However, learning deep convolutional neural networks on large datasets is a highly intensive
computational task and requires much time to learn. That is why the algorithm is subjected to
parallelization, which reduces the load on one core, dividing the work between several [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7-10</xref>
        ]. Due to
this, the use of the algorithm does not require spending days or even weeks to learn it.
      </p>
      <p>
        There are two approaches: model parallelism, i.e., the model is divided between several computing
nodes and trained on the same data, and data parallelism, if the data is distributed on several nodes
and the same model is used for learning. Hybrid approaches using both parallelisms have also been
proposed. Examples of hybrid systems are papers [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In hybrid approaches, a small number
of nodes are grouped to teach the model using model parallelism. The data set is divided into groups
to be processed simultaneously using data parallelism. They use the master-managed model, and the
main task of the server is to update parameters centrally. The disadvantage of this approach is that
because all groups access the same server, which ensures their interaction, a delay is created, which
reduces performance.
      </p>
      <p>
        One of CNN's most popular learning algorithms is stochastic gradient descent (SGD). It was
demonstrated in the article [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The algorithm works iteratively: the model parameters are updated
until they become optimal. Due to the dependence of the data on the model parameters between any
two sequential iterations, parallel SGD can suffer due to the expensive interprocess cost of
communication [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Recently, researchers have made many efforts to improve the scaling of the
similar SGD algorithm [
        <xref ref-type="bibr" rid="ref15 ref16">15-16</xref>
        ]. Traditional synchronous SGD guarantees optimal parameter updates
due to low parallel efficiency, which is achieved through frequent synchronization. Asynchronous
SGD has been designed to address such performance vulnerabilities. This approach is quite popular,
as can be seen from [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. However, asynchronous SGD also has disadvantages. It requires
more iterations to match the same accuracy, so for many processes, this increases the learning time.
      </p>
      <p>
        In paper [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the accuracy of classification of different models based on chest images is
investigated. At the same time, the authors managed to achieve high accuracy – 96.81%. However, no
studies have been conducted to process large-scale input data on the time of execution of such
classifications, and, accordingly, no parallelization was performed.
      </p>
      <p>Another example of an article related to medicine and the parallelism of CNN is [20]. But the
method used in this work is a hybrid in which there is CNN and RNN.</p>
      <p>Therefore, in the analysis of publications, no work was found that would demonstrate the
parallelization of CNN on the CPU and GPU to classify pneumonia.</p>
      <p>Some authors argue that parallelism of large-scale data is harmful and leads to deterioration of
results; others say it is beneficial [21-23]. The absolute competitiveness of parallelization can show
today, as we can obtain large amounts of data in free access, allowing more free experiments. Hence,
it is not surprising that the literature remains ambiguous because all of the above methods have pros
and cons.</p>
    </sec>
    <sec id="sec-3">
      <title>Materials and Methods</title>
      <p>Deep convolutional neural networks are used in many tasks, such as image classification, work
with sound, etc. [24-27]. Although CNN was developed in 2012, it is gaining real popularity right
now. The main factors for CNN's success are the availability of large data sets and the high
performance of modern computer systems.</p>
      <p>For high results, this neural network requires large amounts of data. For example, one of the
earliest CNNs, LeNet-5 [28], was taught to recognize handwritten numbers using the MNIST dataset,
containing about 60 000 images in a training set and 10 000 images in a test set. The CIFAR-10
dataset consists of 60 000 32x32 color images, including 50 000 training images and 10 000 test
images. We took an NIH Chest X-rays database for our research, including 112 thousand
highresolution pictures of 30 thousand different patients [29].</p>
      <sec id="sec-3-1">
        <title>3.1. Database description</title>
        <p>The database, which is considered in this paper, contains images of chest X-ray examinations, one
of the most common medical examinations. One of the main obstacles to creating large data sets of
Xray images is the lack of resources to label many photos. Before releasing this dataset, Openi [30] was
the largest publicly available source of chest X-rays, with 4 143 images available.</p>
        <p>The NIH chest X-ray data set consists of 112 120 X-rays with disease labels of 30 805 unique
patients [29].</p>
        <p>Thus, the neural network's task is to divide all images into two classes: pneumonia and healthy,
based on notions of chest X-rays (see Figure 1).</p>
        <p>Since this paper will compare the proposed parallel approach with the standard use of CNN, we
first consider the operation of the convolutional neural network algorithm.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Cunvolutional Neural Network (CNN)</title>
        <p>In tightly coupled neural networks, neurons are divided into groups that form successive layers. In
them, each unit is connected to each neuron from neighboring layers. An example of this network is
shown below in Figure 2:</p>
        <p>This approach works well to solve a classification problem based on a limited set of defined
characteristics. But the situation is complicated when the data to be classified are images. We could
transmit the brightness of each pixel as separate units to the input of our dense network, but then for it
to work, it must contain tens or even hundreds of millions of neurons. One way to solve this problem
and reduce the network is to reduce the scale of the photo itself, but then we lose information.
Therefore, convolutional neural networks are used to solve this problem.</p>
        <p>To begin with, to analyze the operation of convolutional networks, you need to describe the data
structure with which they will work. For convenience, the image is stored in a 2d matrix, each of
which characterizes a particular image pixel. The image consists of three such matrices for the RGB
model, each of which corresponds to a specific channel - red, green, and blue. If the image is black
and white, then one matrix is used. Each of the digits is in the range from 0 to 255.</p>
        <p>The main element in the work of CNN is a matrix, which is a filter or core. To process it, we pass
our image matrix and convert it based on filter values. The following formula calculates the value of
the object map. The input of the image is denoted by f and the kernel by h. The indices of the rows
and columns of the result matrix are denoted by m and n, respectively.</p>
        <p>[ ,  ] = ( ∗ ℎ)[ ,  ] = ∑ ∑ ℎ[ ,  ] [
−  ,  −  ].</p>
        <p>After placing our filter over the selected pixels, we take each value and multiply it by the
corresponding value from the kernel. Summarize all the results and write in the appropriate place on
the map of the initial characteristics. The filter is above pixels that have tens and zeros. Let each pixel
multiply by the corresponding one in the filter and add all the results. We write down this result in a
matrix. We move the kernel and repeat the previous steps. In the end, we get a matrix with new data.
Because the image decreases with each subsequent iteration, the number of image convolutions is
limited. Also, following the movement of the filter, the influence of pixels located on the outskirts is
observed. They are much smaller than the center of the image. In this way, part of the information
present in the picture is lost.</p>
        <p>A frame is added to the image matrix, mainly filled with zeros, to solve this problem. Depending
we use fillings or not, we deal with two types of convolutions - original and same. Valid – the original
image is used. Same – uses a frame around the original image so that when convolving, the output
gives a matrix of the same size as the original image. For the second case, the width of the frame must
be equal to the following value:
 =
 −1
2
where  – filling, and  – size of the filter (usually this value is odd).</p>
        <p>One of the essential hyperparameters of the convolutional layer is the length of the step along
which the filter should move. For the CNN architecture, this parameter is vital. If you want to overlap
the receptive fields less or the spatial dimensions of the function map to perform the exercise, you
need to increase the step. The following function is used to calculate the size of the output matrix:
 
+ 2 − 

+ 1].
steps:
result.
and</p>
        <p>Convolution over volume – is an essential concept because it allows you to work with color images
and apply multiple filters within one layer. But keep in mind that the number of channels contained in
the filter and the images must match. To use various filters in the same image, we collapsed filters
separately, and the results are combined at the end. We can find the size of the tensor we obtain using
the following formula:
[ ,  ,   ] ∗ [ ,  ,   ] = [[  +2 −
+ 1] , [  +2 −</p>
        <p>+ 1] ,   ],


p is the fill used, s is the step used,   is the number of filters.
where  is the size of the image,  is the size of the filter,   is the number of channels in the image,</p>
        <p>Forward propagation consists of two steps. The first is the calculation of intermediate values of  ,
which we obtain by convolving the input data from the previous layer by the tensor 
and then
adding the offset  . The second is the application of the nonlinear activation function to our
intermediate value, which is denoted by the letter g. The following equations can demonstrate these
 [ ] =  [ ] ⋅  [ −1] +  [ ]
 [ ] =  [ ]( [ −1]).</p>
        <p>Now let's move on to the vital attributes in the complex layers. First, all neurons in convolutional
layer</p>
        <p>are interconnected. Second, some neurons have the same weight. This shows that the
convolution has reduced the parameters to be studied. It is also worth noting that one value from the
filter affects each element of the object map, which is crucial in backpropagation.</p>
        <p>Consider the algorithm of the network in reverse propagation. This algorithm aims to calculate the
derivatives and then use them to update the values of the parameters in a process called gradient
descent. We want to assess the impact of parameter changes on the resulting feature map and the final</p>
        <p>The problem of inverse propagation is to calculate the partial derivatives of cost functions:
which will be transferred to the previous layer. At the entrance we have 
  [ ] – which are derivatives related to the parameters of the current layer, as well as values
  [ ]
the intermediate value</p>
        <p>by applying the activation functions to the input tensor 
 ′( [ ]). According to the chain rule, the result of this operation will be used later.
  [ ] =
  [ ]. The first step is to obtain
Then the matrix operation is applied - full convolution. For this operation, we use a core that is

  [ ]

  [ −1]

  [ ] ∗
rotated 180 degrees. As a result, we have:
where 
previous layer.</p>
        <p>− filter, and 

  [ ] = ∑ ℎ
 =0 ∑ =0</p>
        <p>⋅
  [ ] [ ,  ],
  [ ] [ ,  ] is a scalar that belongs to the partial derivative obtained from the</p>
        <p>In addition to convolution layers, CNN very often uses so-called aggregation layers. They are
mainly used to reduce the size of the tensor and speed up calculations. These layers are superficial –
we need to divide our image into different regions and then perform a specific operation for each of
these parts. For example, for the highest layer of the pool, we select the maximum value from each
area and place it in the appropriate place at the output. As in the case of the convolutional layer, we
have two hyperparameters available – filter size and pitch. Last but not least, if you are merging for a
multi-channel image, the merging for each channel should be done separately.</p>
        <p>As we can see, CNN learning can be pretty slow due to the number of calculations required for
each iteration. Therefore, to speed up the work, it would be advisable to carry out parallel computing.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Parrallelization on the GPU</title>
        <p>In parallel calculations, the problem is divided into independent smaller subtasks that run
simultaneously. The results obtained are recombined or synchronized to formulate the impact of the
initial count. The number of tasks into which computations can be divided depends on the number of
cores. Which in turn are contained in the equipment. Unlike CPUs, which process operations
sequentially, computationally tricky tasks on a GPU are distributed among thousands of processors,
allowing you to perform calculations much faster [31].</p>
        <p>Keras and TensorFlow technologies will be used to parallelize the algorithm. These technologies
use the capabilities of parallel processing of the graphics processor. Due to the CUDA, computational
tasks are performed sequentially on the CPU due to the C ++ software interface or in parallel on the
GPU.</p>
        <p>As mentioned above, a filter matrix with dimension n*n is used for a classical convolutional neural
network. This process can be seen in Figure 3. This matrix is multiplied by a matrix of image pixels.
These calculations are performed sequentially and independently of each other, which means that no
analysis depends on the results of any other count. This lets us see that the convolution operation can
be accelerated using a parallel programming approach and GPUs. This operation takes the most time
in this algorithm and is the most voluminous because the image size is 500x500.</p>
        <p>We can conclude that it is better to perform parallel calculations using graphics processors. To test
this hypothesis, we will conduct experiments and test the results and learning time of the
convolutional neural network sequentially on the CPU and in parallel on the graphics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Intel (R) Core (TM) i7-10750H CPU with the following characteristics is used for numerical
experiments:</p>
      <p>After training, the following results were obtained:
0.1751</p>
      <p>0.14</p>
      <p>As shown from Table 3 and Figure 3, the losses in the training and validation samples decrease at
the same rate in both the GPU and CPU versions. This means that according to the first indicator, the
loss function, there is no difference in the models to use.</p>
      <p>The second indicator that can be used to monitor the change in the model is its accuracy. The task
is to maximize it, ideally to reduce it to 1, i.e., 100% accuracy.</p>
      <p>As shown from Table 4 and Figure 6, the accuracy of the training and validation samples increases
at the same rate in both the GPU and CPU versions. This means that for the second indicator,
accuracy, there is no difference in the models to use.</p>
      <p>Given the same rate of convergence of the model on both the CPU and GPU, an indicator that
becomes important is the time spent on training one era, which is equal to 263 steps.
Epochs</p>
      <p>6
GPU</p>
      <p>CPU</p>
      <p>From Table 5, Figure 7-8, the average time spent on training 1 step with a GPU - 0.347 sec/step;
CPU – 2 sec/step, the average time spent on training 1 era with GPU - 91.3 sec/epoch, and CPU
559.6 sec/epoch. This means that the model trains on the GPU 6.13 times faster than on the CPU.</p>
      <p>One of the tasks is to train the model to classify chest X-rays to detect pneumonia. To prepare the
model for 10 epochs using a GPU, because as shown above, it gives the same result 6.13 times faster.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion of results</title>
      <p>The results presented in the previous section showed the advantage of parallel computing on the
GPU overusing a serial algorithm on the CPU. Using the GPU for training took an average of 91.3
sec/epoch, and the CPU was 559.6 sec/epoch, which is 6.13 times faster for each period than the time
it took the CPU to perform the same calculations. At the same time, the losses in the training and
validation samples decrease at the same rate, and the accuracy in both instances increases at the same
rate regardless of which processor was used, i.e., the model was equally well trained on both CPU and
GPU.</p>
      <p>It can also be concluded that the time required to move data from the CPU to the GPU,
recombination or synchronization of data obtained from parallel computations to form the result of the
initial calculation is not a significant threat to the speed of the program when processing large
amounts of data. Fully compensated during the counts - so much faster they run on the GPU than on
the CPU.</p>
      <p>Figure 9 shows the result of the classification of some cases taken from the dataset. Regarding the
work of the model itself, it is worth noting once again its high metrics of accuracy, precision,
completeness, and the weighted average value of accuracy and recall (f1 score). As already
mentioned, the use of classification in medicine requires or not the highest possible indicators. We
can assume that our model meets this need, showing an 18% error in the absence of pneumonia and
only 0.77% when lung damage was still present on X-ray.</p>
      <p>However, there is room for improvement. In particular, the next step may be to reduce the error in
the absence of pneumonia. By increasing the dataset or creating a more complex model, this can be
done.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>By performing convolutional neural network training to classify chest X-ray images on the CPU,
and in combination with the GPU, it was shown that the training on the GPU is faster. In this
particular case, training on the NVIDIA GeForce GTX 1650 Ti GPU was 6.13 times faster than on the
Intel (R) Core (TM) i7-10750H CPU only. And while setting up a GPU requires extra time, reducing
the training time of a convolutional neural network becomes very significant in real learning
scenarios, when training a single neural network can take days. In this case, training on the CPU can
last a week, when the use of the GPU will reduce this time to one day.</p>
      <p>A convolutional neural network was constructed to classify chest X-rays to detect pneumonia, and
an accuracy of 0.93 was obtained, with a recall of 0.99. Only three patients with pneumonia are
misclassified, which is essential in the medical field because it is better to misclassify a healthy
patient and pay more attention to him than to miss a patient who may get complications because of it.
[20]. Yao, Hongdou et al. Parallel Structure Deep Neural Network Using CNN and RNN with an
Attention Mechanism for Breast Cancer Histology Image Classification. Cancers vol. 11, 12
1901, (2019), doi.org/10.3390/cancers11121901.
[21]. Elnashar, Alaa. To Parallelize or Not to Parallelize, Speed Up Issue. International Journal of</p>
      <p>Distributed and Parallel Systems (IJDPS) Vol. 2, № 2, March 2011, pp. 14-28, (2011).
[22]. Parallel computing and its advantage and disadvantage, 2018.</p>
      <p>https://www.geekboots.com/story/parallel-computing-and-its-advantage-and-disadvantage.
[23]. Martinovic, Goran, Zdravko Krpic and Snjezana Rimac-Drlje. Parallelization Programming</p>
      <p>Techniques: Benefits and Drawbacks. (2010).
[24]. Lee, S., Agrawal, A., Balaprakash, P., Choudhary, A., &amp; Liao, W. K. Communication-Efficient
Parallelization Strategy for Deep Convolutional Neural Network Training. In Proceedings of
MLHPC 2018: Machine Learning in HPC Environments, Held in conjunction with SC 2018: The
International Conference for High Performance Computing, Networking, Storage and Analysis,
pp. 47-56, (2019), doi.org/10.1109/MLHPC.2018.8638635.
[25]. Bird, Jordan &amp; Faria, Diego &amp; Manso, Luis &amp; Ayrosa, Pedro &amp; Ekárt, A. A study on CNN
image classification of EEG Signals represented in 2D and 3D. Journal of Neural Engineering,
18(2), (2021), doi.org/10.1088/1741-2552/abda0c.
[26]. Sharma, Atul &amp; Phonsa, Gurbakash. Image Classification Using CNN. SSRN Electronic
Journal. Proceedings of the International Conference on Innovative Computing &amp;
Communication (ICICC) 2021, 5 p., (2021).
[27]. Palanisamy, Kamalesh &amp; Singhania, Dipika &amp; Yao, Angela. Rethinking CNN Models for Audio</p>
      <p>Classification. 2020, 8 p., arXiv: 2007.11154v2 [cs.CV] 13 Nov 2020.
[28]. Yann, LeCun, Leon, Bottou, Yoshua, Bengio, and Patrick, Haffner. Gradient-Based Learning</p>
      <p>Applied to Document Recognition. Proc. of the IEEE, pp. 1-46, (1998).
[29]. NIH Chest X-ray Dataset. URL: https://www.kaggle.com/nih-chest-xrays/data.
[30]. Openi, chest X-ray collection. URL: https://openi.nlm.nih.gov/.
[31]. Mochurad, L.I. Optimization of numerical solution of model problems on the basis of parallel
calculations. Chapter 1: monograph. Lviv: PE “BONA Publishing House”, 208 p., (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]. Klette, Reinhard.
          <source>Concise Computer Vision. An Introduction into Theory and Algorithms</source>
          . Springer, London, 429 p., (
          <year>2014</year>
          ), doi.org/10.1007/978-1-
          <fpage>4471</fpage>
          -6320-6.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]. Krishna, Srinivasan, Karthik, Raman, Jiecao, Chen, Michael, Bendersky, Marc, Najork. WIT:
          <article-title>Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning</article-title>
          .
          <source>SIGIR Resource Track, Virtual. arXiv:2103</source>
          .
          <year>01913</year>
          ,
          <volume>16</volume>
          p., (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]. Borad, Anand. Healthcare and
          <article-title>Machine Learning: The Future with Possibilities</article-title>
          . E Infochips : URL: https://www.einfochips.com/blog/healthcare-and
          <article-title>-machine-learning-the-future-withpossibilities/?utm_source=EIBlog&amp;utm_medium=BlogPostShubham&amp;utm_campaign=relatedblog.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]. Mochurad, Lesia, Yatskiv, Mariia.
          <article-title>Simulation of a Human Operator's Response to Stressors under Production Conditions</article-title>
          .
          <source>Proceedings of the 3rd International Conference on Informatics &amp; Data-Driven Medicine. Växjö, Sweden, November 19 - 21</source>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>169</lpage>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5]. Moravec's paradox // Wikipedia : веб-сайт</article-title>
          . URL: https: //en.wikipedia.org/ wiki/ Moravec%27 s_paradox.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6]. What is image classification in deep learning? // ThinkAutomation : веб-сайт</article-title>
          . URL: https://www.thinkautomation.com/eli5/eli5
          <article-title>-what-is-image-classification-in-deep-learning/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>.</given-names>
            <surname>Mochurad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Shakhovska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Montenegro</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Parallel Solving of Fredholm Integral Equations of the First Kind by Tikhonov Regularization Method Using OpenMP Technology</article-title>
          . In: Shakhovska N.,
          <string-name>
            <surname>Medykovskyy</surname>
            <given-names>M</given-names>
          </string-name>
          . (eds) Advances
          <source>in Intelligent Systems and Computing IV. CCSIT 2019. Advances in Intelligent Systems and Computing</source>
          , vol
          <volume>1080</volume>
          . Springer, Cham, pp.
          <fpage>25</fpage>
          -
          <lpage>35</lpage>
          ,(
          <year>2020</year>
          )
          <article-title>doi</article-title>
          .org/10.1007/978-3-
          <fpage>030</fpage>
          -33695-
          <issue>0</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <article-title>Parallelizing convolutional neural network for the handwriting recognition problems with different architectures</article-title>
          .
          <source>2017 International Conference on Progress in Informatics and Computing (PIC)</source>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>76</lpage>
          , (
          <year>2017</year>
          ), doi.org/10.1109/PIC.
          <year>2017</year>
          .
          <volume>8359517</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            and
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Parallel</surname>
          </string-name>
          <article-title>Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication</article-title>
          .
          <source>2017 IEEE 24th International Conference on High Performance Computing (HiPC)</source>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          , (
          <year>2017</year>
          ), doi.org/10.1109/HiPC.
          <year>2017</year>
          .
          <volume>00030</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10].Christopher J.
          <string-name>
            <surname>Shallue</surname>
            ,
            <given-names>Jaehoon</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            , Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig,
            <given-names>George E.</given-names>
          </string-name>
          <string-name>
            <surname>Dahl</surname>
          </string-name>
          .
          <source>Measuring the Effects of Data Parallelism on Neural Network Training</source>
          .
          <volume>20</volume>
          (
          <issue>112</issue>
          ):
          <fpage>1</fpage>
          −
          <lpage>49</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[11].</source>
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.
          <article-title>Large scaledistributed deep networks</article-title>
          .
          <source>In Advances in neural informationprocessing systems</source>
          , pp.
          <fpage>1223</fpage>
          -
          <lpage>1231</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12
          <string-name>
            <surname>]. Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Avancha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mudigere</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaidynathan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Srid-haran, S.,
          <string-name>
            <surname>Kalamkar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaul</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Distributeddeep learning using synchronous stochastic gradient descent</article-title>
          .
          <source>arXiv preprint arXiv:1602.06709</source>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]. Robbins, Herbert, Monro,
          <string-name>
            <surname>Sutton.</surname>
          </string-name>
          <article-title>A stochastic approximation method</article-title>
          .
          <source>The Annals of Mathematical Statistics</source>
          , Vol.
          <volume>22</volume>
          , No.
          <fpage>3</fpage>
          . (Sep.,
          <year>1951</year>
          ), pp.
          <fpage>400</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]. Sunwoo, Lee, Ankit, Agrawal, Prasanna, Balaprakash, Alok, Choudhary, Wei-keng,
          <source>Liao. Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training</source>
          .
          <source>2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)</source>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]. Xiangrui,
          <article-title>Li and Deng, Pan and Xin, Li and Dongxiao, Zhu. Improve SGD Training via Aligning Mini-batches</article-title>
          . arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>09917</volume>
          , 10 p., (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16].Yiming, Chen &amp; Kun, Yuan &amp; Yingya, Zhang &amp; Pan, Pan.
          <article-title>Accelerating Gossip SGD with Periodic Global Averaging</article-title>
          .
          <source>Proceedings of the 38th International Conference on Machine Learning</source>
          , PMLR
          <volume>139</volume>
          :
          <fpage>1791</fpage>
          -
          <lpage>1802</lpage>
          , (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]. Zhao,
          <string-name>
            <surname>Shen-Yi</surname>
          </string-name>
          &amp;
          <article-title>Li, Wu-Jun</article-title>
          .
          <source>Fast Asynchronous Parallel Stochastic Gradient Decent</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , (
          <year>2015</year>
          ),
          <source>arXiv:1508.05711v1 [stat.ML] 24 Aug</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]. S.-Y.,
          <string-name>
            <surname>Zhao</surname>
            and
            <given-names>W.-J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Fast asynchronous parallel stochas-tic gradient descent: A lock-free approach with convergenceguarantee</article-title>
          .
          <source>In AAAI</source>
          , pp.
          <fpage>2379</fpage>
          -
          <lpage>2385</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]. Moujahid, Hicham &amp; Cherradi, Bouchaib &amp; el
          <string-name>
            <surname>Gannour</surname>
          </string-name>
          , Oussama &amp; Bahatti, Lhoussain &amp; Terrada, Oumaima &amp; Hamida, Soufiane.
          <article-title>Convolutional Neural Network Based Classification of Patients with Pneumonia using X-ray Lung Images</article-title>
          .
          <source>Adv. Sci. Technol</source>
          . Eng. Syst. J.
          <volume>5</volume>
          (
          <issue>5</issue>
          ),
          <fpage>167</fpage>
          -
          <lpage>175</lpage>
          (
          <year>2020</year>
          )
          <article-title>; doi</article-title>
          .org/10.25046/aj050522.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>