<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (O. Smirnov);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Intellectual Classification method of Gymnastic Elements Based on Combinations of Descriptive and Generative Approache</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksii Smirnov</string-name>
          <email>Dr.smirnovoa@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugene Fedorov</string-name>
          <email>fedorovee75@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Neskorodieva</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana Neskorodieva</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central Ukrainian National Technical University</institution>
          ,
          <addr-line>avenue University, 8, Kropivnitskiy, 25006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Cherkasy State Technological University</institution>
          ,
          <addr-line>Cherkasy, Shevchenko blvd., 460, 18006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Uman National University of Horticulture</institution>
          ,
          <addr-line>1 Instituska st., Uman, Cherkassy region, 20305</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vasyl' Stus Donetsk National University</institution>
          ,
          <addr-line>600-richcha str., 21, Vinnytsia, 21021</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The paper proposes a method for the intellectual classification of gymnastic elements using a combination of descriptive and generative approaches. The created method has the following advantages: the input image is not square, which expands the scope of application; the number of pairs “convolutional layer - downsampling layer” is determined empirically, which increases the classification accuracy of the model; the layer quantity is determined automatically, which speeds up the determination of the model structure; the use of a neural network allows us to label frames of gymnastic elements, and the use of a generative approach allows the resulting sequence of labeled frames of gymnastic elements analyze effectively. The proposed method for the intellectual classification of gymnastic elements can be used in various intelligent visual image recognition systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intelligent classification</kwd>
        <kwd>gymnastic elements</kwd>
        <kwd>descriptive approach</kwd>
        <kwd>generative approach</kwd>
        <kwd>MLP neural network</kwd>
        <kwd>2D neural network LeNet</kwd>
        <kwd>Adam algorithm</kwd>
        <kwd>Viterbi algorithm 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>• complexity of identifying the structure of the hidden Markov model (number of states,
size of the mixture for each state).</p>
      <p>
        The second approach to intelligent image classification was the descriptive approach [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ],
and deep neural networks began to be used to increase recognition accuracy [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        LeNet-5 neural network [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] has the simplest architecture and uses two pairs of
convolutional and downsampling layers, as well as two fully connected layers. The convolutional
layer reduces the shift sensitivity of image elements. A downsampling layer reduces the
dimensionality of an image. Currently, a combination of LeNet -5 (for feature extraction) and Long
      </p>
      <sec id="sec-2-1">
        <title>Short-Term Memory (LSTM) (for classification) is popular [11, 12].</title>
        <p>
          Neural networks of the Dark Net family [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], neural networks of the AlexNet family [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and
neural networks of the VGG family (Visual Geometry Group) [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ] and are a modification of
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>LeNet. These neural networks can have several consecutive convolutional layers.</title>
        <p>
          ResNet family [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16, 17</xref>
          ] use a Residual block, which contains two consecutive convolutional
layers. The output signals of the planes of the layer preceding this block are added to the output
signals of the planes of the second convolutional layer of this block. The ResNet combination is
currently popular (for feature extraction) and support vector machines (SVM) (for classification)
[18].
        </p>
        <p>
          Neural network DenseNet (Dense Convolutional Network) [
          <xref ref-type="bibr" rid="ref16">16, 19</xref>
          ] uses a fully connected
(dense) block, which contains a set of Residual blocks. Output signals of the planes of the second
convolutional layers of the current Residual block of this dense block are concatenated with the
output signals of the planes of the second convolutional layer of all previous Residual blocks of
this dense block and with the output signals of the planes of the layer preceding this dense block.
In addition, the reduction of the planes of convolutional layers (usually by a factor of two) located
between dense blocks is used.
        </p>
        <p>Neural network GoogLeNet (Inception V1) [20] uses an Inception block that contains parallel
convolutional layers with connection regions of different sizes and one downsampling layer. The
output signals of the planes of these parallel layers are concatenated. To reduce the number of
operations, convolutional layers with a unit connection region are sequentially connected to
these parallel layers (in the case of convolutional layers, such a convolutional layer is placed
before them, and in the case of a downsampling layer, such a convolutional layer is placed after
it). The ResNet combination is currently popular (for feature extraction) and support vector
machines (SVM) (for classification) [18], used for diagnosis using CXR images, which provided a
diagnostic probability close to 100%.</p>
        <p>
          Inception neural network V 3 [
          <xref ref-type="bibr" rid="ref16">16, 17, 21</xref>
          ] is a modification of GoogLeNet, and its Inception and
Reduction blocks are a modification of the Inception block of the GoogLeNet neural network.
        </p>
        <p>
          Inception neural network - ResNet - v 2 [
          <xref ref-type="bibr" rid="ref16">16, 17, 22</xref>
          ] is a modification of GoogLeNet and ResNet,
its Inception block is a modification of the Residual and Inception blocks, the Reduction block isa
modification of the Inception block.
        </p>
        <p>
          Xception neural network [
          <xref ref-type="bibr" rid="ref16">16, 23</xref>
          ] uses Depthwise separable convolution block, which
performs first a pointwise convolution and then a depthwise convolution. For both convolutions,
a ReLU activation function is typically used.
        </p>
        <p>MobileNet neural network [24, 25] uses Depthwise separable convolution block, which
performs first depthwise convolution and then pointwise convolution. For both convolutions, a
linear activation function is typically used.</p>
        <p>
          MobileNet 2 neural network [
          <xref ref-type="bibr" rid="ref16">16, 26</xref>
          ] uses Inverse Residual block, which first performs
pointwise convolution, then depthwise convolution, and then pointwise again. For both
convolutions, the SiLU activation function is typically used.
        </p>
        <p>MobileNet 3 neural network [27, 28, 29] uses Squeeze and Excitation block in some Inverse</p>
      </sec>
      <sec id="sec-2-3">
        <title>Residual blocks.</title>
        <p>Deep neural networks have one or more of the following disadvantages:
• insufficiently high classification accuracy;
• insufficiently high speed of parameter identification;
complexity of identifying the structure of a neural network (number and size of layers of
each type).
algorithms are used [27, 30].
gymnastic elements is urgent.</p>
        <p>To increase the speed of identification of parameters of deep neural network models, parallel
In connection with this, the problem of creating an effective intellectual classification of
The goal of the work is to increase the efficiency of intellectual classification of gymnastic
elements using a combination of descriptive and generative approaches.</p>
      </sec>
      <sec id="sec-2-4">
        <title>To achieve this goal, it is necessary to solve the following tasks:</title>
        <p>Create the structure of a method for the intellectual classification of gymnastic elements,
which combines descriptive and generative approaches.</p>
        <p>Develop a one-dimensional neural network model for classifying frames of gymnastic
elements.
elements.</p>
        <p>Create a model of a two-dimensional neural network for classifying frames of gymnastic</p>
      </sec>
      <sec id="sec-2-5">
        <title>Develop a method for identifying the parameters of a neural network model.</title>
      </sec>
      <sec id="sec-2-6">
        <title>Create a method for classifying the sequence of frames of gymnastic elements. Select quality criteria for the method of intellectual classification of gymnastic elements. Conduct a numerical study of the proposed method for intelligent classification of gymnastic elements.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and Materials</title>
      <p>3.1. Structure of the intellectual method classification of gymnastic elements
based on a combination of descriptive and generative approaches
In the proposed method, the outputs of the neural network are considered as the probabilities of
the appearance of the observation symbol (the  -th frame of the gymnastic element) in the  -th
state (gymnastic pose) (at the  -th output of the neural network). The Viterbi dynamic
programming method is applied to a labeled sequence of gymnastic element frames. On the other
hand, the parameters of a neural network can be identified based on a sequence of frames labeled
by the Viterbi method. This combination provides classification probabilities comparable to those
of DTW, discrete and semi-continuous Hidden Markov Models (HMMs), and does not require a
separate neural network for each gymnastic element, as in these methods.</p>
      <sec id="sec-3-1">
        <title>Main stages of the proposed method:</title>
        <p>To initially identify the parameters of the neural network, manually labeled frames of
gymnastic elements from the database [31] are used. Based on the labeled frames of the
database for future use in the Viterbi method, the following are calculated:
standard database,
where   is the number of frames marked with state   in the entire set of training data of the
 is the number of all frames in the entire set of training data of the standard database.
the probability of the initial state   for the Bakis HMM model or the HMM model with a
limited transition is determined by the formula,  ̃ = {
probability of transitions between states   in the form  
1,  = 1
0,  &gt; 1</p>
        <p>;
=   ,
 
a priori probability  (  ) in the form
 (  ) =
 

training data of the standard database,
standard database.</p>
        <p>where nij is the number of any transitions from state   to state   across the entire set of
  is the number of any transitions from the state   across the entire set of training data of the
segmentation is performed.
2. Frames of gymnastic elements are recognized using a neural network model, i.e.
3. A modified Viterbi algorithm is used, which optimizes segmentation (sequence of states).</p>
        <p>For this algorithm, the probability distribution of the occurrence of an observation symbol
  ( -th frame) in the  -th state is pre-calculated   (  ) according to Bayes' rule as an
emission probability p(ot | s j ) =</p>
        <p>P(s j )
p(s j | ot )P(ot ) , where the posterior probability  (  |  )
network model.
and can be omitted,
is the output of the  -th neuron of the neural network, the prior probability  (  ) is fixed
4. The parameters of the neural network model are identified using frame markers of
gymnastic elements (segmentation result) obtained using a modified Viterbi algorithm.
5. For a given subject area, frames of gymnastic elements are recognized using a neural
6. If the recognition error of the neural network exceeds the threshold, then go to step 3.
Next, we consider models of neural networks that mark frames of gymnastic elements.
3.2. One-dimensional neural network for classifying frames of gymnastic
elements based on a multilayer perceptron
(MLP), which is a non-recurrent static multilayer neural network containing two hidden layers
and an output layer. The classes are separated by hyperplanes.</p>
        <p>For MLP, error-correction-based learning (supervised learning) is used in batch mode, and
the Adam algorithm was used in the work.</p>
        <p>…
…
…
…
…
…
…
…
…
),   ( ) =  
( ) + ∑ =(1−1) ( )



( −1),  ∈ 1,  ( ),  ∈ 1,  ,
where  ( ) is the number of neurons in the  -th layer,




 (0) =   ,</p>
        <p>( ) =  ( )( ( )</p>
        <p>is the layer number,
 is the number of layers,
 ( ) is the threshold of the  -th neuron a in the  -th layer,
 ( ) is the connection weight from the i -th neuron to j-th neuron on  -th layer,
 ( ) is the output of the  -th neuron on the  -th layer,
 ( ) is the activation function of neurons of the  -th layer.</p>
        <p>ReLU was used as quality,  ( ) softmax was used as quality  ( ).</p>
        <p>3.3 Two-dimensional neural network for classifying frames of gymnastic
elements based on 2D LeNet
Figure 2 shows a two-dimensional neural network for classification based on 2D LeNet, which is
a non-recurrent dynamic neural network and has a hierarchical structure.</p>
        <p>2D_LeNet is a special class of multilayer perceptron. It is formed by an input layer, which
consists of a single receptor plane, alternating convolutional layers (corresponding to
neocognitron  -layers) and downsampling (pooling) layers (corresponding to neocognitron
Clayers), a sequence of fully connected layers (hidden MLP layers) and an output layer. The
convolutional layer consists of convolutional planes. The downsampling layer consists of
downsampling planes. Each convolutional plane consists of convolutional cells, each
downsampling plane consists of downsampling cells. The convolutional layer reduces the shift
sensitivity of image elements. A downsampling layer reduces the dimensionality of an image. The
connection area of the cell plane of the previous layer is associated with a cell of the cell plane of
the current layer. Geometrically, the communication area is usually a square. For all planes of one
layer it has the same size. All cells of the same plane of cells of the current layer associated with
the connection areas of the plane of cells of the previous layer have the same weights. The cell
plane communication regions of the downsampling layer overlap. Because of this, one cell in the
downsampling layer's cell plane entering different overlapping communication regions can
activate multiple cells in the convolutional layer's cell plane. Communication area for 2D LeNet
does not go beyond the boundaries of the plane, so the size of the convolutional layers gradually
decreases.</p>
        <p>For this neural network model, training is used based on error correction (supervised
learning) in batch mode, and the Adam algorithm was used in the work.</p>
        <p>3.2.1. Neural network model
Let  be the position in the connection region,  = (  ,   ),   be the number of cell planes in the
input layer  (for RGB images 3),    be the number of cell planes in the downsampling layer   ,
   be the number of cell planes in the convolutional layer   ,   be the connection region of the
layer plane   ,  ̑ and be the number of convolutional (or downsampling) layers, ̆ – the number
of fully connected layers.</p>
        <p>1.  = 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2. Calculate the output signal for the convolutional layer</title>
        <p>ucl (m,i) = fcl (hcl (m,i))  ∈ {1, . . . ,    }2,  ∈ 1,    ,
 KI
bc1 (i) + k=1 vA1 wc1 ( , k,i)x(m + , k),</p>
        <p>Ksl−1
hcl (m,i) = bcl (i) + k=1 vAl−1 wcl ( , k,i)usl−1 (m + , k), l  1
l = 1
hdl ( j) = 


where   1
  
  
 1,
layer   ,
  
  
3.
convolutional layer   ,</p>
        <p>where   1( ,  ,  ) is the weight of the connection from the  -th position in the connection area
of the  -th plane of the cells of the input layer I to the  -th plane of cells of the convolutional layer
( ,  ,  ) is the weight of the connection from the  -th position in the connection area of the
 -th plane of cells of the downsampling layer   −1 to the  -th plane of cells of the convolutional
( ,  ) is the output of the cell in the  -th position in the  -th plane of the cells of the
  is the activation function of the neurons of the convolutional layer   .</p>
        <p>Calculate the output signal for the downsampling layer (halving the scale)
1</p>
        <p>
  to the  -th plane of cells of the downsampling layer   ,
downsampling layer   .</p>
        <p>4. If  ≤  ̑ , then  =  + 1, go to 2.
where</p>
        <p>( ,  ) is the connection weight from the  -th plane of cells of the convolutional layer
   ( ,  ) is the output of the cell in the  -th position in the  -th plane of cells of the
5.</p>
        <p>Output calculation for a fully connected layer: udl ( j) = fdl (hdl ( j)) , j 1, Ndl , l 1, L ,
K
sL</p>
        <p>Ndl−1
bd1 ( j) + </p>
        <p>k=1 v{1,...,NsL }2 wd1 (v, k, j)usL (v, k), l = 1
bdl ( j) + z=1 wdl (z, j)udl−1 (z),
l  1
,
connected one layer  1,
neuron on the  -th fully connected layer   ,</p>
        <p>( ,  ,  ) is the weight of the connection from the  -th position in the connection area
of the  -th plane of cells of the downsampling layer   ̑ to the  -th neuron on the first fully
( ,  ) is the weight of connection from the  -th fully connected neuron layer   −1 to  -th
( ) is the output of the  -th fully connected neuron layer   ,
   is the activation function of fully connected neurons layer   .</p>
      </sec>
      <sec id="sec-3-3">
        <title>Output calculation for output layer</title>
        <p>uo ( j) = fo (ho ( j)) , j 1, No , ho ( j) = bo ( j) +  wo (z, j)udL (z) ,</p>
        <p>N
d</p>
        <p>L
z=1
 -th neuron on the output layer O,
  ( ) is the output of the  -th neuron of the output layer  ,
 is the activation function of the neurons of the output layer  .</p>
        <p>ReLU was used as quality   
,</p>
        <p>softmax was used as quality   .
where   ( ,  ) is the weight of connection from the  -th fully connected neuron layer   to the
3.2.2. Method for identifying parameters of a neural network model based
on the Adam algorithm
step 1. Initialization.</p>
        <p>step 1.1. The initial vector of weights is specified  (0).
step 1.2. The initial vector of the first moments is specified  (−1) =  .
step 1.3. The initial vector of the second moments is specified  (−1) =  .
step 1.4. The parameter is set  to determine the learning rate (usually  = 0.001), the
division by zero (usually  = 10−8).
decay rates of the first and second moments  1 and  2, respectively,  1,  2 ∈ [0,1)
(usually  1 = 0.9 and  2 = 0.999), as well as the stability parameter  to prevent
step 1.5. The initial gradient is calculated  (0).</p>
        <p>step 1.6.  = 0.
step 2.</p>
      </sec>
      <sec id="sec-3-4">
        <title>The vector of first moments is calculated based on the exponential moving average</title>
        <p>( ) =  1 ( − 1) + (1 −  1) ( ).
 ( ) =  2 ( − 1) + (1 −  2) 2( ).
step 3.</p>
      </sec>
      <sec id="sec-3-5">
        <title>The vector of second moments is calculated based on the exponential moving average step 4. The vector of weights is calculated (the moments are corrected due to their initialization to zero and the learning step is scaled)</title>
        <p>̑ ( ) =  ( )/(1 −  1 +1),  ̑ ( ) =  ( )/(1 −  2 +1),  ( + 1) =  ( ) −   ̑ ( )
.
√ ̑ ( )+
3.2.3. Method for classifying a sequence of frames of gymnastic elements
based on the Viterbi algorithm
To avoid numerous multiplications during the operation of the Viterbi algorithm, you can
logarithmize all the parameters of the model and move from multiplications to addition, since
addition is much simpler to implement and faster to calculate. The modified Viterbi algorithm is
 ̑ =    , 1 ≤  ≤  ,  ̑ (  ) =</p>
        <p>(  ), 1 ≤  ≤  , 1 ≤  ≤  ,  ̑ =    , 1 ≤  ,  ≤  .</p>
        <p>1̑( ) =  ̑ +  ̑ ( 1), 1 ≤  ≤  ,  1( ) = 0, 1 ≤  ≤  .
described as follows:</p>
      </sec>
      <sec id="sec-3-6">
        <title>Preprocessing:</title>
      </sec>
      <sec id="sec-3-7">
        <title>Initialization:</title>
      </sec>
      <sec id="sec-3-8">
        <title>Recursion:</title>
        <p>1≤ ≤
1≤ ≤
 ̑ +1( ) = 
  +1( ) =</p>
        <p>End:
  = 
1≤ ≤
[ ̑ ( )],  ∗ = 
[ ̑ ( )].
1≤ ≤</p>
      </sec>
      <sec id="sec-3-9">
        <title>Restoring the path (sequence of states):</title>
        <p>∗ =   +1( ∗+1),  =  − 1,  − 2, . . . ,1.</p>
        <p>[ ̑ ( ) +  ̑ ] +  ̑ (  +1),</p>
        <p>[ ̑ ( ) +  ̑ ], 1 ≤  ≤  − 1,1 ≤  ≤  .</p>
        <p>3.2.4. Quality criteria selection for the method of intellectual classification
In the work, to assess the identification of neural networks parameters, the following were
selected:
•
accuracy criterion
of gymnastic elements
=
1</p>
        <p>∑
1,  = 
0,  ≠ 
 =1
[  =  ̑  ] → 
• categorical cross-entropy criterion
= −
1

∑</p>
        <p>
          =1 ∑ =1     
where   is the  -th vector according to the model,   ∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ],
  is the i -th test vector,   ∈ {0,1},
 is the power of the training set,
 is the number of classes (neurons in the output layer),
 is the vector of weights;
•
performance criterion
 →
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>A numerical study was carried out based on the dataset [31]. RG Rotate Dataset consists of 49
examples of performing a turn in the back split position without using the hands, with the torso
horizontal (Split back without help, trunk horizontal). The danne were collected from the video
broadcast of the final stage of the 2021 Olympic Games in Tokyo. The examples consist of
elements performed by 8 different gymnasts with 4 types of apparatus. Each example consists of
an ordered set of images, the number of images in the example depends on the duration of the
athlete’s performance of the element. This structure allows you to store changes in body position
when performing a rotation element. One second of execution is described by 30 frames. The data
set is divided into a training set of 39 examples and a test set of 10 examples of element execution.
The total dataset size for the 49 examples was 7,355 record images. No preprocessing of the data
set was performed. From the datasets, 80% of the images were randomly selected for the training
set and 20% of the images for the validation and test sets. Due to the fact that deep neural
networks do not contain recurrent connections, training was carried out using GPU. To
implement the proposed neural networks, the tensorflow package was used, Google was chosen
as the software environment Collaboratory.</p>
      <p>The frames of one example of execution show the body positions when performing a
rotation element (Fig. 3).</p>
      <p>Table 1 presents the structure of a neural network model based on MLP, where K is the
number of classes.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>0 1
2
3
4
3,089 2,973 3,018
2,639
2,278 2,246
3,5
3
y2,5
p
o
r
t
n
e
s 2
s
o
r
C
l
a
ic1,5
r
o
g
e
t
a
C1
0,5
1
0,8
cy
rau0,6
cc
A
0,4
0,2
2,060
1,820
1,482 1,407
0,1447
0,2368
0,3421
0,4211
1,021
0,575
0,5132
0,7105
0 1
2
3
4
5
6
7
8
4,5
4
3,5
y
p
o
tr 3
n
e
s
s
o
rC2,5
l
a
c
i
r
go 2
e
t
a
C
1,5
0,5
1
0,8
y
c
a
r0,6
u
c
c
A
0,4
0,2</p>
      <p>0,0658
0 1
0,1053
2
0,0658
3
0,1579
0,5132
0,6579
0,002
0
0,014
1 pair
0,002
2 pair
number pair</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussions</title>
      <sec id="sec-6-1">
        <title>As a result of the numerical study, the following was established:</title>
        <p>• the minimum number of iterations for a neural network model based on a three-layer
MLP in terms of losses (based on categorical entropy) (according to Fig. 3) and accuracy
(according to Fig. 4) is 18;
• minimum number of iterations for a 2D neural network model LeNet in terms of loss
(based on categorical entropy) (according to Fig. 5) and accuracy (according to Fig. 6) is
11;
• the best number of “convolutional layer – downsampling layer” pairs for a 2D neural
network model LeNet in terms of loss (based on categorical entropy) is 2 (according to
Fig. 7).</p>
        <p>To prevent overfitting, the KFold cross-entropy method with a number of folds of 5 was used.
7. Conclusions
1. To solve the problem of increasing the efficiency of classification of gymnastic elements,
corresponding artificial intelligence methods were investigated. These studies have
shown that today the most effective is the use of hidden Markov models (generative
approach) and neural networks (descriptive approach).
2. The created method has the following advantages: the input image is not square, which
expands the scope of application; the number of pairs “convolutional layer –
downsampling layer” is determined empirically, which increases the accuracy of
identification by model; the number of planes is defined as the quotient of the number of
cells in the input layer divided by two to the power of two (the power is equal to twice the
number of the pair "convolutional layer - downsampling layer") to preserve the total
number of cells in the layer after downsampling, which halves the size of the layer planes
by height and width, which automates the determination of the structure of the model
layers; the use of a neural network allows us to label frames of gymnastic elements, and
the use of a generative approach allows the resulting sequence of labeled frames of
gymnastic elements analyze effectively.
3. Further prospects for research are the use of the proposed method of intelligent
classification for various intelligent visual image recognition systems.
[17] F. E. L. da Cruz, G. Corso, G. Z. dos Santos Lima, S. R. Lopes, and T. de Lima Prado. Statistical
inference for microstate distribution in recurrence plots. Physica D: Nonlinear Phenomena,
vol. 459. (2024):134048. doi:10.1016/j.physd.2023.134048.
[18] Detection of COVID-19 chest X-ray using support vector machine and convolutional neural
network. Communications in Mathematical Biology and Neuroscience, 2020,
doi:10.28919/cmbn/4765.
[19] Jia. Jia, P. Lv, X. Wei, W. Qiu. SNO-DCA: A model for predicting S-nitrosylation sites based on
densely connected convolutional networks and attention mechanism. Heliyon Vol.10 (2024):
1-11. doi:10.1016/j.heliyon.2023.e23187
[20] F.B.N. Barber, A.E. Oueslati. Human exons and introns classification using pre-trained
Resnet-50 and GoogleNet models and 13-layers CNN model. Journal of Genetic Engineering
and Biotechnology Vol.22 (2024):1-8. doi:10.1016/j.jgeb.2024.100359
[21] H. Wang, Sh. Xu, K.-b. Fang, Zh.-Sh. Dai, G.-Zh. Wei, L.-F. Chen. Contrast-enhanced magnetic
resonance image segmentation based on improved U-Net and Inception-ResNet in the
diagnosis of spinal metastases. Journal of Bone Oncology. Vol.42 (2023): 1-9.
doi:10.1016/j.jbo.2023.100498.
[22] M.N. Khan, S. Das, J. Liu. Predicting pedestrian-involved crash severity using inception-v3
deep learning model. Accident Analysis and Prevention. Vol.197 (2024): 1-17.
doi:10.1016/j.aap.2024.107457.
[23] X. Tang, F.R. Sheykhahmad. Boosted dipper throated optimization algorithm-based Xception
neural network for skin cancer diagnosis: An optimal approach. Heliyon. Vol.10 (2024):
121. doi:10.1016/j.heliyon.2024.e26415.
[24] D. Garg, G.K. Verma, A.K. Singh. EEG-based emotion recognition using MobileNet Recurrent
Neural Network with time-frequency features. Applied Soft Computing. Vol.154 (2024):
114. doi:10.1016/j.asoc.2024.111338.
[25] L. Geng, Y. Hu, Z. Xiao, and J. Xi. Fertility Detection of Hatching Eggs Based on a Convolutional</p>
        <p>Neural Network. Applied Sciences, vol. 9, no. 7, (2019): 1408. doi:10.3390/app9071408.
[26] A.M. Rifai, S. Raharjo, E. Utami, D. Ariatmanto. Analysis for diagnosis of pneumonia symptoms
using chest X-ray based on MobileNetV2 models with image enhancement using white
balance and contrast limited adaptive histogram equalization (CLAHE). Biomedical Signal
Processing and Control. Vol.90 (2024): 1-8. doi.org/10.1016/j.bspc.2023.105857.
[27] T. Neskorodieva, E. Fedorov, M. Chychuzhko, and V. Chychuzhko, Metaheuristic method for
searching quasi-optimal route based on the ant algorithm and annealing simulation.</p>
        <p>Radioelectronic and computer systems, no. 1, (2022): 92–102. doi:10.32620/reks.2022.1.07.
[28] Yi. Liu, Zh. Wang, R. Wang, J. Chen, H. Gao. Flooding-based MobileNet to identify cucumber
diseases from leaf images in natural scenes. Computers and Electronics in Agriculture. v. 213,
(2023): 1-12. doi:10.1016/j.compag.2023.108166
[29] P.A. Arjun, S. Suryanarayan, R.S. Viswamanav, S. Abhishek, T. Anjali. Unveiling Underwater
Structures: MobileNet vs. EfficientNet in Sonar Image Detection. Procedia Computer Science.
v. 233 (2024): 518-527. doi:10.1016/j.procs.2024.03.241
[30] T. Neskorodieva, E. Fedorov. Method for Automatic Analysis of Compliance of Settlements
with Suppliers and Settlements with Customers by Neural Network Model of Forecast.”
Mathematical Modeling and Simulation of Systems (MODS’2020). (2020): 156–165.
doi:10.1007/978-3-030-58124-4_15
[31] Dataset RG Rotate, 2024. URL:
https://drive.google.com/file/d/1HpLAu5esBvsi0VZ0YFywdzlc71B_KQcR/view?usp=shari
ng</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Neskorodieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Strutovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vietrov</surname>
          </string-name>
          .
          <article-title>Real-time Classification, Localization and Tracking System (Based on Rhythmic Gymnastics)</article-title>
          ,
          <source>in: Proceedings of the IEEE 13th International Conference on Electronics and Information Technologies</source>
          ,
          <volume>14</volume>
          .11(
          <year>2023</year>
          ):
          <fpage>11</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1109/elit61488.
          <year>2023</year>
          .10310664
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Goumiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Benboudjema</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Pieczynski</surname>
          </string-name>
          . ”
          <article-title>A new hybrid model of convolutional neural networks and hidden Markov chains for image classification”</article-title>
          .
          <source>Neural Computing and Applications</source>
          , volume
          <volume>35</volume>
          ,
          <string-name>
            <surname>May</surname>
          </string-name>
          (
          <year>2023</year>
          ):
          <fpage>17987</fpage>
          -
          <lpage>18002</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00521-023-08644-4.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garhwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>A Systematic Review of Hidden Markov Models and Their Applications</article-title>
          .
          <source>Arch Computat Methods Eng</source>
          Vol.
          <volume>28</volume>
          , (
          <year>2021</year>
          ):
          <fpage>1429</fpage>
          -
          <lpage>1448</lpage>
          . doi.org/10.1007/s11831-020-09422-4.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>An object tracking framework with recapture based on correlation filters and siamese networks</article-title>
          .
          <source>Comput. Electr. Eng</source>
          .
          <volume>98</volume>
          ,
          <issue>107730</issue>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1016/j.compeleceng.
          <year>2022</year>
          .
          <volume>107730</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Feng</surname>
          </string-name>
          , Ya. Wei, Yu Han,
          <string-name>
            <surname>T</surname>
          </string-name>
          . Li.
          <article-title>DeoT: an end-to-end encoder-only Transformer object detector</article-title>
          .
          <source>Journal of Real-Time Image Processing</source>
          . Vol.
          <volume>20</volume>
          ,
          <issue>Issue 1</issue>
          (
          <year>2023</year>
          ).
          <source>doi:10.1007/s11554-023-01280-0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Moving scene object tracking method based on deep convolutional neural network</article-title>
          .
          <source>Alexandria Engineering Journal</source>
          . Vol.
          <volume>86</volume>
          (
          <year>2024</year>
          ):
          <fpage>592</fpage>
          -602 doi:10.1016/j.aej.
          <year>2023</year>
          .
          <volume>11</volume>
          .077.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Solovyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Gabruseva: Weighted boxes fusion: Ensembling boxes from different object detection models</article-title>
          .
          <source>Image and Vision Computing</source>
          . Vol.
          <volume>107</volume>
          (
          <year>2021</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.imavis.
          <year>2021</year>
          .
          <volume>104117</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Neskorodieva</surname>
          </string-name>
          .,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fedorov</surname>
          </string-name>
          .
          <article-title>Method for automatic analysis of compliance of expenses data and the enterprise income by neural network model of forecast</article-title>
          ,
          <source>in: Proceedings of the 2nd International Workshop on Modern Machine Learning Technologies and Data Science. CEUR Workshop</source>
          ,
          <volume>2631</volume>
          ,
          <string-name>
            <surname>Lviv-Shatsk</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>158</lpage>
          . URL: https://www.scopus.com/inward/record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>85088880635</lpage>
          &amp;partnerID=
          <volume>40</volume>
          &amp;md5=
          <fpage>c0564b0cbe18017126f328fd3a4779c4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          , “
          <string-name>
            <surname>Rolling-Element Bearing Fault Diagnosis Using Improved</surname>
          </string-name>
          LeNet-5
          <source>Network.” Sensors</source>
          , vol.
          <volume>20</volume>
          ,
          <year>2020</year>
          , no.
          <issue>6</issue>
          , p.
          <fpage>1693</fpage>
          . doi:
          <volume>10</volume>
          .3390/s20061693.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          et al.,
          <article-title>“A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition</article-title>
          .
          <source>” IEEE Access</source>
          ,vol.
          <volume>7</volume>
          , (
          <year>2019</year>
          ):
          <fpage>40757</fpage>
          -
          <lpage>40770</lpage>
          . doi:
          <volume>10</volume>
          .1109/access.
          <year>2019</year>
          .
          <volume>2906654</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.-Y.</given-names>
            <surname>Kim and S.-B. Cho</surname>
          </string-name>
          , “
          <article-title>Predicting residential energy consumption using CNN-LSTM neural networks</article-title>
          .
          <source>” Energy</source>
          , vol.
          <volume>182</volume>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>81</lpage>
          . Sep.
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1016/j.energy.
          <year>2019</year>
          .
          <volume>05</volume>
          .230.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          et al.
          <article-title>CNN-LSTM deep learning architecture for computer vision-based modal frequency detection</article-title>
          .
          <source>Mechanical Systems and Signal Processing</source>
          , vol.
          <volume>144</volume>
          , (
          <year>2020</year>
          ):
          <fpage>106885</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.ymssp.
          <year>2020</year>
          .
          <volume>106885</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lyu</surname>
          </string-name>
          .
          <article-title>Object detection in real time based on improved single shot multi-box detector algorithm</article-title>
          .
          <source>EURASIP Journal on Wireless Communications and Networking</source>
          , vol.
          <year>2020</year>
          , no.
          <issue>1</issue>
          ,
          <issue>10</issue>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1186/s13638-020-01826-x.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Sh.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Yu. Zhang.</given-names>
          </string-name>
          <article-title>Review of AlexNet for Medical Image Classification</article-title>
          .
          <source>- arXiv preprint arXiv.2311.08655</source>
          .
          <year>2023</year>
          :
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.S.</given-names>
            <surname>Ch. Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.K.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.P.V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.R.</given-names>
            <surname>Sai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brahmaiah</surname>
          </string-name>
          .
          <article-title>Deep residual convolutional neural Network: An efficient technique for intrusion detection system</article-title>
          .
          <source>Expert Systems With Applications</source>
          Vol.
          <volume>238</volume>
          (
          <year>2024</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2023</year>
          .
          <volume>121912</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          , Zh. Yu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Zhang.</surname>
          </string-name>
          <article-title>Single and simultaneous fault diagnosis of gearbox via wavelet transform and improved deep residual network under imbalanced data</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          . Vol.
          <volume>133</volume>
          (
          <year>2024</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.engappai.
          <year>2024</year>
          .108146
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>