Intellectual Classification method of Gymnastic Elements
                         Based on Combinations of Descriptive and Generative
                         Approache
                         Oleksii Smirnov1, Eugene Fedorov2, Anastasiia Neskorodieva3, Tetiana Neskorodieva4
                         1 Central Ukrainian National Technical University, avenue University, 8, Kropivnitskiy, 25006, Ukraine
                         2 Cherkasy State Technological University, Cherkasy, Shevchenko blvd., 460, 18006, Ukraine
                         3 Vasyl’ Stus Donetsk National University, 600-richcha str., 21, Vinnytsia, 21021, Ukraine
                         4 Uman National University of Horticulture, 1 Instituska st., Uman, Cherkassy region, 20305, Ukraine


                                         Abstract
                                         The paper proposes a method for the intellectual classification of gymnastic elements using a
                                         combination of descriptive and generative approaches. The created method has the following
                                         advantages: the input image is not square, which expands the scope of application; the number of pairs
                                         “convolutional layer – downsampling layer” is determined empirically, which increases the
                                         classification accuracy of the model; the layer quantity is determined automatically, which speeds up
                                         the determination of the model structure; the use of a neural network allows us to label frames of
                                         gymnastic elements, and the use of a generative approach allows the resulting sequence of labeled
                                         frames of gymnastic elements analyze effectively. The proposed method for the intellectual
                                         classification of gymnastic elements can be used in various intelligent visual image recognition systems.

                                         Keywords
                                         Intelligent classification, gymnastic elements, descriptive approach, generative approach, MLP neural
                                         network, 2D neural network LeNet, Adam algorithm, Viterbi algorithm
                                         1


                         1. Introduction
                            Assessing the performance of elements in rhythmic gymnastics is a complex task. Every
                         element, from turns and throwing movements to flexibility and balance, is subjected to rigorous
                         analysis. The difficulty lies in the fact that the assessment of such elements is subject to subjective
                         interpretation and requires a high level of professionalism from experts. In the previous work [1]
                         is study of classification problems of gymnastic balance elements performed by rhythmic
                         gymnastics athletes and based on frames. This article discusses classification gymnastics element
                         turn in dynamics based analysis sequences frames. In this context, the development of intelligent
                         methods for classifying gymnastics elements by video can significantly improve the objectivity
                         and efficiency of the evaluation process in rhythmic gymnastics.

                         2. Related Works
                         The first approach to intelligent image classification was a generative approach, which was based
                         on hidden Markov models [2, 3].
                            Hidden Markov models have one or more of the following disadvantages:
                            • insufficiently high classification accuracy;
                            • insufficiently high speed of parameter identification;


                         COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024,
                         Lviv, Ukraine
                             Dr.smirnovoa@gmail.com (O. Smirnov); fedorovee75@ukr.net (E. Fedorov); neskorodieva.a@gmail.com
                         (A. Neskorodieva); tvnesk1@gmail.com (T. Neskorodieva)
                             0000-0001-9543-874X (O. Smirnov); 0000-0003-3841-7373 (E. Fedorov); 0000-0002-8591-085X
                         (A. Neskorodieva); 0000-0003-2474-7697 (T. Neskorodieva)
                                    © 2024 Copyright for this paper by its authors.
                                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   •    complexity of identifying the structure of the hidden Markov model (number of states,
        size of the mixture for each state).
    The second approach to intelligent image classification was the descriptive approach [4, 5, 6],
and deep neural networks began to be used to increase recognition accuracy [7, 8].
    LeNet-5 neural network [9, 10] has the simplest architecture and uses two pairs of
convolutional and downsampling layers, as well as two fully connected layers. The convolutional
layer reduces the shift sensitivity of image elements. A downsampling layer reduces the
dimensionality of an image. Currently, a combination of LeNet -5 (for feature extraction) and Long
Short-Term Memory (LSTM) (for classification) is popular [11, 12].
    Neural networks of the Dark Net family [13], neural networks of the AlexNet family [14] and
neural networks of the VGG family (Visual Geometry Group) [15, 16] and are a modification of
LeNet. These neural networks can have several consecutive convolutional layers.
    ResNet family [15, 16, 17] use a Residual block, which contains two consecutive convolutional
layers. The output signals of the planes of the layer preceding this block are added to the output
signals of the planes of the second convolutional layer of this block. The ResNet combination is
currently popular (for feature extraction) and support vector machines (SVM) (for classification)
[18].
    Neural network DenseNet (Dense Convolutional Network) [16, 19] uses a fully connected
(dense) block, which contains a set of Residual blocks. Output signals of the planes of the second
convolutional layers of the current Residual block of this dense block are concatenated with the
output signals of the planes of the second convolutional layer of all previous Residual blocks of
this dense block and with the output signals of the planes of the layer preceding this dense block.
In addition, the reduction of the planes of convolutional layers (usually by a factor of two) located
between dense blocks is used.
    Neural network GoogLeNet (Inception V1) [20] uses an Inception block that contains parallel
convolutional layers with connection regions of different sizes and one downsampling layer. The
output signals of the planes of these parallel layers are concatenated. To reduce the number of
operations, convolutional layers with a unit connection region are sequentially connected to
these parallel layers (in the case of convolutional layers, such a convolutional layer is placed
before them, and in the case of a downsampling layer, such a convolutional layer is placed after
it). The ResNet combination is currently popular (for feature extraction) and support vector
machines (SVM) (for classification) [18], used for diagnosis using CXR images, which provided a
diagnostic probability close to 100%.
    Inception neural network V 3 [16, 17, 21] is a modification of GoogLeNet, and its Inception and
Reduction blocks are a modification of the Inception block of the GoogLeNet neural network.
    Inception neural network - ResNet - v 2 [16, 17, 22] is a modification of GoogLeNet and ResNet,
its Inception block is a modification of the Residual and Inception blocks, the Reduction block isa
modification of the Inception block.
    Xception neural network [16, 23] uses Depthwise separable convolution block, which
performs first a pointwise convolution and then a depthwise convolution. For both convolutions,
a ReLU activation function is typically used.
    MobileNet neural network [24, 25] uses Depthwise separable convolution block, which
performs first depthwise convolution and then pointwise convolution. For both convolutions, a
linear activation function is typically used.
    MobileNet 2 neural network [16, 26] uses Inverse Residual block, which first performs
pointwise convolution, then depthwise convolution, and then pointwise again. For both
convolutions, the SiLU activation function is typically used.
    MobileNet 3 neural network [27, 28, 29] uses Squeeze and Excitation block in some Inverse
Residual blocks.
    Deep neural networks have one or more of the following disadvantages:
     • insufficiently high classification accuracy;
     • insufficiently high speed of parameter identification;
    •   complexity of identifying the structure of a neural network (number and size of layers of
        each type).
   To increase the speed of identification of parameters of deep neural network models, parallel
algorithms are used [27, 30].
   In connection with this, the problem of creating an effective intellectual classification of
gymnastic elements is urgent.
   The goal of the work is to increase the efficiency of intellectual classification of gymnastic
elements using a combination of descriptive and generative approaches.
   To achieve this goal, it is necessary to solve the following tasks:
   1. Create the structure of a method for the intellectual classification of gymnastic elements,
        which combines descriptive and generative approaches.
   2. Develop a one-dimensional neural network model for classifying frames of gymnastic
        elements.
   3. Create a model of a two-dimensional neural network for classifying frames of gymnastic
        elements.
   4. Develop a method for identifying the parameters of a neural network model.
   5. Create a method for classifying the sequence of frames of gymnastic elements.
   6. Select quality criteria for the method of intellectual classification of gymnastic elements.
   7. Conduct a numerical study of the proposed method for intelligent classification of
        gymnastic elements.

3. Methods and Materials
    3.1. Structure of the intellectual method classification of gymnastic elements
         based on a combination of descriptive and generative approaches

In the proposed method, the outputs of the neural network are considered as the probabilities of
the appearance of the observation symbol (the 𝑡-th frame of the gymnastic element) in the 𝑗-th
state (gymnastic pose) (at the 𝑗-th output of the neural network). The Viterbi dynamic
programming method is applied to a labeled sequence of gymnastic element frames. On the other
hand, the parameters of a neural network can be identified based on a sequence of frames labeled
by the Viterbi method. This combination provides classification probabilities comparable to those
of DTW, discrete and semi-continuous Hidden Markov Models (HMMs), and does not require a
separate neural network for each gymnastic element, as in these methods.
      Main stages of the proposed method:
    1. To initially identify the parameters of the neural network, manually labeled frames of
        gymnastic elements from the database [31] are used. Based on the labeled frames of the
        database for future use in the Viterbi method, the following are calculated:
     • a priori probability 𝑃(𝑠𝑗 ) in the form
                                                         𝑚
                                                𝑃(𝑠𝑗 ) = 𝑗,
                                                         𝑚
   where 𝑚𝑗 is the number of frames marked with state 𝑠𝑗 in the entire set of training data of the
standard database,
   𝑚 is the number of all frames in the entire set of training data of the standard database.
    • the probability of the initial state 𝑠𝑗 for the Bakis HMM model or the HMM model with a
                                                                 1, 𝑗 = 1
        limited transition is determined by the formula, 𝜋̃𝑗 = {           ;
                                                                 0, 𝑗 > 1
                                                                         𝑛
    • probability of transitions between states 𝑎𝑖𝑗 in the form 𝑎𝑖𝑗 = 𝑛𝑖𝑗 ,
                                                                           𝑖
   where nij is the number of any transitions from state 𝑠𝑖 to state 𝑠𝑗 across the entire set of
training data of the standard database,
   𝑛𝑖 is the number of any transitions from the state 𝑠𝑖 across the entire set of training data of the
standard database.
   2.  Frames of gymnastic elements are recognized using a neural network model, i.e.
       segmentation is performed.
   3. A modified Viterbi algorithm is used, which optimizes segmentation (sequence of states).
       For this algorithm, the probability distribution of the occurrence of an observation symbol
       𝒐𝑡 (𝑡-th frame) in the 𝑗-th state is pre-calculated 𝑏𝑗 (𝒐𝑡 ) according to Bayes' rule as an
                                           p( s j | ot ) P(ot )
       emission probability p(ot | s j ) =                      , where the posterior probability 𝑝(𝑠𝑗 |𝒐𝑡 )
                                                 P( s j )
       is the output of the 𝑗-th neuron of the neural network, the prior probability 𝑃(𝒐𝑡 ) is fixed
       and can be omitted,
   4. The parameters of the neural network model are identified using frame markers of
       gymnastic elements (segmentation result) obtained using a modified Viterbi algorithm.
   5. For a given subject area, frames of gymnastic elements are recognized using a neural
       network model.
   6. If the recognition error of the neural network exceeds the threshold, then go to step 3.
   Next, we consider models of neural networks that mark frames of gymnastic elements.

    3.2. One-dimensional neural network for classifying frames of gymnastic
         elements based on a multilayer perceptron

Figure 1 shows a one-dimensional classification neural network based on a multilayer perceptron
(MLP), which is a non-recurrent static multilayer neural network containing two hidden layers
and an output layer. The classes are separated by hyperplanes.
      For MLP, error-correction-based learning (supervised learning) is used in batch mode, and
the Adam algorithm was used in the work.


                                   …            …                         …

                                   …            …                         …


                                   …            …                         …


Figure 1: MLP based 1D neural network model

        One-dimensional neural network model is presented as follows:
          (0)
        𝑦𝑖 = 𝑥𝑖 ,
       (𝑘)          (𝑘)  (𝑘)    (𝑘)          (𝑘−1)
                                               (𝑘) (𝑘−1)
     𝑦𝑗 = 𝑓 (𝑘) (𝑠𝑗 ), 𝑠𝑗 = 𝑏𝑗 + ∑𝑁    𝑖=1   𝑤𝑖𝑗 𝑦𝑖      , 𝑗 ∈ 1, 𝑁 (𝑘) , 𝑘 ∈ 1, 𝐿,
where 𝑁 (𝑘) is the number of neurons in the 𝑘-th layer,
     𝑘 is the layer number,
     𝐿 is the number of layers,
      (𝑘)
     𝑏𝑗 is the threshold of the 𝑗-th neuron a in the 𝑘-th layer,
         (𝑘)
        𝑤𝑖𝑗 is the connection weight from the i -th neuron to j-th neuron on 𝑘-th layer,
         (𝑘)
        𝑦𝑗 is the output of the 𝑗-th neuron on the 𝑘-th layer,
        𝑓 (𝑘) is the activation function of neurons of the 𝑘-th layer.
        ReLU was used as quality, 𝑓 (𝑘) softmax was used as quality 𝑓 (𝐿) .
      3.3     Two-dimensional neural network for classifying frames of gymnastic
elements based on 2D LeNet
Figure 2 shows a two-dimensional neural network for classification based on 2D LeNet, which is
a non-recurrent dynamic neural network and has a hierarchical structure.


Figure 2: Model LeNet based 2D neural network

       2D_LeNet is a special class of multilayer perceptron. It is formed by an input layer, which
consists of a single receptor plane, alternating convolutional layers (corresponding to
neocognitron 𝑆-layers) and downsampling (pooling) layers (corresponding to neocognitron C-
layers), a sequence of fully connected layers (hidden MLP layers) and an output layer. The
convolutional layer consists of convolutional planes. The downsampling layer consists of
downsampling planes. Each convolutional plane consists of convolutional cells, each
downsampling plane consists of downsampling cells. The convolutional layer reduces the shift
sensitivity of image elements. A downsampling layer reduces the dimensionality of an image. The
connection area of the cell plane of the previous layer is associated with a cell of the cell plane of
the current layer. Geometrically, the communication area is usually a square. For all planes of one
layer it has the same size. All cells of the same plane of cells of the current layer associated with
the connection areas of the plane of cells of the previous layer have the same weights. The cell
plane communication regions of the downsampling layer overlap. Because of this, one cell in the
downsampling layer's cell plane entering different overlapping communication regions can
activate multiple cells in the convolutional layer's cell plane. Communication area for 2D LeNet
does not go beyond the boundaries of the plane, so the size of the convolutional layers gradually
decreases.
       For this neural network model, training is used based on error correction (supervised
learning) in batch mode, and the Adam algorithm was used in the work.

        3.2.1. Neural network model

Let 𝜈 be the position in the connection region, 𝜈 = (𝜈𝑥 , 𝜈𝑦 ), 𝐾𝐼 be the number of cell planes in the
input layer 𝐼 (for RGB images 3), 𝐾𝑠𝑙 be the number of cell planes in the downsampling layer 𝑆𝑙 ,
𝐾𝑐𝑙 be the number of cell planes in the convolutional layer 𝐶𝑙 , 𝐴𝑙 be the connection region of the
layer plane 𝑆𝑙 , 𝐿̑ and be the number of convolutional (or downsampling) layers, 𝐿̆ – the number
of fully connected layers.
    1. 𝑙 = 1.
    2. Calculate the output signal for the convolutional layer
        uc (m, i ) = f c (hc (m, i )) 𝑚 ∈ {1, . . . , 𝑁𝑐𝑙 }2, 𝑖 ∈ 1, 𝐾𝑐𝑙 ,
        l             l   l


                               KI

                     c1
                     b   (i ) +      wc1 ( , k , i) x(m +  , k ),      l =1
                               k =1 v A1
      hcl (m, i ) =            Ks
                                                                                 ,
                    b (i ) + l −1
                     cl              wcl ( , k , i)usl−1 (m +  , k ), l  1
                               k =1 v Al −1
     where 𝑤𝑐1 (𝜈, 𝑘, 𝑖) is the weight of the connection from the 𝜈-th position in the connection area
of the 𝑘-th plane of the cells of the input layer I to the 𝑖-th plane of cells of the convolutional layer
𝐶1 ,
     𝑤𝑐𝑙 (𝜈, 𝑘, 𝑖) is the weight of the connection from the 𝜈-th position in the connection area of the
𝑘-th plane of cells of the downsampling layer 𝑆𝑙−1 to the 𝑖-th plane of cells of the convolutional
layer 𝐶𝑙 ,
     𝑢𝑐𝑙 (𝑚, 𝑖) is the output of the cell in the 𝑚-th position in the 𝑖-th plane of the cells of the
convolutional layer 𝐶𝑙 ,
     𝑓𝑐𝑙 is the activation function of the neurons of the convolutional layer 𝐶𝑙 .
     3. Calculate the output signal for the downsampling layer (halving the scale)
                       1
          usl (m, k ) =  ucl (2m +  , k ) , 𝑚 ∈ {1, . . . , 𝑁𝑠𝑙 }2, 𝑘 ∈ 1, 𝐾𝑠𝑙 ,
                       4 {0,1}2
    where 𝑤𝑠𝑙 (𝑘, 𝑘) is the connection weight from the 𝑘-th plane of cells of the convolutional layer
𝐶𝑙 to the 𝑘-th plane of cells of the downsampling layer 𝑆𝑙 ,
    𝑢𝑠𝑙 (𝑚, 𝑘) is the output of the cell in the 𝑚-th position in the 𝑘-th plane of cells of the
downsampling layer 𝑆𝑙 .
    4. If 𝑙 ≤ 𝐿̑, then 𝑙 = 𝑙 + 1, go to 2.
   5.    Output calculation for a fully connected layer: ud ( j ) = f d (hd ( j )) , j 1, N dl , l 1, L ,
                                                                                     l         l   l


                               Ks

                   bd1 ( j ) +   wd1 (v, k , j )usL (v, k ), l = 1
                                   L


                               k =1 v{1,..., N s }2
       hdl ( j ) =                               L
                                                                      ,
                               N dl −1

                   bdl ( j ) +  wdl ( z , j )udl −1 ( z ),    l 1
                                z =1

   where 𝑤𝑑1 (𝜈, 𝑖, 𝑗) is the weight of the connection from the 𝜈-th position in the connection area
of the 𝑘-th plane of cells of the downsampling layer 𝑆𝐿̑ to the 𝑘-th neuron on the first fully
connected one layer 𝐷1,
   𝑤𝑑𝑙 (𝑧, 𝑗) is the weight of connection from the 𝑖-th fully connected neuron layer 𝐷𝑙−1 to 𝑗-th
neuron on the 𝑙-th fully connected layer 𝐷𝑙 ,
   𝑢𝑑𝑙 (𝑗) is the output of the 𝑗-th fully connected neuron layer 𝐷𝑙 ,
   𝑓𝑑𝑙 is the activation function of fully connected neurons layer 𝐷𝑙 .
   6. Output calculation for output layer
                                                                        Nd

                                                                         w ( z, j )u ( z) ,
                                                                             L

        uo ( j ) = f o (ho ( j )) , j 1, N o , ho ( j ) = bo ( j ) +            o       dL
                                                                        z =1

   where 𝑤𝑜 (𝑧, 𝑗) is the weight of connection from the 𝑖 -th fully connected neuron layer 𝐷𝐿 to the
𝑗-th neuron on the output layer O,
   𝑢𝑜 (𝑗) is the output of the 𝑗-th neuron of the output layer 𝑂,
   𝑓𝑜 is the activation function of the neurons of the output layer 𝑂.
   ReLU was used as quality 𝑓𝑐𝑙 , 𝑓𝑑𝑙 softmax was used as quality 𝑓𝑜 .

         3.2.2. Method for identifying parameters of a neural network model based
              on the Adam algorithm

   step 1. Initialization.
      step 1.1. The initial vector of weights is specified 𝒘(0).
      step 1.2. The initial vector of the first moments is specified 𝒎(−1) = 𝟎.
      step 1.3. The initial vector of the second moments is specified 𝒗(−1) = 𝟎.
      step 1.4. The parameter is set 𝜂 to determine the learning rate (usually 𝜂 = 0.001), the
           decay rates of the first and second moments 𝛽1 and 𝛽2 , respectively, 𝛽1 , 𝛽2 ∈ [0,1)
           (usually 𝛽1 = 0.9 and 𝛽2 = 0.999), as well as the stability parameter 𝜀 to prevent
           division by zero (usually 𝜀 = 10−8).
      step 1.5. The initial gradient is calculated 𝒈(0).
      step 1.6. 𝑛 = 0.
   step 2. The vector of first moments is calculated based on the exponential moving average
   𝒎(𝑛) = 𝛽1 𝒎(𝑛 − 1) + (1 − 𝛽1 )𝒈(𝑛).
   step 3. The vector of second moments is calculated based on the exponential moving average
   𝒗(𝑛) = 𝛽2 𝒗(𝑛 − 1) + (1 − 𝛽2 )𝒈2 (𝑛).
   step 4. The vector of weights is calculated (the moments are corrected due to their
        initialization to zero and the learning step is scaled)
                                                                               𝜂𝒎̑(𝑛)
      𝒎̑(𝑛) = 𝒎(𝑛)/(1 − 𝛽1𝑛+1 ), 𝒗̑(𝑛) = 𝒗(𝑛)/(1 − 𝛽2𝑛+1 ), 𝒘(𝑛 + 1) = 𝒘(𝑛) −         .
                                                                                       √𝒗̑(𝑛)+𝜀


         3.2.3. Method for classifying a sequence of frames of gymnastic elements
              based on the Viterbi algorithm

To avoid numerous multiplications during the operation of the Viterbi algorithm, you can
logarithmize all the parameters of the model and move from multiplications to addition, since
addition is much simpler to implement and faster to calculate. The modified Viterbi algorithm is
described as follows:
   1. Preprocessing:
   𝜋̑ 𝑗 = 𝑙𝑛 𝜋𝑗 , 1 ≤ 𝑗 ≤ 𝑁, 𝑏𝑗̑ (𝑜𝑡 ) = 𝑙𝑛 𝑏𝑗 (𝒐𝑡 ), 1 ≤ 𝑗 ≤ 𝑁, 1 ≤ 𝑡 ≤ 𝑇, 𝑎̑ 𝑖𝑗 = 𝑙𝑛 𝑎𝑖𝑗 , 1 ≤ 𝑖, 𝑗 ≤ 𝑁.
   2. Initialization:
   𝛿1̑ (𝑗) = 𝜋̑ 𝑗 + 𝑏𝑗̑ (𝒐1 ), 1 ≤ 𝑗 ≤ 𝑁, 𝜓1 (𝑗) = 0, 1 ≤ 𝑗 ≤ 𝑁.
   3. Recursion:
      ̑ (𝑗) = 𝑚𝑎𝑥 [𝛿𝑡̑ (𝑖) + 𝑎̑ 𝑖𝑗 ] + 𝑏𝑗̑ (𝒐𝑡+1 ),
   𝛿𝑡+1
                 1≤𝑖≤𝑁
   𝜓𝑡+1 (𝑗) = 𝑎𝑟𝑔 𝑚𝑎𝑥 [𝛿𝑡̑ (𝑖) + 𝑎̑ 𝑖𝑗 ], 1 ≤ 𝑡 ≤ 𝑇 − 1,1 ≤ 𝑗 ≤ 𝑁.
                        1≤𝑖≤𝑁
   4. End:
   𝑙𝑛 𝑃 = 𝑚𝑎𝑥 [𝛿̑𝑇 (𝑖)], 𝑞𝑇∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 [𝛿̑𝑇 (𝑖)].
            1≤𝑖≤𝑁                   1≤𝑖≤𝑁
   5. Restoring the path (sequence of states):
                ∗
   𝑞𝑡∗ = 𝜓𝑡+1 (𝑞𝑡+1 ), 𝑡 = 𝑇 − 1, 𝑇 − 2, . . . ,1.

         3.2.4. Quality criteria selection for the method of intellectual classification
              of gymnastic elements

In the work, to assess the identification of neural networks parameters, the following were
selected:
    • accuracy criterion
                     1
      𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝐼 ∑𝐼𝑖=1[𝒅𝑖 = 𝒚̑ 𝑖 ] → 𝑚𝑎𝑥,
                                        𝑊
                1, 𝑗 = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑦𝑖𝑧
                            𝑧
      𝑦̑ 𝑖𝑗 = {                    .
                0, 𝑗 ≠ 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑦𝑖𝑧
                                𝑧
    •    categorical cross-entropy criterion
                 1
        𝐶𝐶𝐸 = − ∑𝐼𝑖=1 ∑𝐾
                    𝐼    𝑗=1 𝑑𝑖𝑗 𝑙𝑛 𝑦𝑖𝑗 → 𝑚𝑖𝑛,
                                            𝑊
   where 𝒚𝑖 is the 𝑖-th vector according to the model, 𝑦𝑖𝑗 ∈ [0,1],
   𝒅𝑖 is the i -th test vector, 𝑑𝑖𝑗 ∈ {0,1},
   𝐼 is the power of the training set,
   𝐾 is the number of classes (neurons in the output layer),
   𝑊 is the vector of weights;
    • performance criterion
                                             𝑇 → 𝑚𝑖𝑛.
4. Experiment
A numerical study was carried out based on the dataset [31]. RG Rotate Dataset consists of 49
examples of performing a turn in the back split position without using the hands, with the torso
horizontal (Split back without help, trunk horizontal). The danne were collected from the video
broadcast of the final stage of the 2021 Olympic Games in Tokyo. The examples consist of
elements performed by 8 different gymnasts with 4 types of apparatus. Each example consists of
an ordered set of images, the number of images in the example depends on the duration of the
athlete’s performance of the element. This structure allows you to store changes in body position
when performing a rotation element. One second of execution is described by 30 frames. The data
set is divided into a training set of 39 examples and a test set of 10 examples of element execution.
The total dataset size for the 49 examples was 7,355 record images. No preprocessing of the data
set was performed. From the datasets, 80% of the images were randomly selected for the training
set and 20% of the images for the validation and test sets. Due to the fact that deep neural
networks do not contain recurrent connections, training was carried out using GPU. To
implement the proposed neural networks, the tensorflow package was used, Google was chosen
as the software environment Collaboratory.
       The frames of one example of execution show the body positions when performing a
rotation element (Fig. 3).


Figure 3: Example figure caption

    Table 1 presents the structure of a neural network model based on MLP, where K is the
number of classes.

Table 1
MLP- based neural network model
             Layer type                              Input size
             Input                                   1280x720
             Resizing                                32x32
             Full connect or Dense (1 layer)         1024
             Full connect or Dense (2 layer)         1024
             Output (Full connect or Dense)          K
      Table 2 presents the structure of a neural network model based on 2 D LeNet, where K is
the number of classes.
Table 2
2D neural network model LeNet
             Layer type                                                                                                                                   Input size
             Input                                                                                                                                        1280x720
             Resizing                                                                                                                                     32x32
             Conv2D                                                                                                                                       32x32x4
             MaxPooling2D                                                                                                                                 16x16x4
             Conv2D                                                                                                                                       16x16x16
             MaxPooling2D                                                                                                                                 8x8x16
             Flatten                                                                                                                                      1024
             Full connect or Dense (1 layer)                                                                                                              1024
             Full connect or Dense (2 layer)                                                                                                              1024
             Output (Full connect or Dense)                                                                                                               K

5. Results
       Fig. 4 shows the dependence of losses (based on categorical entropy) on the number of
iterations for the three-layer MLP model.

                            3,5


                                    3,089
                                                            3,018
                                                   2,973
                             3


                                                                     2,639


                            2,5
 Categorical Crossentropy


                                                                               2,278   2,246

                                                                                                 2,060

                             2
                                                                                                           1,820


                                                                                                                      1,482
                            1,5                                                                                                     1,407


                                                                                                                                                  1,021
                             1


                                                                                                                                                               0,575

                            0,5                                                                                                                                                          0,382        0,403
                                                                                                                                                                            0,357

                                                                                                                                                                                                                 0,162
                                                                                                                                                                                                                          0,048            0,076    0,039
                                                                                                                                                                                                                                  0,019
                             0
                                        1           2        3        4         5       6         7            8         9           10            11           12            13          14           15         16       17      18       19       20

                                                                                                                                     iteration

Figure 4: Losses dependence (based on categorical entropy) on the number of iterations for a
model based on a three-layer MLP
      Fig. 5 shows the accuracy dependence on the number of iterations for a model based on a
three-layer MLP.
                                             1,2


                                                                                                                                                                                                                             1      1              0,9868
                                              1                                                                                                                                                                  0,9474                   0,9605
                                                                                                                                                                               0,9079
                                                                                                                                                                                           0,8816
                                                                                                                                                                     0,8289                            0,8158
                                             0,8
                                                                                                                                                        0,7105
                                  Accuracy


                                             0,6
                                                                                                                                            0,5132

                                                                                                                              0,4211
                                             0,4                                                                   0,3421


                                                                                                         0,2368

                                             0,2                                               0,1447


                                                            0,0428            0,0526 0,0658
                                                   0,0263            0,0263

                                              0
                                                        1        2     3        4      5         6         7         8          9            10           11           12           13           14         15      16      17     18      19       20
                                                                                                                                              iteration


Figure 5: Accuracy dependence on the number of iterations for a model based on a three-layer
MLP
       Figure 6 shows the losses dependence (based on categorical entropy) on the number of
iterations for the 2 D model LeNet.
                                              5     4,740

                                             4,5


                                              4


                                             3,5
                  Categorical Crossentropy


                                                             3,089
                                              3                        2,872


                                                                                2,432
                                             2,5


                                              2                                          1,736

                                             1,5                                                  1,241

                                              1

                                                                                                            0,447
                                             0,5
                                                                                                                      0,198
                                                                                                                                 0,096 0,160 0,046 0,043 0,035
                                                                                                                                                               0,013 0,014 0,010 0,004 0,003 0,003 0,002
                                              0
                                                         1         2      3       4       5         6        7         8          9         10        11        12        13        14        15        16        17        18       19   20
                                                                                                                                            iteration


Figure 6: Losses dependence (based on categorical entropy) on the number of iterations for a 2D
model LeNet

      Figure 7 shows the dependence of accuracy on the number of iterations for a model based
on 2 D LeNet.
            1,2


                                                                                                                        0,9868     0,9868        1         1         1         1         1         1         1         1         1        1
             1                                                                                               0,9737
                                                                                                   0,9211


            0,8


                                                                                         0,6579
 Accuracy


            0,6
                                                                                0,5132


            0,4


            0,2                                                        0,1579
                                                   0,1053
                           0,0658                            0,0658


             0
                                        1            2         3         4        5        6         7         8           9          10         11        12        13        14        15        16        17        18        19       20

                                                                                                                                      iteration

Figure 7. Accuracy dependence on the number of iterations for a 2D model LeNet

      Figure 8 shows the dependence of the loss (based on categorical entropy) on the number
of pairs “convolutional layer – downsampling layer” for the 2D model LeNet.
                                   0,016


                                           0,014
                                   0,014
                                                                            0,013

                                   0,012
        Categorical Crossentropy


                                    0,01


                                   0,008


                                   0,006


                                   0,004

                                                       0,002
                                   0,002


                                      0
                                           1 pair      2 pair              3 pair
                                                    number pair


 Figure 8: Losses dependence (based on categorical entropy) on the number of convolutional
                layer–downsampling layer pairs for a 2D-based model LeNet

6. Discussions
  As a result of the numerical study, the following was established:
  •    the minimum number of iterations for a neural network model based on a three-layer
       MLP in terms of losses (based on categorical entropy) (according to Fig. 3) and accuracy
       (according to Fig. 4) is 18;
  •    minimum number of iterations for a 2D neural network model LeNet in terms of loss
       (based on categorical entropy) (according to Fig. 5) and accuracy (according to Fig. 6) is
       11;
  •    the best number of “convolutional layer – downsampling layer” pairs for a 2D neural
       network model LeNet in terms of loss (based on categorical entropy) is 2 (according to
       Fig. 7).
  To prevent overfitting, the KFold cross-entropy method with a number of folds of 5 was used.

7. Conclusions
  1.   To solve the problem of increasing the efficiency of classification of gymnastic elements,
       corresponding artificial intelligence methods were investigated. These studies have
       shown that today the most effective is the use of hidden Markov models (generative
       approach) and neural networks (descriptive approach).
  2.   The created method has the following advantages: the input image is not square, which
       expands the scope of application; the number of pairs “convolutional layer –
       downsampling layer” is determined empirically, which increases the accuracy of
       identification by model; the number of planes is defined as the quotient of the number of
       cells in the input layer divided by two to the power of two (the power is equal to twice the
       number of the pair "convolutional layer - downsampling layer") to preserve the total
       number of cells in the layer after downsampling, which halves the size of the layer planes
       by height and width, which automates the determination of the structure of the model
       layers; the use of a neural network allows us to label frames of gymnastic elements, and
       the use of a generative approach allows the resulting sequence of labeled frames of
       gymnastic elements analyze effectively.
  3.   Further prospects for research are the use of the proposed method of intelligent
       classification for various intelligent visual image recognition systems.
References
[1] A. Neskorodieva, M. Strutovskyi, A. Baiev, O. Vietrov. Real-time Classification, Localization
     and Tracking System (Based on Rhythmic Gymnastics), in: Proceedings of the IEEE 13th
     International Conference on Electronics and Information Technologies, 14.11(2023): 11–16.
     doi:10.1109/elit61488.2023.10310664
[2] S. Goumiri, D. Benboudjema, and W. Pieczynski. ”A new hybrid model of convolutional neural
     networks and hidden Markov chains for image classification”. Neural Computing and
     Applications, volume 35, May (2023): 17987–18002. doi:10.1007/s00521-023-08644-4.
[3] B. Mor, S. Garhwal, A. Kumar. A Systematic Review of Hidden Markov Models and Their
     Applications. Arch Computat Methods Eng Vol. 28, (2021): 1429–1448.
     doi.org/10.1007/s11831-020-09422-4.
[4] J. Zhang, J. Sun, J. Wang, Z. Li, X. Chen. An object tracking framework with recapture based on
     correlation filters and siamese networks. Comput. Electr. Eng. 98, 107730 (2022).
     doi:10.1016/j.compeleceng.2022.107730.
[5] T. Ding, K. Feng, Ya. Wei, Yu Han, T. Li. DeoT: an end-to-end encoder-only Transformer object
     detector. Journal of Real-Time Image Processing. Vol.20, Issue 1 (2023).
     doi:10.1007/s11554-023-01280-0.
[6] L Liu, B. Lin, Y. Yang. Moving scene object tracking method based on deep convolutional
     neural network. Alexandria Engineering Journal. Vol.86 (2024): 592-602
     doi:10.1016/j.aej.2023.11.077.
[7] R. Solovyev, W. Wang, T. Gabruseva: Weighted boxes fusion: Ensembling boxes from different
     object detection models. Image and Vision Computing. Vol. 107 (2021): 1-6.
     doi:10.1016/j.imavis.2021.104117.
[8] T. Neskorodieva., E. Fedorov. Method for automatic analysis of compliance of expenses data
     and the enterprise income by neural network model of forecast, in: Proceedings of the 2nd
     International Workshop on Modern Machine Learning Technologies and Data Science. CEUR
     Workshop,           2631,         Lviv-Shatsk,          2020,         pp.      145–158.        URL:
     https://www.scopus.com/inward/record.uri?eid=2-s2.0-
     85088880635&partnerID=40&md5=c0564b0cbe18017126f328fd3a4779c4
[9] L. Wan, Y. Chen, H. Li, and C. Li, “Rolling-Element Bearing Fault Diagnosis Using Improved
     LeNet-5 Network.” Sensors, vol. 20, 2020, no. 6, p. 1693. doi:10.3390/s20061693.
[10] X. Ouyang et al., “A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action
     Recognition.” IEEE Access,vol. 7, (2019): 40757–40770. doi:10.1109/access.2019.2906654.
[11] T.-Y. Kim and S.-B. Cho, “Predicting residential energy consumption using CNN-LSTM neural
     networks.” Energy, vol. 182, pp. 72–81. Sep. 2019. doi:10.1016/j.energy.2019.05.230.
[12] R. Yang et al. CNN-LSTM deep learning architecture for computer vision-based modal
     frequency detection. Mechanical Systems and Signal Processing, vol. 144, (2020): 106885.
     doi:10.1016/j.ymssp.2020.106885.
[13] A. Kumar, Z. J. Zhang, and H. Lyu. Object detection in real time based on improved single shot
     multi-box detector algorithm. EURASIP Journal on Wireless Communications and
     Networking, vol. 2020, no. 1, 10(2020). doi: 10.1186/s13638-020-01826-x.
[14] W. Tang, J. Sun, Sh. Wang, Yu. Zhang. Review of AlexNet for Medical Image Classification. –
     arXiv preprint arXiv.2311.08655. 2023: 1-13.
[15] G.S.Ch. Kumar, R.K. Kumar, K.P.V. Kumar, N.R. Sai, M. Brahmaiah. Deep residual convolutional
     neural Network: An efficient technique for intrusion detection system. Expert Systems With
     Applications Vol. 238 (2024): 1-16. doi:10.1016/j.eswa.2023.121912.
[16] S. Wang, J. Tian, P. Liang, X. Xu, Zh. Yu, S. Liu, D. Zhang. Single and simultaneous fault diagnosis
     of gearbox via wavelet transform and improved deep residual network under imbalanced
     data. Engineering Applications of Artificial Intelligence. Vol. 133 (2024): 1-17.
     doi:10.1016/j.engappai.2024.108146
[17] F. E. L. da Cruz, G. Corso, G. Z. dos Santos Lima, S. R. Lopes, and T. de Lima Prado. Statistical
     inference for microstate distribution in recurrence plots. Physica D: Nonlinear Phenomena,
     vol. 459. (2024):134048. doi:10.1016/j.physd.2023.134048.
[18] Detection of COVID-19 chest X-ray using support vector machine and convolutional neural
     network. Communications in Mathematical Biology and Neuroscience, 2020,
     doi:10.28919/cmbn/4765.
[19] Jia. Jia, P. Lv, X. Wei, W. Qiu. SNO-DCA: A model for predicting S-nitrosylation sites based on
     densely connected convolutional networks and attention mechanism. Heliyon Vol.10 (2024):
     1-11. doi:10.1016/j.heliyon.2023.e23187
[20] F.B.N. Barber, A.E. Oueslati. Human exons and introns classification using pre-trained
     Resnet-50 and GoogleNet models and 13-layers CNN model. Journal of Genetic Engineering
     and Biotechnology Vol.22 (2024):1-8. doi:10.1016/j.jgeb.2024.100359
[21] H. Wang, Sh. Xu, K.-b. Fang, Zh.-Sh. Dai, G.-Zh. Wei, L.-F. Chen. Contrast-enhanced magnetic
     resonance image segmentation based on improved U-Net and Inception-ResNet in the
     diagnosis of spinal metastases. Journal of Bone Oncology. Vol.42 (2023): 1-9.
     doi:10.1016/j.jbo.2023.100498.
[22] M.N. Khan, S. Das, J. Liu. Predicting pedestrian-involved crash severity using inception-v3
     deep learning model. Accident Analysis and Prevention. Vol.197 (2024): 1-17.
     doi:10.1016/j.aap.2024.107457.
[23] X. Tang, F.R. Sheykhahmad. Boosted dipper throated optimization algorithm-based Xception
     neural network for skin cancer diagnosis: An optimal approach. Heliyon. Vol.10 (2024): 1-
     21. doi:10.1016/j.heliyon.2024.e26415.
[24] D. Garg, G.K. Verma, A.K. Singh. EEG-based emotion recognition using MobileNet Recurrent
     Neural Network with time-frequency features. Applied Soft Computing. Vol.154 (2024): 1-
     14. doi:10.1016/j.asoc.2024.111338.
[25] L. Geng, Y. Hu, Z. Xiao, and J. Xi. Fertility Detection of Hatching Eggs Based on a Convolutional
     Neural Network. Applied Sciences, vol. 9, no. 7, (2019): 1408. doi:10.3390/app9071408.
[26] A.M. Rifai, S. Raharjo, E. Utami, D. Ariatmanto. Analysis for diagnosis of pneumonia symptoms
     using chest X-ray based on MobileNetV2 models with image enhancement using white
     balance and contrast limited adaptive histogram equalization (CLAHE). Biomedical Signal
     Processing and Control. Vol.90 (2024): 1-8. doi.org/10.1016/j.bspc.2023.105857.
[27] T. Neskorodieva, E. Fedorov, M. Chychuzhko, and V. Chychuzhko, Metaheuristic method for
     searching quasi-optimal route based on the ant algorithm and annealing simulation.
     Radioelectronic and computer systems, no. 1, (2022): 92–102. doi:10.32620/reks.2022.1.07.
[28] Yi. Liu, Zh. Wang, R. Wang, J. Chen, H. Gao. Flooding-based MobileNet to identify cucumber
     diseases from leaf images in natural scenes. Computers and Electronics in Agriculture. v. 213,
     (2023): 1-12. doi:10.1016/j.compag.2023.108166
[29] P.A. Arjun, S. Suryanarayan, R.S. Viswamanav, S. Abhishek, T. Anjali. Unveiling Underwater
     Structures: MobileNet vs. EfficientNet in Sonar Image Detection. Procedia Computer Science.
     v. 233 (2024): 518-527. doi:10.1016/j.procs.2024.03.241
[30] T. Neskorodieva, E. Fedorov. Method for Automatic Analysis of Compliance of Settlements
     with Suppliers and Settlements with Customers by Neural Network Model of Forecast.”
     Mathematical Modeling and Simulation of Systems (MODS’2020). (2020): 156–165.
     doi:10.1007/978-3-030-58124-4_15
[31] Dataset RG Rotate, 2024. URL:
     https://drive.google.com/file/d/1HpLAu5esBvsi0VZ0YFywdzlc71B_KQcR/view?usp=shari
     ng