<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of the limits of model improvement in deep learning and performance saturation assessment⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samat Mukhanov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orken Mamyrbayev</string-name>
          <email>morkenj@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dair Katayev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alikhan Kalmurzayev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhasulan Oteuli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shaim Yakupov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daryn Amrin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Information and Computational Technologies</institution>
          ,
          <addr-line>Shevchenko str. 28 050010 Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>International Information Technology University</institution>
          ,
          <addr-line>Manas St 34/1 050040 Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning models, particularly those used in automatic speech recognition (ASR), generally exhibit diminishing returns when dataset size and resource utilization escalate. This study analyzes the performance saturation observed during ASR model training, using the Whisper-Tiny model to illustrate this trend. The research identifies key factors contributing to performance limitations, including dataset size, model architecture, and resource utilization. As dataset size exceeds 15,000 samples, improvements in Word Error Rate (WER) and Character Error Rate (CER) decline significantly, confirming diminishing returns. The study also examines resource utilization, revealing that training time increases non-linearly with dataset size. While GPU memory usage remains relatively constant, CPU and RAM usage fluctuate, indicating potential inefficiencies. To address computational constraints, techniques such as streaming data processing and fixed-length audio segments are implemented to enhance training efficiency. Additionally, evaluation bottlenecks are mitigated by using fixed test dataset sizes, ensuring quicker and more consistent assessments. Efficient processing strategies, including gradient accumulation and mixed-precision training, are explored to reduce resource consumption without compromising performance. Visualization techniques, such as correlation heatmaps and performance plots, highlight the trade-offs between dataset size, computational cost, and model accuracy. The findings emphasize the importance of balancing resource allocation and data volume to optimize ASR training workflows. By acknowledging and addressing performance saturation, researchers can develop more scalable and efficient ASR models, making advanced speech recognition technology more accessible in resource-constrained environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;paper template</kwd>
        <kwd>machine learning</kwd>
        <kwd>automatic speech recognition (ASR)</kwd>
        <kwd>performance saturation</kwd>
        <kwd>diminishing returns</kwd>
        <kwd>Word Error Rate (WER)</kwd>
        <kwd>Character Error Rate (CER)</kwd>
        <kwd>training efficiency</kwd>
        <kwd>resource optimization</kwd>
        <kwd>computational cost</kwd>
        <kwd>deep learning</kwd>
        <kwd>model scalability</kwd>
        <kwd>dataset size</kwd>
        <kwd>GPU memory</kwd>
        <kwd>WhisperTiny</kwd>
        <kwd>streaming data processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning models typically improve as they are trained with more data and computational
power. However, at some point, this improvement diminishes or stops altogether. This
phenomenon is known as diminishing returns—where further increases in training time, data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
or model complexity result in minimal or no gains in performance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Understanding when and why this happens is crucial for optimizing machine learning
workflows. Continuing training beyond a model’s performance limits leads to wasted time and
resources without tangible benefits. This article explores the primary reasons why a model ceases
to improve, how to recognize these signs, and potential solutions.</p>
      <p>Most standard machine learning models improve with additional data and computational
power. However, at some point, this improvement slows down or stops altogether. This
phenomenon is known as diminishing returns—where extra training time, data, or model
complexity leads to only marginal gains. Understanding when and why this happens is crucial for
optimizing machine learning workflows. Training beyond a model’s performance limits wastes
time and resources without yielding any meaningful benefits.</p>
      <p>
        This issue is particularly relevant for speech recognition models, as they typically require vast
amounts of data and computational resources [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Advances in Automatic Speech Recognition
(ASR) have been closely tied to progress in deep learning, particularly through more advanced
recurrent and convolutional neural networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These improvements have enabled models to
better understand human speech, accounting for variations in accents, background noise, and
speaking styles, and low-resource languages [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        However, even state-of-the-art models face performance saturation. Identifying the key factors
contributing to this limitation—such as dataset size, model architecture, and training strategies—
can help researchers build more efficient training pipelines [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This article explores these
factors, drawing insights from recent studies and experiments, and offers practical
recommendations for optimizing ASR workflows.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review or Problem Statement</title>
      <p>Training a machine learning model involves teaching it to recognize patterns in data to make
accurate predictions or decisions. The process typically includes the following key steps:</p>
      <sec id="sec-2-1">
        <title>Process</title>
      </sec>
      <sec id="sec-2-2">
        <title>Advantages</title>
      </sec>
      <sec id="sec-2-3">
        <title>Disadvantages</title>
        <sec id="sec-2-3-1">
          <title>Visualizing the Tools like</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Enables visualization</title>
        </sec>
        <sec id="sec-2-3-3">
          <title>Requires additional</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Process</title>
        <sec id="sec-2-4-1">
          <title>Collecting and Preparing Data</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Data Preprocessing</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Model Selection and Architecture</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Model Training</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>Evaluation and</title>
          <p>
            Testing
Datasets such as
LibriSpeech provide
spoken language
Recordings [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
          <p>
            Cleaning noise,
normalizing values,
and converting data
for Processing [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
Use of RNNs and
CNNs for sequential
data processing
in speech recognition
[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
          <p>
            Optimization
algorithms (e.g.,
SGD) iteratively
minimize prediction
errors [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
          </p>
          <p>Metrics like WER
and CER measure
model performance
on unseen Data [15].</p>
        </sec>
        <sec id="sec-2-4-6">
          <title>Captures variations in accents, speaking styles, and noise conditions.</title>
        </sec>
        <sec id="sec-2-4-7">
          <title>Improves model input quality and extraction of relevant features.</title>
        </sec>
        <sec id="sec-2-4-8">
          <title>Effective in capturing sequential relationships in speech data.</title>
        </sec>
        <sec id="sec-2-4-9">
          <title>Allows continuous performance improvement with additional iterations.</title>
        </sec>
        <sec id="sec-2-4-10">
          <title>Provides insight into model accuracy and generalization capabilities.</title>
          <p>Large
require
storage
processing.
datasets
significant
and</p>
        </sec>
        <sec id="sec-2-4-11">
          <title>Noise removal may inadvertently lose useful information.</title>
        </sec>
        <sec id="sec-2-4-12">
          <title>High computational cost and risk of overfittingcomplex models.</title>
        </sec>
        <sec id="sec-2-4-13">
          <title>Overtraining may lead to diminishing returns and overfitting.</title>
        </sec>
        <sec id="sec-2-4-14">
          <title>Evaluation may not capture all realworld speech conditions.</title>
          <p>TensorBoard track
model performance
over time [16], [17].</p>
          <p>of metrics and
training dynamics
for better insights.</p>
          <p>resources and time
for visualization
setup.</p>
          <p>
            This Figure 2.1 – Correlation heatmap provides key insights into the relationships between
different training metrics, helping to understand diminishing returns in model improvement.
Strong positive correlations (e.g., between Train WER and Train CER, and between Test WER and
Test CER) confirm that these error metrics behave similarly. The inverse correlation between
Samples and Test WER (-0.92) suggests that increasing training data leads to better generalization,
but diminishing gains are evident as correlation weakens for Valid WER. Notably, Training Time
and RAM Usage exhibit high positive correlation (0.79), showing the growing computational cost
with more samples. Meanwhile, Batch Processing Time does not always scale linearly with
Samples, hinting at potential inefficiencies [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. These findingssupport the argument that beyond a
certain point, additional training data increases computational cost disproportionately to
performance gains.
          </p>
          <p>The heatmap in Figure 2.1 presents the correlation between various performance and
computational metrics used in evaluating the Whisper-Tiny model. The correlation coefficients
range from -1 to 1, where positive values indicate a direct relationship, and negative values
indicate an inverse relationship.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>3.1. Performance Saturation Analysis
As the dataset size increases from 5,000 to 50,000 samples, a pattern of diminishing returns
becomes evident. Initially, with 5,000 samples, the test Character Error Rate (CER) is relatively
Test WER5000=0.5300
Test WER5000=0.4559
CER=[  + + ]∗ 100</p>
      <p>T 5000=818.23 seconds</p>
      <p>T 50500=2644.48 seconds
3.2. Training Time Scaling</p>
      <p>Training time increases non-linearly with dataset size:
high, reflecting the model's struggle with limited training data. For instance, at this stage, the test
Word Error Rate (WER) is 0.5300, indicating that the model misinterprets over 50% of words.</p>
      <p>With an increase in sample size, the CER gradually decreases, demonstrating that the model
benefits from additional training data. However, beyond a certain threshold, roughly between
30,000 and 40,000 samples, improvements in performance become marginal. This suggests that
while adding more data helps in the earlier stages, the model reaches a saturation point where
additional samples contribute little to further reducing the error rate.</p>
      <p>The persistent gap between the test and training CER values suggests that the model may still
be overfitting, learning patterns specific to the training set while struggling with generalization.
Potential strategies to address this issue include refining the model architecture, optimizing
hyperparameters, or incorporating regularization techniques. Furthermore, improvements in data
quality, rather than sheer quantity, might yield better gains in accuracy at this stage:
(1)
(2)
(3)
(4)
(5)
Beyond 15,000 samples, improvements slow significantly.</p>
      <p>WER= + + =  + +</p>
      <p>N  + +
where S – is the number of substitutions, D – is the number of deletions, I – is the number of
insertions, C – is the number of correct words, N – is the number of words in the reference
(N=S+D+C)
as:
samples per second = effective batch sizе × steps per second</p>
      <p>gradient accumulation steps
where:
effective_batch_size = per_device_train_batch_size * num_devices (In your case, 8 * 1 = 8,
assuming 1 GPU)</p>
      <p>steps_per_second = number of training steps completed per second (This depends on hardware,
model size, and optimizations)</p>
      <p>gradient_accumulation_steps = number of steps before updating weights (In your case, 1, so it
doesn’t change the formula)
3.3. Resource Usage

</p>
      <p>GPU memory usage remains roughly constant.</p>
      <p>CPU and RAM usage fluctuates without a clear trend.
3.4. Model Performance Stability
Training loss decreases initially but levels off after 15,000 samples, confirming performance
saturation. The plots illustrate the trade-offs in training a speech recognition model with increasing
dataset size. While test WER and CER show improvement as more samples are added, training and
validation metrics fluctuate, suggesting inconsistencies in generalization. Resource usage metrics
(GPU, CPU, RAM) indicate increasing computational costs, with training time and batch processing
time rising significantly beyond 30,000 samples.This demonstrates that scaling dataset size alone is
not always optimal and reinforces the importance of balancing data volume with model efficiency.
You should place this analysis in the section discussing the diminishing returns of increasing data
in research, supporting the argument that more data is not always the best solution and
computational constraints must be considered.
num epochs=min (6 , max steps × batch sizе × gradient accumulation steps )
num samples
(6)
where:




num_samples = total number of training samples,
batch_size = per_device_train_batch_size (8 in your case),
gradient_accumulation_steps (1 in your case),
max_steps = 3000.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>4.1. Efficient Processing Strategies
The Whisper-Tiny model was selected primarily to minimize computational costs while achieving
faster convergence. As a lightweight version of the Whisper architecture, it enables efficient
training and inference without requiring extensive hardware resources. By using a smaller model,
the risk of overfitting is reduced, allowing for a more generalized learning process even with a
moderate dataset size. Additionally, the smaller model size facilitates quicker saturation, meaning
that improvements in performance diminish at an earlier stage compared to larger models. A
Whisper-Tiny model was chosen to reduce computational costs and reach saturation faster.
Streaming Dataset Approach: Uses IterableDataset to dynamically load and process small data
batches. Librosa-Based Audio Processing: Keeps only essential audio fragments in memory.
Padding &amp; Truncation: Audio samples standardized to 10-second segments.</p>
      <p>Robust Model Evaluation:
WER: Measures incorrect transcriptions at the word level.</p>
      <p>CER: Provides a finer,character-level evaluation.</p>
      <p>The bar plot shows how Figure 4.1 – Word Error Rate (WER) decreases as training data
increases, confirming that more data initially improves model performance. However, beyond
15,000–20,000 samples, the improvement slows significantly, demonstrating performance
saturation. The training set (blue bars) shows steady WER reduction, but validation (gray) and test
(red) sets maintain a gap, highlighting limited generalization.</p>
      <p>This supports the claim that simply adding data is not always effective and suggests the need for
alternative optimizations like better architecture or regularization. Visualization reinforces that
increasing dataset size beyond a threshold has diminishing returns, making computational efficiency a
key consideration.</p>
      <p>The chart in Figure 4.2 represents the Character Error Rate (CER) distribution over different sample
sizes for training, validation, and test datasets.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>One of the largest challenges was Out-of-Memory (OOM) errors. Initially, attempting to load the
entire dataset into RAM caused system crashes. To resolve this, an IterableDataset was
implemented, enabling data to be streamed in small chunks instead of being loaded all at once.
Another major issue was inefficiency in audio processing. The dataset contained large,
variablelength audio files, which led to batching problems during training. This was addressed by
segmenting the audio into uniform 10-second chunks, ensuring consistency across batches.</p>
      <p>Training instability also emerged as a challenge, primarily due to prolonged training on limited
hardware, which led to overfitting. To mitigate this, several strategies were applied: the maximum
number of training steps was capped at 3,000, and a small batch size of 8 was used in combination
with gradient accumulation to optimize resource utilization.</p>
      <p>Finally, evaluation bottlenecks slowed down development. Processing the entire dataset was
time-consuming, making evaluations inefficient. To streamline this process, the test dataset size
was fixed at 5,000 samples, allowing for faster evaluations without significantly compromising
accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study optimizes Whisper-Tiny for low-resource training by leveraging streaming data
processing, mixed-precision training, and efficient memory batching. The pipeline is designed to
accommodate real-world constraints, making ASR model training both scalable and efficient in
resource-limited environments.</p>
      <p>The research underscores the concept of performance saturation in deep learning models.
Initially, as the number of training samples increases, the Word Error Rate (WER) decreases,
reflecting improved model performance. However, beyond a certain threshold (around 15,000
samples), the rate of improvement slows significantly, demonstrating the phenomenon of
diminishing returns. This aligns with the study’s findings, highlighting that while expanding
dataset size and computational resources can enhance model performance, there is a limit beyond
which further investments yield minimal gains.</p>
      <p>Recognizing this saturation point is essential for optimizing machine learning workflows,
especially in resource-constrained settings. This reinforces the importance of strategies like the
Whisper-Tiny model and streaming dataset approaches, which promote faster convergence and
efficient resource utilization. The findings in this section support the broader discussion on
effective processing techniques and model performance stability in machine learning.</p>
      <p>Special thanks to Diana Baranovskaya for generously providing a computing machine for model
training.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[15] Alpar, S., Faizulin, R., Tokmukhamedova, F., &amp; Daineko, Y. (2024). Applications of
SymmetryEnhanced Physics-Informed Neural Networks in High-Pressure Gas Flow Simulations in
Pipelines. Symmetry, 16(5), 538. https://doi.org/10.3390/sym16050538.
[16] Nuralin M.; Daineko Y.; Aljawarneh S.; Tsoy D.; Ipalakova M. The real-time hand and object
recognition for virtual interaction. 2024. PeerJ Computer Science.
[17] Borodkin K.; Nurtas M.; Altaibek A.; Daineko Y.; Otepov T. Data Pre-processing and
Visualization for Machine Learning Models and its Applications in Education. 2024. CEUR
Workshop Proceedings.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>N. C.</given-names>
          </string-name>
          <article-title>Deep Learning's Computational Cost</article-title>
          . In IEEE Spectrum,
          <year>2024</year>
          . Available at: https://spectrum.ieee.org/amp/deep
          <article-title>-learning-computational-cost-2655082754.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Automatic Speech Recognition: A Deep Learning Approach</article-title>
          . In Springer,
          <year>2014</year>
          . Available at:
          <article-title>Automatic speech recognition book</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          <string-name>
            <surname>Speech</surname>
            and
            <given-names>Language</given-names>
          </string-name>
          <string-name>
            <surname>Processing</surname>
          </string-name>
          :
          <article-title>An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd Edition)</article-title>
          . Stanford University. In Github,
          <year>2021</year>
          . Available at: Speech and
          <string-name>
            <given-names>Language</given-names>
            <surname>Processing</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hon</surname>
            ,
            <given-names>H.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Spoken Language</surname>
          </string-name>
          <article-title>Processing: A Guide to Theory, Algorithm, and System Development</article-title>
          . In Github,
          <year>2001</year>
          . Available at: Spoken Language Processing.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Asma</given-names>
            <surname>Trabelsi</surname>
          </string-name>
          , Sébastien Warichet, Yassine Aajaoun, Séverine Soussilane. “
          <article-title>Evaluation of the efficiency of state-of-the-art Speech Recognition engines In ScienceDirect,</article-title>
          <year>2022</year>
          . Available at: https://www.sciencedirect.com/science/article/pii/S1877050922014338.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Korbinian</given-names>
            <surname>Kuhn</surname>
          </string-name>
          , Verena Kersken, Benedikt Reuter, Niklas Egger, Gottfried Zimmermann.
          <article-title>“Measuring the Accuracy of Automatic Speech Recognition Solutions</article-title>
          .”
          <source>In ACM Digital Library</source>
          vol
          <volume>16</volume>
          , no 4.
          <year>2022</year>
          . Available at: https://dl.acm.org/doi/full/10.1145/3636513.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Stolypin</given-names>
            <surname>Vestnik</surname>
          </string-name>
          .
          <article-title>Advances in Speech Recognition Technologies for Low-Resource Languages</article-title>
          .
          <year>2024</year>
          . Available at: https://stolypin-vestnik.ru/wp- content/uploads/2024/05/15.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Rabiner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Juang</surname>
            ,
            <given-names>B.-H.</given-names>
          </string-name>
          <article-title>Fundamentals of Speech Recognition</article-title>
          . In Github,
          <year>1993</year>
          . Available at: Speech Recognition Fundamentals.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Foundations of Statistical Natural Language Processing</article-title>
          . In Github,
          <year>1999</year>
          . Available at: Foundations of NLP.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Belcic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Hyperparameter Tuning: Approaches and Best Practices</article-title>
          .
          <source>In IBM</source>
          ,
          <year>2024</year>
          . Available at: https://www.ibm.com/think/topics/hyperparameter-tuning.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kenshimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukhanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Merembayev</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yedilkhan</surname>
          </string-name>
          , “
          <article-title>A comparison of convolutional neural networks for Kazakh sign language recognition</article-title>
          ,
          <source>” EEJET</source>
          , vol.
          <volume>5</volume>
          , no.
          <volume>2</volume>
          (
          <issue>113</issue>
          ), pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          , Oct.
          <year>2021</year>
          , doi: 10.15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2021</year>
          .
          <volume>241535</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Mukhanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Uskenbayeva</surname>
          </string-name>
          , “
          <article-title>Pattern Recognition with Using Effective Algorithms and</article-title>
          Methods of Computer Vision Library,” in
          <source>Optimization of Complex Systems: Theory, Models, Algorithms and Applications</source>
          , vol.
          <volume>991</volume>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Le Thi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Le</surname>
          </string-name>
          , and T. Pham Dinh, Eds.,
          <source>in Advances in Intelligent Systems and Computing</source>
          , vol.
          <volume>991</volume>
          . Cham: Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>810</fpage>
          -
          <lpage>819</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -21803-4_
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Mukhanov</surname>
            ,
            <given-names>Samat</given-names>
          </string-name>
          &amp; Uskenbayeva, Raissa &amp; Rakhim, Abd &amp; Akim, Akbota &amp; Mamanova,
          <string-name>
            <surname>Symbat.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Gesture recognition of the Kazakh alphabet based on machine and deep learning models</article-title>
          .
          <source>Procedia Computer Science</source>
          .
          <volume>241</volume>
          .
          <fpage>458</fpage>
          -
          <lpage>463</lpage>
          .
          <fpage>10</fpage>
          .1016/j.procs.
          <year>2024</year>
          .
          <volume>08</volume>
          .064.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Bazarbekov</surname>
            <given-names>I.М.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ipalakova</surname>
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daineko</surname>
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukhanov S.B. DEVELOPMENT AND DATA ANALYSIS OF A ROBO-PEN FOR ALZHEIMER'S DISEASE DIAGNOSIS</surname>
          </string-name>
          <article-title>: PRELIMINARY RESULTS</article-title>
          .
          <article-title>Herald of the Kazakh-British technical university</article-title>
          .
          <year>2024</year>
          ;
          <volume>21</volume>
          (
          <issue>3</issue>
          ):
          <fpage>78</fpage>
          -
          <lpage>89</lpage>
          . (In Kazakh) https://doi.org/10.55452/1998-6688-2024-21-3-
          <fpage>78</fpage>
          -89.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>