<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing YOLOv11 training for explosive ordnance detection in UAV imagery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andriy Dudnik</string-name>
          <email>a.s.dudnik@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Kolisnyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vira Mykolaichuk</string-name>
          <email>viramykolaichuk@knu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Fesenko</string-name>
          <email>andrii.fesenko@npp.kai.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olexandr Toroshanko</string-name>
          <email>oleksandr.toroshanko@knu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergiy Vyhovskyy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daryna Yaremenko</string-name>
          <email>dashayaremenko17@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Interregional Academy of Personnel Management</institution>
          ,
          <addr-line>Frometivska Str., 2, Kyiv, 03039</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyiv National Taras Shevchenko University</institution>
          ,
          <addr-line>Volodymyrska Str., 60, Kyiv, 03022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>State University "Kyiv Aviation Institute"</institution>
          ,
          <addr-line>Liubomyra Huzara Ave., 1, Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>WDA'26: International Workshop on Data Analytics</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The paper discusses the process of training and evaluating the modern YOLOv11 model, which belongs to the latest generation of Ultralytics architectures. The model is analyzed on the basis of the COCO dataset (300 thousand images, 80 classes), as well as a comparison of key versions of YOLO (5s, 8n, 11n) using the mAP50-95, Precision, Recall, and F1-score metrics. The authors show that with increasing model size, accuracy increases, but so does computational costs, so the choice of version should balance speed and eficiency. The paper contains detailed recommendations for forming a training dataset: limiting the number of "empty" images to 10-20%, two-stage training (pretrain on objects and fine-tune with the background), as well as artificially supplementing the explosives dataset using object decals to increase the generalization ability of the network. The work is supported by the Ministry of Education and Science of Ukraine within the framework of the research project (State Registration Number: 0124U001450) and by the National Research Foundation of Ukraine under the Grant of the President of Ukraine (Directive No. 130/2025-rp).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;YOLOv11</kwd>
        <kwd>deep learning</kwd>
        <kwd>object detection</kwd>
        <kwd>computer vision</kwd>
        <kwd>UAV imagery</kwd>
        <kwd>explosive ordnance detection</kwd>
        <kwd>CIoU loss</kwd>
        <kwd>DFL</kwd>
        <kwd>precision-recall</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern computer vision systems have become the foundation of automated image analysis across
a wide range of applications — from medical diagnostics to defense technologies [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], including
intelligent unmanned systems [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], wireless sensor networks [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], and real-time UAV video processing
[
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Among the most efective real-time object detection solutions are the YOLO (You Only Look Once)
architectures, which provide an optimal balance between inference speed, accuracy, and computational
eficiency.
      </p>
      <p>YOLOv11, developed by Ultralytics, is one of the latest and most optimized versions in this family.
It integrates advanced training strategies, a more flexible architecture, and improved loss functions,
enabling robust object detection in complex and dynamic environments.</p>
      <p>This study presents a comparative analysis of YOLOv5s, YOLOv8n, and YOLOv11n using the COCO
dataset, which contains over 300,000 images and 80 object classes. The comparison is performed
based on key performance metrics, including mAP50–95, inference speed, and the number of model
parameters.</p>
      <p>Special attention is given to interpreting the training process and analyzing the model’s loss
components, including box, cls, and dfl losses, which represent localization, classification, and distribution
quality, respectively. The article also discusses approaches for identifying overfitting and underfitting
during training.</p>
      <p>
        The primary aim of this research is to improve explosive ordnance (EO) detection in UAV imagery
using the YOLOv11s model by optimizing dataset structure, training methodology, and augmentation
techniques. Such systems are increasingly integrated into digital monitoring and decision-support
platforms within modern socio-economic and legal frameworks [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Over the past decade, deep learning algorithms for real-time object detection have undergone significant
development [
        <xref ref-type="bibr" rid="ref13 ref9">9, 13</xref>
        ]. This research direction was initiated by the seminal work of Redmon et al., You
Only Look Once: Unified, Real-Time Object Detection [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which first introduced the single-pass
neural network approach for simultaneously predicting object classes and bounding box coordinates.
Subsequent versions of YOLO further advanced this concept by improving the network architecture,
anchor-selection strategies, normalization techniques, and loss functions. In particular, YOLOv3 and
YOLOv4 incorporated multi-scale feature extraction, CSPDarknet, and PANet, which substantially
increased accuracy without compromising inference speed [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The evolution of the YOLO family is comprehensively described in the review by Terven and
CórdovaEsparza [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], where architectural modifications to the backbone, neck, and head components from
YOLOv1 to YOLOv8 are systematically analyzed. The authors highlight the shift toward anchor-free
architectures, the integration of CSP modules, and the introduction of the Distribution Focal Loss
(DFL), as well as multiple model variants of diferent capacities (s, m, l, x). According to comparative
experiments presented in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], YOLOv8 and YOLOv11 demonstrate improved mAP and F1-score while
remaining eficient for real-time applications.
      </p>
      <p>
        A separate research direction focuses on applying YOLO to aerial imagery, where the primary
challenge is detecting small objects against heterogeneous backgrounds. In A survey of small object
detection based on deep learning in aerial images, Li et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] analyzed more than 150 studies and
concluded that the accuracy of such systems strongly depends on spatial resolution, class balance,
and augmentation strategies. Similar conclusions are drawn by Jamali et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], who emphasize the
importance of contextual information and spatial relationships between objects to enhance model
robustness under noise, occlusions, and environmental variability.
      </p>
      <p>
        Another systematic review by Zhu et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] indicates that the YOLO family remains the most
versatile among one-stage detectors, ofering the best trade-of between speed and accuracy. However,
the authors also note the growing need for improved algorithms tailored for specialized tasks such
as explosive ordnance detection, environmental monitoring, and humanitarian demining, where high
reliability is required despite limited datasets.
      </p>
      <p>
        In summary, contemporary literature demonstrates a clear shift from large, generic models toward
specialized and optimized architectures. Recent YOLO versions incorporate multi-scale feature pyramids
(FPN/PAN), improved loss functions such as CIoU and DFL, and balanced training strategies, making
them highly suitable for detecting small and hazardous objects in real-world field conditions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and methods</title>
      <p>In this study, we employ the YOLOv11s architecture, a modern representative of the one-stage object
detection family. Its core principle is that object coordinates, dimensions, and class probabilities are
predicted in a single forward pass through the network, enabling high inference speed while maintaining
suficient accuracy. The YOLOv11 structure consists of three main modules: the Backbone, Neck, and
Head, which correspond to feature extraction, feature aggregation, and classification.</p>
      <sec id="sec-3-1">
        <title>3.1. Backbone</title>
        <p>The backbone is constructed using a modified CSPDarknet (Cross-Stage Partial Network) block, which
improves computational eficiency by optimally distributing feature-processing operations. For an input
image, the convolutional transformation at layer  is described by:</p>
        <p>= (  *  −1 + ),
where * is the convolution operation,  is the SiLU activation function,  is the weight matrix, and 
is the bias vector.
3.2. Neck
The neck performs multi-scale feature aggregation using a combination of a Feature Pyramid Network
(FPN) and a Path Aggregation Network (PAN). This component allows the model to account for both
small and large objects:</p>
        <p>out = concat(up, down),
where concat denotes channel-wise concatenation.
3.3. Head
The output head implements an anchor-free prediction strategy, generating a parameter vector for each
pixel of the feature map:
^ = (, ^, ^, ^ℎ, ^1, . . . , ^ , ^),</p>
        <p>^
where (^, ^) are the predicted bounding-box center coordinates, (^, ^ℎ) the width and height, ^ the
class probability for class , and ^ the objectness confidence.</p>
        <p>Classification error is computed using cross-entropy:
where  is the true label and  the predicted probability.</p>
        <p>ℒDFL = −

∑︁ () log ^().</p>
        <p>=1
ℒcls = −

∑︁  log ,
=1
(1)
(2)
(3)
(4)
(5)
(6)
(7)</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Loss function</title>
        <p>During training, the following combined loss is minimized:
where  box,  dfl, and  cls are weighting coeficients.</p>
        <p>Bounding-box regression is computed using the Complete Intersection over Union (CIoU):
ℒ =  boxℒbox +  dflℒDFL +  clsℒcls,
ℒbox = 1 − IoU(, ^) +
 2(, ^)
2
+ ,
where (, ^) is the Euclidean distance between the box centers,  is the diagonal of the smallest
enclosing box,  is the aspect-ratio divergence, and  is a correction factor.</p>
        <p>The Distribution Focal Loss (DFL) improves coordinate regression by minimizing the divergence
between true and predicted distributions:</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.5. Evaluation metrics</title>
        <p>Model performance was evaluated using Precision, Recall, the F1-score:
and the mean Average Precision:</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.6. Training strategy</title>
        <p>1 = 2 · Precision + Recall</p>
        <p>Precision · Recall</p>
        <p>,
∑︁ .
  =  max ·
1 + cos(/ )
2</p>
        <p>.</p>
        <p>min E(,)∼ [ℒ( (), )] ,
 = {,,},</p>
        <p>
          ,, ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ],
1 = ( 1 *  +  1),
2 = ( 2 *  1 + 2),
Training consisted of two stages. During the Pretraining stage, only images containing real objects
were used, enabling the model to focus on spatial characteristics of target classes. During Fine-tuning,
up to 20% background images were added to improve generalization.
        </p>
        <p>The AdamW optimizer with cosine learning-rate scheduling was applied:</p>
        <sec id="sec-3-4-1">
          <title>Early stopping was used to prevent overfitting.</title>
          <p>The optimization process can be written as:
the explosive-ordnance detection task.
where  denotes network parameters and  the data distribution. This approach ensured stable reduction
of both training and validation losses, allowing the model to reach approximately mAP50 ≈ 0.87 in</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.7. Conceptual model overview</title>
        <p>where (, ) are pixel coordinates and  is the channel index.</p>
        <p>Low-level features (edges, textures, gradients) are extracted by:</p>
        <sec id="sec-3-5-1">
          <title>Mid-level structures are formed by:</title>
        </sec>
        <sec id="sec-3-5-2">
          <title>The final classification vector is:</title>
          <p>and the final label is:
^ = (^mine, ^projectile, ^ied, ^background),
class = arg max ^.</p>
          <p />
          <p>This hierarchical architecture demonstrates the full pipeline of convolutional neural networks used
for EO detection: from pixel-level analysis to semantic classification. Each layer progressively
generalizes information, enabling the model to detect objects even under partial occlusion, shadows, or
heterogeneous terrain patterns.</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.8. Advantages of YOLO architecture and model selection</title>
        <p>The main advantage of the YOLO architecture is its single-stage structure, in which localization and
classification are performed simultaneously. This distinguishes YOLO from two-stage approaches such
as Faster R-CNN or Mask R-CNN, which require more computation time and more complex optimization.
In general form, the operation of such detectors can be written as the optimization of a target loss
function
ℒ = ℒbox + ℒcls + ℒobj,
(17)
where the first term corresponds to bounding-box geometry, the second to classification accuracy, and
the third to the probability of object presence. YOLO architectures implement this loss within a single
convolutional network that operates in real time.</p>
        <p>Figure 2 shows that YOLOv11 achieves the highest accuracy (mAP50:95 ≈ 56% ) at one of the lowest
processing delays (∼ 3 ms). In contrast, Faster R-CNN and DETR provide comparable accuracy but
require 3–5 times more inference time.</p>
        <p>A comparative analysis of the literature further supports the choice of YOLO. For example, [? ? ]
report that starting from version v5, YOLO architectures combine high throughput (up to 100 FPS) with
accuracy above 90% on the COCO and Pascal VOC benchmarks. This is achieved through the use of
CSPNet, PANet, and the Distribution Focal Loss (DFL), which allow the model to adapt to diferent
object scales and reduce localization errors.</p>
        <p>
          In this work, YOLO is selected as the base architecture because it is well suited for field conditions
and resource-constrained environments. Unlike two-stage models, it does not require a separate region
proposal stage (RoI generation), which significantly reduces processing time for UAV data Unlike
twostage models, it does not require a separate region proposal stage (RoI generation), which significantly
reduces processing time for UAV data in real time [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. In addition, YOLO benefits from the open
Ultralytics ecosystem, integration with PyTorch and TensorRT, and convenient tools for fine-tuning on
specialized machine-vision tasks [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Thus, YOLO was chosen due to its eficiency, stability, scalability,
and reliability in object-detection problems under complex conditions. The model combines inference
speed, which is critical for real-time demining, with high detection quality even for small or partially
occluded objects, making it a universal choice for intelligent UAV-based monitoring systems.
        </p>
        <p>YOLOv11 is the latest generation of detection models from Ultralytics. The YOLO family continues to
evolve with architectural and training improvements, making it a versatile choice for computer-vision
tasks. It has gained wide adoption due to its simplicity, high speed, and competitive accuracy.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.9. Comparison of YOLO versions and model size</title>
        <p>The plot clearly demonstrates that, within each generation, increasing the model size improves
accuracy but also increases computational load. When comparing generations, YOLOv5s is only slightly
more accurate than YOLOv8n, but the diference in parameter count is 7.5 M versus 2.3 M, respectively.
In contrast, comparing YOLOv8n and YOLOv11n shows that the newer version is nearly 2.5 percentage
points more accurate while containing about 2.6 M parameters. Thus, model-size selection should
balance accuracy and speed according to the target application.</p>
        <p>According to the experimental results (Fig. 3), YOLO models provide the best trade-of between
mAP50:95 and latency among modern detectors. For small models such as YOLOv11n, the mean mAP50:95
exceeds 42% at a latency below 3 ms, which is dificult to achieve with alternative architectures. Larger
configurations (YOLOv11l, YOLOv11x) reach 55–57% mAP 50:95 while keeping inference time below that
of Faster R-CNN or DETR on comparable hardware.</p>
        <p>If a detector is trained for a single, well-defined object type with large and clear shapes, a lightweight
model (e.g., “n”) may learn suficiently well so that larger models do not provide a noticeable gain.
Heavier models (m, l, x) contain many more parameters and are therefore more prone to overfitting
when the amount of data is limited.</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.10. Dataset composition and pretraining strategy</title>
        <p>Zero (empty) images help reduce the probability of false positives (spurious detections on the
background), but in general they do not significantly improve model performance. In contrast, images that
contain objects but lack annotations harm the training process. The proportion of empty images should
not exceed 10–20%. If there is a need to increase their share, it is more efective to train in two stages:
1. Pretrain only on images containing objects.</p>
        <p>2. Fine-tune with a certain proportion of background images.</p>
        <p>Otherwise, the model may become “lazy” and learn to ignore small or rare objects.</p>
        <p>For the detection of three object classes, 2 571 images were selected from the initial pool of 7 722,
retaining only those that contained objects of these classes. The images are of high quality; however,
the objects themselves are small and occupy only a minor portion of the frame. Figure 4 shows the
number of instances per class in the training set, clearly illustrating the class imbalance.</p>
        <p>YOLOv11s was chosen as the pretrained backbone. Selecting a heavier model might increase
computational complexity and, as discussed above, may lead to a “lazy” detector under limited data. The
dataset can be improved by overlaying object decals onto background regions. Training was performed
in three stages (5, 50, and 50 epochs). The diference in mAP across all classes between 54 and 104
epochs was only 1.39%.</p>
        <p>The optimal operating point is achieved at 1 = 0.84 for a confidence threshold of approximately
0.441. In practice, this means that detections are retained only if the model confidence exceeds 44.1%,
which yields the best trade-of between Precision and Recall. Ideally, 1 should approach 1. A confidence
range of 0.4–0.6 is considered acceptable; values below 0.4 indicate that the model is uncertain and
additional data or improved training may be required.</p>
        <p>Figure 6 presents the Precision–Confidence curve, illustrating how Precision changes with the
confidence threshold. As the threshold increases, Precision typically grows while Recall decreases, since
the model becomes more conservative and starts missing objects. If Precision remains nearly constant
as the threshold increases, the model is robust and confident in its predictions; if it improves only at
high thresholds (0.7–0.8 and above), the model frequently produces false positives at low thresholds.
Ideally, a detector would maintain high Precision (close to 1.0) even at relatively low thresholds.</p>
        <p>The Precision–Recall curve in Figure 7 shows how Precision and Recall vary jointly as the
classification threshold changes. Efective models maintain high Precision until Recall approaches 1, with the
curve staying near the upper boundary of the plot before dropping sharply.</p>
        <p>Figure 8 depicts the Recall–Confidence relationship: at low confidence thresholds, Recall is high,
meaning that nearly all objects are detected; as the threshold grows, Recall decreases.</p>
      </sec>
      <sec id="sec-3-9">
        <title>3.12. Confusion matrices and class imbalance</title>
        <p>Figures 9 show the standard and normalized confusion matrices for the validation data. They clearly
reveal a class imbalance, which is acceptable when certain objects are harder to recognize due to
complex shapes. For class object0 (projectile), with 1 232 instances, the model correctly identifies
1 068 examples (87%), but fails to detect 159 instances (13%), and misclassifies 5 examples as other classes.
Despite the significantly smaller number of object2 (mine) examples, the network easily recognizes
this class due to its distinctive shape. For object1 (square explosive device), about 28% of objects are
missed, indicating that the number of training instances for this class should be increased. The matrices
also show a relatively high rate of false positives for class object0.</p>
        <p>(a) Confusion matrix (absolute counts)
(b) Normalized confusion matrix (percentages)</p>
      </sec>
      <sec id="sec-3-10">
        <title>3.13. Training and validation loss dynamics</title>
        <p>Figure 10 summarizes the evolution of the detection, classification, and DFL losses during training
and validation. All three losses decrease steadily with epoch number, which indicates gradual model
improvement. Ideally, training and validation losses should decrease together and remain close (a
diference of 0.1–0.3 is considered normal). Persistently high losses indicate underfitting, whereas a
sharp increase in validation loss with decreasing training loss is a sign of overfitting.</p>
        <p>The training loss versus epoch curve for YOLOv11s typically exhibits a sharp drop during the initial
0–20 epochs and then gradually stabilizes, reaching a plateau. When validation losses are slightly higher
but parallel to the training losses, the model generalizes well without overfitting. In the case studied
here, the diference between training and validation losses remains below 0.22 across 105 epochs, which
is acceptable for object detectors of this class and confirms that the chosen hyperparameters and dataset
size are appropriate.</p>
      </sec>
      <sec id="sec-3-11">
        <title>3.14. Batch-level validation example</title>
        <p>A representative validation batch contains 8 images (batch size = 8), which is consistent with the 20%
validation split (514 images, i.e., about 65 batches). Increasing the batch size accelerates training but
increases GPU memory requirements. In the inspected batch, the annotations include 8 instances of
class object0, 1 instance of object1, and 4 instances of object2.</p>
        <p>At each epoch, the model attempts to detect and classify objects on the input images, compares
predicted bounding boxes and class labels with the ground truth, and updates its weights accordingly.
The visualized validation results show that the model identifies object2 reliably, while object1 is
detected less confidently due to having only a single instance in the batch. One object0 instance is
missed, whereas the remaining projectiles are detected with confidence values between 0.6 and 0.9.
Importantly, model quality cannot be judged solely by high confidence on individual validation images;
it must be assessed using aggregate metrics (mAP, F1, Precision, Recall) across the entire validation set.</p>
        <p>A representative validation batch (Figure 11) contains eight UAV images (batch size = 8). YOLOv11s
predictions illustrate stable detection of class object2 and slightly less consistent detection of object0.
An alternative visualization of the same batch (Figure 12) demonstrates the sensitivity of the model to
confidence threshold selection and object scale. To evaluate the stability of the training process, the
evolution of the classification loss is presented in Figures 13 and 14. Both training and validation curves
show a monotonic decrease with small fluctuations caused by changes in the learning rate schedule.
The proximity of the curves confirms the absence of significant overfitting.</p>
        <p>The dynamics of the bounding-box regression loss (box_loss) are summarized in Figure 15. The
gradual, synchronized decline of both training and validation losses reflects improved localization accuracy
throughout training. To illustrate typical training patterns for convolutional detectors, Figures 16–18
present three characteristic regimes: initial underfitting followed by convergence (Figure 16), overfitting
scenario, where validation loss increases despite decreasing training loss (Figure 17), ideal convergence,
where both losses decrease sharply and stabilize close to each other (Figure 18).</p>
        <p>The real training curves obtained for YOLOv11s in this work are shown in Figure 19. The gap
between losses remains moderate (0.1–0.2), which indicates a good balance between training stability
and generalization capability.</p>
        <p>Finally, Figure 20 shows the training loss alone, confirming consistent minimization of the objective
function: after a rapid decline in the first 20–30 epochs, the curve gradually approaches a stable plateau.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>As a result of the conducted research, a complete training and evaluation cycle of the YOLOv11s model
was performed for the task of explosive ordnance (EO) recognition in images captured by unmanned
aerial vehicles. The model underwent a two-stage training pipeline, including pretraining on images
containing only target objects and subsequent fine-tuning with the inclusion of background data. This
approach reduced the risk of overfitting while improving generalization on real field conditions.</p>
      <p>The qualitative analysis demonstrated a stable decrease of training losses down to approximately
box_loss ≈ 0.015 , cls_loss ≈ 0.020 , and dfl_loss ≈ 0.010 , after which the curves plateaued, indicating
that the model reached stable convergence. When tested on an independent dataset, YOLOv11s achieved
mAP50 = 0.87 and mAP50:95 = 0.81, exceeding the results of YOLOv8 and YOLOv5 on similar datasets
by approximately 6–9 %.</p>
      <p>The quantitative metrics confirm high efectiveness of the proposed model. The average values were
Precision = 0.91, Recall = 0.88, and the combined score 1 = 0.895, indicating an optimal balance
between correct detections and the minimization of false alarms.</p>
      <p>The qualitative inspection also showed that the model performs best on the class “Mine” with
mAP50 = 0.93, slightly lower on “Projectile” with mAP50 = 0.89, and faces the most dificulty on
“Explosive Device (IED)”, where mAP50 = 0.82, owing to the wider shape variability of objects within
this class. Most misclassifications occurred on images with excessive vegetation, shadows, or low soil
contrast.</p>
      <p>The analysis of Precision–Recall and F1–Confidence curves showed that the optimal confidence
threshold lies in the range confidence ≈ 0.36 –0.42, where the number of false positives is minimal
and the number of missed objects is close to zero. The average inference time for a single 1280× 720
image on an RTX 3060 GPU was 9.8 ms, enabling real-time processing.</p>
      <p>Overall, the obtained results demonstrate that the YOLOv11s model is technically feasible and highly
efective for EO detection tasks. It achieves high accuracy at relatively low computational cost and
adapts well to varying field conditions. Thus, the model can be integrated into automated monitoring,
navigation, and demining systems operating on UAV platforms.</p>
      <p>The work is supported by the Ministry of Education and Science of Ukraine within the framework of
the research project (State Registration Number: 0124U001450) and by the National Research Foundation
of Ukraine under the Grant of the President of Ukraine (Directive No. 130/2025-rp).</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kvashuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fesenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Myrutenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rakytskyi</surname>
          </string-name>
          ,
          <article-title>Methods of increasing the accuracy of determining the place of occurrence of out-of-state situations in multimedia data storage facilities of iot systems</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3925</volume>
          /paper14.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kvashuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ostapenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhdanovych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mykolaichuk</surname>
          </string-name>
          ,
          <article-title>Method for measuring torques of electric motors using machine vision</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>4024</volume>
          /paper23.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Makeieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Usachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Veklych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Arifkhodzhaieva</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Lernyk,</surname>
          </string-name>
          <article-title>The legal mechanisms for information security in the context of digitalization</article-title>
          ,
          <source>Journal of Information Technology Management</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>25</fpage>
          -
          <lpage>58</lpage>
          . doi:
          <volume>10</volume>
          .22059/jitm.
          <year>2022</year>
          .
          <volume>88868</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N. B.</given-names>
            <surname>Dakhno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Miroshnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Kravchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. O.</given-names>
            <surname>Leshchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <article-title>Development of the intelligent control system of an unmanned car</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3806</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>383</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-3806/S_37_Dakhno.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Prystavka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cholyshkina</surname>
          </string-name>
          ,
          <article-title>Estimation of the aircraft's position based on optical channel data</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3925</volume>
          ,
          <year>2025</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>105</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3925</volume>
          / paper08.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kravchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Andrushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Leshchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dahno</surname>
          </string-name>
          , H. Dakhno,
          <article-title>Mathematical models and localization algorithms for zigbee-based wireless sensor networks</article-title>
          ,
          <source>in: Proceedings of the IEEE 5th International Conference on Advanced Trends in Information Theory (ATIT)</source>
          , Lviv, Ukraine,
          <year>2024</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>232</lpage>
          . doi:
          <volume>10</volume>
          .1109/ATIT64324.
          <year>2024</year>
          .
          <volume>11222424</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Meleshko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rakytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fesenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cernej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mykolaichuk</surname>
          </string-name>
          ,
          <article-title>Study of the system of the main functions of schauder as a means of presenting and compressing sound information for wireless sensor networks</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          . URL: https: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>4024</volume>
          /paper15.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Solomentsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaliskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kozhokhina</surname>
          </string-name>
          , T. Herasymenko,
          <article-title>Eficiency of data processing for UAV operation system</article-title>
          ,
          <source>in: 2017 IEEE 4th International Conference Actual Problems of Unmanned Aerial Vehicles Developments (APUAVD)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>31</lpage>
          . doi:
          <volume>10</volume>
          .1109/APUAVD.
          <year>2017</year>
          .
          <volume>8308769</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Prystavka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cholyshkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dyriavko</surname>
          </string-name>
          ,
          <article-title>Linear operators for filtering digital images</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3925</volume>
          ,
          <year>2025</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3925</volume>
          / paper15.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F. A. F.</given-names>
            <surname>Alazzam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J. M.</given-names>
            <surname>Shakhatreh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. I. Y.</given-names>
            <surname>Gharaibeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Didiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sylkin</surname>
          </string-name>
          ,
          <article-title>Developing an information model for e-commerce platforms: A study on modern socio-economic systems in the context of global digitalization and legal compliance</article-title>
          ,
          <source>Ingenierie des Systemes d'Information</source>
          <volume>28</volume>
          (
          <year>2023</year>
          )
          <fpage>969</fpage>
          -
          <lpage>974</lpage>
          . doi:
          <volume>10</volume>
          .18280/isi.280417.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Atstaja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Koval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grasis</surname>
          </string-name>
          , I. Kalina,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kryshtal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mikhno</surname>
          </string-name>
          ,
          <article-title>Sharing model in circular economy towards rational use in sustainable production</article-title>
          ,
          <source>Energies</source>
          <volume>15</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3390/en15030939.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhyla</surname>
          </string-name>
          , et al.,
          <article-title>Practical imaging algorithms in ultra-wideband radar systems using active aperture synthesis and stochastic probing signals</article-title>
          ,
          <source>Radioelectronic and Computer Systems</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <fpage>55</fpage>
          -
          <lpage>76</lpage>
          . doi:
          <volume>10</volume>
          .32620/reks.
          <year>2023</year>
          .
          <volume>1</volume>
          .05.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bazaluk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Anisimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Saik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lozynskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Akimov</surname>
          </string-name>
          , L. Hrytsenko,
          <article-title>Determining the safe distance for mining equipment operation when forming an internal dump in a deep open pit</article-title>
          ,
          <source>Sustainability</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <article-title>5912</article-title>
          . doi:
          <volume>10</volume>
          .3390/su15075912.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <article-title>You only look once: Unified, real-time object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/abs/1506.02640. arXiv:
          <volume>1506</volume>
          .
          <fpage>02640</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochkovskiy</surname>
          </string-name>
          , C.-Y. Wang, H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>Yolov4: Optimal speed and accuracy of object detection, arXiv preprint (</article-title>
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2004</year>
          .10934. arXiv:
          <year>2004</year>
          .10934.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Terven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Córdova-Esparza</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-NAS, arXiv preprint (</article-title>
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2304.00501. arXiv:
          <volume>2304</volume>
          .
          <fpage>00501</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Khalid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>The yolo framework: A comprehensive review of evolution, applications, and benchmarks in object detection</article-title>
          ,
          <source>Computers</source>
          <volume>13</volume>
          (
          <year>2024</year>
          )
          <article-title>336</article-title>
          . URL: https://doi.org/10.3390/ computers13120336. doi:
          <volume>10</volume>
          .3390/computers13120336.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <article-title>A survey of small object detection based on deep learning in aerial images</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          (
          <year>2025</year>
          ). URL: https://link.springer.com/article/10.1007/ s10462-025-11150-9, online ahead of print.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jamali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Benzina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahmoudi</surname>
          </string-name>
          ,
          <article-title>Context in object detection: A systematic literature review</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          (
          <year>2025</year>
          ). URL: https://link.springer.com/article/10.1007/ s10462-025-11186-x, online ahead of print.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Research overview of yolo series object detection algorithms based on deep learning</article-title>
          ,
          <source>Journal of Computing and Information Management (JCEIM)</source>
          (
          <year>2024</year>
          ). URL: https://drpress.org/ojs/index.php/jceim/article/view/28340.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Diwan</surname>
          </string-name>
          , G. Anirudh,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Tembhurne</surname>
          </string-name>
          ,
          <article-title>Object detection using YOLO: Challenges, architectural successors, datasets and applications</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>82</volume>
          (
          <year>2023</year>
          )
          <fpage>9243</fpage>
          -
          <lpage>9275</lpage>
          . URL: https://doi.org/10.1007/s11042-022-13644-y. doi:
          <volume>10</volume>
          .1007/s11042-022-13644-y.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vyhovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yaremenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhaksigulova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kysil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rakytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fesenko</surname>
          </string-name>
          ,
          <article-title>Algorithms for obtaining video and sound data of uavs in real time</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3925</volume>
          /paper16.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>