<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Neural Network Methods for Software of Military Object Detection in UAV Images⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksii Bychkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kateryna Merkulova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yelyzaveta Zhabska</string-name>
          <email>y.zhabska@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Ivanenko</string-name>
          <email>super-ivan-ivanenko@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Volodymyrska str. 64/13, Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>159</fpage>
      <lpage>170</lpage>
      <abstract>
        <p>This paper is dedicated to the study and comparison of military object detection and classification methods in video streams obtained from unmanned aerial vehicles. The main objective is to identify the most effective approach based on predefined quality assessment criteria, particularly for identifying military objects, for further use in software development. Three detection and classification methods, namely Faster R-CNN, SSD, and YOLO, were used for the study. Three quality criteria for detection and classification methods were developed and described. An algorithm was proposed to determine the required number of images for calculating metrics with a predefined error -5, 250 images are sufficient in the context of the task at hand. For each detection and classification method, corresponding metrics were calculated with the given error, followed by a comparative analysis of these methods based on the three metrics. During the comparative analysis, none of the methods demonstrated the highest results across all three quality criteria. Therefore, priorities were assigned to each metric based on the specific nature of the task. After analyzing the qualitative metrics of each detection and classification method and considering the chosen priorities, it was concluded that the most effective approach for identifying military objects in UAV video streams is the method based on the YOLO model.</p>
      </abstract>
      <kwd-group>
        <kwd>UAV</kwd>
        <kwd>image processing</kwd>
        <kwd>object detection</kwd>
        <kwd>pattern matching</kwd>
        <kwd>image classification</kwd>
        <kwd>neural networks1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent years, a significant number of software solutions have been developed for processing video
streams from unmanned aerial vehicles (UAVs), primarily for object detection and analysis purposes.
These systems are widely employed across various domains
from agriculture to the defense sector.</p>
      <p>However, due to considerable differences in task specificity, object types, and imaging conditions,
there is an increasing demand for specialized software focused on the detection, identification, and
classification of military-related objects.</p>
      <p>This need has become particularly relevant in the context of ongoing armed conflict, where UAVs
play a critical role in modern warfare</p>
      <p>supporting aerial reconnaissance, situational monitoring,
and target acquisition. Accordingly, the automated, accurate, and real-time detection of potentially
dangerous or suspicious objects in UAV video streams is a key capability for improving the efficiency
of military operations, reducing the cognitive load on operators, and enhancing overall situational
awareness.</p>
      <p>
        The automation of detecting and classifying military objects in the video stream from UAVs
during wartime is one of the key factors for ensuring national security and effective control. Building
a video stream processing system that comes from UAVs, creating an automated target recognition
system, is an extremely important task. Currently, the most complex information processing system
is the UAV operator s brain. However, the constant need for heightened attention, significant eye
strain, and working at night imposes a heavy burden on the human operator [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>This topic is highly relevant in wartime conditions, as drones play a key role in modern combat
operations. Therefore, fast and accurate detection and classification of military objects in the video
stream from UAVs is a crucial task that addresses the urgent security and defense needs of our
nation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of Related Solutions and Problem Definition</title>
      <p>
        Currently, there are a number of software products for processing video streams from UAVs,
designed for object detection. Programs like Pix4Dmapper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], DroneDeploy [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and AgroScout [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
which were thoroughly reviewed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], are actively used in various sectors, ranging from agriculture
to military industries. Analyzing these software solutions reveals that they are comprehensive tools
focused on specific domains, namely:
•
•
•
      </p>
      <p>Pix4Dmapper focuses on surface analysis.</p>
      <p>DroneDeploy specializes in creating maps and 3D models.</p>
      <p>AgroScout used for plant disease diagnostics and crop monitoring.</p>
      <p>Clearly, the development of software with similar functionalities would not be a novel and unique
solution. Therefore, the software being developed will target a different domain detection and
classification of various types of objects related to military applications. This issue is especially
relevant under wartime conditions, as drones have become an essential tool in modern warfare. Thus,
the rapid identification of suspicious objects in the video stream from UAVs is a critical task in the
current context.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Methods</title>
      <p>The task of object detection and classification in a video stream is not a trivial one. Therefore, it is
evident that there is currently no exact analytical solution. However, there are already a variety of
methods and algorithms that show good results in this area under certain conditions. Identifying the
required military objects in the UAV video stream can be a challenging task, and choosing the best
detection and classification algorithm is crucial for achieving high-accuracy recognition results.</p>
      <sec id="sec-3-1">
        <title>3.1. Methods based on neural networks</title>
        <p>
          Particularly noteworthy today are the methods based on neural networks, as in recent years they
have shown incredible results in various fields of human activity, including tasks related to object
detection and classification in video streams. For this reason, it was decided to use one of these
methods for the future application. Based on personal experience and open sources from the internet,
three of the most popular and widely used methods for object detection and classification based on
neural networks were chosen: Faster R-CNN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], SSD [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and YOLO [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Faster R-CNN</title>
        <p>In the first version of R-CNN, there were three distinct stages. First, 2000 region proposals were
generated using selective search. Then, these regions were resized to a fixed size. In the final stage,
a support vector machine (SVM) with trained weights was applied for classification.</p>
        <p>R-CNN turned out to be not as powerful due to the use of selective search, which significantly
slowed down the process. Additionally, a large number of cached data had to be stored for the trained
network.</p>
        <p>Many of these problems were addressed with the release of Fast R-CNN. The entire architecture
now consists of a single module, greatly simplifying the training process. One of the key innovations
in the developed model was the addition of the ROI (Region of Interest) Pooling layer, designed to
produce feature vectors of fixed length. This layer transforms each proposed region into a grid, after
which a max-pooling operation is applied to each cell. It is worth noting that Fast R-CNN still uses
the selective search algorithm.</p>
        <p>In Faster R-CNN, the Region Proposal Network (RPN) was introduced to generate candidate
regions, while Fast R-CNN was used for object detection within these regions. These two stages were
combined into a single network by sharing features. The RPN takes an image as input and returns a
set of coordinates of rectangular regions (which are candidates for classification), along with
probability scores indicating the likelihood of an object being present in those regions. RPN is a fully
connected convolutional network, meaning it does not contain any fully connected layers. This
conceptual solution replaced the selective search algorithm in Fast R-CNN.
3.1.2. YOLO
The R-CNN family of algorithms uses region proposals, which provide good accuracy but can be
very slow for certain domains. Another family of algorithms, which has been developing in parallel
for object detection, does not use region proposals.</p>
        <p>YOLO (You Only Look Once) is a single-stage object detector that achieves both speed and
accuracy. This neural network is designed for object detection and is distinguished by its ability to
quickly and accurately identify objects in images and videos. The model can process data in
realtime. This is achieved because it does not perform the process of object localization at multiple levels
of the image, which is typically common in other object detection architectures.</p>
        <p>The main difference between this architecture and others is that while some systems apply CNN
multiple times to different fragments of the image, YOLO applies CNN once to the entire image at
once.
3.1.3. SSD
The SSD (Single Shot Multibox Detector) model utilizes the idea of a pyramidal hierarchy of network
outputs for identifying objects at different scales. The image passes sequentially through
convolutional layers, which reduce its dimensions. The output signal from the last layer of each size
is used to make decisions regarding object detection, forming what is known as the "pyramidal
feature" of the image. This allows for object identification at different scales, as the dimensionality
of the outputs from the early layers strongly correlates with bounding boxes for small objects, while
the outputs from the later layers correlate with bounding boxes for larger objects.</p>
        <p>Unlike YOLO, this model does not divide the image into a grid of a fixed size. Instead, it predicts
the shifts of key bounding boxes. The boxes at different levels are scaled in such a way that each
output layer dimension is responsible for objects of its scale. This means that large objects can only
be detected at higher levels, while small objects are detected at lower levels.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. Research Metodology</title>
        <p>Thus, the detection and classification methods that will be investigated within this article have been
briefly reviewed. Now, let's move on to describing the overall research methodology based on the
results that will help determine which method performs best for solving our specific task namely,
the detection and classification of specified military types of objects in video streams from UAVs.</p>
        <p>First and foremost, it is essential to define what is meant by the term "quality criterion" within
the context of this article. To put it simply, a quality criterion (for a method, technology, solution, or
algorithm) is a characteristic or property of the method that can be unambiguously interpreted in a
numerical form.</p>
        <p>At this stage, it is necessary to determine the parameters to focus on when selecting quality
criteria. During the research process, these criteria will be applied to methods for detecting and
classifying military objects in video streams from UAVs. Based on personal experience and
information from open sources, the most relevant quality indicators for such methods are as follows:
1. Ratio of correctly identified objects to the total number of objects
this criterion allows to
assess how effectively the model identifies objects in the video stream.
2. Intersection over Union (IoU)</p>
        <p>this metric measures the accuracy with which the model
determines the location of objects in the video stream.</p>
        <p>of objects by the investigated method.
3.</p>
        <p>Average object localization time</p>
        <p>this criterion reflects the processing and localization speed</p>
        <p>Thus, all three criteria cover the main characteristics of the object detection and classification
method, namely: the ability to identify, localization accuracy, and recognition speed. It is also
important to note that each of these metrics will not be calculated for a single image but for a sample
of size N, meaning the average value will be used. Below is a description of each of the proposed
quality criteria.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.1. The ratio of correctly identified objects to the total number of objects</title>
        <p>The full name of the first quality criterion essentially describes its content. For convenience, to avoid
repeating the full name of this metric,</p>
        <p>denote it, for example, by the symbol R. To calculate the
quality criterion R for the detection and classification method, the following formula is used:
(1)
(2)
,
where N is the total number of images, mi is the number of objects that the method correctly
identified in the i-th image, and ki is the actual number of objects in the i-th image.</p>
        <p>
          Before applying the formula above, it is necessary to determine in which cases the method is
considered to have correctly detected and classified an object in an image [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. To do this, it is
advisable to use the Intersection over Union (IoU) quality evaluation metric, which will be discussed
in more detail later. The value of the IoU metric can range from 0 to 1, with values greater than 0.5
generally considered to indicate a good prediction of the object detector, while values below this
threshold indicate ineffective prediction [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. Therefore, when counting the correctly identified
objects, for each object, the IoU value is calculated. If IoU &gt; 0.5, the object is included in the count,
and if the value is lower, it is ignored. To calculate the number of correctly identified objects mi in
the i-th image, the following formula is used:
where ki is the actual number of objects in the i-th image, IoUj is the IoU metric value calculated for
the j-th object in the i-th image. Therefore, when calculating the quality criterion R, formula 2 will
be integrated into formula 1.
3.2.2. Intersection over Union
  = ∑ {
 
 =1
1,
0,


 &gt; 0.5,
 ≤ 0.5,
Intersection over Union is an evaluation metric used to measure the accuracy of an object detector
(in our case, various military objects) on a specific dataset. Any algorithm that provides predicted
bounding boxes for objects in an image can be evaluated using the Intersection over Union metric
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The IoU metric is calculated using the formula:

=
where Area of Overlap is the area of the intersection between the predicted and actual bounding
boxes, and Area of Union is the area of the union between the predicted and actual bounding boxes.
        </p>
        <p>
          Considering formula 3, it can be noted that the possible values of the IoU metric range from 0 to
1, including these extreme values. It is generally considered that IoU &gt; 0.5 indicates a good prediction
of the object detector [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
IoU quality criterion, IoU :
where IoUi is the value of the IoU quality metric calculated for the i-th image, and N is the total
number of images in the sample.
        </p>
        <p>
          Since one image may contain more than one object, the averaged value should also be calculated
for each individual image [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The calculation of IoUi for the i-th image is performed using the
formula:


 =
∑

 =1
        </p>
        <p>=
∑  
 =1
 

,

,
 =</p>
        <p>∑
 =1</p>
        <p>,
  =
∑
 = 1   −  
 
,
(3)
(4)
(5)
(6)
(7)
where IoUj is the value of the IoU quality metric calculated for the j-th object on the i-th image, and
mi is the number of correctly localized objects on the i-th image.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.3. Average object localization time</title>
        <p>
          This quality evaluation criterion is designed to demonstrate the speed of the method under
investigation. In short, this criterion refers to the time required to identify a single object in an image
using a specific identification method [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. For ease of notation, the T symbol will be used to
represent this quality criterion. The following formula is used to calculate the average value of
criterion T:
images [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
where Ti is the average time to identify an object on the i-th image, and N is the total number of
images in the sample. To briefly describe formula number 6,
calculate the metric T separately
for each image, then sum all the obtained values, and finally divide the result by the total number of
        </p>
        <p>
          The calculation of the quality evaluation metric Ti for the i-th image is done using the formula:
where bi is the time at which the object identification process ends on the image, ai is the time at
which the object identification process starts on the image, and mi is the number of correctly
identified objects on the i-th image [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>3.3. Error Estimation</title>
        <p>
          To perform a comparative analysis of the described methods based on the proposed quality criteria,
it is necessary to compute them for a sample of images of size N [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. This is done to ensure that the
obtained results are as objective as possible.
        </p>
        <p>The determination of the required number of images N for calculating the quality criterion R with
a given error is carried out using the formula:</p>
        <p>= | ( +  ) −  ( )|, (8)
where is the given error for computing f, f(N) is the metric value for a specific object identifier
using a sample of N images, n is the current number of images, and step is a fixed increment that
increases n for each iteration.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Research Results</title>
      <p>For further research, three models were used: Faster R-CNN, SSD, and YOLO. The first two models
were implemented using the TensorFlow framework, a popular open-source machine learning
framework developed by Google. It is used for creating, training, and deploying various machine
learning and deep learning models. Meanwhile, the YOLO model was implemented using the
PyTorch framework, developed by Facebook and widely used by researchers and engineers
worldwide. During training, monitoring was conducted to assess the effectiveness of the model
training process. This control involved analyzing loss functions such as Classification loss and
Localization loss.</p>
      <p>
        Classification loss is a loss function applied during the training of a neural network for
classification tasks. It measures the difference between predicted and actual object classes, helping
to adjust the model during training [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        Localization loss is a loss function used to train a neural network to determine the location of
objects in an image. It evaluates the error between predicted and actual object coordinates, improving
localization accuracy during the model's training process [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>Below, Figures 1 and 2 show the graphs of the above-mentioned loss functions for each of the
trained models.</p>
      <p>In Figure 3, the corresponding graphs reflecting the training process of the model are shown. The
graphs have a slightly different style because a different framework was used for the implementation
of this model.</p>
      <p>The provided graphs for the three models reflect the overall trend of their training. To evaluate
and compare the obtained quantitative results, Table 1 is presented, which shows the loss function
values for each model after the completion of the training process.</p>
      <p>The smaller the loss function value, the more effectively the model performs the given task. In
other words, a lower loss function value indicates that the model's predictions are closer to the
expected results. However, this statement should not be taken too literally. In general, if the loss
function value is between 0 and 1, it is considered a good result. On the other hand, if the loss function
values are almost zero, it could indicate overfitting. In this case, the model will perform perfectly on
the training dataset but will perform poorly on new data, as it has adapted too much to the training
set. Therefore, the table above does not reflect the real situation, as it shows data related to the
training process.</p>
      <p>Thus, it is now necessary to check the performance of the implemented methods in practice. This
can be done using the proposed quality criteria, which were discussed earlier. At this point, all the
necessary formulas to calculate the quality criteria are in place, meaning the theoretical base is ready.</p>
      <p>Next, the number of images required to calculate the quality criteria needs to be determined.
There is a lower limit to the number of images in the sample, which ensures sufficient objectivity of
the obtained results when using the given number of images for calculations. In other words, a
sample that is too small would lead to results that lack objectivity. To quantify the necessary
objectivity of the calculations, the allowable error value will be used. The question now is how
many images N are required to calculate the quality criterion f with a specified error . This issue
was discussed in the second section, where it was stated that the maximum value of , corresponding
to sufficient objectivity, is considered to be 10-5.
where P is the number that shows by what percentage A exceeds B, A is the metric value for the first
detection and classification method, and B is the metric value for the second detection and
classification method.
presented that illustrate the dependence of the average IoU value for each method on the number of
images N required to calculate this indicator.</p>
      <p>After substituting the corresponding values into Formula 9, the following results are obtained:
 
 
=
=
on a specific metric involves determining which method has the best value for this metric (whether
it is the highest or the lowest). After the qualitative comparison, a quantitative comparison is
conducted to determine how much one detection and classification method outperforms another
based on the chosen metric. The quantitative comparison was carried out using the P indicator:
(9)
(10)
(11)</p>
      <p>During the quantitative comparison of detection and classification methods using the IoU metric,
it was found that the Faster R-CNN model predicts the location of the specified military targets 4.5%
more accurately compared to the YOLO model, and 5.3% more accurately than the model based on
the SSD architecture.</p>
      <p>To compare the methods based on the R metric, which reflects the ratio of the number of correctly
metric, which numerically characterizes the method's ability to correctly identify the given objects,
in particular, vehicles. Figure 5 shows three graphs that illustrate the relationship between the R
metric value for each detection and classification method and the number of images N required for
the calculation of the metric.</p>
      <p>After substituting the corresponding values into the formula, the following result is obtained:
  = 0,96107,694−700,694706 ∗ 100% ≈ 1,6%, (12)
  = 0,96107,690−808,290882 ∗ 100% ≈ 5,8%. (13)</p>
      <p>After the quantitative comparison of detection and classification methods using the R metric, it
can be concluded that the method based on the YOLO model has a 1.6% better ability to detect
military targets compared to the Faster R-CNN method and a 5.8% better performance compared to
the method based on the SSD architecture.</p>
      <p>The last quality criterion for comparing methods is the average detection and classification time
for a single military target. Figure 6 shows the graphs demonstrating the dependence of the T metric
value for each method on the number of images N required for its calculation.</p>
      <p>In this case, formula 9 is not optimal, as at first glance it is obvious that one number is significantly
larger than the other. For a more visual comparison, it is better to determine how many times one
value exceeds the other:
0,09326
0,03345
0,0478
0,03345
≈ 2,79,</p>
      <p>Thus, as a result of the quantitative comparison of the three detection and classification methods
using the T metric, it can be concluded that the YOLO-based method detects and classifies a military
target on average 2.79 times faster than the Faster R-CNN-based method and 1.43 times faster than
the SSD-based method.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>After conducting a comparative analysis of the studied methods, it is difficult to definitively
determine which one is the best for detecting and classifying military targets captured by UAVs, as
none of the methods demonstrated the best results across all three quality criteria. In other words:</p>
      <p>The Faster R-CNN-based method predicts the location of military targets most accurately
(best for the IoU metric).</p>
      <p>The YOLO-based method identifies military targets the fastest (best for the T metric).
There is no clear difference in terms of detection and classification capability between Faster
R-CNN and YOLO, as both perform equally well. However, when looking at the numbers,
YOLO shows slightly better results than Faster R-CNN in the R quality criterion. This
advantage, though, is minimal (only 1.6%) and could be considered within the margin of
calculation error.</p>
      <p>As observed from the results, the SSD-based method did not outperform in any of the three quality
criteria, so it is not considered a viable candidate for the future implementation of a military target
detection and classification system for UAV video streams.</p>
      <p>Thus, a more detailed analysis of the results for the Faster R-CNN and YOLO methods is necessary
to determine the most relevant method based on the computed quality criteria. Since neither method
showed the best results across all three criteria, the next step is to prioritize each quality criterion.
This will allow for a selection based on the context of the task to be solved using these methods.</p>
      <p>The R quality criterion, which evaluates the ability of the method to detect and classify military
targets, is the most important and should be given the highest priority. This is because methods with
a higher R value are capable of detecting more camouflaged military targets, such as people in
military uniforms blending with the environment. This is particularly important in the context of
territory monitoring and control using drones. The T quality criterion, which characterizes the speed
of the method, should be given second priority because the methods will operate not with static
images, but with real-time video streams from drones. To process and analyze all frames from the
video stream in time, high-speed performance is essential. If the speed is insufficient, frames that
may contain military targets will be skipped, resulting in missed detections. Therefore, the IoU
quality criterion, which assesses the accuracy of object localization, takes third priority, as it is
important for evaluating placement accuracy, but it does not carry as much weight as the ability to
identify and classify objects.</p>
      <p>Based on the established priorities, the most relevant method for the given context will be
determined. According to the R quality criterion, which holds the highest priority, the Faster R-CNN
and YOLO methods showed nearly identical results, making it impossible to determine a clear winner
at this point. Moving to the T metric, which has the second priority, the clear winner is the
YOLObased method, as it is almost three times faster than the Faster R-CNN-based method based on the
obtained results. Considering the final priority criterion, IoU, it was noted that Faster R-CNN
performed 4.5% better than YOLO, but this advantage is minimal and not significant, as the IoU
criterion holds the lowest priority. Therefore, given that the YOLO-based method was nearly three
times faster than the Faster R-CNN-based method, the YOLO-based method was selected for the
further development of the military target detection and classification system in UAV video streams.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4 to check grammar and spelling.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , PA-YOLO:
          <article-title>Small Target Detection Algorithm with Enhanced Information Representation for UAV Aerial Photography</article-title>
          ,
          <source>in IEEE Sensors Letters. doi:10</source>
          .1109/LSENS.
          <year>2025</year>
          .
          <volume>3550406</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <source>Research on Small Face Detection in UAV Command and Control System</source>
          ,
          <source>2022 International Conference on Cyber-Physical Social Intelligence (ICCSI)</source>
          , Nanjing, China,
          <year>2022</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>72</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCSI55536.
          <year>2022</year>
          .
          <volume>9970688</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Petrivskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shevchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pokotylo</surname>
          </string-name>
          ,
          <article-title>Models and Information Technologies of Coverage of the Territory by Sensors with Energy Consumption Optimization</article-title>
          ,
          <source>In: Mathematical Modeling and Simulation of Systems, MODS 2021, Lecture Notes in Networks and Systems</source>
          , vol.
          <volume>344</volume>
          , Springer, Cham,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -89902-
          <issue>8</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Merkulova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhabska</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ivanenko</surname>
          </string-name>
          ,
          <article-title>Software for UAV Images Processing for Object Identification, 20th International Scientific Conference "Dynamical System Modeling and Stability Investigation"</article-title>
          ,
          <source>DSMSI 2023 - Volume 1: Mathematical Foundations of Information Technologies, CEUR Workshop Proceedings</source>
          , vol.
          <volume>3687</volume>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>2023</year>
          . URL: https://ceurws.org/Vol-
          <volume>3687</volume>
          /Paper_3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <fpage>Pix4Dmapper</fpage>
          . URL: https://www.pix4d.com/product/pix4dmapper-photogrammetry-software/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] DroneDeploy</article-title>
          . URL: https://www.dronedeploy.com/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] AgroScout</article-title>
          . URL: https://agro-scout.com/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards Real-Time Object Detection with Region Proposal Networks</article-title>
          ,
          <source>in IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>39</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1137</fpage>
          -
          <issue>1149</issue>
          , 1 June 2017. doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2016</year>
          .
          <volume>2577031</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cong</surname>
          </string-name>
          ,
          <source>Improved Transformer-Based SSD Detector for Airborne Object Detection</source>
          ,
          <source>2022 4th International Conference on Frontiers Technology of Information and Computer</source>
          (ICFTIC), Qingdao, China,
          <year>2022</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICFTIC57696.
          <year>2022</year>
          .
          <volume>10075226</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          , MS-YOLO:
          <article-title>Object Detection Based on YOLOv5 Optimized Fusion Millimeter-Wave Radar and Machine Vision</article-title>
          ,
          <source>in IEEE Sensors Journal</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>15</issue>
          , pp.
          <fpage>15435</fpage>
          -
          <issue>15447</issue>
          , 1 Aug.1,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/JSEN.
          <year>2022</year>
          .
          <volume>3167251</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Merkulova</surname>
          </string-name>
          , Photo Portrait,
          <source>2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics</source>
          , Telecommunications and Computer Engineering (TCSET),
          <source>Lviv-Slavske, Ukraine</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>786</fpage>
          -
          <lpage>790</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCSET49122.
          <year>2020</year>
          .
          <volume>235542</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects</article-title>
          ,
          <source>in IEEE Transactions on Neural Networks and Learning Systems</source>
          , vol.
          <volume>33</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>6999</fpage>
          -
          <lpage>7019</lpage>
          , Dec.
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2021</year>
          .
          <volume>3084827</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C. A. R.</given-names>
            <surname>Goyzueta</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. E. C. De la Cruz</surname>
            ,
            <given-names>W. A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Machaca</surname>
          </string-name>
          , Integration of U-Net,
          <article-title>ResU-Net and DeepLab Architectures with Intersection Over Union metric for Cells Nuclei Image Segmentation, 2021</article-title>
          IEEE Engineering International Research Conference (EIRCON), Lima, Peru,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/EIRCON52903.
          <year>2021</year>
          .
          <volume>9613150</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          et al.,
          <source>Using Neural Networks Application for the Font Recognition Task Solution</source>
          ,
          <year>2020</year>
          55th International Scientific Conference on Information,
          <source>Communication and Energy -170. doi:10.1109/ICEST49890</source>
          .
          <year>2020</year>
          .
          <volume>9232788</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sitaraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V. S.</given-names>
            <surname>Narayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lande</surname>
          </string-name>
          , L. M,
          <string-name>
            <surname>A. H. Shnain</surname>
          </string-name>
          ,
          <article-title>Center Intersection of Union loss with You Only Look Once for Object Detection</article-title>
          and Recognition,
          <source>2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS)</source>
          , Hassan, India,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/IACIS61494.
          <year>2024</year>
          .
          <volume>10721907</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Yurchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pylypenko</surname>
          </string-name>
          ,
          <article-title>Quantile-Based Statistical Techniques for Anomaly Detection</article-title>
          ,
          <source>Proceedings of the XX International Scientific Conference Dynamical System Modeling and Stability Investigation (DSMSI-2023), CEUR Workshop Proceedings</source>
          , vol.
          <volume>3746</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>73</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3746</volume>
          /Paper_7.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ivanchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Merkulova</surname>
          </string-name>
          , Y. Zhabska,
          <source>Mathematical Methods for Information Technology of Biometric Identification in Conditions of Incomplete Data, Proceedings of the 7th International Conference "Information Technology and Interactions" (IT&amp;I-</source>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2845</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>349</lpage>
          . URL: https://ceur-ws.org/Vol2845/Paper_31.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y. R. T.</given-names>
            <surname>Bethi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <surname>Real-Time Object</surname>
          </string-name>
          Detection and Localization in Compressive Sensed Video,
          <source>2021 IEEE International Conference on Image Processing (ICIP)</source>
          , Anchorage,
          <string-name>
            <surname>AK</surname>
          </string-name>
          , USA,
          <year>2021</year>
          , pp.
          <fpage>1489</fpage>
          -
          <lpage>1493</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICIP42928.
          <year>2021</year>
          .
          <volume>9506769</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Chwa</surname>
          </string-name>
          ,
          <string-name>
            <surname>Paste-</surname>
          </string-name>
          and
          <article-title>-</article-title>
          <string-name>
            <surname>Cut</surname>
          </string-name>
          :
          <article-title>Collective Image Localization and Classification for Real-Time Multi-Camera Object Detection</article-title>
          ,
          <source>2023 14th International Conference on Information and Communication Technology Convergence (ICTC)</source>
          ,
          <source>Jeju Island, Republic of Korea</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>740</fpage>
          -
          <lpage>742</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICTC58733.
          <year>2023</year>
          .
          <volume>10393851</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Toliupa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pylypenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tymchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kohut</surname>
          </string-name>
          ,
          <article-title>Generator for Testing Data Analytics Methods</article-title>
          ,
          <source>Proceedings of the XX International Scientific Conference Dynamical System Modelling and Stability Investigation (DSMSI-2023), CEUR Workshop Proceedings</source>
          , vol.
          <volume>3687</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>24</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3687</volume>
          /Paper_2.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panda</surname>
          </string-name>
          , U. Garain,
          <article-title>Label Dependency Aware Loss for Reliable Multi-Label Medical Image Classification</article-title>
          ,
          <string-name>
            <surname>ICASSP</surname>
          </string-name>
          <year>2025</year>
          - 2025 IEEE International Conference on Acoustics,
          <source>Speech and Signal Processing (ICASSP)</source>
          , Hyderabad, India,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICASSP49660.
          <year>2025</year>
          .
          <volume>10888215</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M. W. P.</given-names>
            <surname>Maduranga</surname>
          </string-name>
          , U. Oruthota,
          <string-name>
            <given-names>H. K. I. S.</given-names>
            <surname>Lakmal</surname>
          </string-name>
          , S. Kulatunga,
          <article-title>RSSI-Based Indoor Localization Using Deep Learning with A Custom Loss Function</article-title>
          ,
          <source>2024 8th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)</source>
          , Ratmalana, Sri Lanka,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/SLAAI-ICAI63667.
          <year>2024</year>
          .
          <volume>10844973</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>