<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Design and Development of a Lightweight YOLOv11-Based Model for UAV Image Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yan YAN</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qi Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lin Meng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Science and Engineering, Ritsumeikan University</institution>
          ,
          <addr-line>1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School of Science and Engineering, Ritsumeikan University</institution>
          ,
          <addr-line>1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>108</fpage>
      <lpage>120</lpage>
      <abstract>
        <p>As drone numbers rise and illegal flights become more common, security and privacy issues grow more serious. To monitor and manage drone flights efectively, this paper proposes YOLOv11-mini, a lightweight model improved from YOLOv11. By using GhostConv, C3Ghost, and pruning, YOLOv11-mini keeps high detection accuracy while cutting model size by 87%, making it fit for edge devices. This paper tests the model on a small custom nano-size drone dataset and applies data augmentation to boost performance. Results show that with augmentation, YOLOv11-mini increases mAP50 by about 4% over the unaugmented model, with accuracy only 2% lower than the original YOLOv11. This shows the model's strong advantages and potential in resource-limited settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Nano-sized UVA</kwd>
        <kwd>YOLOv11</kwd>
        <kwd>Small object detection</kwd>
        <kwd>Lightweight detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the rapid increase in the number of drones, illegal drone flights continue despite repeated
bans, frequently leading to incidents such as flight disruptions, public disturbances, and personal
injuries. In addition to issuing relevant laws and regulations, it is also urgent to strengthen
comprehensive supervision of drones [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, achieving around-the-clock real-time
monitoring of drones through manpower alone is dificult. Furthermore, due to their small
size and high speed, drones can easily be misidentified or completely overlooked by human
observers, especially under low-light conditions. Therefore, it is imperative to leverage more
advanced and eficient technological solutions to achieve real-time monitoring and precise
control of low-altitude drone activities, such as deploying edge devices for real-time target
detection and identification.
      </p>
      <p>
        In recent years, the rapid development of deep learning has driven continuous breakthroughs
in object detection algorithms [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2, 3, 4, 5, 6</xref>
        ]. From the initial Region-CNN(R-CNN) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], through
Single Shot MultiBox Detector(SSD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to the rapidly evolving YOLO series [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], each
technological iteration brings new vitality and expands the possibilities of object detection.
      </p>
      <p>
        Although R-CNN achieves high detection accuracy, its processing speed is slow, so it is
not suitable for real-time applications [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. SSD usually provides higher accuracy than YOLO.
However, its large model size and high computational cost make it hard to meet the demands of
scenarios that require strong real-time performance. In contrast, YOLO achieves an efective
balance between accuracy and speed, making it especially suitable for real-time detection. In
summary, lightweight improvements based on YOLO significantly enhance inference speed
while maintaining detection accuracy, which makes it an ideal choice for latency-sensitive
lowaltitude applications such as real-time drone monitoring. This article adopts the YOLOv11 model
proposed by Rahima Khanam and Muhammad Hussain in 2024 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as the baseline architecture.
Although the original YOLOv11 achieves excellent detection accuracy and inference speed,
its model size remains relatively large. For edge devices with limited computing resources,
such as drones, this makes smooth deployment challenging and may lead to inference delays,
falling short of real-time monitoring requirements. However, existing research mostly focuses
on introducing attention mechanisms, improving feature extraction modules, or optimizing
loss functions to enhance detection accuracy, while paying little attention to the deployment
and adaptation of the YOLO model on edge devices with limited computing resources. This is
particularly critical in anti-drone scenarios, where edge devices such as surveillance cameras
and drones have constrained computing power, creating an urgent need to significantly reduce
model size and computational overhead while maintaining detection performance.
      </p>
      <p>Therefore, to achieve eficient monitoring and precise control of low-altitude drone
activities, this paper proposes a lightweight object detection model based on YOLOv11, named
YOLOv11-mini. By redesigning the backbone network and introducing lightweight convolution
modules, the model size is reduced by approximately 87% compared to the original YOLOv11,
enabling eficient deployment on resource-constrained edge devices. With the help of edge
devices equipped with lightweight detection models, small drones in the monitoring area can
be automatically detected and identified around the clock. This not only improves the eficiency
and accuracy of monitoring but also significantly reduces the workload of manual patrols.</p>
      <p>The main contributions of this paper are as follows:
• In response to the detection requirements for nano-sized drones in low-altitude flight
scenarios, this paper independently collects high-resolution images and complete detailed
annotations to construct a dedicated dataset. This fills the gap of insuficient publicly
available data and lays the foundation for subsequent model training and evaluation.
• In the YOLOv11 framework, we remove the redundant detection layer and replace several
standard convolutions with GhostConv and C3Ghost, thereby significantly reducing
the number of parameters and computational cost. As a result, the model size reduce
by approximately 87% compared to the original version, making it more suitable for
deployment on edge devices with limited computing power.
• Systematically compares various data augmentation strategies (such as Mosaic and
CutMix), selects the optimal combination, and efectively improves the model’s robustness
and generalization ability. Experiments show that the enhanced YOLOv11-mini achieves
a 4% improvement in mAP50 compared to the unenhanced version, with merely a 2%
decrease in accuracy relative to the original YOLOv11, thereby significantly reducing the
computational burden while maintaining high detection performance.</p>
      <p>In summary, through comprehensive improvements in dataset construction, lightweight
network design, and data augmentation optimization, this paper enables the model to maintain
high detection accuracy while reducing its model size by 87%, thereby meeting the requirements
of real-time drone monitoring in resource-constrained devices.</p>
      <p>The remainder of this paper is organized as follows. Section 2 reviews related work.Section 3
introduces the dataset and the YOLO models relevant to this study. Section 4 provides a detailed
description of the data augmentation methods and evaluation metrics used. Section 5 presents
the experimental results along with an analysis. Finally, Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>The rapid development of drone technology has driven in-depth research into low-altitude
monitoring and counter-drone systems. The YOLO series of models has become a hot topic in
research and applications in this field due to its efective balance between detection accuracy
and real-time performance. To meet the needs of real-time monitoring in complex environments,
many scholars have made various improvements to the YOLO architecture, proposing more
adaptive detection methods.</p>
      <sec id="sec-2-1">
        <title>2.1. Improvements for the YOLO series</title>
        <p>
          Ghazlane Yasmine et al. propose a series of improvements based on the YOLOv7 model,
incorporating the CSPResNeXt module into the backbone, a transformer block with the C3TR
attention mechanism, and a decoupled head structure to enhance the model’s performance.
While ensuring an accuracy of 0.97, their model achieves an inference speed of 0.02 milliseconds
per image, successfully achieving an optimal balance between inference speed and detection
performance [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Xueqi Cheng et al. propose an IRWT-YOLO model based on YOLOv8 that
integrates object detection and image segmentation, incorporates BiFormer into the backbone
network, and introduces the RCSCAA and DCPPA modules to improve the detection of weak
objects. The proposed model improves the robustness and efectiveness of the original model in
detecting weak objects under complex infrared conditions, thereby addressing the problems of
low object visibility and background interference in infrared UAV image detection [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Ruixi
Liu et al. propose a distributed anti-drone system based on YOLOv5, which achieves
automatic target locking through a mechanical structure, efectively improves detection accuracy,
and adopts distributed cluster deployment to overcome the shortcomings of detection blind
spots and target loss. This provides a deployment concept for airport countermeasures against
lightweight UAVs and ofers theoretical guidance for future anti-UAV strategies [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Juanqin
Liu et al. propose a detection method called GL-YOMO, which combines the traditional YOLOv5
framework with multi-frame motion detection technology. It enhances the recognition of small
drone targets by fusing features of diferent scales and introducing an attention module. In
addition, they integrate the Ghost module into the network to further reduce computational
cost and improve inference eficiency, thus achieving a better balance between accuracy and
real-time performance and underscoring its potential in UAV detection applications [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Main work</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset creation</title>
        <p>The image acquisition device is a Nikon Z50II mirrorless camera (APS-C DX format) equipped
with a NIKKOR Z DX 18-140mm f/3.5-6.3 VR lens kit. The target is a nano-sized low-altitude
UAV, namely the Bitcraze Crazyflie 2.1. The standard on nano-sized drone is shown in Figure 1.
Images are captured from diferent heights and angles under bright indoor, dark indoor, and
bright outdoor conditions. The self-constructed nano-sized drone dataset shows in Figure 2.</p>
        <p>The initial dataset consists of a total of 413 images of nano-sized drones. Python scripts
are used to randomly sample 10% of the images as a validation set, 10% as a test set, and the
remaining 334 images as the training set for model development and evaluation. In the training
set, 173 shots are taken indoors and the remaining 139 shots are taken outdoors. All images are
annotated using LabelImg to generate YOLO-format label files for nano-sized drones.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Yolov11 network structure</title>
        <p>
          YOLOv1, proposed by Joseph Redmon in 2015, treats object detection as a regression problem,
allowing the detection performance to be directly optimized end-to-end [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. YOLOv11 is
oficially released on October 1, 2024, and is developed based on the YOLO series framework,
leading to significant improvements in detection accuracy and eficiency.
        </p>
        <p>
          Figure 3 shows that YOLOv11 introduces the C3k2 block and the convolutional block with
parallel spatial attention(C2PSA) attention mechanism [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Among these components, C3k2 plays a key role in enhancing the feature extraction capability.
It is an optimized version of the traditional CSP bottleneck structure in YOLOv11 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], with its
core feature being the use of two parallel convolutional layers. This design enables the extraction
of features across diferent channels, thereby improving the model’s adaptability to complex
scenes. This makes data processing more eficient while maintaining high accuracy. C2PSA
enhances multi-scale feature extraction by combining the Cross Stage Partial(CSP) structure
and the Pyramid Squeeze Attention(PSA) mechanism. Additionally, it dynamically weights
channel features through the Squeeze-and-Excitation(SE) mechanism, thereby strengthening
the responses of important channels.
        </p>
        <p>In summary, YOLOv11 achieves significant improvements in detection accuracy while
maintaining real-time performance by integrating advanced convolution techniques and innovative
attention mechanisms. This structural simplification not only significantly reduces the number
of network parameters and computational overhead, thereby improving inference speed, but
also prevents excessive aggregation of fine-grained small target information under a large
receptive field. This helps to maintain the detection sensitivity and discrimination capability
for small targets.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Lightweight improvement of the YOLOv11 model: YOLOv11-mini</title>
        <p>We propose a lightweight model YOLOv11-mini based on YOLOv11 with its architecture shown
in Figure 4.</p>
        <p>
          YOLOv11 performs well in object detection and other visual tasks due to its outstanding
inference speed and high accuracy. However, when deployed on edge devices with limited
computation resources, YOLOv11 still sufers from certain limitations. This paper draws on the
Ghost module and feature layer pruning concepts to systematically prune and modify YOLOv11
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], designing YOLOv11-mini, which retains only two output layers.
        </p>
        <p>Based on YOLOv11, we set both the depth and width multipliers to 0.25 and reduce the
number of layers in the neck and head, significantly decreasing the network’s depth and width.
To further reduce the computational burden, we replace the standard convolution with the
lightweight GhostConv module and substitute the original C3k2 structure with C3Ghost, which
efectively eliminates redundant computations, reducing the number of parameters and FLOPs.</p>
        <p>In the standard YOLOv11 architecture, there are typically three detection heads, namely p3,
p4, and p5, each responsible for detecting objects at diferent scales. In particular, the p5 output
feature map, with its larger receptive field, is mainly used for detecting larger objects in images.
However, in our target application scenarios, the drones are generally small and rely less on
large-scale feature layers. Therefore, in designing YOLOv11-mini, we remove the p5/32 output,
which is more sensitive to large object detection, and retain only the p3/8 and p4/16 output
branches, which are better suited for detecting small and medium-sized objects.</p>
        <p>This structural simplification not only significantly reduces the number of network parameters
and computational overhead, thereby improving inference speed, but also avoids the excessive
aggregation of fine-grained small target information under a large receptive field. This helps
maintain the detection sensitivity and discriminative capability for small targets.</p>
        <p>In addition, we retain the lightweight SPPF module to enhance the receptive field and feature
aggregation capability while further reducing the model size. After these improvements, the
lightweight YOLOv11-mini has a simpler network structure and a smaller model size, making it
highly suitable for deployment on edge devices with limited computing resources.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental data and evaluation metrics</title>
      <p>In this section, we use a self-constructed drone dataset and evaluate multiple data augmentation
methods to determine the optimal solution. In addition, we introduce the performance evaluation
metrics employed in this study.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental data and data augmentation methods</title>
        <p>The computer operating system used in this experiment is Windows 11 Professional Edition.
VSCode is employed to remotely connect to the server for training and testing. Python 3.10
is used as the primary programming language, and PyTorch 2.7.0 (CUDA 11.8) serves as the
deep learning framework. The dataset is a nano-sized drone dataset that is self-collected and
annotated, comprising a total of 413 images, with 334 used for training, 40 for testing, and 39
for validation.</p>
        <p>This experiment uses a total of seven data augmentation methods, which can be divided
into two categories: optical content transformation and geometric texture transformation. The
visual efect is shown in Figure 5 and is described in detail as follows.</p>
        <p>1. Optical content transformation
• Color-light: Slight adjustments are made to the hue, saturation, and brightness of the
image. The amplitude is small, slightly altering the visual characteristics while maintaining
the fundamental features of the original image, thereby improving the model’s robustness
to minor lighting changes.
• Color-medium: Building on Color-Light, it applies a wider range of brightness and contrast
adjustments, yielding more noticeable visual efects and further enhancing the model’s
generalization ability.
• Color-medium-noise: Building on Color-Medium, Gaussian noise is superimposed to
simulate sensor interference in real-world scenarios, enhancing adaptability to noisy
environments.
• Color-medium-clahe: Building on Color-Medium, CLAHE is applied to improve local
contrast, making it suitable for scenes with uneven lighting or unclear details.
2. Geometry Texture Transformation
• Mosaic: Randomly splicing 4 images into one to increase the number of target in each
batch and to enrich the combination diversity, which is especially beneficial for small
object detection.
• Mixup: The two images and their labels are linearly mixed in proportion to improve the
model’s robustness to partial occlusion and enhance its adaptability to complex scenes.
• Cutmix: Rectangular regions are randomly cropped and pasted between two images,
and labels are mixed according to the area ratio to further improve the robustness and
generalization ability of the model in complex scenes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation metrics</title>
        <p>To evaluate the accuracy of the YOLOv11-mini model in recognizing nano-sized drones, this
paper adopts performance metrics commonly used in object detection, including Precision,
Recall, mAP50, mAP50-95, as well as the number of parameters and model size. The definitions
of these metrics are provided below.</p>
        <p>=
 =</p>
        <p>+</p>
        <p>+   

 = ∑︁(+1 − ) (+1)
=1</p>
        <p>= 1 ∑︁ 


(1)
(2)
(3)
(4)</p>
        <p>The Average Precision (AP) of all classes is the area of the region below the precision-recall
curve.  represents the recall of the th value, and  (+1) represents the
highest precision value in the range  to +1. The mAP is calculated by averaging the AP of
each class in the dataset. Specifically, mAP50 refers to the mean AP when the IoU threshold is
ifxed at 0.5, whereas mAP50-95 is calculated by averaging the mAP values over IoU thresholds
ranging from 0.5 to 0.95.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental results and analysis</title>
      <p>In this section, we compare and analyze the performance of the YOLOv11 model before and
after lightweight optimization, and evaluate the performance of the YOLOv11-mini model using
various data augmentation strategies.</p>
      <sec id="sec-5-1">
        <title>5.1. Model performance comparison</title>
        <p>We use Parameters, Model Size, Precision, Recall, mAP50, and mAP50-95 as performance
evaluation metrics to compare the YOLOv11 model before and after lightweight optimization.
The results are shown in Table 1.</p>
        <p>Compared to YOLOv11, the YOLOv11-mini model has significantly fewer parameters and a
smaller model size. However, its performance metrics such as Recall and mAP50 are slightly
lower than those of YOLOv11. To improve these metrics, we further apply data augmentation
techniques to enhance the model.</p>
        <p>We use Precision, Recall, mAP50, and mAP50-95 as performance evaluation metrics to
assess the YOLOv11-mini model under eight diferent data augmentation strategies: base, light,
medium, noise, clahe, mosaic, mixup, and cutmix. The results are shown in Table 2.As shown
in Table 2, after applying Clahe data augmentation, the model achieves a precision of 1.0, a
recall of 0.837, and an mAP50 of 0.905, demonstrating the best overall performance among the
seven data augmentation methods and significantly improving detection accuracy. Notably,
the Mosaic method yields the highest mAP50-95 value (0.49), while the Medium and Cutmix
strategies also exhibit strong overall performance.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Ablation experiments and results analysis</title>
        <p>After comparing the training efects of seven data augmentation methods on the YOLOv11-mini
model, four methods that demonstrated better performance—medium color, clahe, mosaic, and
cutmix—are initially selected. Subsequently, ablation experiments are conducted to compare the
training performance of YOLOv11-mini with diferent combinations of these data augmentation
methods, aiming to determine the optimal augmentation strategy. The experimental results are
shown in Table 3.</p>
        <p>The ablation experiments on various data augmentation methods and their combinations
indicate that the combination of medium color augmentation, mosaic, and cutmix (1.0) achieves
outstanding performance across all metrics. Specifically, it achieves a precision of 0.963, recall
of 0.821, mAP50 of 0.907, and mAP50-95 of 0.493, significantly surpassing the baseline model
(mAP50 = 0.876). Moreover, the combination of medium and mosaic yields the highest
mAP5095 (0.507), demonstrating more stable detection performance across diferent IoU thresholds.
Overall, integrating medium, mosaic, and cutmix augmentation strategies efectively improves
the detection accuracy and generalization ability of the YOLOv11-mini model for nano-sized
UAV target detection. Future research can further optimize this augmentation strategy to fully
exploit the model’s potential.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Visualization of detection resultss</title>
        <p>After adopting the combined data augmentation strategy of medium, mosaic, and cutmix (1.0),
the YOLOv11-mini model proposed in this paper demonstrates excellent detection performance
on the self-constructed nano-sized low-altitude drone dataset.</p>
        <p>As illustrated in Figure 6, the model accurately locates drone targets across various
lowaltitude scenarios. Figure 7 presents the Precision–Recall curve on the test set, with an mAP0.5 of
0.906, indicating that the lightweight YOLOv11-mini still achieves high detection accuracy and
recall while significantly reducing the model’s complexity, making it suitable for deployment
on edge devices with limited computation resources.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Aiming at the real-time monitoring needs of nano-sized civilian drones in low-altitude scenarios,
this paper proposes a lightweight improved YOLOv11-mini model based on YOLOv11, and
constructs a dedicated dataset to support subsequent training and testing. Based on the original
YOLOv11 architecture, GhostConv, C3Ghost, and a pruning strategy are introduced to reduce the
model size by approximately 87%, significantly lowering the computational burden and making it
more suitable for deployment on edge devices. Furthermore, a systematic comparison of multiple
data augmentation methods and their combinations shows that the augmentation strategy
combining Medium, Mosaic, and CutMix (1.0) efectively improves detection performance, with
mAP50 increasing to 0.907 and Precision reaching 0.963. It also performs well on mAP50-95,
thus verifying the efectiveness of the proposed approach. In summary, this study achieves
significant improvements in dataset construction, network lightweight optimization, and data
augmentation, enabling YOLOv11-mini to greatly reduce model parameters and computational
overhead while maintaining excellent detection accuracy and recall. This meets the requirements
of real-time, resource-constrained low-altitude drone monitoring. Future work focuses on
exploring more eficient attention mechanisms and small object detection strategies to further
enhance the model’s robustness and generalization in complex scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative Al</title>
      <p>The author(s) have not employed any Generative Al tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          , H. T. Kim,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Joo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Survey on anti-drone systems: Components, designs, and challenges</article-title>
          ,
          <source>IEEE access 9</source>
          (
          <year>2021</year>
          )
          <fpage>42635</fpage>
          -
          <lpage>42659</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Dataset purification-driven lightweight deep learning model construction for empty-dish recycling robot</article-title>
          ,
          <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>A survey of deep learning for industrial visual anomaly detection</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>58</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>A multi-scale information fusion framework with interaction-aware global attention for industrial vision anomaly detection and localization</article-title>
          , Information
          <string-name>
            <surname>Fusion</surname>
          </string-name>
          (
          <year>2025</year>
          )
          <fpage>103356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Meng,</surname>
          </string-name>
          <article-title>3d industrial anomaly detection via dual reconstruction network</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>54</volume>
          (
          <year>2024</year>
          )
          <fpage>9956</fpage>
          -
          <lpage>9970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Mcad: Multi-classification anomaly detection with relational knowledge distillation</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>14543</fpage>
          -
          <lpage>14557</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , G. Gkioxari,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mask</surname>
          </string-name>
          r-cnn,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2961</fpage>
          -
          <lpage>2969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Berg</surname>
          </string-name>
          , Ssd:
          <article-title>Single shot multibox detector</article-title>
          , in: Computer Vision-ECCV
          <year>2016</year>
          : 14th European Conference, Amsterdam, The Netherlands,
          <source>October 11-14</source>
          ,
          <year>2016</year>
          , Proceedings,
          <source>Part I 14</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jegham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. Y.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdelatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hendawi</surname>
          </string-name>
          ,
          <article-title>Evaluating the evolution of yolo (you only look once) models: A comprehensive benchmark study of yolo11 and its predecessors</article-title>
          ,
          <source>arXiv preprint arXiv:2411.00201</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Khanam</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Hussain, Yolov11: An overview of the key architectural enhancements</article-title>
          ,
          <source>arXiv preprint arXiv:2410.17725</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Yasmine</surname>
          </string-name>
          , G. Maha,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hicham</surname>
          </string-name>
          ,
          <article-title>Anti-drone systems: An attention based improved yolov7 model for a real-time detection and identification of multi-airborne target</article-title>
          ,
          <source>Intelligent Systems with Applications</source>
          <volume>20</volume>
          (
          <year>2023</year>
          )
          <fpage>200296</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Cheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nuo</surname>
          </string-name>
          ,
          <article-title>Irwt-yolo: A background subtraction-based method for anti-drone detection</article-title>
          ,
          <source>Drones</source>
          <volume>9</volume>
          (
          <year>2025</year>
          )
          <fpage>297</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Cao,</surname>
          </string-name>
          <article-title>Research on the anti-uav distributed system for airports: Yolov5-based auto-targeting device</article-title>
          ,
          <source>in: 2022 3rd International Conference on Computer Vision</source>
          , Image and
          <string-name>
            <given-names>Deep</given-names>
            <surname>Learning</surname>
          </string-name>
          &amp; International Conference on Computer Engineering and
          <article-title>Applications (CVIDL &amp; ICCEA)</article-title>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>864</fpage>
          -
          <lpage>867</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plotegher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Roura</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. de Souza Junior</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Real-time detection for small uavs: Combining yolo and multi-frame motion analysis</article-title>
          ,
          <source>IEEE Transactions on Aerospace and Electronic Systems</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <article-title>You only look once: Unified, real-time object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Polinar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Al Jastin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. A.</given-names>
            <surname>Daño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Aparicio</surname>
          </string-name>
          ,
          <article-title>Deep learning approach for weed detection to determine soil condition</article-title>
          ,
          <source>in: 2025 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream)</source>
          , IEEE,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>C3ghost and c3k2: performance study of feature extraction module for small target detection in yolov11 remote sensing images</article-title>
          , in: Second International Conference on Big Data,
          <string-name>
            <given-names>Computational</given-names>
            <surname>Intelligence</surname>
          </string-name>
          , and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (BDCIA
          <year>2024</year>
          ), volume
          <volume>13550</volume>
          ,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          ,
          <year>2025</year>
          , pp.
          <fpage>464</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>