<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>comprehensive survey on object detection YOLO</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiangheng Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hengyi Li</string-name>
          <email>lihengyi@fc.ritsumei.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuebin Yue</string-name>
          <email>yue-xb@fc.ritsumei.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lin Meng</string-name>
          <email>menglin@fc.ritsumei.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Science and Engineering, Ritsumeikan University</institution>
          ,
          <addr-line>1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School of Science and Engineering, Ritsumeikan University</institution>
          ,
          <addr-line>1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research Organization of Science and Technology, Ritsumeikan University</institution>
          ,
          <addr-line>1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>2</volume>
      <fpage>8</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>As a single-stage object detection framework, the YOLO (You Only Look Once) technique has emerged as a prominent technique for various object detection tasks owing to its impressive balance between speed and precision. This research article presents a comprehensive review of the YOLO family of algorithms. This review covers the evolutionary journey of YOLO from its initial release to the latest versions, encompassing an in-depth analysis of the performance and critical characteristics exhibited by each iteration. Particular emphasis is given to exploring the applications of YOLO in diverse domains, focusing on its role in real-time object detection on embedded systems. Furthermore, the paper delves into the latest advancements in compressing algorithms for optimizing the cumbersome YOLO models and practical implementation examples. The potential of deploying YOLO on resource-constrained devices is further unlocked by addressing the challenge of model size reduction. Finally, this study outlines potential research trends and improvements for the YOLO family of algorithms, including novel architectural designs and innovative training strategies. Overall, the thorough investigation presented in this review is a valuable reference for researchers seeking to explore the YOLO framework and its evolving landscape in object detection.</p>
      </abstract>
      <kwd-group>
        <kwd>YOLO</kwd>
        <kwd>object detection</kwd>
        <kwd>deep learning</kwd>
        <kwd>application</kwd>
        <kwd>compressing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Object detection is the main task of computer vision which is to locate the object of interest
from the input image target and then accurately judge the class of each target of interest. In
recent years, object detection as a popular task in computer vision due to its wide range of
applications and recent technological breakthroughs.</p>
      <p>
        Traditional target detection algorithm uses sliding window or image segmentation technology
to generate a large number of candidate regions and then extracts image features for each
The 5th International Symposium on Advanced Technologies and Applications in the Internet of Things (ATAIT 2023),
CEUR
Workshop
Proceedings
candidate region such as Histograms of Oriented Gradients (HOG)[
        <xref ref-type="bibr" rid="ref1 ref54">1</xref>
        ]. These features are passed
to a classifier like Support Vector Machine (SVM)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to judge the category of the candidate region.
However, traditional target detection algorithm often needs to produce many candidate boxes,
resulting in low eficiency. The speed and accuracy of detection can not meet the requirement
of practical application, so developing a traditional target detection algorithm falls into the
bottleneck. The deep convolutional neural network has been applied in various domains since
AlexNet[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] was released in 2012. Deep learning provides a new method for object detection.
Since then, a series of models based on deep learning have been proposed, and researchers have
focused more on deep learning.
      </p>
      <p>
        Object detection based on deep learning can be divided into two categories according to
detection methods: Region-based and regression-based. The region-based system is known
as a two-stage detector that first determines the candidate boxes of the samples and then
classifies the samples through the convolutional neural networks (CNN), such as Regional CNN
(R-CNN)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The regression-based system is called a one-stage detector that does not generate candidate
boxes during processing and directly realizes target detection based on a specific regression
analysis. The comparative analysis shows that the characteristics of the two methods are
diferent, the two-stage detectors were better than one-stage models in accuracy, but the
realtime performance was slightly slower. Due to one-stage detectors’ comprehensive performance
and excellent operational eficiency, researchers have focused more on the one-stage detector.
The most typical representative of one-stage detectors is YOLO[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Due to the superior performance of YOLO, researchers have made many improvements to it,
and YOLO has been updated to YOLOv8[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In addition, there are YOLOR[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and YOLOX[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Shao et al.[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] provide a detailed description of the YOLO algorithms, but the article is limited
to YOLOv5[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and does not include the latest version of the YOLO family. Terven et al.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
make a summary of YOLO but need to be more intuitive. In this article, we introduce the
features of YOLOv1-v8, YOLOR, and YOLOX to give researchers an understanding of the YOLO
families. It provides help for researchers to choose models according to their own needs.
      </p>
      <p>The rest of this paper is organized as follows. The architectures of the YOLO family are
discussed in Section 2 and summarized according to structure, image size, AP,  50, FPS, and
Parameters. Section 3 elaborates on the improvements and applications of YOLO in various
domains. Section 4 introduces model compression and summarizes the applications of compressed
YOLO models. Finally, section 5 summarizes the development trend of the YOLO framework
and gives an outlook on the future of YOLO.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Systematic overview on the key features of YOLO families</title>
      <p>
        The initial YOLO was presented by Joseph Redmon et al. in CVPR 2016. The YOLO means “You
Only Look Once,” which glances at an image like a human and ofhandedly knows what object
is in the image, what they are doing, and where they are. Unlike traditional object detection
models based on deep learning, YOLO had excellent accuracy and speed as a regression-based
object detection algorithm. YOLO is an advanced one-stage object detection framework that has
evolved over the years and spawned several versions. This section introduces the diferences
between YOLOv1-YOLOv8, YOLOX, and YOLOR. YOLO algorithm forgoes the traditional sliding
window technique, and it divides the input image into  × grids, each of which predicts B
bounding boxes of the same class and the confidence of each grid for  diferent classes. Each
bounding box predicts five values: (  , , ,ℎ, ), representing the bounding box’s position, size,
and confidence, respectively. Each grid predicts (  ×5+ ) values, after which Non-Maximum
Suppression (NMS) is used to remove duplicate detections.
2.1. YOLOv1
Accuracy, including classification accuracy and localization accuracy, and detection speed are
essential criteria to judge the quality of image object detection model[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The YOLO model
does not need to generate candidate boxes and classify them through the CNN network but
directly regresses the candidate boxes and categories of the target in multiple positions of the
image. YOLOv1 resizes the input image to 448×448 and trains it with the features extracted by
CNN. And then, YOLOv1 processes the prediction results to achieve end-to-end object detection.
      </p>
      <p>
        YOLOv1[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] uses a backbone network similar to GoogLeNet[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], with 24 convolutional layers
and two fully connected layers, and 1×1 convolutional layers are used to reduce the number
of feature maps. YOLOv1 is pre-trained on ImageNet[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and then transferred to validate on
VisualObject Classes (VOC) dataset. YOLOv1 divides the input image into 7×7 grids, and each
grid predicts two bounding boxes, so there are 7×7×2 bounding boxes. A maximum of 49 targets
were identified. YOLOv1 is not good at identifying dense targets and small targets.
      </p>
      <p>
        Evaluated on the PASCAL VOC2007, YOLOv1 scores 63.4% average precision (AP).
2.2. YOLOv2
YOLOv2[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] inspires by the architecture of VGG[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and constructs a Darknet-19 network that
contains 19 convolutional layers and five max pooling layers. YOLOv1 uses the fully connected
layer to predict the bounding box directly and loses more spatial information, causing inaccurate
positioning. YOLOv2 introduces anchor boxes from Faster R-CNN[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] instead of fully connected
layers to predict bounding boxes. Furthermore, YOLOv2 uses batch normalization to improve
convergence—the authors through changing the input size to achieve robustness.
      </p>
      <p>In addition, to achieve the goal of extending object detection to objects lacking detection
samples, YOLOv2 uses various datasets to optimize the training jointly. The WordTree method is
trained synchronously on the ImageNet classification dataset and MS COCO dataset to achieve
real-time detection with more than 9000 object categories. The improved YOLOv2 is also known
as YOLO9000.</p>
      <p>
        Evaluated on the PASCAL VOC2007, YOLOv2 achieves 78.6% AP, which is 15.2% higher than
YOLOv1.
2.3. YOLOv3
YOLOv3[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] uses the Darknet-53 as the backbone. The architecture of YOLOv3 draws on the
residual structure of ResNet[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to deepen the network structure, which solves the problem of
network gradient explosion that makes the network dificult to converge. YOLOv3 utilizes a
method similar to Feature Pyramid Network (FPN)[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and performs multi-scale training to
predict three boxes at three diferent scales. Moreover, YOLOv3 also uses k-means to determine
the prior of the bounding box of the anchor box. Unlike YOLOv2, YOLOv3’s architecture uses
three prior boxes for the scales of the large, medium, and small objects. Furthermore, feature
maps of three diferent scales are used for object detection by feature fusion, and logistics is
used to replace softmax for category prediction to achieve multi-label object detection. The
proposed network not only improves the performance of small objects but also achieves 3 to 4
times faster than previous YOLO models when the bounding box prediction is not strict, and
the detection accuracy is similar.
      </p>
      <p>Due to the rapid development of computer vision, evaluation datasets that can better reflect the
comprehensive performance of detection algorithms are needed. When YOLOv3 was released,
the evaluation benchmark for object detection changed from PASCAL VOC to MS COCO.
Therefore, YOLOv3 and subsequent YOLO models are evaluated on the MS COCO dataset. The
YOLOv3 algorithm meets the accuracy and speed requirements of real-time detection and has
become one of the preferred target detection algorithms in the engineering field.</p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, YOLOv3 achieves 31.0% AP and 55.3%
 50 at 20 FPS.
2.4. YOLOv4
After YOLOv3, YOLOv4[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] adopts various improvement methods, which are divided into
bag-of-freebies and bag-of-specials: the bag-of-freebies refer to modules that improve training
without afecting inference speed, and the bag-of-specials refer to modules that have less impact
on inference time and higher performance returns. The key features include Cross Stage Partial
(CSP)[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and Mish activation function[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] are adopted in the backbone network. CSPNet
solves the bottleneck of the repetition of gradient information in other large neural network
frameworks. It uses a unique method to integrate the gradient into the feature map fully, so
this method can efectively reduce the number of parameters and FLOPs values of the model.
      </p>
      <p>
        Therefore, compared with other YOLO models, YOLOv4 has higher accuracy while
maintaining a higher inference speed. Furthermore, YOLOv4 is more suitable for training on a
single GPU. The architecture of YOLOv4 operates CSPDarknet-53 as the backbone. For the
neck, authors also use tricks from YOLOv3-SPP, including a modified version of spatial pyramid
pooling (SPP)[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], multi-scale prediction, and a modified path aggregation network (PANet)[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
instead of FPN and improved Spatial Attention Module (SAM)[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Moreover, anchor boxes are
used in the head to extract features.
      </p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, YOLOv4 achieves 41.2% AP and 62.8%
 50 at over 96 FPS by NVIDIA Tesla V100.
2.5. YOLOv5
The basic structure of YOLOv5[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is similar to YOLOv4. The significant diference is the scaling
based on diferent channels. YOLOv5 provides five scale models of YOLOv5-N (nano) /S (small)
/ M (medium) / L (large) / X (extra large) are constructed from small to large models. Pytorch
develops YOLOv5, which is easier to deploy on hardware than YOLOv4. As of this writing,
oficial papers have yet to be published for YOLOv5. According to the homepage, YOLOv5 has
been updated to the seventh edition. In the latest version, it is capable of handling classification
and instancing segmentation tasks and speeds up training.
      </p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, with image size is 1536×1536, YOLOv5x6
obtains 55.8%AP. In case the image size is 640×640, YOLOv5x achieves 50.7% AP and exceeds
200 FPS on NVIDIA Tesla V100.
2.6. YOLOR
The YOLOv4 team releases You Only Learn One Representation (YOLOR)[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in 2021. They
use implicit knowledge to address the fact that features extracted from previously trained
convolutional neural networks are often less adaptable to other problems.
      </p>
      <p>Human beings can acquire explicit knowledge through regular learning and implicit
knowledge through subconscious learning. Even things that people have not seen can be judged by
experience. YOLOR proposes a unified network combining implicit and explicit knowledge to
give the network a learning ability similar to the human brain. YOLOR can be applied in many
ifelds, such as segmentation and detection.</p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, YOLOR-D6 achieves 55.4% AP and 73.3%
 50 at 30 FPS by NVIDIA Tesla V100. The results demonstrate that the performance of all
tasks improves after introducing implicit knowledge into the neural network.
2.7. YOLOX
YOLOX[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is released by Megvii Technology in 2021. The model is based on YOLOv3 by
Pytorch. Compared with other YOLO models, YOLOX has three main changes: decoupled head,
anchor-free, and advanced label assigning strategy (SimOTA).
      </p>
      <p>
        Decoupled head: The conflict between classification and regression tasks is an unavoidable
problem[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ][
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. According to the author’s experimental analysis, the coupled detection head
may damage the performance, so the head is replaced with a decoupled head. The decoupled
head is divided into two parts, one for regression tasks and the other for classification tasks.
The structure speeds up the convergence of the network.
      </p>
      <p>
        Anchor-free: Although the anchor mechanism works well for specific domains, it increases
the complexity of the detection head and may cause delays when deployed on edge hardware.
Inspired by the target detection models of FCOS[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], YOLOX uses an anchor-free mechanism
that reduces the number of parameters and GFLOPs of the detector and makes the model faster.
      </p>
      <p>
        SimOTA: Based on the research of OTA[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], YOLOX reconsiders the tag assignment from a
global perspective and proposes to formulate the assignment process as an Optimal Transport
(OT) problem. The SimOTA technique proposed by YOLOX achieves better performance with
reduced training time.
      </p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, YOLOX-L achieves the best performance
is 50.1% AP on COCO at a speed of 68.9 FPS on NVIDIA Tesla V100.
2.8. YOLOv6
The Meituan Vision AI Department released YOLOv6[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] in 2022. The overall structure refers
to YOLOv4 but uses a more advanced mechanism. Firstly, they use a new backbone eficiency
developed based on RepVGG[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Secondly, the neck of YOLOv6 adopts the PANet[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], and
RepPAN is obtained after the neck is enhanced. YOLOv6’s architecture is similar to YOLOX.
YOLOv6 also uses Eficient Decoupled Head and anchor-free technology. In addition, YOLOv6
also uses a self-distillation strategy to reduce the cost of reasoning. Similar to YOLOv5, YOLOv6
also provides multiple versions to facilitate model quantification and hardware deployment. It
is also suitable for application in the industrial field.
      </p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, YOLOv6-L achieves an AP of 52.5% and
 50 of 70.0% on a NVIDIA Tesla T4 in the same environment with TensorRT.
2.9. YOLOv7
The research team of YOLOv4 and YOLOR propose YOLOv7[
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] 2022. YOLOv7 outperforms all
known object detectors in speed and accuracy, ranging from 5 FPS to 160 FPS. Like YOLOv4,
YOLOv7 is trained from scratch on the MS COCO dataset.
      </p>
      <p>
        The main changes in the architecture of YOLOv7 are Extended eficient layer aggregation
networks (E-ELAN) by improved ELAN and Model scaling for concatenation-based models.
If more computational blocks are stacked indefinitely, it may destroy the stable state of the
network. ELAN[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] enables the deeper network to learn and converge eficiently by controlling
the shortest and longest gradient path. E-ELAN proposed by YOLOv7 further takes expand,
shufles, and merges cardinality to achieve the ability to continuously enhance the learning
ability of the network without destroying the original gradient path.
      </p>
      <p>Unlike architectures such as ResNet, applying model scaling to a concatenation-based
architecture results in a change in the ratio of input and output channels, which leads to the model’s
ineficiency for hardware. YOLOv7 proposed a composite scaling approach that maintains the
characteristics of the model at the time of initial design and maintains the optimal architecture.
The composite scaling method proposed by YOLOv7 maintains the model’s properties at the
initial design. It preserves the optimal structure by scaling the width factor on the transition
layer with the same amount of variation.</p>
      <p>
        Evaluated on the MS COCO dataset test-dev 2017, when input image size is 640×640, YOLOv7
achieves AP of 51.4% and  50 of 69.7% at 161 FPS by NVIDIA Tesla V100.
2.10. YOLOv8
The YOLOv5 team releases YOLOv8[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in January 2023, and an oficial article still needs to be
published. The main improvement as follows:
      </p>
      <p>Backbone: The backbone of YOLOv8 is CSPDraknet-53. As YOLOv5, YOLOv8’s C3 module
is replaced by the C2f module with richer gradient flow, and diferent channel numbers are
adjusted for diferent scale models to achieve further lightweight. Moreover, YOLOv8 still uses
the SPPF module used in YOLOv5.</p>
      <p>Head: The Head part has two significant improvements compared with YOLOv5. Firstly, it
is replaced with the current mainstream Decoupled-Head, which separates the classification
and detection. Secondly, it is also changed from Anchor-Based to Anchor-Free.</p>
      <p>
        Loss: YOLOv8 abandons the previous IOU matching or unilateral ratio distribution method
but takes the Task-Aligned Assigner positive and negative sample matching method.
Furthermore, YOLOv8 introduced Distribution Focal Loss (DFL)[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
      </p>
      <p>Data augmentation can improve model performance, but introducing mosaic augmentation
in training may have adverse efects. YOLOv8 turns of the Mosaic augmentation in the last
ten epochs, improving accuracy. To meet the needs of diferent scenarios, YOLOv8 provides
diferent size models of N / S / M / L / X scales.</p>
      <p>Evaluated on MS COCO dataset test-dev 2017, YOLOv8X achieves an AP of 53.9% with 283
FPS on NVIDIA Tesla A100 and TensorRT.
2.11. Summary
The architecture of the YOLO families is shown in Table??, including the backbone, neck, and
anchor. YOLOv2 first proposes darknet-19 as the backbone, and YOLOv3 deepened the 19
convolutional layers to 53 layers. Almost all subsequent YOLO models use Darknet-53 or an
improved version of darknet-53 as the backbone architecture. The initial YOLO does not use
anchors, which were used in YOLOv2, and improves the prediction accuracy until YOLOX takes
an anchor-free approach and performs well. Since then, subsequent versions of YOLO dropped
the use of anchors.</p>
      <p>Table2 shows the performance, size, FPS, parameters, and GPU used for training mainstream
YOLO versions. Among them, YOLOv1 and YOLOv2 were tested on PASCAL VOC2007. When
YOLOv3 was released, the benchmark for object detection had changed from PASCAL VOC to
Microsoft COCO. Therefore, the performance of subsequent versions is tested on Microsoft COCO.
In addition, the YOLOv6 and YOLOv8x models are quantized by TensorRT. From the parameters
in Table2, YOLO increasingly favors lighter models. In YOLOv8, the model YOLOv8v8m is only
3.7% less than the  50 of YOLOv8x, but the parameters are only about 38% of that of YOLOv8x,
which is 25.9M. In many versions of YOLO, researchers can choose models according to their
needs to find a balance between accuracy and speed.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Improvements and Applications</title>
      <p>
        YOLO models have been used for Industry, AIoT, Health Care, and the Protection of Cultural
Heritage. In industry, many YOLO applications exist for robots, trafic, and personal protection
equipment. In terms of the empty-dish recycling robots, Yue et al.[
        <xref ref-type="bibr" rid="ref36">36</xref>
        ][
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] solve the problem
that the traditional object detection model requires store parameters, and propose a lightweight
dish detection model based on the YOLOX for an empty-dish recycling robot. Ge et al.[
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]
through lightweight YOLO-GG to enable the recycling of empty plates eficiently. For trafic,
He et al.[
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] design a flexible and eficient one-stage object detection network FE-YOLO for
the rail transit scene. Li et al.[
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] optimize the YOLOv5 model and propose a helmet detection
system to ensure the safety of workers.
      </p>
      <p>
        The field of AIOT has also attracted much attention. AIoT integrates artificial intelligence
(AI) technology and the Internet of Things (IoT) in practical applications. Morioka et al.[
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]
propose a YOLO model-based Android system for ancient text recognition, implemented by
communicating with a server equipped to recognize AI models. Building Smart City Trafic
Management Systems based on artificial intelligence and big data has become a trend. Liu et
al.[
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] use YOLOv3 to detect vehicles and estimate their length by image processing.
      </p>
      <p>
        YOLO is also widely used in other health care, [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] propose a system based on YOLOv4-tiny
that was quantified and deployed on Jetson-nano at the beginning of COVID-19. The system
identifies the wearing status of masks and measures social distance. It efectively protects
people’s health. Zhuang et al.[
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] combine YOLO and the improved two-dimensional continuity
equation for the cardiac Vector Flow Mapping (VFM) analysis and evaluation.
      </p>
      <p>
        Besides, ancient books record much information, and decoding these classic books is favorable
for studying history, politics, and culture. Liu[
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] and Fujikawa[
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] use YOLO to detect and
identify Oracle Bone Inscription (OBI).
      </p>
      <p>To sum up, YOLO models have been shown to play an essential role in various fields. It has
become a trend to optimize the YOLO model for diferent applications.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Compression methods for YOLO models</title>
      <p>
        The YOLO model has become more complex in pursuit of better performance. Model
compression has become the focus of research to facilitate deployment on edge devices. This
section introduces several techniques for model compression, such as model pruning, parameter
quantization, knowledge distillation, and lightweight model design[
        <xref ref-type="bibr" rid="ref47">47</xref>
        ].
      </p>
      <p>
        Model Pruning: Model pruning is achieved by searching for redundant layers/channels in
the model and deleting them with little or no impact on performance. By pruning the
YOLOv3tiny network, Shi et al.[
        <xref ref-type="bibr" rid="ref48">48</xref>
        ] reduce the network computation by 68.7% Wu et al.[
        <xref ref-type="bibr" rid="ref49">49</xref>
        ] propose
a real-time apple flower detection using the channel-pruned YOLOv4 model. The number of
parameters is reduced by 96.74%.
      </p>
      <p>
        Parameter Quantization: Parameter quantization converts floating-point calculations to
low-bit-rate integer calculations, such as converting float32 to int8 or int4. We quantize the
YOLOv4 model on 8-bit through TensorRT and deploy it to the Jetson Nano, and it achieves the
purpose of real-time detection of dirty eggs[
        <xref ref-type="bibr" rid="ref50">50</xref>
        ]. Wang et al.[
        <xref ref-type="bibr" rid="ref51">51</xref>
        ] deploy the pruned YOLOv3
detection model on the FPGA, and the model size is reduced by 80% with little change in
accuracy.
      </p>
      <p>
        Knowledge Distillation: The teacher network is a complex pre-trained network, and the
student network is a simple small network. By transferring knowledge, the student network
that is more suitable for reasoning can be obtained through the teacher network. Chen et al.[
        <xref ref-type="bibr" rid="ref52">52</xref>
        ]
propose a lightweight ship detector by knowledge distillation. Xing et al.[
        <xref ref-type="bibr" rid="ref53">53</xref>
        ] use ResNet101 as
the teacher network and DD-YOLO as the student network, reducing the model complexity to
61.4%, which is more suitable for mobile deployment.
      </p>
      <p>
        Lightweight Model Design: Lightweight DNN model design refers to the redesign based
on the existing deep neural network structure to reduce the number of parameters and
computational complexity. Liu et al.[
        <xref ref-type="bibr" rid="ref55">54</xref>
        ] replace the backbone of YOLOv3 with ShufleNet to realize
real-time vehicle detection. Liu et al.[
        <xref ref-type="bibr" rid="ref56">55</xref>
        ] uses the backbone of YOLOv4 with Mobilenetv3,
which improves the accuracy of an extensive pedestrian detection network.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper summarizes the models of the YOLO series and provides a detailed analysis of the
critical feature of each model.Afterward, the application of YOLO in diferent scenarios was
introduced. Finally, the method of YOLO model compression was briefly described, and an
illustration of the application of YOLO in model compression.</p>
      <p>Although the YOLO series is the leader in the speed-accuracy balance in the field of target
detection, its main work is for the computer side. In further work, how to make YOLO lighter
and faster is worth pondering, especially embedded devices such as Nvidia Jetson Nano and
Raspberry Pi. Moreover, it will become a trend to carry DNN models on FPGA to make the
model run more eficiently. In addition, the technology combining AI and IoT will also bring
more convenience to human life. Finally, since the release of YOLOv4, integrating various
advanced algorithms has become an essential way to develop the YOLO algorithm. With the
development of the YOLO framework, YOLO is more versatile and powerful and will be applied
in a broader range of fields.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          ,
          <article-title>Histograms of oriented gradients for human detection</article-title>
          ,
          <source>in: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05)</source>
          , volume
          <volume>1</volume>
          , San Diego, CA, USA,
          <year>2005</year>
          , pp.
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 0 5 . 1 7</volume>
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Cristianini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shawe-Taylor</surname>
          </string-name>
          ,
          <article-title>An introduction to support vector machines and other kernel-based learning methods</article-title>
          , Cambridge university press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>60</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          , J. Malik,
          <article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>
          ,
          <source>in: 2014 IEEE Conference on Computer Vision</source>
          and Pattern Recognition, Columbus,
          <string-name>
            <surname>OH</surname>
          </string-name>
          , USA,
          <year>2014</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 4 . 8</volume>
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <article-title>You only look once: Unified, real-time object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR), Las Vegas</article-title>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA,
          <year>2016</year>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 6 . 9</volume>
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Glenn</given-names>
            <surname>Jocher</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ayush</given-names>
            <surname>Chaurasia</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jing</given-names>
            <surname>Qiu</surname>
          </string-name>
          , Yolo by ultralytics,
          <year>2023</year>
          . URL: https: //github.com/ultralytics/ultralytics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.-H.</given-names>
            <surname>Yeh</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>You only learn one representation: Unified network for multiple tasks</article-title>
          ,
          <source>arXiv preprint arXiv:2105.04206</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Yolox: Exceeding yolo series in
          <year>2021</year>
          , arXiv preprint arXiv:
          <volume>2107</volume>
          .08430 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <article-title>A survey of deep learning-based object detection</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>79</volume>
          (
          <year>2020</year>
          ).
          <source>doi:1 0 . 1 0 0 7 / s 1 1</source>
          <volume>0 4 2 - 0 2 0 - 0 8 9 7 6 - 6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Glenn</surname>
            <given-names>Jocher</given-names>
          </string-name>
          , Yolov5 by ultralytics,
          <year>2020</year>
          . URL: https://github.com/ultralytics/yolov5.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Terven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cordova-Esparza</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of yolo: From yolov1 to yolov8 and beyond</article-title>
          ,
          <source>arXiv preprint arXiv:2304.00501</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <article-title>Object detection in 20 years: A survey</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>111</volume>
          (
          <year>2023</year>
          ).
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ J P R O C</surname>
          </string-name>
          .
          <volume>2 0 2 3 . 3 2 3 8 5 2 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          ,
          <article-title>Going deeper with convolutions</article-title>
          ,
          <source>in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , Boston, MA, USA,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 5 . 7 2 9 8 5 9 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , et al.,
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>115</volume>
          (
          <year>2015</year>
          ).
          <source>doi:1 0 . 1 0 0 7 / s 1 1</source>
          <volume>2 6 3 - 0 1 5 - 0 8 1</volume>
          <fpage>6</fpage>
          -
          <lpage>y</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , Yolo9000: Better, faster, stronger, in: 2017 IEEE Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR), Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , pp.
          <fpage>6517</fpage>
          -
          <lpage>6525</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 7 . 6 9</volume>
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          ,
          <source>arXiv preprint arXiv:1409.1556</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fast</surname>
          </string-name>
          r-cnn,
          <source>in: 2015 IEEE International Conference on Computer Vision</source>
          (ICCV), Santiago, Chile,
          <year>2015</year>
          , pp.
          <fpage>1440</fpage>
          -
          <lpage>1448</lpage>
          . doi:
          <article-title>1 0 . 1 1 0 9 / I C C V . 2 0 1 5 . 1 6 9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Farhadi,</surname>
          </string-name>
          <article-title>Yolov3: An incremental improvement</article-title>
          , CoRR abs/
          <year>1804</year>
          .02767 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1804</year>
          .02767.
          <article-title>a r X i v : 1 8 0 4 . 0 2 7 6 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: 2016 IEEE Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR), Las Vegas</article-title>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 6 . 9</volume>
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hariharan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Feature pyramid networks for object detection</article-title>
          ,
          <source>in: 2017 IEEE Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR), Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , pp.
          <fpage>936</fpage>
          -
          <lpage>944</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 7 . 1 0</volume>
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochkovskiy</surname>
          </string-name>
          , C.-Y. Wang, H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>Yolov4: Optimal speed and accuracy of object detection</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>10934</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>C.-Y. Wang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. Mark Liao</surname>
          </string-name>
          , Y.
          <string-name>
            <surname>-H. Wu</surname>
            , P.-Y. Chen,
            <given-names>J.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>I.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Yeh</surname>
          </string-name>
          ,
          <article-title>Cspnet: A new backbone that can enhance learning capability of cnn</article-title>
          , in: 2020 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle</article-title>
          , WA, USA,
          <year>2020</year>
          , pp.
          <fpage>1571</fpage>
          -
          <lpage>1580</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R W 5
          <volume>0 4 9 8 . 2 0 2 0 . 0 0 2 0 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <article-title>Mish: A self regularized non-monotonic activation function</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>08681</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>37</volume>
          (
          <year>2015</year>
          ).
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ T P A M I</surname>
          </string-name>
          .
          <volume>2 0 1 5 . 2 3 8 9 8 2 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <article-title>Path aggregation network for instance segmentation</article-title>
          ,
          <source>in: 2018 IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition, Salt Lake City,
          <string-name>
            <surname>UT</surname>
          </string-name>
          , USA,
          <year>2018</year>
          , pp.
          <fpage>8759</fpage>
          -
          <lpage>8768</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 8 . 0 0 9 1 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Woo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          , J.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Kweon</surname>
          </string-name>
          , Cbam:
          <article-title>Convolutional block attention module</article-title>
          , in: Computer Vision - ECCV
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R .
          <volume>2 0 1 8 . 0 0 9 1 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Revisiting the sibling head in object detector</article-title>
          ,
          <source>in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , Seattle, WA, USA,
          <year>2020</year>
          , pp.
          <fpage>11560</fpage>
          -
          <lpage>11569</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R 4
          <volume>2 6 0 0 . 2 0 2 0 . 0 1 1 5 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>Rethinking classification and localization for object detection</article-title>
          ,
          <source>in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , Seattle, WA, USA,
          <year>2020</year>
          , pp.
          <fpage>10183</fpage>
          -
          <lpage>10192</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R 4
          <volume>2 6 0 0 . 2 0 2 0 . 0 1 0 2 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , T. He,
          <article-title>Fcos: Fully convolutional one-stage object detection</article-title>
          ,
          <source>in: 2019 IEEE/CVF International Conference on Computer Vision</source>
          (ICCV), Seoul, Korea (South),
          <year>2019</year>
          , pp.
          <fpage>9626</fpage>
          -
          <lpage>9635</lpage>
          . doi:
          <article-title>1 0 . 1 1 0 9 / I C C V . 2 0 1 9 . 0 0 9 7 2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yoshie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Ota:
          <article-title>Optimal transport assignment for object detection</article-title>
          ,
          <source>in: 2021 IEEE/CVF Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR), Nashville</article-title>
          ,
          <string-name>
            <surname>TN</surname>
          </string-name>
          , USA,
          <year>2021</year>
          , pp.
          <fpage>303</fpage>
          -
          <lpage>312</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R 4
          <volume>6 4 3 7 . 2 0 2 1 . 0 0 0 3 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          , M. Cheng, W. Nie, et al.,
          <article-title>Yolov6: A single-stage object detection framework for industrial applications</article-title>
          ,
          <source>arXiv preprint arXiv:2209.02976</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , N. Ma, J. Han,
          <string-name>
            <surname>G</surname>
          </string-name>
          . Ding,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Repvgg:
          <article-title>Making vgg-style convnets great again</article-title>
          ,
          <source>in: 2021 IEEE/CVF Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR), Nashville</article-title>
          ,
          <string-name>
            <surname>TN</surname>
          </string-name>
          , USA,
          <year>2021</year>
          , pp.
          <fpage>13728</fpage>
          -
          <lpage>13737</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / C V P R 4
          <volume>6 4 3 7 . 2 0 2 1 . 0 1 3 5 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>C.-Y. Wang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bochkovskiy</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>
          ,
          <source>arXiv preprint arXiv:2207.02696</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>C.-Y. Wang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
            ,
            <given-names>I.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Yeh</surname>
          </string-name>
          ,
          <article-title>Designing network design strategies through gradient path analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2211.04800</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shimizu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kawamura</surname>
          </string-name>
          , L. Meng,
          <article-title>Yolo-gd: A deep learning-based object detection algorithm for empty-dish recycling robots</article-title>
          ,
          <source>Machines</source>
          <volume>10</volume>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 3 3 9 0 / m a c h i n e s 1 0</source>
          <volume>0 5 0 2 9 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>An ultralightweight object detection network for empty-dish recycling robots</article-title>
          ,
          <source>IEEE Transactions on Instrumentation and Measurement</source>
          <volume>72</volume>
          (
          <year>2023</year>
          ).
          <source>doi:1 0 . 1 1 0 9 / T I M . 2 0</source>
          <volume>2 3 . 3 2 4 1 0 7 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>A high-eficiency dirty-egg detection system based on yolov4 and tensorrt</article-title>
          , in: 2022
          <source>International Conference on Advanced Mechatronic Systems (ICAMechS)</source>
          , Toyama, Japan,
          <year>2022</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <article-title>Rail transit obstacle detection based on improved cnn</article-title>
          ,
          <source>IEEE Transactions on Instrumentation and Measurement</source>
          <volume>70</volume>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 1 1 0 9 / T I M . 2 0</source>
          <volume>2 1 . 3 1 1 6 3 1 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <article-title>Toward eficient safety helmet detection based on yolov5 with hierarchical positive sample selection and box density filtering</article-title>
          ,
          <source>IEEE Transactions on Instrumentation and Measurement</source>
          <volume>71</volume>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 1 1 0 9 / T I M . 2 0</source>
          <volume>2 2 . 3 1 6 9 5 6 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>T.</given-names>
            <surname>Morioka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aravinda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>An ai-based android application for ancient documents text recognition</article-title>
          ,
          <source>in: Proceedings of the 2021 International Symposium on Advanced Technologies and Applications in the Internet of Things</source>
          , Virtual, Kusatsu, Japan,
          <year>2021</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huynh</surname>
          </string-name>
          , G. Hassan,
          <article-title>Study of accurate and fast estimation method of vehicle length based on yolos</article-title>
          ,
          <source>in: 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS)</source>
          , Dalian, China,
          <year>2020</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>121</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I C A I I S 4 9</source>
          <volume>3 7 7 . 2 0 2 0 . 9 1 9 4 9 3 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Ai-based prevention embedded system against covid-19 in daily life</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>202</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N. J.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>Cardiac vfm visualization and analysis based on yolo deep learning model and modified 2d continuity equation</article-title>
          ,
          <source>Computerized Medical Imaging and Graphics</source>
          <volume>82</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>G.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Spatial pyramid block for oracle bone inscription detection</article-title>
          ,
          <source>in: Proceedings of the 2020 9th International Conference on Software and Computer Applications</source>
          , New York, NY, USA,
          <year>2020</year>
          „ p.
          <fpage>133</fpage>
          -
          <lpage>140</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 3 8 4 5 4 4 . 3 3 8 4 5 6 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fujikawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aravinda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Prabhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Recognition of oracle bone inscriptions by using two deep learning models</article-title>
          ,
          <source>International Journal of Digital Humanities</source>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 1 0 0 7 / s 4 2</source>
          <volume>8 0 3 - 0 2 2 - 0 0 0 4 4 - 9</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Model compression for deep neural networks: A survey</article-title>
          ,
          <source>Computers</source>
          <volume>12</volume>
          (
          <year>2023</year>
          ).
          <source>doi:1 0 . 3 3</source>
          <volume>9 0</volume>
          / c o m p u
          <source>t e r s 1 2</source>
          <volume>0 3 0 0 6 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <article-title>An attribution-based pruning method for real-time mango detection with yolo network, Computers and Electronics in Agriculture 169 (</article-title>
          <year>2020</year>
          ).
          <source>doi:1 0 . 1 0 0 7 / s 1 1</source>
          <volume>2 6 3 - 0 1 4 - 0 7 3 3 - 5</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Using channel pruning-based yolo v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments</article-title>
          ,
          <source>Computers and Electronics in Agriculture</source>
          <volume>178</volume>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . c o m p a
          <source>g . 2 0</source>
          <volume>2 0 . 1 0 5 7 4 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>A high-eficiency dirty-egg detection system based on yolov4 and tensorrt</article-title>
          , in: 2021
          <source>International Conference on Advanced Mechatronic Systems (ICAMechS)</source>
          , Tokyo, Japan,
          <year>2021</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>80</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ I C A M e c</surname>
          </string-name>
          h
          <source>S 5 4</source>
          <volume>0 1 9 . 2 0 2 1 . 9 6 6 1 5 0 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Design and acceleration of field programmable gate array-based deep learning for empty-dish recycling robots</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>12</volume>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 3 3</source>
          <volume>9 0</volume>
          / a p p
          <volume>1 2 1 4 7 3 3 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Learning slimming sar ship object detector through network pruning and knowledge distillation</article-title>
          ,
          <source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source>
          <volume>14</volume>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 1 1</source>
          <volume>0 9 /</volume>
          <string-name>
            <surname>J S T A R S</surname>
          </string-name>
          .
          <volume>2 0 2 0 . 3 0 4 1 7 8 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <article-title>Dd-yolo: An object detection method combining knowledge distillation and diferentiable architecture search</article-title>
          ,
          <source>IET Computer Vision</source>
          <volume>16</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>1</volume>
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <article-title>1 0 4 9 / c v i 2 . 1 2 0 9 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Vehicle detection and ranging using two diferent focal length cameras</article-title>
          ,
          <source>Journal of Sensors</source>
          <volume>14</volume>
          (
          <year>2020</year>
          ).
          <source>doi:1 0 . 1 1</source>
          <volume>5 5 / 2 0 2 0 / 4 3 7 2 8 4 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          , et al.,
          <source>Research on pedestrian detection algorithm based on mobilenet-yolo, Computational intelligence and neuroscience</source>
          <year>2022</year>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 1 1</source>
          <volume>5 5 / 2 0 2 2 / 8 9 2 4 0 2 7 .</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>