<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Optimal size reduction methodology for YOLO-based underwater object detectors based on knowledge distillation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Sineglazov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykhailo Savchenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Z. Zgurovsky</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1, Prospect Liubomyra Huzara, Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical Ukraine</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>The paper introduces a novel methodology for reducing the size of YOLO-based underwater object detectors to deploy it on edge hardware. Feature extraction layer light-weighting technique is used to compress the model with minimal impact on performance. Two new object detection network topologies are created, suitable to be used as a student network in knowledge distillation tasks. Knowledge distillation algorithm with temperature decay strategy is developed to mitigate the performance loss caused by model compression without inflating the parameter count. Object detection models, based on the proposed methodology, are tested on Underwater Target Detection Algorithm Competition 2020 dataset, providing higher accuracy and offering faster runtime than the existing solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;underwater object detection</kwd>
        <kwd>autonomous underwater vehicles</kwd>
        <kwd>neural networks</kwd>
        <kwd>deep learning</kwd>
        <kwd>knowledge distillation 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Due to rapid development of deep learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], optimization algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and complex neural
network topologies [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ], developing intellectual systems for autonomous underwater vehicles
(AUVs) is gaining more and more attention. These applications include biodiversity exploration,
pollution monitoring, demining and surveillance operations, rescue missions, and other critical tasks
in underwater environments.
      </p>
      <p>To operate in real-time mode, the artificial intelligence model is usually deployed on a separate
device, typically a single board computer, which is then installed on the AUV. This approach allows
for greater modularity in system design and easier upgrades. However, this method has its own set
of challenges and limitations. The edge devices chosen for this purpose are selected primarily for
their economic and power efficiency, which is crucial for underwater operations. In such cases,
performance is frequently traded off for extended battery runtime and lower overall system costs,
potentially limiting the capabilities of installed software.</p>
      <p>Another major problem is that detecting targets in underwater environments is significantly more
challenging than generic object detection on land, due to the overall lower image quality of
underwater datasets. The degradation in image quality is caused by presence of suspended particles
in the water, which introduces significant color distortion and cast. One more issue, which further
complicates the task, is motion blur, caused by continuous movement of underwater vehicle.
Additionally, there are substantial differences in light emission and propagation between underwater
and land scenes.</p>
      <p>To overcome these obstacles, researchers have developed a variety of advanced techniques.
Common approaches typically involve implementing the series of pre-processing steps to enhance
image quality before inputting it into intellectual system, designing and deploying deeper neural
networks capable of extracting meaningful features from degraded inputs, and integrating
specialized network blocks that enhance feature representation abilities. Often, combinations of
these methods are used, which results in complex, multi-stage processing pipelines. While these
solutions have demonstrated impressive results in improving underwater object detection, they come
with a significant drawback: increased overall network complexity, which translates directly into
higher hardware requirements, larger model sizes, and increased power consumption.</p>
      <p>The complexity of these advanced models makes it infeasible to deploy them on the edge devices
typically used in AUVs. Thus, there is a need for research focused on size reduction strategies
specifically tailored for underwater object detectors. The ultimate goal of such research should be to
develop a comprehensive methodology that enables significant reductions in the size of object
detection neural networks while simultaneously preserving their accuracy and operational speed.</p>
      <p>The successful development of such methodologies could potentially lead to the design of smaller,
more efficient AUVs capable of accessing environments that are currently out of reach. Considering
this, this research paper aims to explore novel approaches to network size reduction for underwater
object detection models, addressing this crucial challenge and contributing to the ongoing
advancement of AUV technologies and their applications in marine science, environmental
monitoring, and underwater operations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Object detection algorithms</title>
        <p>
          Generic object detection neural network topologies development started in 2014, when R-CNN model
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] was first introduced. In that timeframe, two-stage object detection methodologies were widely
used, with one network responsible for region proposals, and another network handling object
localization and classification. This approach is known to provide high accuracy, but the detection
speed is slow due to a large number of computations caused by redundant bounding boxes.
Twostage object detectors have undergone series of improvements, with Fast R-CNN [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and Faster
RCNN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] being significantly more effective than the original model, but the performance was still
unsuitable for real-time applications.
        </p>
        <p>
          Later in 2015, one-stage object detection algorithms such as Single-Shot Detector (SSD) and You
Only Look Once (YOLO) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] were introduced. In these methods, a single convolutional neural
network is responsible for predicting bounding boxes across all classes simultaneously by splitting
the image into S x S grid, determining the tile containing the center of an object and handling
confidence score calculations within it. This approach streamlines a detection process and leads to
significant performance improvements by the cost of accuracy.
        </p>
        <p>
          To improve both speed and detection accuracy, YOLO series of object detectors have undergone
a lot of updates. YOLOv2 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] introduced batch normalization and removed dropout layers. YOLOv3
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] used residual connections in feature extraction layers and feature pyramid network (FPN) for
multi-scale feature aggregation. YOLOv4 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] introduced cross-partial connections. YOLOv5 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
offered automated hyperparameter search. YOLOX [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] introduced anchor-less design, decoupled
classification and regression head, advanced augmentations and label assignment strategy. YOLOv7
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] offered new layer aggregation and model scaling strategies. YOLOv8 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] implemented new
augmentation strategies and offered improvements for model light-weighting. YOLOv9 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] uses
programmable gradient information to deal with information loss when data is transmitted through
network layers. YOLOv10 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] offered dual label assignment strategy to omit non-maximum
suppression strategy and introduced tweaks for lower latency.
        </p>
        <p>Generally, modern versions of YOLO network share the same topology, where convolutional
neural network (backbone) is responsible for feature extraction at multiple scales, feature pyramid
network (neck) is used to aggregate multi-scale features, mixing contextual and detailed information,
and detection head is responsible for regression and classification tasks.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Underwater object detection frameworks</title>
        <p>To overcome constraints presented by lower quality of underwater images, unique features of
underwater targets such as small size and dense location, and computational constraints, specific
underwater detection frameworks are developed. Typically, these frameworks can be divided into
three categories by the way they achieve efficiency improvements for detecting targets in
underwater environments.</p>
        <p>
          The first category are object detectors with higher feature representation abilities, reached by
bigger network capacity or introduction of specific blocks for better feature extraction. Attention
mechanisms are often used to enhance feature extraction capabilities of the model by ensuring that
backbone layers of the network focus on more relevant features [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ]. Liu et al. have suggested to
introduce transformer blocks into object detector backbone, based on the assumption that using
heterogenous architectures enhances the variability of extracted features [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Other array of works
focuses on data augmentation and series of pre-processing steps to reach higher object detection
accuracy by raising the quality and quantity levels of input data [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. However, these methodologies
share a common problem, such as overall network complexity. Although using a larger network,
extra blocks and preprocessing is useful for accuracy, the number of parameters and high latency
makes it infeasible to run on edge hardware, so real-world usage of this type of frameworks is
restricted with using it on pre-collected data.
        </p>
        <p>
          The second category of underwater object detection frameworks includes models, which achieve
accuracy gains by focusing on mitigating a specific issue with underwater images, such as small
target size, target overlap and motion blur [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ]. Common approaches include enhancing feature
map upsampling process, using extra classification heads for smaller objects or adding extra blocks
such as attention and visual transformers [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Main problem of the frameworks of this type are its
generalization capabilities. While the accuracy is enhanced on datasets which have the problems
targeted by a specific model, the same enhancements may not be applicable for other dataset, limiting
its real-world usage.
        </p>
        <p>
          The third category includes underwater object detection methodologies, which focus on
decreasing the parameter count, size and latency of the model [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ]. Light-weighting is typically
done by replacing the feature extraction portion of the model, which in case with YOLO-based
detectors, is responsible for over 50% of overall computational complexity. Backbone part of object
detector are typically fully or partially replaced with a mobile or light-weight convolutional neural
networks, such as FasterNet [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] or GhostNet [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. These changes drastically reduce model size and
improve processing time, making the model more feasible to be used on edge devices, such as AUV
integrated hardware. However, in this case, the speed and size improvements are reached by
decreasing the network capacity, which leads to worse accuracy than by using generic solutions. To
mitigate this effect, researchers typically introduce extra layers to enhance the representation ability
of a network, which introduces extra parameters and lessens efficiency improvements, achieved by
model light-weighting.
        </p>
        <p>This fact has sparked our interest in comparing the performance of light-weight underwater
object detectors and finding optimal way to mitigate the performance loss, which is inevitably a
sideeffect of model light-weighting.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Knowledge distillation</title>
        <p>
          Knowledge distillation (KD) is a highly efficient technique of boosting efficiency of a light-weight
student model by forcing it to mimic the outputs of a larger pre-trained teacher model. Earliest
overviews on these methodologies were formulated by Bucilua [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] in 2006, and the term "knowledge
distillation" and modern concept of this process was introduced in 2015 by Hinton [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Later,
Romero et al. improved training process and student model performance by using intermediate
representations as hints [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. Zagoruyko and Komodakis suggested to use attention transfer to boost
student performance [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], while Zhang et al. offered to use multiple co-learning student models [34].
        </p>
        <p>While being an efficient technique, KD process ensures best results when teacher and student
models share the similar architecture (e.g. CNN-to-CNN, ViT-to-ViT). As feature heterogeneity
increases in later layers of neural network, it becomes harder for a student model to improve
performance. Researches by Touvron et al. [35], Hao et al. [36] are offering more efficient KD
algorithms for heterogenous architectures, however, the problem is still not fully resolved and KD
performance is better when teacher and student models are similar.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Problem statement</title>
      <p>The problem of creating an optimal size reduction methodology for YOLO-based underwater object
detectors involves building two neural networks: 
samples ( 1,  1), ( 2,  2), … , (  ,   ), where   denotes the  -th RGB image matrix of dimensions
 ×  × 3, and   represents the vector of ground truth bounding box coordinates, which represent
the position of the object within the image, and class labels for each object within the image. The
teacher network 
parametrized by 
, parametrized by the weights 
, and the student network 
, are trained to handle the following transformation:
  :   ↦ {( ̂
where  ̂ is the predicted bounding box for object  in image   , and  ̂ is the predicted class
probability for the object. The goal is to minimize the discrepancy between the predictions  ̂
and the ground truth   ,   for bounding boxes and class labels, respectively.</p>
      <p>The loss function of</p>
      <p>, denoted as ℒYOLO, is composed of regression and
classification losses. CIoU [37] is used as a main bounding box regression loss, defined as:
ℒCIoU( ,  ̂)= 1 − IoU( ,  ̂)+
 2(  ,  ̂ )
 2
+  ⋅  ,
where  ,  ̂ are the ground truth and predicted bounding boxes, respectively, IoU( ,  ̂) is the
Intersection over Union, which measures the overlap between the predicted and ground truth
bounding boxes:</p>
      <p>where | ∩  ̂| is the area of overlap, | ∪  ̂| is the total area covered by both boxes,  (  ,  ̂ )is
the Euclidean distance between the centroids   ,  ̂ ,  is the weight factor, balancing the importance
of aspect ratio consistency:</p>
      <p>IoU( ,  ̂)=
| ∩  ̂|
| ∪  ̂|</p>
      <p>,
 =</p>
      <p>(1 − IoU( ,  ̂))+ 
 measures the consistency of the aspect ratio between the predicted and ground truth boxes:
4
 2

ℎ
 =
(arctan⁡ ( )− arctan⁡ ( )) ,
 ̂
̂
ℎ
2
where  and ℎ are the width and height of the ground truth bounding box, and  ̂ , ℎ̂ are the width
and height of the predicted bounding box.</p>
      <p>Classification is governed by varifocal loss function (VFL), defined as:
ℒVFL(p, q)= {
− (
−  
and  is the penalty coefficient [38].
loss:
where,  is the predicted classification score,  is the target score,  is the balancing coefficient,
Total loss function can be defined as the sum of these two loss functions and distribution focal
ℒYOLO =  CIoUℒCIoU +  DFLℒDFL +  VFLℒVFL,
with  CIoU,  DFL and  VFL being hyperparameters, balancing the importance of each component.
To reduce the size of</p>
      <p>while maintaining performance, knowledge distillation is used for
transferring knowledge from the larger teacher model 
. The objective of knowledge
distillation is to minimize a combination of the standard YOLO loss ℒYOLO and distillation loss ℒKD,
for the student model is defined as follows:
where  controls the trade-off between the YOLO loss and the distillation loss in the total loss
function.</p>
      <p>Both 
with the following weight update rule:
respect to the model parameters.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed methodology</title>
      <p>ℒstudent =  ℒYOLO + (1 −  )ℒKD,</p>
      <p>+1 =   −  ∇ ℒ,
where  is the learning rate controlling step size, and ∇ ℒ is the gradient of the loss function with</p>
      <sec id="sec-4-1">
        <title>4.1. Developing knowledge distillation algorithm for YOLO</title>
        <p>To enable knowledge transfer from larger teacher model into light-weight student model, regression
and classification components of total loss function have been modified to include distillation loss
with additional weighting coefficient added to avoid learning collapse due to student model fully
mimicking teacher model outputs. Additionally, temperature coefficient  with decay strategy has
been used to control the softening of logits for the classification component, allowing to regulate the
amount of knowledge being distilled from a larger model by a student.</p>
        <p>Bounding box regression loss is handled by CIoU function with additional L2 loss component,
improving the consistency between teacher and student model bounding box predictions:
ℒCIoU = (1 −  distill)⋅ (1 − CIoU(  ,  ̂ ))+  distill ⋅ ∥∥ ̂teacher −  ̂student∥∥2,

where  distill is the distillation coefficient balancing the standard CIoU loss and the distillation
term,  ̂teacher and  ̂student are the bounding boxes predicted by the teacher and student networks

in that order.</p>
        <p>Classification loss is handled with varifocal loss function (VFL), with Kullback-Leibler divergence
loss (KL-loss) added as a distillation component, scaled by temperature. To encourage exploration in
earlier stages of student model training, we propose temperature decay strategy, starting from higher
values and linearly shifting the temperature coefficient toward 1 to focus on more confident
predictions in later stages of training:
ℒVFL = (1 −  distill)⋅ VFL( 
,</p>
        <p>̂student)+  distill ⋅  ( )2
⋅ KL( (</p>
        <p>) ∥  (
 ̂teacher
 ( )
 ̂student
 ( )
)),
(11)
with  ( )representing the decaying temperature,  
̂teacher and  
̂student are the class probability
logits for teacher and student models, respectively.</p>
        <p>Algorithm 1
Training YOLO student and teacher models</p>
        <p>, learning rate  , batch size  .
initial temperature  ⁡ = ⁡  0.</p>
        <p>For each epoch  = 1, … ,</p>
        <p>:
shuffle training set  ;
for each mini-batch ℬ ⊂  :
Input: Dataset</p>
        <p>= {(  ,   )} =1, teacher model 
, student model 
, initial
temperature  0, decay strategy  ( ), hyperparameters  CIoU,  DFL,  VFL,  distill, maximum epochs
Initialize student model parameters 
, load teacher model with parameters 
, set
do forward path, computing  ̂
compute loss ℒtotal;
do backward pass, compute gradients ∇
update  student ←  student −  ⋅ ∇</p>
        <p>ℒ
update temperature  ( ).</p>
        <sec id="sec-4-1-1">
          <title>End of training: Return the trained student model</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Building light-weight student model topology</title>
        <p>To ensure optimal performance of a resulting distilled model, the student model should meet the size
and computational efficiency requirement. An approach used in this paper involved light-weighting
the feature extraction (backbone) layers of YOLO object detector, to reduce the number of expensive
convolutional operations, which contribute a lot to a total parameter count. Feature aggregation
(neck) and final output layers (head) from original YOLOv8 architecture were reused, as adding
additional blocks to these parts of the network would increase the parameter count, and extra
lightweighting would introduce more differences between student and teacher model, which could harm
the distillation performance. In this case, backbone network consists of convolutional blocks, which
are composed of 2D convolutional layer and batch normalization, followed by SiLU activation
function. Each convolutional block is followed by bottleneck blocks C2f or C3, which perform
convolutional operation on the input, then splits the channels, processes the resulting feature map
through multiple bottlenecks (number in the name of the block represents the bottleneck layer
count), ending with concatenation.</p>
        <p>Overall architecture of object detection network is shown in Figure 1.</p>
        <p>To find the model, which provides the best balance between efficiency and performance, we have
built two variants of backbone network, based on GhostConvolution layers derived from GhostNet
and FasterNet blocks. The core idea behind this design is to reduce the number of computationally
expensive 3 x 3 convolutions in favor of their light-weight counterparts.</p>
        <p>Default convolution operation outputs feature map  by processing input feature map  using a
convolution kernel  as  ⁡ = ⁡ ⁡ ∗ ⁡ , where ∗ denotes the convolution operation.
GhostConvolution is aimed at generating the similar number of feature maps by using less
computations by executing convolution with fewer filters, obtaining intrinsic feature maps  int as
 int =  ∗  int where  int is a smaller convolutional filter. Then, the series of computationally
inexpensive operations is used to generate the ghost feature maps  ghost as in  ghost =  ( int)where
 denotes cheap operation.</p>
        <p>FasterNet applies different approach to reduce the computational complexity and decrease
latency of convolutional operations, based on PConv procedure, which applies convolutional
operation only on a part of input channels for spatial feature extraction and leaves remaining
channels as is. Then, PConv is followed by series of pointwise convolution to reuse the information
from all channels in an efficient way.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental evaluation</title>
      <p>To evaluate the performance of the proposed algorithm, we used the UTDAC2020 (Underwater
Target Detection Algorithm Competition 2020) dataset, a challenging underwater detection
benchmark, consisting of 5168 training and 1293 validation images in various resolutions (3840 x
2160, 1920 x 1080, 720 x 405, and 586 x 480). The chosen dataset contains four classes: echinus,
holothurian, scallop, and starfish. UTDAC2020 presents several challenges, including significant
class imbalance, with the echinus class appearing four times more frequently than the other classes.
The dataset also features targets at different scales, often densely packed, and the cases of low
contrast, challenging lighting conditions and motion blur caused by camera movement.</p>
      <p>The machine used for experiment is equipped with Intel Core i5-13600K processor, NVIDIA A4000
GPU with 16GB VRAM. Software-wise, the test setup is running Ubuntu 20.04.6 LTS with Python
3.10.13, CUDA 12.1, and PyTorch 2.2.1.</p>
      <p>Each model has been trained during 250 epochs with a batch size of 32 and an image size of 640
x 640. Stochastic Gradient Descent (SGD) served as the optimization algorithm, with a momentum
of 0.937, an initial learning rate of 0.01, and a weight decay coefficient of 0.005. For distilled models,
initial temperature coefficient is set to 5, linearly decaying toward 1 until the model converges. The
hyperparameters were found empirically using the Ray Tune library. Albumentations package was
employed for augmenting the dataset, which includes combinations of random crop, random rotate
and mosaic augmentations.</p>
      <p>Total of five metrics have been used to test the model, with mAp and mAp50 representing the
object detection accuracy of neural network. Size, parameter count and FLOPs are also measured as
performance metrics to evaluate the computational efficiency of proposed approach. Detailed
explanation of metrics is provided in Table 1.</p>
      <sec id="sec-5-1">
        <title>Description</title>
        <p>Mean average precision (mAp) at intersection over union (IoU) of 0.50,
defined as average precision for each class over number of classes
mAp at IoU of 0.50:0.05:0.95
Total number of model parameters</p>
        <p>Performance metrics denoting number of floating-point operations per
second</p>
        <p>Model size in megabytes</p>
        <p>A comparison of existing frameworks and models, built using the proposed methodology, is
provided in Table 2. For distilled models, YOLOv8l with DarkNet-53 backbone is used as a teacher
model. Student models use YOLOv8s architecture with custom backbones, based on GhostNet and
FasterNet, with both convolutional blocks and bottlenecks modified. Models using knowledge</p>
        <p>Experiment results prove superior performance of models using knowledge distillation algorithm.
In comparison to YOLOv8s, the computational complexity in FLOPs is reduced by 45%, while
63
maintaining object detection performance. The choice of light-weight
provide any significant difference, with FasterNet and GhostNet backbones providing similar results
in terms of both size and accuracy.</p>
        <p>Visualization of ground truth bounding boxes and class labels, compared with detections,
performed with the proposed YOLOv8-dist model, is shown in Figure 2. The samples with complex
background information and targets of various sizes were selected to demonstrate the performance
of our model.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The paper proposed a novel methodology to reduce the size of YOLO-based underwater object
detectors. Knowledge distillation algorithm with temperature decay strategy has been developed for
object detection neural network, allowing to effectively train light-weight student model by
transferring knowledge from teacher model of larger capacity. Additionally, we developed two
lightweight YOLO architectures, derived from GhostNet and FasterNet approaches to convolution
operation, which are suitable to be used as a student model in knowledge distillation tasks.</p>
      <p>The proposed light-weight models are 45% more efficient in terms of computational complexity,
compared to existing YOLOv8s model. After using our knowledge distillation algorithm, the
performance of the student model is superior to original YOLOv8s in terms of accuracy (84.72% and
84.58%, respectively), while the model size is comparable to YOLOv8n, the smallest model among
YOLO-based detectors.</p>
      <p>The applications of our methodology include training efficient object detection neural network
for integrated autonomous underwater vehicle hardware. To achieve even better performance of
knowledge distillation with YOLO-based detectors, we suggest that further research could be
conducted on more sophisticated knowledge distillation techniques during training, and using
knowledge distillation algorithms with heterogenous backbones, such as visual transformers, which
could potentially enrich intermediate feature maps with more context and semantic information,
leading to a higher accuracy in applied tasks.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[34] Y. Zhang, T. Xiang, T. M. Hospedales, H. Lu, Deep Mutual Learning, in: 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2018.
doi:10.1109/cvpr.2018.00454.
[35] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jegou, Training data-efficient
image transformers &amp; distillation through attention, in: Proceedings of the 38th International
Conference on Machine Learning, Journal of Machine Learning Research, 2021, pp. 10347
10357. URL: https://proceedings.mlr.press/v139/touvron21a.html.
[36] Z. Hao, J. Guo, K. Han, Y. Tang, H. Hu, Y. Wang, C. Xu, One-for-All: Bridge the Gap Between
Heterogeneous Architectures in Knowledge Distillation, 2023. URL:
https://arxiv.org/abs/2310.19444.
[37] Z. Zheng, P. Wang, D. Ren, W. Liu, R. Ye, Q. Hu, W. Zuo, Enhancing Geometric Factors in Model
Learning and Inference for Object Detection and Instance Segmentation, 2020. URL:
https://arxiv.org/abs/2005.03572.
[38] H. Zhang, Y. Wang, F. Dayoub, N. Sunderhauf, VarifocalNet: An IoU-aware Dense Object
Detector, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
IEEE, 2021. doi:10.1109/cvpr46437.2021.00841.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zgurovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          , E. Chumachenko,
          <source>Classification and Analysis Topologies Known Artificial Neurons and Neural Networks, in: Studies in Computational Intelligence</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          <lpage>58</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -48453-
          <issue>8</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Riazanovskiy</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. I. Chumachenko</surname>
          </string-name>
          ,
          <article-title>Multicriteria conditional optimization based on genetic algorithms</article-title>
          ,
          <source>Syst. Res. Inf. Technol. No. 3</source>
          (
          <year>2020</year>
          )
          <fpage>89</fpage>
          104. doi:
          <volume>10</volume>
          .20535/srit.2308-
          <fpage>8893</fpage>
          .
          <year>2020</year>
          .
          <volume>3</volume>
          .07.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kot</surname>
          </string-name>
          ,
          <article-title>Design of Hybrid Neural Networks of the Ensemble Structure, SSRN Electron</article-title>
          . J. (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .2139/ssrn.3807474.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zgurovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          , E. Chumachenko,
          <source>Formation of Hybrid Artificial Neural Networks Topologies, in: Studies in Computational Intelligence</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>175</fpage>
          <lpage>232</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -48453-
          <issue>8</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zgurovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          , E. Chumachenko,
          <article-title>Development of Hybrid Neural Networks</article-title>
          ,
          <source>in: Studies in Computational Intelligence</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>233</fpage>
          <lpage>312</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -48453-
          <issue>8</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          , J. Malik,
          <article-title>Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation</article-title>
          , in: 2014 IEEE Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr.
          <year>2014</year>
          .
          <volume>81</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshik</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fast</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          ,
          <year>2015</year>
          . URL: https://arxiv.org/abs/1504.08083.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards Real-Time Object Detection with Region Proposal Networks</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell. 39.6</source>
          (
          <year>2017</year>
          )
          <fpage>1137</fpage>
          1149. doi:
          <volume>10</volume>
          .1109/tpami.
          <year>2016</year>
          .
          <volume>2577031</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , You Only Look Once: Unified,
          <string-name>
            <surname>Real-Time Object</surname>
          </string-name>
          Detection, in: 2016 IEEE Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr.
          <year>2016</year>
          .
          <volume>91</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , YOLO9000: Better, Faster, Stronger, in: 2017 IEEE Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr.
          <year>2017</year>
          .
          <volume>690</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <source>YOLOv3: An Incremental Improvement</source>
          ,
          <year>2018</year>
          . URL: http://arxiv.org/abs/
          <year>1804</year>
          .02767v1.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochkovskiy</surname>
          </string-name>
          , C.-Y. Wang, H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <source>YOLOv4: Optimal Speed and Accuracy of Object Detection</source>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .10934.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ultralytics</surname>
          </string-name>
          , GitHub - ultralytics/yolov5: YOLOv5 in PyTorch &gt; ONNX &gt; CoreML &gt; TFLite,
          <year>2020</year>
          . URL: https://github.com/ultralytics/yolov5.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          , M. Cheng, W. Nie, et al.,
          <article-title>YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2209.02976.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>C.-Y. Wang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bochkovskiy</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>YOLOv7: Trainable Bag-of-Freebies Sets New Stateof-the-Art for Real-Time Object Detectors</article-title>
          , in: 2023 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr52729.
          <year>2023</year>
          .
          <volume>00721</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ultralytics</surname>
          </string-name>
          ,
          <source>Ultralytics YOLO Docs</source>
          ,
          <year>2023</year>
          . URL: https://docs.ultralytics.com.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>C.-Y. Wang</surname>
            ,
            <given-names>I.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Yeh</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2402.13616.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>G</given-names>
            . Ding, YOLOv10:
            <surname>Real-Time</surname>
          </string-name>
          End-to-
          <source>End Object Detection</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2405.14458.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , G. Yin,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Yi,</surname>
          </string-name>
          <article-title>YOLOTrashCan: A Deep Learning Marine Debris Detection Network</article-title>
          ,
          <source>IEEE Trans. Instrum</source>
          . Meas. (
          <year>2022</year>
          )
          <article-title>1</article-title>
          . doi:
          <volume>10</volume>
          .1109/tim.
          <year>2022</year>
          .
          <volume>3225044</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>An Underwater Target Detection Algorithm Based on Attention Mechanism and Improved YOLOv7, Comput</article-title>
          .,
          <source>Mater. &amp; Contin. 78.2</source>
          (
          <year>2024</year>
          )
          <fpage>2829</fpage>
          2845. doi:
          <volume>10</volume>
          .32604/cmc.
          <year>2024</year>
          .
          <volume>047028</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Underwater Object Detection Using TC-YOLO with Attention Mechanisms</article-title>
          ,
          <source>Sensors 23.5</source>
          (
          <year>2023</year>
          )
          <article-title>2567</article-title>
          . doi:
          <volume>10</volume>
          .3390/s23052567.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>J.-M. Noh</surname>
            ,
            <given-names>G.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Jang</surname>
            ,
            <given-names>K.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>J.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Data Augmentation Method for Object Detection in Underwater Environments</article-title>
          , in: 2019 19th International Conference on Control,
          <source>Automation and Systems (ICCAS)</source>
          , IEEE,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .23919/iccas47443.
          <year>2019</year>
          .
          <volume>8971728</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <article-title>Underwater small target detection under YOLOv8-LA model</article-title>
          ,
          <source>Sci. Rep</source>
          .
          <volume>14</volume>
          .1 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1038/s41598-024-66950-w.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Underwater Small Target Detection Based on YOLOX Combined with MobileViT and Double Coordinate Attention</article-title>
          ,
          <source>J. Mar. Sci. Eng</source>
          .
          <volume>11</volume>
          .6 (
          <year>2023</year>
          )
          <article-title>1178</article-title>
          . doi:
          <volume>10</volume>
          .3390/jmse11061178.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network, Appl</article-title>
          . Sci.
          <volume>14</volume>
          .3 (
          <year>2024</year>
          )
          <article-title>1095</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14031095.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Ayob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khairuddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. M.</given-names>
            <surname>Mustafah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Salisa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kadir</surname>
          </string-name>
          ,
          <article-title>Analysis of Pruned Neural Networks (MobileNetV2-YOLO v2) for Underwater Object Detection</article-title>
          ,
          <source>in: Lecture Notes in Electrical Engineering</source>
          , Springer Singapore, Singapore,
          <year>2020</year>
          , pp.
          <fpage>87</fpage>
          <lpage>98</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-15- 5281-
          <issue>6</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Xu,
          <string-name>
            <given-names>W.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion</article-title>
          , Remote Sens.
          <volume>13</volume>
          .22 (
          <year>2021</year>
          )
          <article-title>4706</article-title>
          . doi:
          <volume>10</volume>
          .3390/rs13224706.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , S.-h. Kao,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H. G.</given-names>
            <surname>Chan</surname>
          </string-name>
          , Run, Don't Walk:
          <article-title>Chasing Higher FLOPS for Faster Neural Networks</article-title>
          , in: 2023 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr52729.
          <year>2023</year>
          .
          <volume>01157</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Xu, GhostNet: More Features From Cheap Operations</article-title>
          , in: 2020 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/cvpr42600.
          <year>2020</year>
          .
          <volume>00165</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30] international conference,
          <source>ACM doi:10.1145/1150402</source>
          .1150464.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <source>Distilling the Knowledge in a Neural Network</source>
          ,
          <year>2015</year>
          . URL: https://arxiv.org/abs/1503.02531.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>A.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ballas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Kahou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chassang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gatta</surname>
          </string-name>
          , Y. Bengio, FitNets: Hints for Thin Deep Nets,
          <year>2014</year>
          . URL: https://arxiv.org/abs/1412.6550.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zagoruyko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Komodakis</surname>
          </string-name>
          , Paying More Attention to Attention
          <source>: Improving the Performance of Convolutional Neural Networks via Attention Transfer</source>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/abs/1612.03928.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>-Mizil</surname>
          </string-name>
          , Model compression,
          <source>in: the 12th ACM</source>
          SIGKDD Press, New York, New York, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>