<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PG-Prnet: A Lightweight Parallel Gated Feature Extractor Based on An Adaptive Progressive Regularization Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhe Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ming Ye</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yongsheng Xie</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yan Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chongqing Market Supervision Administration Archives Information Center</institution>
          ,
          <addr-line>Chongqing, 400700</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>College of Artificial Intelligence, Southwest University</institution>
          ,
          <addr-line>Chongqing, 400700</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>263</fpage>
      <lpage>271</lpage>
      <abstract>
        <p>The residual block in deeper DNNs has a positive effect on feature extraction, but it is limited by practical computational resources. Deeper structures have limited performance gains in later stages, while residuals in lightweight DNNs reduce the abstract feature representation capability. We propose a lightweight parallel gating framework (PG-PRNet) based on the adaptive progressive regularization algorithm (APR), which changes the constant mapping of residual, increases the representation of structural information, and compresses the structure by Hard-Sigmoid, layer pruning, etc. The APR algorithm avoids the irrationality of using the same regularization rules in different cases. This better preserves the shallow spatial location information and deep abstract semantic information, improving the performance of the lightweight model for different specification. PG-PRNet is embedded in two vision tasks. It outperforms the listed models on the GTSRB and BDD100K datasets while maintaining low storage and computational overhead.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;parallel gating</kwd>
        <kwd>progressive regularization</kwd>
        <kwd>feature extraction</kwd>
        <kwd>residual block</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction 1</title>
      <p>specifications. The shallow spatial location information and deep abstract semantic
information are better preserved.</p>
      <sec id="sec-1-1">
        <title>We embed the proposed feature extraction framework into image recognition and object detection, and validate the performance of PG-PRNet on GTSRB and BDD100K datasets.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>In its early years focused on improving accuracy by building more complex neural networks.</title>
        <p>
          AlexNet designed a DNNs with 60 million parameters and 60,000 neurons, which earned first place in
the ImageNetLSVRC-2012 competition [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Parallel computing on dozens of devices by Google
confirms that distributing the model across multiple devices is another solution [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, in
recent years, scholars have found that simply increasing the depth of the model can lead to
performance degradation. ResNet shows that as the depth of the network increases, the accuracy gain
obtained later decreases due to overfitting, gradient disappearance, etc [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The residual structure
adopted by ResNet preserves the shallow spatial location information as much as possible. This
avoids the above problems to a large extent.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>SOTA models usually use neural network structure search (NAS) to find the best structural</title>
        <p>
          parameters for building the network [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This places higher demands on the hardware. Some
researchers are working on network compression. DepthwiseConv most assigning only one set of
convolutional kernels to each channel can achieve great speedups with little loss of accuracy [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>GhostModule presents a plug-and-play module that reduces intermediate feature maps and allows models to be easily deployed on mobile devices. In this paper, GhostModule and DepthwiseConv are used to build PG-PRNet lightweight networks, which combines layer pruning and Hard Sigmoid.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>The overall network structure is shown in figure 1. In the feature extraction part, in order to avoid</title>
        <p>the computational overload caused by the large feature embedding in the later stage, the input image
is first passed through a CBR block, which increases the channel dimension and reduces the width and
height scales. Then there are multiple Fused-PG and PG units proposed in this paper. The detailed
description of the improvement points is as follows:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.1. Model compression.</title>
      <p>In this paper, layer pruning is used to reduce the overall scale. In order to minimize the sacrifice of
accuracy, parallel gating is used, and the representation of residual branches is added. Multiple
parallel gating units form a cascade feature representation. According to GhostNet, each trained DNN
contains many similar intermediate feature maps. We start by generating only half of the intermediate
feature maps, generating the same number of features by linear mapping, called Ghost. Finally,
connect the</p>
      <p>two parts in series. Extensive use of GhostModule and DepthwiseConv in the PG unit reduces the
amount of computation.</p>
      <sec id="sec-4-1">
        <title>The squeeze and excitation module (SE) is inserted into the PG unit to impose an attention</title>
        <p>
          mechanism with low computational cost [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The Squeeze part of the main branch compresses the
features in the channel dimension, and the Excitation part learns the feature weights of the channel.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>The core idea is that the model learns the attention weight of the channel by loss, so that the weight of the effective feature map is relatively large, and the weight of the invalid feature map is relatively small. We use two 1x1 convolutions instead of fully connected layers, and use Hard Sigmoid activation instead of ReLU, which reduces the amount of computation.</title>
        <p>
          activates and weights the output of the last convolutional layer and visualizes the result in different
colors, which shows which parts of the image the model focuses more on [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Due to the
effectiveness of the proposed method, most of the categories can be highly focused.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2. PG and Fused-PG Units</title>
      <p>
        MBConv differs from the traditional process of dimensionality reduction of residuals [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
features input to the inverse residual block are first expanded to higher dimensions and then deeply
mapped to the lower dimensional space. The PG unit inherits this process. As shown in figure 1, the
parameters of the network structure are reduced in the PG unit by using a modified GhostModule
instead of CNN. Constant mapping is redundant in the lightweight model (which means that the input
features go to the next layer without modification) because it deprives the branching part of the ability
to obtain abstract features. We add Pooling and GhostModule to the branch part to selectively output
the generated branch-specific folded embeddings by thresholding, which can freely choose the
relationship to the backbone part, which is called parallel gating.
3.2.1. PG unit.
      </p>
      <p>B0(%)
97.3
96.8
98.3</p>
      <p>B1(%)
98.0
96.9
98.4</p>
      <p>B2(%)
97.9
95.4
99.0</p>
      <p>
        In the backbone part, a DepthwiseConv of size 3x3 extends the previous feature. The attention
score is computed in the SE module, which makes the model focus on features that are more
important to the channel. Then reduce the dimension with 1x1GhostModule. The average pooling
layer in the branch section selectively compresses features, acting as gating and local area feature
aggregation. Downsampling and fusion are performed using the Ghost module. Finally connect the
trunk and branch parts. Stochastic depth is used to prevent network model degradation. The
simplified mathematical expression of the whole process is equation (1) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>=</p>
      <p>expressed as.</p>
      <p>Where  ,</p>
      <p>
        represent the generated feature by the  -th module backbone and branc. P
represents the survival probability of  , which fits the Bernoulli distribution, 
∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ].  , 
3.2.2. Fused-PG.
      </p>
      <p />
      <p>= {[ℎ(
=</p>
      <p>)]}
,  = 2
,</p>
      <p>PG uses DepthwiseConv to reduce computation, but it is limited in the early stages. As can be
seen from table 1, if all modules use Depthwise, the performance will drop. Therefore, we only use
it in the first few stages of the model. In the Fused-PG module, the 1x1 CNN and Depthwise are
replaced by 3x3 for convolution to reduce computation, and DepthwiseConv is removed. The
simplified mathematical expression is equation (4).</p>
      <p>= {[
]}
Algorithm 1. Adaptive progressive regularization (APR)</p>
      <sec id="sec-5-1">
        <title>Output: Trained model.</title>
        <p>regularization dropout rate  , adjustment factor , , , 
Input: Network blocks length  , initial image size  , final image size  , initial
are
(2)
(3)
(4)
then
if  ≥ 
else
end if
for  = 1 to  do</p>
        <sec id="sec-5-1-1">
          <title>Last blockc survival probability:</title>
          <p>←</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Last blockc survival probability:</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Image size or feature map size:  ←</title>
          <p>− (
−  )
10:
11: end for</p>
        </sec>
        <sec id="sec-5-1-4">
          <title>Dropout rate:</title>
        </sec>
        <sec id="sec-5-1-5">
          <title>Survival probability:  Train model with  and</title>
          <p>←
(</p>
          <p>)
← 1 − (1 −  )


1:
2:
3:
4:
5:
6:
7:
8:
9:</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.3. Adaptive Progressive Regularization</title>
      <p>Similar to EfficientNetv2, we consider the regularization problem for the training of a
multigranularity variable model. First, we add the regularization to the network depth. Second, the survival
probability problem at stochastic depth is considered. Third, the adaptive probability calculation
expression of dropout is improved. In PG-PRNet, the head has more redundant information, and a
larger regularization factor is required to improve the generalization ability. In the tail, the features are
mapped to a high-dimensional abstract space with smaller features, so a smaller regularization factor
is used. For lightweight models, residuals are very important. When the model is very shallow, try to
keep the residuals. When the model is complex, the residuals are discarded appropriately. Therefore it
is not reasonable to use the same regularization rules all the time. Therefore, the survival probability
and dropout rate need to be flexibly adjusted to fit the feature size and network depth. There are
identifiers defined as.</p>
      <sec id="sec-6-1">
        <title>The length of the network module is  , and if  is larger, a higher regularization rate is required,</title>
      </sec>
      <sec id="sec-6-2">
        <title>The whole model has</title>
        <p>and the ratio of the two is controlled by λ.
map size.</p>
        <p>stages. And the features of the middle hidden layer gradually decrease
from the first stage to the last stage, and the dropout rate is positively related to the feature</p>
      </sec>
      <sec id="sec-6-3">
        <title>The scale coefficients of feature map size and survival probability are β, μ, respectively, and the</title>
        <p>overall steps can be described as algorithm 1. The ablation experiments in Section 4.3 further
elaborate and demonstrate the effectiveness of APR.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. Experiments</title>
      <sec id="sec-7-1">
        <title>All our experiments were done on a Nvidia RTX 2080Ti server using Pytorch. In the parameters of</title>
        <p>adaptive regularization, we set the threshold ϱ = 11,  = 0.25, β = 1, λ = 7 . We validate the
PG</p>
      </sec>
      <sec id="sec-7-2">
        <title>PRNet feature extraction performance on two tasks on two datasets.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4.1. PG-PRNet for Image Recongnition</title>
      <p>The recognition of traffic signs is a challenging real-world problem related to intelligent
transportation</p>
      <p>
        systems. The German Traffic Sign Recognition Benchmark (GTSRB) contains more than 50,000
images of daytime and nighttime scenes from 43 categories [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Images that are too similar are
removed using the Structural Similarity Index (SSMI) algorithm. The mean and variance of the local
and global luminance of each image were calculated for adaptive luminance and contrast
enhancement, and the distribution of each category was approximated after processing. Using the
cross-entropy loss function, Adam optimizer and Cosine Annealing scheduler, we set the weight
decay factor = 0.0005, initial learning rate = 0.001, batch size = 64, epoch = 100. the resolution of the
design varies from 48 to 224. The training set was preprocessed using random cropping, Gaussian
noise. To validate the performance of the feature extractor, a PG-PRNet feature extractor with
classification head was added to evaluate its image recognition performance. It mainly includes a
global average pooling layer, aggregated features and features compressed by a fully connected layer,
and softmax output of category probabilities.
      </p>
    </sec>
    <sec id="sec-9">
      <title>4.2. Result analyse</title>
      <sec id="sec-9-1">
        <title>We use the inference time of a single image with 224 resolution and the amount of parameters as</title>
        <p>an indicator of network complexity, perform five calculations, and finally take the average. The
results are shown in table 2. The PG-PRNet model uses the SE module and GhostModule, so the
amount of parameters has been improved, but due to the calculation amount of the two, as well as the
use of Hard Sigmoid, layer pruning and DepthwiseConv, therefore, the picture The inference speed is
the best (56 ms &lt; 70 ms &lt; 74 ms), where the number of parameters of B0 is second only to</p>
      </sec>
      <sec id="sec-9-2">
        <title>EfficientNetV1, but the accuracy of the latter is much lower than our method.</title>
      </sec>
      <sec id="sec-9-3">
        <title>Thanks to the parallel gating unit, our model can obtain good shallow spatial position information while keeping light weight. Because of the parallel gating, it also has the function of selecting input features in the branch part, and mapping the features to high dimensions.</title>
        <p>224
98.3
98.4
99.0
87.0
96.5
98.3
97.7
Params(M)
3.2
5.4
7.2
10.2
0.7
22.4
4.0</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>4.3. Ablation experiments</title>
      <sec id="sec-10-1">
        <title>Two adaptive regularization methods are considered: Dropout and Stochasitc depth. Larger  is</title>
        <p>used for larger features and smaller  is used for smaller features. The lower dimension contains more
spatial location information, but the higher dimension contains more abstract semantic information.</p>
      </sec>
      <sec id="sec-10-2">
        <title>Both kinds of information are very important for inference. It is not reasonable to use the same</title>
        <p>and  for the whole structure. In algorithm 1, the survival probability  and dropout rate  are
adaptively adjusted according to the size of the feature map. This problem is mitigated to some extent.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>4.4. PG-PRNet for Object Detection</title>
      <p>
        BDD100K is a traffic driving video dataset that can be used for a variety of autonomous driving
task scenarios, containing up to 100,000 images for 10 task scenarios. In this paper, for the use of
lightweight models, we use a subset of 10,000 of these images of autonomous driving scenarios to test
performance in terms of target detection. As shown in figure 3, the output features of the last three
layers of PG-PRNet are extracted using the Neck of YOLOv4 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Using Mosaic data enhancement,
we introduced copy-and-paste data enhancement to improve the detection accuracy of small targets
[14]. Finally, the three scales of features are output and the corresponding target detection results are
obtained after post-processing (NMS). We list three images as a reference for the results. The
parameters are set to batch size = 16, loop scheduler and Adam optimizer.
      </p>
    </sec>
    <sec id="sec-12">
      <title>4.5. Results Analyze.</title>
      <p>It can be seen that in the case of training only 300 epoches, our model is advanced. The optimum is
achieved with 11.5 millions number of parameters and 311 ms inference time, and, the deepened
model has a significant performance improvement. This demonstrates that the parallel gating unit
effectively improves the feature representation of the branch, making the feature extraction capability
of PG-PRNet still highly applicable even after many model compression methods. In figure 4, six
typical difficulties of target detection in real-time traffic scenarios are listed. Our approach maintains
high detection accuracy and robustness. A parallel gating unit is used in combination with an adaptive
progressive regularization algorithm. The Copy-Paste and Mosaic based approach reduces overfitting,
improves model generalization, and enhances performance in scenes with occlusion, too many small
targets, rain, multiple categories, and video motion blur.</p>
    </sec>
    <sec id="sec-13">
      <title>5. Conclusion</title>
      <p>In this work, we propose a lightweight parallel gated feature extraction framework to represent the
residual branching information of a given feature in a new cascade, which changes the constant
mapping of the residual structure in lightweight networks. In addition, an adaptive progressive
regularization algorithm is used to adapt the regularization rules for different size features and
different scale networks, called PG-PRNet. The framework is embedded into image recognition and
object detection to verify its feature extraction capability, and our model achieves optimality in model
volume and accuracy. Its efficiency at variable resolution is demonstrated.</p>
    </sec>
    <sec id="sec-14">
      <title>6. References</title>
      <p>[14] Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T. Y., Cubuk, E. D. and Zoph, B., 2021. Simple
copy-paste is a strong data augmentation method for instance segmentation. In: Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Online Meeting. (pp.
2918-2928).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Efficientnet: Rethinking model scaling for convolutional neural networks</article-title>
          .
          <source>In: International conference on machine learning</source>
          .
          <source>Long Beach</source>
          , California.
          <fpage>6105</fpage>
          -
          <lpage>6114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Ghostnet: More features from cheap operations</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF conference on computer vision</source>
          and pattern recognition. Seattle, WA, USA. (pp.
          <fpage>1580</fpage>
          -
          <lpage>1589</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Cheng, Y.,
          <string-name>
            <surname>Bapna</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firat</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , (
          <year>2019</year>
          ).
          <article-title>Gpipe: Efficient training of giant neural networks using pipeline parallelism</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision</source>
          and pattern recognition.
          <source>Las Vegas</source>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA. (pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zoph</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Neural architecture search with reinforcement learning</article-title>
          .
          <source>arXiv preprint arXiv:1611</source>
          .
          <fpage>01578</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <year>2017</year>
          . Xception:
          <article-title>Deep learning with depthwise separable convolutions</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu</source>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA. (pp.
          <fpage>1251</fpage>
          -
          <lpage>1258</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Squeeze-and-excitation networks</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision</source>
          and pattern recognition.
          <source>Salt Lake City</source>
          ,
          <string-name>
            <surname>UT</surname>
          </string-name>
          , USA. (pp.
          <fpage>7132</fpage>
          -
          <lpage>7141</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Selvaraju</surname>
            ,
            <given-names>R. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cogswell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2017</year>
          .
          <article-title>Grad-cam: Visual explanations from deep networks via gradient-based localization</article-title>
          .
          <source>In: Proceedings of the IEEE international conference on computer vision</source>
          . Venice, Italy. (pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weyand</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreetto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Adam</surname>
          </string-name>
          , H.,
          <year>2017</year>
          . MobileNets:
          <article-title>Efficient Convolutional Neural Networks for Mobile Vision Applications</article-title>
          . ArXiv, abs/1704.04861.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sedra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K. Q.</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Deep networks with stochastic depth</article-title>
          .
          <source>In: European conference on computer vision</source>
          . Amsterdam, Netherlands. (pp.
          <fpage>646</fpage>
          -
          <lpage>661</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Houben</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stallkamp</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salmen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlipsing</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Igel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark</article-title>
          .
          <source>In: The 2013 international joint conference on neural networks (IJCNN)</source>
          . Dallas, TX, USA. (pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Bochkovskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C. Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>H. Y. M.</given-names>
          </string-name>
          ,
          <year>2020</year>
          . Yolov4:
          <article-title>Optimal speed and accuracy of object detection</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .10934.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>