<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>E. Tartaglione);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A round-trip journey in pruned artificial neural networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Bragagnolo</string-name>
          <email>andrea.bragagnolo@synesthesia.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enzo Tartaglione</string-name>
          <email>enzo.tartaglione@telecom-paris.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Dalmasso</string-name>
          <email>gianluca.dalmasso@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Grangetto</string-name>
          <email>marco.grangetto@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Deep Learning, Pruning, Eficiency</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Dept., University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LTCI, Télécom Paris, Institut Polytechnique de Paris</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Synesthesia s.r.l</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In the last decade, deep learning models competed for performance at the price of tremendous computational costs. Such a critical aspect recently attracted more attention for both the training and inference phases. The latter is obviously orders of magnitude lower than the training complexity, but on the other hand, it contributes many times, which impacts eficiency on edge or embedded devices. Inference can be made eficient through neural network pruning, which consists of parameters and neurons' removal from the model's topology while maintaining the model's accuracy. This results in reduced resource and energy requirements for the models. This paper describes two pruning procedures for lowering the operations required during the inference phase and a method to exploit the resulting sparsity. The same cannot be applied at training time: we show it is possible to borrow similar ideas to reduce the cost of gradient backpropagation by disabling the computation for selected neurons.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
works are widely used in various tasks, such as speech
recognition and computer vision. However, modern
architectures require many parameters to generalize well,
resulting in large model sizes, high computational and
memory resources, and significant energy consumption
during training and inference.</p>
      <p>In this paper, we present our research on neural
network pruning, which involves removing the less essential
elements of the network to reduce the model resource
requirements. Specifically, we explore the design of
pruning procedures (Sec. 2 and Sec. 3), the efect of pruning on
network features (Sec. 4), and the practical application of
pruned networks to reduce energy consumption (Sec. 5).</p>
      <p>
        We present two pruning techniques capable of
squeezing the model size incrementally during training:
LOBSTER [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an unstructured approach that uses parameter
sensitivity as a regularizer, and SeReNe [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a structured
procedure that evaluates the contribution of neurons to
the network’s output. Pruned networks obtained with
these techniques were used to assess the benefits of
pruning at inference time.
      </p>
      <p>Ital-IA 2023: 3rd National Conference on Artificial Intelligence,
organEvelop-O
∗Corresponding author.</p>
    </sec>
    <sec id="sec-2">
      <title>2. LOBSTER</title>
      <p>
        In this section, we present LOBSTER [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (LOss-Based
SensiTivity rEgulaRization), an unstructured and gradual
pruning procedure.
      </p>
      <p>LOBSTER uses a sensitivity-based regularization to
promote sparsity in the network topology. Specifically,
we define the sensitivity of a network parameter as the
derivative of the loss function with respect to that
parameter. Parameters with low sensitivity have little
impact on the loss function when perturbed and can be
pruned without compromising performance. LOBSTER
achieves sparsity by gradually shrinking parameters with
low sensitivity using a regularize-and-prune approach.</p>
      <p>The sensitivity is defined as
!!+’&amp;
"+",(’&amp;#"",
)
’
)
)
(
’
)+"$#’&amp;!!+$
’,#-’*!+$
()*!
&amp;!+$
ℓ2(),&amp;#&amp;"
!!!
*
(
*
*
)
(
*-’"*
$" "’."$’",%
ℓ2)*-’$’#
),&amp;! )%!+!)*
(a)
*-’"!*&amp;","*+
(b)</p>
      <p>
        SeReNe [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] solves this issue by producing sparse
net(ℒ ,  ,, ) = |  ℒ,, | , (1) wfeowrekr tnoepuorloongsieasnwd,itthhearesftorurec,tuferwe,ehreonpceeractoionnsisstdiunrginogf
inference. Our approach involves driving all the
paramewetietrhoℒftrheepnreestewnotirnkg. the loss function and  ,, a param- ters of a neuron toward zero, allowing us to prune entire
      </p>
      <p>
        LOBSTER allows training a network from scratch, neurons from the network. To achieve this, we leverage
thanks to its loss-based sensitivity formulation. More- the concept of neuron’s sensitivity, defined as the
variaover, it avoids additional derivative computations or tion of the network output with respect to the neuron’s
second-order derivatives, unlike other sensitivity-based activity:
appErxopaecrhimese.nts on multiple architectures and datasets  , (  ,  , ) = 1 ∑ |   , | , (2)
demonstrate that LOBSTER outperforms several com-  =1  ,
petitors in multiple tasks. It achieves competitive com- where   represents the network’s output and  , the
pression ratios with minimal computational overhead  -th neuron of the  -th layer activity.
and without compromising performance. The results of During training, all the parameters of low-sensitivity
the pruning procedure for LeNet-5 trained on the MNIST neurons are shrunk, making it possible to remove them
dataset and ResNet-18 trained on ImageNet are shown from the network. When the ℓ2 norm of a neuron’s
pain Figure 1. LOBSTER achieves state-of-the-art sparsi- rameters approach zero, the neuron no longer emits
sigifcation and classification errors for both architectures. nals (except for the bias), and can be pruned. We propose
Sparse VD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] slightly outperforms all other methods in an iterative two-step procedure to prune parameters
bethe LeNet5-MNIST experiment at higher compression longing to low-sensitivity neurons. We ensure controlled
rates. performance loss for the original architecture using a
cross-validation strategy.
3. SeReNe Our approach allows us to learn network topologies
that are not only sparse, i.e., with few non-zero
paramAlthough LOBSTER can achieve high sparsity rates, the eters, but also with fewer neurons. This can speed-up
sparsity is unstructured, meaning that the architecture the network execution by better using cache locality and
may not remove entire neurons, and the resulting model memory access patterns. We demonstrate the
efectivecan only be accelerated using specialized hardware and ness of SeReNe on multiple learning tasks and network
software. architectures, outperforming state-of-the-art references.
      </p>
      <p>Finally, we show that structured sparsity provides
benefits when storing the neural network topology and pa- rons from the architecture. The resulting models do not
rameters. Table 1 shows the results obtained applying require any particular software or hardware to speed up
SeReNe on the LeNet-5 architecture trained on MNIST. their inference.</p>
      <p>
        SeReNe achieves a high compression ratio and pruned We were able to perform benchmarks for both mobile
neurons, outperforming the considered references. The devices [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and FPGA platforms [11], which
demonstructured sparsity results in a significant decrease in strates the efectiveness of our approach. Specifically,
the uncompressed network storage footprint, with only we evaluated the performance of the pruned neural
neta slight 0.12% performance drop after compression. We works on a range of devices, varying in terms of
processalso tested our method on more challenging architectures ing power and memory capacity. Our results show that
and datasets: Table 2 shows the results for ResNet-101 the combination of pruning and Simplify optimization
trained on ImageNet. The pruning procedure results in outperforms the other techniques in terms of both
inferaround 86% of the parameters being pruned, and the re- ence speed and memory footprint. Table 3 and Table 4
sulting network size is reduced from 156.67 MB to only shows the results for pruned and simplified network on
27.84 MB. mobile devices and FPGAs respectively.
      </p>
      <p>Overall, these results demonstrate the feasibility of
deploying pruned neural networks on various
resource4. Structured pruning for constrained devices, opening up new opportunities for
low-power devices bringing deep learning to the edge.</p>
      <p>
        In this section, we present some empirical results that
demonstrate how pruning (especially structured prun- 5. Neurons at Equilibrium (NEq)
ing) can produce a network model that requires fewer
resources to perform inference. To achieve this, we built All the works presented up to this point focused on
reducthe s i m p l i f y library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a PyTorch-compatible tool that ing the neural network’s inference time. However, in this
automates the process of optimizing the inference code section, we present NEq [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], an approach that enables us
for pruned neural networks by removing the zeroed neu- to shrink the cost of training by reducing the number of
Experimental results for diferent network architectures and pruning strategies. Left: percentage of pruned parameters, size of
the simplified network topology, and size of the compressed bitstream. Right: inference time on diferent embedded devices:
Raspberry Pi 3B (RPi 3B), Huawei P20 (P20), Xiaomi MI 9 (MI9), and Samsung Galaxy S6 lite (S6L). Source: [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
satisfy | Δ | &lt;  for some  ≥ 0 .
where   = ∑ ∈Ξ 
∑
 
=1
 ̂

,,,
⋅  ̂,,,
      </p>
      <p>−1
ilarity between all the outputs of the  -th neuron at time
 and at time  − 1 for the whole validation set Ξ . We
can say that the  -th neuron is at equilibrium when it can
is the cosine
simin FLOPs, and the network’s generalization capability at
the end of training. Our results show that NEq
consistently reduces the number of FLOPs with minimal or no
performance drop. While the amount of saved
computation is similar for the stochastic approach with fixed
probabilities in all the considered scenarios, the loss in
performance varies depending on the architecture and
dataset. In contrast, NEq adapts to the particular setup
and saves the largest FLOPs for a given performance,
with a lower performance loss even when the stochastic
approach is tested with the same FLOPs saving.</p>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion</title>
      <p>In this paper, we shared the research experiences we
developed in the context of compressing large neural
models. Our story has begun with classical unstructured
pruning of model parameters, e.g. connections between
neurons, where the target is the highest sparsification
with the lowest performance impairment. This approach,
while very sound from a theoretical point of view, does
not guarantee significant eficiencing of the inference
phase, when the model is deployed on actual devices.
Therefore, we described the structured pruning
alternatives that aim at removing whole neurons, thus
uncovering the real pruning potential in saving memory
and reducing the latency. Finally, we show that pruning
can also be exploited at training time to cut the cost of
backward propagation. In particular, we introduced NEq,
a technique to disable the computation of gradients of
neurons that have reached equilibrium: this amounts to
pruning the backpropagation graph, and decreasing the
number of operations during training. This technique
can reduce the cost of training modern neural networks.
for neural network compression, in: 2021 IEEE
International Conference on Image Processing (ICIP),
IEEE, 2021, pp. 3527–3531.
[11] J. Flich, L. Medina, I. Catalán, C. Hernández, A.
Bragagnolo, F. Auzanneau, D. Briand, Eficient
inference of image-based neural network models in
reconfigurable systems with pruning and
quantization, in: 2022 IEEE International Conference
on Image Processing (ICIP), 2022, pp. 2491–2495.
doi:10.1109/ICIP46576.2022.9897752.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tartaglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bragagnolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiandrotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grangetto</surname>
          </string-name>
          ,
          <article-title>Loss-based sensitivity regularization: Towards deep sparse neural networks</article-title>
          ,
          <source>Neural Networks</source>
          <volume>146</volume>
          (
          <year>2022</year>
          )
          <fpage>230</fpage>
          -
          <lpage>237</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0893608021004706. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . n e u n e
          <source>t . 2 0 2 1 . 1 1 . 0 2 9 .</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tartaglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bragagnolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Odierna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiandrotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grangetto</surname>
          </string-name>
          , Serene:
          <article-title>Sensitivity-based regularization of neurons for structured sparsity in neural networks</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>33</volume>
          (
          <year>2022</year>
          )
          <fpage>7237</fpage>
          -
          <lpage>7250</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ T N N L S</surname>
          </string-name>
          .
          <volume>2 0 2 1 . 3 0 8 4 5 2 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bragagnolo</surname>
          </string-name>
          , E. Tartaglione,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grangetto</surname>
          </string-name>
          ,
          <article-title>To update or not to update? neurons at equilibrium in deep models</article-title>
          , in: A. H.
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Belgrave</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Cho (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/ forum?id=
          <fpage>LGDfv0U7MJR</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Molchanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ashukha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vetrov</surname>
          </string-name>
          ,
          <article-title>Variational dropout sparsifies deep neural networks</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2498</fpage>
          -
          <lpage>2507</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pool,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dally</surname>
          </string-name>
          ,
          <article-title>Learning both weights and connections for eficient neural network</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ullrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , E. Meeds,
          <article-title>Soft weightsharing for neural network compression</article-title>
          ,
          <source>5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tartaglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lepsøy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiandrotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Francini</surname>
          </string-name>
          ,
          <article-title>Learning sparse neural networks via sensitivitydriven regularization</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3878</fpage>
          -
          <lpage>3888</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Dynamic network surgery for eficient dnns</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          (
          <year>2016</year>
          )
          <fpage>1387</fpage>
          -
          <lpage>1395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bragagnolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Barbano</surname>
          </string-name>
          ,
          <article-title>Simplify: A python library for optimizing pruned neural networks</article-title>
          ,
          <source>SoftwareX</source>
          <volume>17</volume>
          (
          <year>2022</year>
          )
          <article-title>100907</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S2352711021001576. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 1 6 / j . s o f t x . 2 0</source>
          <volume>2 1 . 1 0 0 9 0 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bragagnolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tartaglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiandrotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grangetto</surname>
          </string-name>
          ,
          <article-title>On the role of structured pruning</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>