<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TensorFlow Enabled Deep Learning Model Optimization for enhanced Realtime Person Detection using Raspberry Pi operating at the Edge?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Reenu Mohandas</string-name>
          <email>reenu.mohandas@ul.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mangolika Bhattacharya</string-name>
          <email>mango.bhattacharya@ul.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Penica</string-name>
          <email>mihai.penica@ul.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karl Van Camp</string-name>
          <email>karl.vancamp@ul.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin J. Hayes</string-name>
          <email>martin.j.hayes@ul.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Limerick</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper Quantization e ects are assessed for a real time Edge based person detection use case that is based on the use of a Raspberry Pi. TensorFlow architectures are presented that enable the use of real-time person detection on the Raspberry Pi. The model quantization is performed, performance of quantized models is analyzed, and worstcase performance is established for a number of deep learning object detection models that are capable of being deployed on the Pi for realtime applications. The study shows that the inference time for a suitably optimized TensorFlow enabled solution architecture is signi cantly lower than for an unquantized model with only slight cost implications in terms of accuracy when benchmarked against a desktop implementation. An industrial standard oor limit value of greater than 70% is achieved on the quantized models considered with a reduced detection time of less than 3ms. The Deep Neural Network model is trained using the INRIA Person Detection benchmark Dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Person Detection</kwd>
        <kwd>Edge Intelligence</kwd>
        <kwd>Edge Computing</kwd>
        <kwd>Model Optimization</kwd>
        <kwd>Model Quantization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Person recognition subsystems have now reached a certain level of maturity in
many autonomous detection systems. The detection subsystems range from
computationally less expensive use cases like simple people counting that can still
use Infra-Red (IR) systems or heat-map processing to identi cation problems
with more complex surveillance applications. Such cognitive applications
invariably depend on a deep learning framework for robust performance. The deep
learning frameworks will encompass aspects of representation learning,
highlevel abstraction of non-linear raw signal data and will contain an automatic
feature extraction capability [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        The demand for fast and robust person detection in indoor as well as
outdoor use-cases is necessary in this accelerated urbanizing environment. Central
to the use case is a requirement for the use of a TensorFlow enabled approach
within a CNN (Convolutional Neural Network) framework. Neural networks that
consist of chains of tensor operations, (geometric trans-formations, a ne
transformations, rotation, scaling and so on) are use cases that are attracting a lot of
attention in the literature of late. Pre-processing of the input data and a suitable
framework which can process the tensor data on a device like the Pi is
necessary for the geometric interpretation of such operations in low cost commercial
applications [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Hence TensorFlow is considered as a necessary inference step
for person detection in the deep learning use case that is considered here. In
this work `TensorFlow lite' is considered for the inference processes on the Pi.
The TensorFlow framework used here is an open-source framework developed
for internal use by Google for machine learning but was later released under the
Apache 2.0 open source license in the year 2015 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        This work also applies some recent advances in the study of the integration of
deep learning frameworks to exploit the strong inductive biases that have been
observed when applying neural networks to optimization or machine learning
problems, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] CNNs are an essential component in a deep learning framework
used for detection and classi cation from image/video input. CNNs have been
the preferred neural network-based approach to pixel-wise image segmentation
over traditional image processing and computer vision techniques, especially in
real-time person detection and identi cation problems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Unlike the
conventional image processing techniques which involve HOG, SVM or other gradient
based methods, Deep Learning makes it easier to implement person detection
due to its automatic feature extraction capabilities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Transfer Learning have
made Deep Learning more versatile as the base model trained on a su ciently
large dataset can be further used to nd solutions of new problems with few
steps of ne-tuning for the speci c use-case.
      </p>
      <p>The e cacy of such deep learning frameworks is always a focus of academic
research. In this work use case performance is assessed across a GPU powered
device and an Edge computing device with regard to latency calculations as
opposed to the standard processing time requirements in an industry-speci c
application, and use of quantization approach for performance enhancement. This
paper considers the e ciency vs latency of person detection on Edge Computing
devices against a GPU accelerated device. This analysis is signi cant because of
the advancement of Edge Intelligence, where every technology is swiftly moving
into resource constrained devices, and there is an increased need to maintain
robust performance.</p>
      <p>The literature is focused brie y on the types of optimization that were
previously used for model optimization. There is a comparison of how various deep
learning object detection models respond to quantization.</p>
      <p>This paper is organized as follows. In Section II, the concept of Edge
Intelligence is discussed This work should be viewed very much as an example of AI
on the Edge use case based on a Pi type infrastructure. Model optimization and
compression e orts is considered in Section III. In Section IV, the experiment
is explained, and the results are analyzed. Finally, we present some conclusions
based on the results that have been obtained and some recommendations for
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Edge Intelligence</title>
      <p>
        Conventional deep learning frameworks are bulky, consume hundreds of
megabytes necessary for trained weight storage and the inclusion of a necessary
inference process. For those deep learning models that rely on dense layers, the
number of parameters can number in the billions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This explosion in
parameters makes it challenging for reduced instruction set embedded or mobile systems
to perform such cumbersome calculations in real-time. This has motivated the
use of so-called performance `optimized' neural networks that are denoted as
edge applications. The process of integrating edge computing and AI, termed as
edge intelligence, has attracted signi cant attention in the literature of late [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the peripheral control devices of industrial electronic systems often set
up in the local ethernet is referred to as the edge computing infrastructure. Edge
computing is a decentralized intelligent system with independent entities as per
Kristiani et.al [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Edge computing has often been combined with IoT(Internet
of Things)-based decentralized and distributed data capture infrastructure for
advanced applications.
      </p>
      <p>In this work on person detection, the input will be captured by a pin-point
camera device which will be mounted on an edge processor. The real-time
streaming and processing of image is necessary at the edge device to initiate the
detection process on the edge device. The Edge computing device used in this work is
a Raspberry Pi 4 with 4GB RAM. A picture of the experiment setup is attached
above in gure 2.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Deng et.al. describes the principle of edge computing as the process
of transferring the computation and communication resources to the edge of
networks, from the cloud, to reduce the latency, enabling faster responses for
end users. They also state that Edge Intelligence is a blooming eld today.
Studies shows that by 2024, 40-ZB of global internet data will be generated
by IoT devices. In contrast to the growth rate of data generated by the IoT
devices, the global datacenter tra c is estimated to reach only 20.6ZB by 2021
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. At this stage, the conventional wisdom is to transfer the data into cloud
datacenters which will lead to network congestion. This is the reason for the
recent advancements to handle user demands at the edge-cloud servers. The
process of analyzing user data at the edge device directly can account for the
concerns of latency, monetary loss in data transfer and privacy issues associated
with data transfer and storage.
      </p>
      <p>
        The large model size and complex matrix calculations during the inference
process in this use case poses a signi cant challenge for the deployment of deep
learning models on edge devices [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Re-design of deep learning architectures
become inevitable given the increased need of e cient performance of deep learning
frameworks on resource constrained devices [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Model Optimization E orts</title>
      <p>
        The e orts of model optimization can be classi ed into di erent types based on
the change in the architecture. A signi cant property of Deep Neural Networks
is that its inference is not a ected by minor changes in weights or activation
functions. Hence the optimization techniques started o as two-fold: modi cation
of network structure to increase e ciency, which led to MobileNets which use
depth-wise separable convolutions, and the second category is introduction of
quantization from oating point precision to discrete levels, owing to quantized
inference due to the constrained weights and activation values [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        Training large deep learning networks required computing clusters of
thousands of machines and various methods like ensemble models were studied in
the process of constructing smaller, compressed models. The model compression
techniques were later classi ed into four in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] based on the principles used. 1)
Reduction in Size by pruning, quantization and model compression. 2) Altering
the matrix multiplication by matrix factorization and ltering. 3) Based on
domain knowledge and the data learned which includes processed like knowledge
Distillation and Transfer Learning. 4)Hybrid methods.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Mimic Nets and Mimic Loss</title>
        <p>
          Neural networks have undergone di erent stages of evolution and further size
optimizing stages. Mimic architectures were one the rst experiments, further
progressing into various model compression techniques. Mimic nets are
architectures which are not necessarily supporting the neurological analogy, but they are
used to mimic consistent training data [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Glenn et.al introduced mimic nets
as a technique to train feed-forward nets to automate classi cation and ranking
task in 1993.
        </p>
        <p>Mimic nets have two stages of operation. The rst stage is to generate an
augmented feature space from the training data input features and in the second
stage, linear boundaries to separate classi cation categories, the selected and
rejected options, are found. Feature selection is a substantial part of constructing
a mimic net as it also de nes the order of the mimic net.</p>
        <p>
          Mimic nets could classify inputs as well as rank sets of input feature and have
since been used to optimize classi cation and ranking tasks. Recently in 2018
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Plantinga et.al used a mimic architecture that was developed to mimic the
output of a spectral classi er and called it mimic loss. Mimic loss mentioned by
Plantinga is used to train a student model on a di erent task than the teacher
model. This solution only applies to classi cation and ranking problems which
make it out of scope in person detection problem.
        </p>
        <p>
          With the increased use of analytics, Data Protection has also become
seriously important in every domain, be it industrial or academic or personal. This
concept of stealing data from Deep Neural Networks is possible with Mimic
Nets which were initially developed to learn the generalization function learned
by large deep learning networks. Mosa et. al in 2019 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] proposes the need for
protecting the data learned by networks which are deployed at the core of AI
based products and services. Though Mimic nets have proved to give promising
results in classi cation problems, they are susceptible to the threat of data
stealing, by mimicking the output of student network using a random large dataset.
This makes Mimic net an unfavorable candidate for Edge Deployment.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Model Compression</title>
        <p>
          Model Compression has been an area of research since large deep neural
networks have been used for solving problems. Ensemble models were an option
to use smaller models to solve problems faster, but the main disadvantage of
ensemble models is that they are not suitable for applications in which real-time
predictions are needed, or in case of portable devices and sensor networks [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
In 2006, Bucila et.al [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] proposed the use of model compression to obtain fast,
compact and accurate models.
        </p>
        <p>
          Shallow feed-forward nets can learn deep functions using the same number
of parameters as the original deep models. The concept that is experimented by
Jimmy et. al [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], is that the shallow nets can be trained that perform similar
to complex well-organized deeper convolutional models. In the work on
convolutional nets by Urban et. al [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], the main emphasis was on analysis of CIFAR-10
Dataset which gave poor results when the shallow nets were trained to learn the
function from the deep learning networks and the presence of convolutional layer
was inevitable. Convolutional layers have proved to be the best neural network
layers for extracting feature information from image data and hence is important
to our problem of person detection from images and live video stream. Depth in
neural networks improve generalization capability of the model. Hence the most
e ective method of model compression adopted lately is model quantization [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Hinton and Vinyals, a team from Google, found that an e ective way to
transfer the generalization capability from a large model or ensemble of models
is to use a single model [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. This process of transfer of the large cumbersome
model to a small model using a di erent kind of training is called Knowledge
Distillation. Hinton et. al [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] trained models with no convolutional layers on
CIFAR-10 dataset with an accuracy of 70.2% using distillation. In knowledge
distillation, knowledge can be transferred from one model to another model
with di erent architecture by training the new model on a transfer set.
        </p>
        <p>
          In knowledge transfer learning, a base(teacher) network is rst trained and
then re-purpose the network. In the re-purposing step, the knowledge is
transferred to a second target(student) network to be trained on a random target
dataset. This kind of transfer learning has been found to give promising results
for learning on edge devices [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Quantization E ects</title>
      <p>
        Network quantization achieves large reduction in memory and processing power
usage by reduction in precision values and operations within a model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Quantization enables on-device inference for deep learning models residing on Edge
Devices. Person detection can be achieved in Edge Devices using quantized Deep
Learning models. This work compares the accuracy of detection and inference
time required by the non-quantized model vs quantized models. Unlike
shallow net, the network architecture is not altered in quantization process and the
network remains deep enough for better generalization capability.
In the ow-diagram, the `weight quant' and `activatn quant' are quantization
nodes integrated into the computation graph to simulate the e ects of
quantization of the respective weight and activation values. Training that accounts
quantization in models accounts for quantization error during training to re ect
the quantization at the point of inference. In this type of quantization, every
quanti-zation is followed by dequantization, thus facilitating the simulation of
precision loss in case of inference operation using arithmetic operations. Such
quantization is termed as quantization-aware training. In Post training
quantization, the inference is quantized with o ine conversion from oating point to
xed point.
      </p>
      <p>In Post training Quantization, model size can be reduced by quantizing an
already trained oat TensorFlow model. There are three types of quantization
available. They are 1)Dynamic Range Quantization which statically quantizes
only weights from oating point to integer(with 8-bits precision) which are then
converted back to oating point during inference. In this process, the model can
become 4times smaller, with 2-3 times increase in speed. 2)Full integer
Quantization, as the name suggests, all mathematical operations are integerised and hence
3times increase in speed and reduction in peak memory usage. This is mostly
used in Edge devices. 3)Float16 quantization quantizes the weight to oat16
from oating point numbers. This ensures minimal loss in accuracy and mostly
used only with CPU and GPU devices. The quantization method adapted in
this experiment is dynamic range quantization, which is a type of post-training
quantization.</p>
      <p>
        Quantization aware training is better for model accuracy than post training
quantization, even though the latter is often easier to achieve [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The quantized
models use 8-bit oat instead of 32-bit oat and this is more similar to
inferencetime quantization.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experiment and Analysis</title>
      <p>The experimentation were carried out by training the models using the
TensorFlow object detection API and with the use of transfer learning, the models are
then trained and ne-tuned for person-detection task. TensorFlow Lite is the
framework for inference modules to work on resource constrained devices with
low latency. The weights are converted into 8-bit precision values in the
TensorFlow Flat bu er format. The activations are always stored in oating point. For
inference, the activations are dynamically quantized to 8-bit precisions prior to
inference processing and dequantized back into oating point post processing.</p>
      <p>
        The reference experiment for the model optimization criteria is SSD
inception network trained on INRIA dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is included as the rst model
in both the comparison tables. In 2014, Szegedy et.al from Google proposed
GoogLeNet(22 layers) which consists of inception modules and hence came to
be widely known as Inception Net [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. These networks were further modi ed
in architecture in 2015, which led to versions Inception-v2 and Inception-v3
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. MobileNets were lighter networks, also designed by Google engineers for
Mobile vision applications [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The models were trained on TensorFlow Model
repository on the TensorFlow version 1.14. The inference was programmed using
Python 3.6 with OpenCV 3.4 as the image analysis package. The training process
is completed using GPU GeForce RTX 2080 Ti and the frozen graph is exported
as a TensorFlow-lite model. This model is then converted into a at-bu er
format used for detection experiments on Raspberry Pi 4 running the Raspbian
Buster-10 OS. The Raspberry Pi Camera Module V2 is used for real-time
detection experiment. The simulation in this experiment involves detection on images
for controlled comparison of results. The Raspberry Pi camera is single channel,
8megapixels and has a maximum frame rate capture of 30fps. To connect the
camera module to a pi, a 15cm ribbon cable is attached to the module slots into
the Pi Camera Serial Interface port(CSI).
      </p>
      <p>The evaluation metric used in this experiment is IOU(Intersection Over
Union). The IOU is the ratio of area of intersection vs area of union of the
predicted bounding box and the ground-truth bounding box. This is used to
measure the Precision and Recall of object detection. Precision is the ratio of
True Positives against all positive detections whereas Recall is the ratio of True
Positives against the sum of True Positives and False Negatives(all ground truth
instances).</p>
      <p>The above graph shows that the inference time of the TensorFlow Lite model
is comparable to the larger models with slight change in IOU values. This is a
promising result. In case of industrial application, accuracy is a concern and
a 10% reduction in accuracy might lead to much higher number of erroneous
products and which might cost huge money in any mass production system. If
the acceptable oor rate of detection in an industrial scenario is considered to be
above 70% and within the time of detection of 3ms, then quantized version of
SSD-Inception-v2 and SSD-MobileNet are winning candidates.
SSD-MobileNetv1 is a framework designed for the mobile and edge devices and it has been found
to be the most accurate and robust in the person detection experiment with a
detection rate of 78% and detection time of 1ms. The SSDLite-Mobilenet model
does not perform well enough to be considered for any industrial application as
the IOU values falls to the value of 59% and very low precision of 0.66. The
oor rate of detections as above 70% is su cient for reliable identi cation of
person which is a standard engineering performance requirement for a cell-based
manufacturing environment.</p>
      <p>The research for model optimization is still ongoing with di erent types
of quantization under study. Further research in this area of model
quantization considers adaptive quantization and layer-wise quantization. The process
of quantization is the most-e ective method of model compression used in the
current research for autonomous person detection on Edge Devices.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Edge Intelligence is rapidly developing with optimization of Deep Neural
Networks. Among all the types of optimization and compression techniques, Model
quantization is the most widely used optimization method, due to its huge
reduction in size as well as reduction in computational costs. Further developments in
model quantization like adaptive quantization or multiple quantization within
the same neural network model are also under research. The Knowledge Transfer
methods discussed above have been found bene cial in classi cation problems
but in case of person detection, model quantization gives the best results with
reduced inference time. The quantized models work e ciently on the Raspberry
Pi module, the edge device, with high accuracy values of more than 70% with a
reduced detection time of less than 3ms, which is comparable to the detection
time in GPU accelerated devices with non-quantized models. Thus, person
detection in Industrial application can rely on quantized models for stand-alone
Edge Devices. This highly accurate detections can be further integrated into
intelligent automated responses</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Get started with tensor ow lite</article-title>
          , https://www.tensor ow.org/lite/guide/get started, [Online; accessed 11-October-2020]
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Post training quantization in tensor ow lite (</article-title>
          <year>2020</year>
          ), https://www.tensor ow.org/lite/performance/post training quantization, [Online; accessed 11-October-2020]
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caruana</surname>
          </string-name>
          , R.:
          <article-title>Do deep nets really need to be deep?</article-title>
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>2654</volume>
          {
          <issue>2662</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bagchi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plantinga</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fosler-Lussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Spectral feature mapping with mimic loss for robust speech recognition</article-title>
          .
          <source>In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . pp.
          <volume>5609</volume>
          {
          <fpage>5613</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Braun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krebs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flohr</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gavrila</surname>
            ,
            <given-names>D.M.:</given-names>
          </string-name>
          <article-title>Eurocity persons: A novel benchmark for person detection in tra c scenes</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>41</volume>
          (
          <issue>8</issue>
          ),
          <year>1844</year>
          {
          <year>1861</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bucilua</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caruana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niculescu-Mizil</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Model compression</article-title>
          .
          <source>In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>535</volume>
          {
          <issue>541</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin-Kuo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>c</year>
          .:
          <article-title>Acceleration of neural network model execution on embedded systems</article-title>
          .
          <source>In: 2018 International Symposium on VLSI Design</source>
          ,
          <article-title>Automation and Test (VLSI-DAT)</article-title>
          . pp.
          <volume>1</volume>
          {
          <issue>3</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek.</article-title>
          <string-name>
            <surname>MITP-Verlags</surname>
            <given-names>GmbH</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Co. KG</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dalal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Triggs</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Histograms of oriented gradients for human detection</article-title>
          .
          <source>In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05)</source>
          . vol.
          <volume>1</volume>
          , pp.
          <volume>886</volume>
          {
          <fpage>893</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dustdar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zomaya</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Edge intelligence: the con uence of edge computing and arti cial intelligence</article-title>
          .
          <source>IEEE Internet of Things Journal</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Engel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hantrakul</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Ddsp: Di erentiable digital signal processing</article-title>
          . arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>04643</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>George</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huerta</surname>
          </string-name>
          , E.:
          <article-title>Deep learning for real-time gravitational wave detection and parameter estimation: Results with advanced ligo data</article-title>
          .
          <source>Physics Letters B</source>
          <volume>778</volume>
          ,
          <volume>64</volume>
          {
          <fpage>70</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distilling the knowledge in a neural network</article-title>
          .
          <source>arXiv preprint arXiv:1503.02531</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weyand</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreetto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adam</surname>
          </string-name>
          , H.:
          <article-title>Mobilenets: E cient convolutional neural networks for mobile vision applications</article-title>
          .
          <source>arXiv preprint arXiv:1704.04861</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jacob</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kligys</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Quantization and training of neural networks for e cient integerarithmetic-only inference</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <volume>2704</volume>
          {
          <issue>2713</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , G.E.:
          <article-title>Mimic nets</article-title>
          .
          <source>IEEE transactions on neural networks 4(5)</source>
          ,
          <volume>803</volume>
          {
          <fpage>815</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Knezovic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pervan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Relja</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knezovic</surname>
          </string-name>
          , J.:
          <article-title>Project houseleek-a case study of applied object recognition models in internet of things (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kristiani</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , C.Y.:
          <article-title>isec: An optimized deep learning model for image classi cation on edge computing</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>27267</issue>
          {
          <fpage>27276</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kwasniewska</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szankin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozga</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zajac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruminski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rad</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Deep learning optimization for edge devices: Analysis of training quantization parameters</article-title>
          .
          <source>In: IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society</source>
          . vol.
          <volume>1</volume>
          , pp.
          <volume>96</volume>
          {
          <fpage>101</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mosa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>David</surname>
            ,
            <given-names>E.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Netanyahu</surname>
            ,
            <given-names>N.S.:</given-names>
          </string-name>
          <article-title>Stealing knowledge from protected deep neural networks using composite unlabeled data</article-title>
          .
          <source>In: 2019 International Joint Conference on Neural Networks (IJCNN)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sermanet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anguelov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabinovich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Going deeper with convolutions</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Io e, S.,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Rethinking the inception architecture for computer vision</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Urban</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geras</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kahou</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>O.A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caruana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohamed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philipose</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Do deep convolutional nets really need to be deep (or even convolutional)? (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Verstraete</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Droguett</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meruane</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Modarres</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Deep learning enabled fault diagnosis using time-frequency image analysis of rolling element bearings</article-title>
          .
          <source>Shock and Vibration</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.:
          <article-title>Edge intelligence: Paving the last mile of arti cial intelligence with edge computing</article-title>
          .
          <source>Proceedings of the IEEE</source>
          <volume>107</volume>
          (
          <issue>8</issue>
          ),
          <volume>1738</volume>
          {
          <fpage>1762</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Adaptive layerwise quantization for deep neural network compression</article-title>
          .
          <source>In: 2018 IEEE International Conference on Multimedia and Expo (ICME)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>