<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Artificial Intelligence and Cyber Security, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Memory Eficient Federated Deep Learning for Intrusion Detection in IoT Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Idris Zakariyya</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harsha Kalutarage</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Omar Al-Kadri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing and Digital Technology, Birmingham City University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing, Robert Gordon University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>14</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Deep Neural Networks (DNNs) methods are widely proposed for cyber security monitoring. However, training DNNs requires a lot of computational resources. This restricts direct deployment of DNNs to resource-constrained environments like the Internet of Things (IoT), especially in federated learning settings that train an algorithm across multiple decentralized edge devices. Therefore, this paper proposes a memory eficient method of training a Fully Connected Neural Network (FCNN) for IoT security monitoring in federated learning settings. The model's performance was evaluated against eleven realistic IoT benchmark datasets. Experimental results show that the proposed method can reduce memory requirement by up to 99.46 percentage points when compared to its benchmark counterpart, while maintaining the state-of-the-art accuracy and F1 score.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Neural Networks (DNNs)</kwd>
        <kwd>Internet of Things (IoT)</kwd>
        <kwd>Fully Connected Neural Network (FCNN)</kwd>
        <kwd>Memory</kwd>
        <kwd>Federated Learning</kwd>
        <kwd>Intrusion Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        resource-constrained and distributed in nature, DNN-based cyber security techniques cannot be
directly deployed for intrusion detection in IoT networks. In that context, Federated Learning
(FL) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] approach that supports for data privacy may not scale through IoT devices due to their
lack of computational resources. To respond to this challenge, we propose an eficient training
method for Fully Connected Neural Network (FCNN) for IoT security monitoring, in particular
to reduce the memory footprint during the training while maintaining the same or higher level
of accuracy than its benchmark counterpart.
      </p>
      <p>For our experiments, we utilize a FCNN along with eleven IoT benchmark datasets to build
a memory-eficient DNN (MEDNN) model. The experimental results are encouraging as the
resulting MEDNN shows lower memory consumption with better classification performance
in both centralized and federated settings against each data set used in our experiments. The
federated integration of the model also helps to preserve the privacy of IoT device data during
on-device model training.</p>
      <p>The rest of the paper is organized as follows. Section 2 presents the related work. Section 3
describes the proposed method and the utilized FL technique, while Section 4 describes the
evaluation process. Results and discussion can be found in Section 5. Finally, Section 6 concludes
the paper with future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section presents related studies concerning deep learning for IoT intrusion detection
followed by recent FL techniques applied to IoT security monitoring.</p>
      <p>
        Significant research has been conducted on IoT security monitoring using AI techniques.
Most of these methods utilized DNN. Mohammad et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] described the potentiality of DNN
for IoT data analysis and classification tasks. Kodali et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] employed DNN, especially FCNN,
for classification tasks on resource-limited devices. Shen et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed compact
structurebased learning with Convolutional Neural Network (CNN) for an IoT resource-constrained
environment. Most of the optimization approach considered the quantization of weights and
bias parameters. However, our proposed approach in this paper aims to reduce memory
requirements. The method exploits pruning, simulated micro-batching and parameter regularization to
optimise the resulting model in terms of memory requirements and accuracy performance. This
is useful, especially for the task of distributed learning in a resource-constrained environment.
      </p>
      <p>
        Recently, researchers from several disciplines explored FL methods from diferent perspectives.
In the field of IoT security monitoring, FL is gaining popularity. Preuveneers et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] explored
FL applications for intrusion detection in IoT networks. Lim et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Imteaj et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
describes open research problems on FL for resource-constrained IoT devices. Thein et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] utilized FL to detect attacks on industrial IoT devices. Liu et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] conduct a similar
investigation by considering sensor reading data. Jiang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] utilized model pruning for
eficient FL training on edge devices. Bonawitz et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a scalable FL framework for
mobile devices to reduce communication overhead. However, none of these proposals considers
optimizing FL training to reduce memory consumption on IoT networks using pruning and
micro-batching. We address this challenge by optimizing the federated training procedure using
raw network trafic datasets from various IoT devices. Then, we proposed a MEDNN FL method
with minimal resource consumption. This method maintains state-of-the-art accuracy while
reducing memory consumption.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Methodology</title>
      <p>We propose a framework that manipulates and optimizes an FCNN version of DNN to yield a
compact classification model (see Figure 1). We later validated this framework by training the
FCNN on IoT benchmark datasets in federated and centralized settings to build MEDNN. This
requires evaluating the FCNN regularization to produce a loss function that identifies various
parameters relevant to model shrinking. We demonstrate that knowledge of architecture and
optimizing parameters is suficient to produce the MEDNN model. The optimized model can
classify malicious activities on IoT networks.</p>
      <sec id="sec-3-1">
        <title>3.1. Baseline FCNN Training</title>
        <p>
          A DNN is a neural network containing deep layers of neurons representing the input data. These
neurons correspond to computing units. They are capable of transmitting the computational
results operated with their activation function and the input. FCNN is a sequential DNN
connecting neurons by linking them with their corresponding weights and bias parameters.
The weights and biases serve as information storage components. The baseline FCNN model
(ℳ) in Algorithm 1 is consist of network topology, activation functions and corresponding
values for weights and bias. The weight and bias values settings can minimize the error function
ℰℳ evaluated over the labelled training data . The function BASE in line 1 of Algorithm 1
describes the ℳ training using a gradient descent algorithm with backpropagation [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This
is determined to minimizes the cost function in Equation 1 and Equation 2 in-order to properly
map unseen samples using a function that learned from . The resulting FCNN approach
uses supervised neural networks as a classifier, ℳ can accept an input  and outputs a
probability class of vector ˆ . The desired output ˆ are rounded up to the closest integer using
a specified threshold value  as in Equation 3. This output represents either the benign (1) or
the attack (0) trafic instance.
        </p>
        <p>Algorithm 1 Baseline FCNN Training</p>
        <p>Input: Labelled data , Number of iteration  , Batch size</p>
        <p>Output: Baseline Model ℳ
(ˆ ,  ) = − ( log ˆ + (1 −  )log (1 − ˆ)
 =
{︃0 if ˆ ≤ 
1 if ˆ &gt; 
(1)
(2)
(3)</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Memory Eficient MEDNN Training</title>
        <p>
          Training a resource eficient DNN model can be a challenging task [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Especially in
considerations of model parameters requirements in designing and building the desirable architecture.
The complexity of such an approach increases with multidimensional datasets.
        </p>
        <p>
          To this end, we utilize the baseline ℳ model (a trained FCNN model) to produce the
memory eficient version of it (MEDNN). The training procedure described in Algorithm 2
optimizes a function that requires  to return the eficient  correspond to the MEDNN
model. As described in line 4 in Algorithm 2, the optimization procedure utilized micro-batching
[
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ] for eficient training. To reduce network complexity, we used a penalty [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (weight
elimination) technique with a threshold parameter 0 as shown in regularized Equation 4. This
is a requirement to discover those sets of relevant weights from the irrelevant ones. Particularly
in determining the significant and insignificant large weights of the baseline FCNN model.
Weights greater than 0 that yield a complexity cost closer to 1 requires a regularization using
the penalty parameter  . The regularization considers a scenario where the baseline produces a
higher error value ℰ as in line 9. For better performance, we utilized the set of parameters to
produce a lower error value ℰ . This process can reduce the complexity of the FCNN model
while building the MEDNN.
        </p>
        <p>Algorithm 2 Procedure to build MEDNN</p>
        <p>Input: Penalty term</p>
        <p>Output: Eficient Model
1: function Efficient([ ])
for  = 1 to  ; do</p>
        <p>ℳ
19: end function</p>
        <p>Sample mini-batch  = {(1, 1), ..., (, )} ⊂ 
Sample micro-batch  = {(1, 1), ..., (, )} ⊂ 
( )
ℰ ←
(M)
Compute gradients for parameters update
 +  ∑︀</p>
        <p>(2 /02)
=1 (1+2 /02)</p>
        <p>◁ , , 0 = Loss, total weights, threshold
◁ backward pass based on model parameters for ( )
◁  in Alg. 1</p>
        <p>◁  in Alg. 1
◁ forward pass
◁ ℰ in Alg. 1
◁  in Alg. 1
◁ Execution memory at epoch 
◁  = Eficient memory footprint
ℳ = Trained model that estimate ℰ , 
 =  ∑︁

=1</p>
        <p>(2 /02)
(1 + 2 /02)
(4)</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. MEDNN in Federated Learning</title>
        <p>
          FL is a machine learning approach that supports distributed model training using multiple
clients without exposing their training data. This technique updates a shared global model by
aggregating each client training output [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Building a federated model can be a challenge
for resource-constrained IoT devices. With this in mind, we tested the proposed MEDNN in
FL settings to see how much memory it can save in model training. Our federated learning
approach is less complex, eficient and efective for the task of IoT intrusion detection compared
to its benchmark counterpart (see experimental results in Section 5.3).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>This section describes benchmark datasets and the evaluation procedure used to build the
MEDNN and FCNN techniques in centralized and federated learning settings.</p>
      <sec id="sec-4-1">
        <title>4.1. Utilized Datasets</title>
        <p>
          The N-BaIoT dataset consists of various raw subsets data instances from many commercial IoT
devices (see Table 1). Each device contains data samples of attacks and benign network trafic
lfows [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. These devices are either infected by BASHLITE or Mirai attacks with some benign
instances. The overall dataset serves as a benchmark for the proposal of IoT intrusion detection
methods. We consider device subsets data of the N-BaIoT to train and test our models. The
distribution of the benign and attack samples for each subset of the data show its unbalanced
nature. Each device subset data consists of 115 features vector.
        </p>
        <p>
          Kitsune dataset contains multiple trafic captured on an IoT network setting [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. A subset of
this data employed to evaluate our models has 764,137 instances of Mirai and regular trafic.
This dataset has 115 features with a normal distribution of 121,621 raw trafics data.
        </p>
        <p>
          IoT-DDoS consists of various captured trafics representing the DDoS botnet attacks and
some portion of regular trafic [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. We consider 79,035 benign data and 398,391 attack data
samples for empirical model evaluation.
        </p>
        <p>
          WUSTL consists of multiple flows of trafic from an emulated SCADA system [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The
dataset can be used to investigate the feasibility of ML algorithms in detecting various attacks.
The raw data consists of 7,037,983 data samples. For experimental purposes, the distribution of
471,545 attacks and 6,566,438 normal instances was considered.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Preprocessing</title>
        <p>The choice of utilized datasets allows eficient model training for investigations purposes. The
classes in these datasets are unbalanced, making them suitable for IoT security monitoring.
Employed datasets are categorized into 80% for training and 20% testing samples. Data input
vectors are normalized using the unity-based normalization feature scaling. With  data features
1, 2, ..., , within a dataset, the normalization is performed using the formula in Equation 5.
The description ′, represents the normalized value of the ith feature,  the original value,
while  and  represents the minimum and maximum value of the ℎ feature over
the entire dataset.</p>
        <p>′ =</p>
        <p>− 
 − 
(5)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experimental Setup</title>
        <p>
          We profile the memory usage for each model training procedure using the integrated memory
usage [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. We used Python 3.76 on a desktop computer with Intel Xeon E5-2695(4 core) CPUs
running at 2.10 GHz with 16.0 GB installed memory. For models analytics, the Spyder scientific
Integrated Development Environment (IDE) [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] was used to store the model for each dataset.
At training, parameters remain constant to enable a fair comparison. This applied to the baseline
FCNN model and optimized MEDNN. The code used for this study can be accessible at [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Implementation Details</title>
        <p>
          FCNN and MEDNN Models. For building the sequential FCNN and MEDNN with each dataset,
we used the scientific NumPy python module [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. Each sequential model consists of an input
layer, three hidden layers, and an output layer. Regarding the eight device subset data of N-BaIoT,
the topology used consists of 83 neurons in the first and last hidden layer, with 128 neurons in
the second hidden layer (83-128-83). The network architecture used with the kitsune dataset
consists of 83 neurons in the first and third hidden layers, with 141 neurons in the second hidden
layer (83-141-83). For each implementation of these mentioned models topology, the input layer
has 115 neurons representing the number of data features, while the output has one neuron.
        </p>
        <p>The network architecture used with the Wustl dataset has three hidden layers with 26 neurons
each (26-26-26), while the input and output layers have 6 and 1 neurons, respectively. The
model topology used against the IoT-DDoS dataset consists of 20 neurons in each of the three
hidden layers (20-20-20), while the input and output layer has 12 and 1 neurons.</p>
        <p>These topology architectures are the requirement for the task of binary classification. The
setting considers meant to minimize training computations while increasing the performance
metrics. These architectures settings are identical for evaluating the baseline FCNN and the
proposed MEDNN model. The only diference during the training would be FCNN used
Algorithm 1, while MEDNN utilized Algorithm 2. This indicates that significant memory reduction
was due to the optimization procedure in Algorithm 2.</p>
        <p>
          For training each model, a mini-batch gradient descent was used. The weight and bias
parameters are initialized randomly within [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]. The baseline and optimized training procedure
utilized  = 0.001. We used 0.01 values for  , △ and threshold 0 [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] with 4 micro-batches
to build the MEDNN model. The activation function considered in the fully connected layers is
relu with sigmoid in the output layer. Models are trained in 128 batches within the 100 epochs
for accuracy to converge. Parameters and hyperparameters were choosing based on grid search.
Binary cross entropy was utilized for calculating loss function. See Figure 2a and Figure 2b for
the learning process using the chosen epoch for the optimized and baseline training procedure.
The optimized training algorithm provides better training accuracy even with fewer iterations
than its baseline counterpart.
        </p>
        <p>Low Precision 16-bit Implementation. In Numpy, training with 16-bit floating precision
(FP16) requires calling the .float16() method on all model parameters and input data. We consider
FP16 while training the baseline FCNN and in obtaining the eficient MEDNN model.</p>
        <p>
          FL Setup. For the FL experimental settings, we used PyTorch version 1.4.0 [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and PySyft
version 0.2.9 [31]. Pysyft framework simplifies the creation of virtual workers. These workers
(a)
(b)
emulate real virtual machines and can run as a separate process within the same python program.
Our federation training procedure utilized three virtual workers representing clients and a
coordinating worker. As we utilized Federated averaging (FedAvg), a Stochastic Gradient
Descent (SGD) was used to optimize each model. Federated models are trained in 128 batches
within four epochs in 30 workers iterations. After the clients model training is complete, average
weights values are sent to the coordinating worker. This worker aggregates those weights to
update the global model.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>This section discusses the experimental results. It details the evaluation comparison of the
optimized MEDNN and baseline FCNN models in centralized and federated settings across
datasets.</p>
      <sec id="sec-5-1">
        <title>5.1. MEDNN Model Training (Centralized Manner)</title>
        <p>With 11 IoT data sets, we first examined the memory requirements for training FCNN and
MEDNN models in a centralised manner. Table 2 presents the memory profile in MB across each
dataset. The optimized MEDNN model training requires a lower memory. It reduces the memory
requirements of training with Philips B120N10 by 97.60 percentage points and achieves a higher
classification accuracy of 84.10 percentage points than its baseline counterpart. These results
show the regularization advantage [32, 33] on accuracy with certain datasets. It indicates the
less complexity, faster learning capability and better performance behaviour of the optimized
model. These resources minimization make it a better choice for IoT security monitoring.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Low Precision 16-bit Training of MEDNN</title>
        <p>Training with reduced precision has become the de facto technique for increasing the energy
eficiency of deep learning hardware [ 34]. Therefore we investigated the memory eficiency of
the proposed MEDNN with low precision implementation. Table 3 presents training memory
usage while integrating the FP16 precision. Across each dataset, memory consumption was
reduced by the complete training iterations. Regarding the Philips data, the reduction is 43.63
and 80.61 percentage points with the baseline and optimized training process, respectively. With
the same data, the accuracy increased by 68.18 percentage points using the optimized method.
The results suggest that FP16 operations can influence memory reduction using the optimized
training method. It demonstrated that FP16 integration does not influence MEDNN accuracy
reduction in most cases. It can reduce the FCNN classification accuracy across some datasets.
As a result, the regularized MEDNN can maintain a better accuracy with FP16 computations.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. MEDNN Model Training (Decentralized Manner)</title>
        <p>The results in Table 4 are for the implemented FL method with baseline (FCNN) and its optimized
model (MEDNN). These results compared the training memory requirements and accuracy
across each dataset. In federated training, the MEDNN model requires lower memory across
all datasets. It saves 99.46 percentage points of memory while training the SimpleHome
XCS1003-WHT dataset. Across all tested datasets, the classification accuracy is not degraded by the
proposed method. This result demonstrates the advantage of the optimized model in building
an eficient federated training method, and the usefulness of the proposed method for efective
attack detection on resource-constrained devices.</p>
        <p>We investigated the efect of the proposed method in federated learning using all the datasets.
However, due to the space constraint, we only present the result of the SimpleHome
XCS71003-WHT device data to show the significant memory reduction (see Table 5). In addition
to the significant memory reduction by the MEDNN model, it outperforms the FCNN model
with low precision 16-bit implementation. As shown in the table, FP16 integration reduces the
accuracy of the FCNN by 0.05 percentage points while reducing that of the MEDNN by only
0.02 percentage points, respectively. In centralized and federated training procedures, both
models demonstrate equal accuracy performance. These results suggest the significance of
our optimized model compared with its benchmark counterpart. It indicates that the proposed
method is eficient and efective for on-device training in a distributed manner.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Model Performances</title>
        <p>Table 6 describes the federated model performance evaluated by test set accuracy, precision,
recall and harmonic mean on randomly chosen datasets. As the chosen IoT datasets are often
unbalanced, test accuracy alone would not be a suficient metric to measure the performance in
security applications. Instead, the F1 score that corresponds to the harmonic mean of precision
and recall is more appropriate. It considers accuracy for each class sample. Employed metrics
utilized the True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN).
Accuracy, precision, recall and F1 score are defined in Equation 6, 7, 8 and 9. In each scenario,
the optimized MEDNN model maintains similar detection performance across all metrics. The
performance metrics result presented in Table 6 remained identical for models trained in
centralized settings against each dataset. In each case, accuracy, precision, recall and F1-score
remained similar. The results indicate that the utilized number of virtual workers nodes in the
federated settings had a minor influence on model performance. This behaviour indicates the
lightweight advantage and efectiveness of MEDNN in detecting IoT attacks with good F1-score
performance.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper investigated the possibility of reducing memory consumption during DNN training,
intending to use DNN-based security solutions in resource-constrained environments. Using
FCNN, we proposed a memory-eficient MEDNN for the efective detection of cyber attacks
on IoT devices. The efectiveness of MEDNN was tested using eleven IoT benchmark datasets
in both centralized and federated learning manners. Experimental results showed that the
proposed MEDNN can outperform its benchmark counterparts for memory eficiency and
accuracy performance, especially with federated learning. This could be because many clients
are involved in training in a federation and thus the cumulative savings are higher than with
centralized training on a single node. In addition, the aggregation of models in federated
training can influence faster learning compared with centralized training. However, these
initial experimental results are encouraging and warrant further investigation, particularly
consideration of more computational nodes in a virtual and realistic federated environment.
Therefore, in future, we plan to deploy the model in a real IoT network and examine its
capabilities to detect IoT attacks in near real-time in a federated learning setting. In addition,
we plan to investigate the impact of adversarial attacks on the proposed MEDNN.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the Petroleum Technology Development Fund (PTDF), Nigeria.
N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep
learning library, Advances in neural information processing systems 32 (2019) 8026–8037.
[31] T. Ryfel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, D. Rueckert, J. Passerat-Palmbach, A
generic framework for privacy preserving deep learning, arXiv preprint arXiv:1811.04017
(2018).
[32] D. Krueger, R. Memisevic, Regularizing rnns by stabilizing activations, arXiv preprint
arXiv:1511.08400 (2015).
[33] J. Lever, M. Krzywinski, N. Altman, Points of significance: Regularization, Nature methods
13 (2016) 803–805.
[34] X. Sun, N. Wang, C.-Y. Chen, J. Ni, A. Agrawal, X. Cui, S. Venkataramani, K. El Maghraoui,
V. V. Srinivasan, K. Gopalakrishnan, Ultra-low precision 4-bit training of deep neural
networks, Advances in Neural Information Processing Systems 33 (2020).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Technology evolution from self-powered sensors to aiot enabled smart homes</article-title>
          , Nano
          <string-name>
            <surname>Energy</surname>
          </string-name>
          (
          <year>2020</year>
          )
          <fpage>105414</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Antonakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>April</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernhard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bursztein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cochran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Durumeric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Halderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Invernizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kallitsis</surname>
          </string-name>
          , et al.,
          <article-title>Understanding the mirai botnet</article-title>
          ,
          <source>in: 26th {USENIX} security symposium ({USENIX} Security 17)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1093</fpage>
          -
          <lpage>1110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Kotenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Saenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Branitskiy</surname>
          </string-name>
          ,
          <article-title>Applying big data processing and machine learning methods for mobile internet of things security monitoring</article-title>
          .,
          <source>J. Internet Serv. Inf. Secur</source>
          .
          <volume>8</volume>
          (
          <year>2018</year>
          )
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Konečny</surname>
          </string-name>
          `, H. B.
          <string-name>
            <surname>McMahan</surname>
            ,
            <given-names>F. X.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Richtárik</surname>
            ,
            <given-names>A. T.</given-names>
          </string-name>
          <string-name>
            <surname>Suresh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Bacon</surname>
          </string-name>
          ,
          <article-title>Federated learning: Strategies for improving communication eficiency</article-title>
          ,
          <source>arXiv preprint arXiv:1610.05492</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Fuqaha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sorour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guizani</surname>
          </string-name>
          ,
          <article-title>Deep learning for iot big data and streaming analytics: A survey</article-title>
          ,
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          <volume>20</volume>
          (
          <year>2018</year>
          )
          <fpage>2923</fpage>
          -
          <lpage>2960</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kodali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mulholland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Whatmough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brooks</surname>
          </string-name>
          , G.-Y. Wei,
          <article-title>Applications of deep neural networks for ultra low power iot</article-title>
          , in: 2017 IEEE International Conference on Computer Design (ICCD), IEEE,
          <year>2017</year>
          , pp.
          <fpage>589</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Zhang,</surname>
          </string-name>
          <article-title>Eficient deep structure learning for resource-limited iot devices</article-title>
          ,
          <source>in: GLOBECOM</source>
          <year>2020</year>
          -2020
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Global Communications</surname>
          </string-name>
          <string-name>
            <surname>Conference</surname>
          </string-name>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Preuveneers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rimmer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Tsingenopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Spooren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Joosen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ilie-Zudor</surname>
          </string-name>
          ,
          <article-title>Chained anomaly detection models for federated learning: An intrusion detection case study</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>8</volume>
          (
          <year>2018</year>
          )
          <fpage>2663</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W. Y. B.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Niyato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <article-title>Federated learning in mobile edge networks: A comprehensive survey</article-title>
          ,
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          <volume>22</volume>
          (
          <year>2020</year>
          )
          <fpage>2031</fpage>
          -
          <lpage>2063</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Imteaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Thakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Amini</surname>
          </string-name>
          ,
          <article-title>A survey on federated learning for resource-constrained iot devices</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. D. Nguyen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Marchal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Miettinen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Fereidooni</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Asokan</surname>
            ,
            <given-names>A.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Sadeghi</surname>
          </string-name>
          ,
          <article-title>Dïot: A federated self-learning anomaly detection system for iot</article-title>
          ,
          <source>in: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>756</fpage>
          -
          <lpage>767</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiong</surname>
          </string-name>
          , W. Y. B.
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Niyato</surname>
          </string-name>
          ,
          <article-title>Communication-eficient federated learning for anomaly detection in industrial internet of things</article-title>
          , in: GLOBECOM 2020
          <string-name>
            <surname>-2020 IEEE Global Communications</surname>
            <given-names>Conference</given-names>
          </string-name>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Valls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Leung</surname>
          </string-name>
          , L. Tassiulas,
          <article-title>Model pruning enables eficient federated learning on edge devices</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>12326</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bonawitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Eichner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Grieskamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ingerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiddon</surname>
          </string-name>
          , J. Konečny`,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mazzocchi</surname>
          </string-name>
          , H. B.
          <string-name>
            <surname>McMahan</surname>
          </string-name>
          , et al.,
          <article-title>Towards federated learning at scale: System design</article-title>
          , arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>01046</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chauvin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <article-title>Backpropagation: theory, architectures, and applications</article-title>
          , Psychology press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O. I.</given-names>
            <surname>Abiodun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Omolara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Dada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arshad</surname>
          </string-name>
          ,
          <article-title>State-ofthe-art in artificial neural network applications: A survey</article-title>
          ,
          <source>Heliyon</source>
          <volume>4</volume>
          (
          <year>2018</year>
          )
          <article-title>e00938</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Oyama</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
            Ben-Nun,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hoefler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Matsuoka</surname>
          </string-name>
          ,
          <article-title>Accelerating deep learning frameworks with micro-batches</article-title>
          ,
          <source>in: 2018 IEEE International Conference on Cluster Computing (CLUSTER)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>402</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          , Y. Cheng, A.
          <string-name>
            <surname>Bapna</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Firat</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ngiam</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , et al.,
          <article-title>Gpipe: Eficient training of giant neural networks using pipeline parallelism</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          )
          <fpage>103</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pool,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Dally</surname>
          </string-name>
          ,
          <article-title>Learning both weights and connections for eficient neural networks</article-title>
          ,
          <source>arXiv preprint arXiv:1506.02626</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>McMahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hampson</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. A. y Arcas</surname>
          </string-name>
          ,
          <article-title>Communication-eficient learning of deep networks from decentralized data</article-title>
          ,
          <source>in: Artificial intelligence and statistics</source>
          , PMLR,
          <year>2017</year>
          , pp.
          <fpage>1273</fpage>
          -
          <lpage>1282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Meidan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bohadana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mathov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mirsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shabtai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Breitenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Elovici</surname>
          </string-name>
          ,
          <article-title>N-baiot-network-based detection of iot botnet attacks using deep autoencoders</article-title>
          ,
          <source>IEEE Pervasive Computing</source>
          <volume>17</volume>
          (
          <year>2018</year>
          )
          <fpage>12</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mirsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Doitshman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Elovici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shabtai</surname>
          </string-name>
          ,
          <article-title>Kitsune: an ensemble of autoencoders for online network intrusion detection</article-title>
          , arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>09089</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Siddharth</surname>
          </string-name>
          ,
          <string-name>
            <surname>IoT-DDoS dataset</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://www.kaggle.com/siddharthm1698/ ddos
          <article-title>-botnet-attack-on-iot-devices</article-title>
          , accessed:
          <fpage>2021</fpage>
          -02-10.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Teixeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zolanvari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Meskin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samaka</surname>
          </string-name>
          ,
          <article-title>Scada system testbed for cybersecurity research using machine learning approach</article-title>
          ,
          <source>Future Internet</source>
          <volume>10</volume>
          (
          <year>2018</year>
          )
          <fpage>76</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gervais</surname>
          </string-name>
          ,
          <article-title>Memory profiler (python</article-title>
          ),
          <source>Python Software Foundation</source>
          , https://pypi. org/project/memory-profiler/.
          <source>Accessed March</source>
          <volume>25</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Raybaut</surname>
          </string-name>
          , Spyder-documentation, Available online at: pythonhosted. org (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>I. Zakariyya</surname>
          </string-name>
          ,
          <article-title>Memory eficient federated algorithm</article-title>
          .,
          <year>2021</year>
          . URL: https://github.com/ izakariyya/Robust_DNN_IoT.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R.</given-names>
            <surname>Johansson</surname>
          </string-name>
          , Numerical Python:
          <article-title>Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib</article-title>
          , Apress,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Engelbrecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Helbig</surname>
          </string-name>
          ,
          <article-title>Fitness landscape analysis of weight-elimination neural networks</article-title>
          ,
          <source>Neural Processing Letters</source>
          <volume>48</volume>
          (
          <year>2018</year>
          )
          <fpage>353</fpage>
          -
          <lpage>373</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>