1. Introduction

Workshop on Artificial Intelligence and Cyber Security, December

Memory Eficient Federated Deep Learning for Intrusion Detection in IoT Networks

Idris Zakariyya

Harsha Kalutarage

M. Omar Al-Kadri

0 0 School of Computing and Digital Technology, Birmingham City University , UK 1 School of Computing, Robert Gordon University , UK

2021

14 2021 0000 0002

Deep Neural Networks (DNNs) methods are widely proposed for cyber security monitoring. However, training DNNs requires a lot of computational resources. This restricts direct deployment of DNNs to resource-constrained environments like the Internet of Things (IoT), especially in federated learning settings that train an algorithm across multiple decentralized edge devices. Therefore, this paper proposes a memory eficient method of training a Fully Connected Neural Network (FCNN) for IoT security monitoring in federated learning settings. The model's performance was evaluated against eleven realistic IoT benchmark datasets. Experimental results show that the proposed method can reduce memory requirement by up to 99.46 percentage points when compared to its benchmark counterpart, while maintaining the state-of-the-art accuracy and F1 score.

eol>Deep Neural Networks (DNNs) Internet of Things (IoT) Fully Connected Neural Network (FCNN) Memory Federated Learning Intrusion Detection

1. Introduction

resource-constrained and distributed in nature, DNN-based cyber security techniques cannot be directly deployed for intrusion detection in IoT networks. In that context, Federated Learning (FL) [ 4 ] approach that supports for data privacy may not scale through IoT devices due to their lack of computational resources. To respond to this challenge, we propose an eficient training method for Fully Connected Neural Network (FCNN) for IoT security monitoring, in particular to reduce the memory footprint during the training while maintaining the same or higher level of accuracy than its benchmark counterpart.

For our experiments, we utilize a FCNN along with eleven IoT benchmark datasets to build a memory-eficient DNN (MEDNN) model. The experimental results are encouraging as the resulting MEDNN shows lower memory consumption with better classification performance in both centralized and federated settings against each data set used in our experiments. The federated integration of the model also helps to preserve the privacy of IoT device data during on-device model training.

The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 describes the proposed method and the utilized FL technique, while Section 4 describes the evaluation process. Results and discussion can be found in Section 5. Finally, Section 6 concludes the paper with future research directions.

2. Related Work

This section presents related studies concerning deep learning for IoT intrusion detection followed by recent FL techniques applied to IoT security monitoring.

Significant research has been conducted on IoT security monitoring using AI techniques. Most of these methods utilized DNN. Mohammad et al. [ 5 ] described the potentiality of DNN for IoT data analysis and classification tasks. Kodali et al. [ 6 ] employed DNN, especially FCNN, for classification tasks on resource-limited devices. Shen et al. [ 7 ] proposed compact structurebased learning with Convolutional Neural Network (CNN) for an IoT resource-constrained environment. Most of the optimization approach considered the quantization of weights and bias parameters. However, our proposed approach in this paper aims to reduce memory requirements. The method exploits pruning, simulated micro-batching and parameter regularization to optimise the resulting model in terms of memory requirements and accuracy performance. This is useful, especially for the task of distributed learning in a resource-constrained environment.

Recently, researchers from several disciplines explored FL methods from diferent perspectives. In the field of IoT security monitoring, FL is gaining popularity. Preuveneers et al. [ 8 ] explored FL applications for intrusion detection in IoT networks. Lim et al. [ 9 ] and Imteaj et al. [ 10 ] describes open research problems on FL for resource-constrained IoT devices. Thein et al. [ 11 ] utilized FL to detect attacks on industrial IoT devices. Liu et al. [ 12 ] conduct a similar investigation by considering sensor reading data. Jiang et al. [ 13 ] utilized model pruning for eficient FL training on edge devices. Bonawitz et al. [ 14 ] proposed a scalable FL framework for mobile devices to reduce communication overhead. However, none of these proposals considers optimizing FL training to reduce memory consumption on IoT networks using pruning and micro-batching. We address this challenge by optimizing the federated training procedure using raw network trafic datasets from various IoT devices. Then, we proposed a MEDNN FL method with minimal resource consumption. This method maintains state-of-the-art accuracy while reducing memory consumption.

3. Research Methodology

We propose a framework that manipulates and optimizes an FCNN version of DNN to yield a compact classification model (see Figure 1). We later validated this framework by training the FCNN on IoT benchmark datasets in federated and centralized settings to build MEDNN. This requires evaluating the FCNN regularization to produce a loss function that identifies various parameters relevant to model shrinking. We demonstrate that knowledge of architecture and optimizing parameters is suficient to produce the MEDNN model. The optimized model can classify malicious activities on IoT networks.

3.1. Baseline FCNN Training

A DNN is a neural network containing deep layers of neurons representing the input data. These neurons correspond to computing units. They are capable of transmitting the computational results operated with their activation function and the input. FCNN is a sequential DNN connecting neurons by linking them with their corresponding weights and bias parameters. The weights and biases serve as information storage components. The baseline FCNN model (ℳ) in Algorithm 1 is consist of network topology, activation functions and corresponding values for weights and bias. The weight and bias values settings can minimize the error function ℰℳ evaluated over the labelled training data . The function BASE in line 1 of Algorithm 1 describes the ℳ training using a gradient descent algorithm with backpropagation [ 15 ]. This is determined to minimizes the cost function in Equation 1 and Equation 2 in-order to properly map unseen samples using a function that learned from . The resulting FCNN approach uses supervised neural networks as a classifier, ℳ can accept an input and outputs a probability class of vector ˆ . The desired output ˆ are rounded up to the closest integer using a specified threshold value as in Equation 3. This output represents either the benign (1) or the attack (0) trafic instance.

Algorithm 1 Baseline FCNN Training

Input: Labelled data , Number of iteration , Batch size

Output: Baseline Model ℳ (ˆ , ) = − ( log ˆ + (1 − )log (1 − ˆ) = {︃0 if ˆ ≤ 1 if ˆ > (1) (2) (3)

3.2. Memory Eficient MEDNN Training

Training a resource eficient DNN model can be a challenging task [ 16 ]. Especially in considerations of model parameters requirements in designing and building the desirable architecture. The complexity of such an approach increases with multidimensional datasets.

To this end, we utilize the baseline ℳ model (a trained FCNN model) to produce the memory eficient version of it (MEDNN). The training procedure described in Algorithm 2 optimizes a function that requires to return the eficient correspond to the MEDNN model. As described in line 4 in Algorithm 2, the optimization procedure utilized micro-batching [ 17, 18 ] for eficient training. To reduce network complexity, we used a penalty [ 19 ] (weight elimination) technique with a threshold parameter 0 as shown in regularized Equation 4. This is a requirement to discover those sets of relevant weights from the irrelevant ones. Particularly in determining the significant and insignificant large weights of the baseline FCNN model. Weights greater than 0 that yield a complexity cost closer to 1 requires a regularization using the penalty parameter . The regularization considers a scenario where the baseline produces a higher error value ℰ as in line 9. For better performance, we utilized the set of parameters to produce a lower error value ℰ . This process can reduce the complexity of the FCNN model while building the MEDNN.

Algorithm 2 Procedure to build MEDNN

Input: Penalty term

Output: Eficient Model 1: function Efficient([ ]) for = 1 to ; do

ℳ 19: end function

Sample mini-batch = {(1, 1), ..., (, )} ⊂ Sample micro-batch = {(1, 1), ..., (, )} ⊂ ( ) ℰ ← (M) Compute gradients for parameters update + ∑︀

(2 /02) =1 (1+2 /02)

◁ , , 0 = Loss, total weights, threshold ◁ backward pass based on model parameters for ( ) ◁ in Alg. 1

◁ in Alg. 1 ◁ forward pass ◁ ℰ in Alg. 1 ◁ in Alg. 1 ◁ Execution memory at epoch ◁ = Eficient memory footprint ℳ = Trained model that estimate ℰ , = ∑︁ =1

(2 /02) (1 + 2 /02) (4)

3.3. MEDNN in Federated Learning

FL is a machine learning approach that supports distributed model training using multiple clients without exposing their training data. This technique updates a shared global model by aggregating each client training output [ 20 ]. Building a federated model can be a challenge for resource-constrained IoT devices. With this in mind, we tested the proposed MEDNN in FL settings to see how much memory it can save in model training. Our federated learning approach is less complex, eficient and efective for the task of IoT intrusion detection compared to its benchmark counterpart (see experimental results in Section 5.3).

4. Evaluation

This section describes benchmark datasets and the evaluation procedure used to build the MEDNN and FCNN techniques in centralized and federated learning settings.

4.1. Utilized Datasets

The N-BaIoT dataset consists of various raw subsets data instances from many commercial IoT devices (see Table 1). Each device contains data samples of attacks and benign network trafic lfows [ 21 ]. These devices are either infected by BASHLITE or Mirai attacks with some benign instances. The overall dataset serves as a benchmark for the proposal of IoT intrusion detection methods. We consider device subsets data of the N-BaIoT to train and test our models. The distribution of the benign and attack samples for each subset of the data show its unbalanced nature. Each device subset data consists of 115 features vector.

Kitsune dataset contains multiple trafic captured on an IoT network setting [ 22 ]. A subset of this data employed to evaluate our models has 764,137 instances of Mirai and regular trafic. This dataset has 115 features with a normal distribution of 121,621 raw trafics data.

IoT-DDoS consists of various captured trafics representing the DDoS botnet attacks and some portion of regular trafic [ 23 ]. We consider 79,035 benign data and 398,391 attack data samples for empirical model evaluation.

WUSTL consists of multiple flows of trafic from an emulated SCADA system [ 24 ]. The dataset can be used to investigate the feasibility of ML algorithms in detecting various attacks. The raw data consists of 7,037,983 data samples. For experimental purposes, the distribution of 471,545 attacks and 6,566,438 normal instances was considered.

4.2. Data Preprocessing

The choice of utilized datasets allows eficient model training for investigations purposes. The classes in these datasets are unbalanced, making them suitable for IoT security monitoring. Employed datasets are categorized into 80% for training and 20% testing samples. Data input vectors are normalized using the unity-based normalization feature scaling. With data features 1, 2, ..., , within a dataset, the normalization is performed using the formula in Equation 5. The description ′, represents the normalized value of the ith feature, the original value, while and represents the minimum and maximum value of the ℎ feature over the entire dataset.

′ =

− − (5)

4.3. Experimental Setup

We profile the memory usage for each model training procedure using the integrated memory usage [ 25 ]. We used Python 3.76 on a desktop computer with Intel Xeon E5-2695(4 core) CPUs running at 2.10 GHz with 16.0 GB installed memory. For models analytics, the Spyder scientific Integrated Development Environment (IDE) [ 26 ] was used to store the model for each dataset. At training, parameters remain constant to enable a fair comparison. This applied to the baseline FCNN model and optimized MEDNN. The code used for this study can be accessible at [ 27 ].

4.4. Implementation Details

FCNN and MEDNN Models. For building the sequential FCNN and MEDNN with each dataset, we used the scientific NumPy python module [ 28 ]. Each sequential model consists of an input layer, three hidden layers, and an output layer. Regarding the eight device subset data of N-BaIoT, the topology used consists of 83 neurons in the first and last hidden layer, with 128 neurons in the second hidden layer (83-128-83). The network architecture used with the kitsune dataset consists of 83 neurons in the first and third hidden layers, with 141 neurons in the second hidden layer (83-141-83). For each implementation of these mentioned models topology, the input layer has 115 neurons representing the number of data features, while the output has one neuron.

The network architecture used with the Wustl dataset has three hidden layers with 26 neurons each (26-26-26), while the input and output layers have 6 and 1 neurons, respectively. The model topology used against the IoT-DDoS dataset consists of 20 neurons in each of the three hidden layers (20-20-20), while the input and output layer has 12 and 1 neurons.

These topology architectures are the requirement for the task of binary classification. The setting considers meant to minimize training computations while increasing the performance metrics. These architectures settings are identical for evaluating the baseline FCNN and the proposed MEDNN model. The only diference during the training would be FCNN used Algorithm 1, while MEDNN utilized Algorithm 2. This indicates that significant memory reduction was due to the optimization procedure in Algorithm 2.

For training each model, a mini-batch gradient descent was used. The weight and bias parameters are initialized randomly within [ 0,1 ]. The baseline and optimized training procedure utilized = 0.001. We used 0.01 values for , △ and threshold 0 [ 29 ] with 4 micro-batches to build the MEDNN model. The activation function considered in the fully connected layers is relu with sigmoid in the output layer. Models are trained in 128 batches within the 100 epochs for accuracy to converge. Parameters and hyperparameters were choosing based on grid search. Binary cross entropy was utilized for calculating loss function. See Figure 2a and Figure 2b for the learning process using the chosen epoch for the optimized and baseline training procedure. The optimized training algorithm provides better training accuracy even with fewer iterations than its baseline counterpart.

Low Precision 16-bit Implementation. In Numpy, training with 16-bit floating precision (FP16) requires calling the .float16() method on all model parameters and input data. We consider FP16 while training the baseline FCNN and in obtaining the eficient MEDNN model.

FL Setup. For the FL experimental settings, we used PyTorch version 1.4.0 [ 30 ] and PySyft version 0.2.9 [31]. Pysyft framework simplifies the creation of virtual workers. These workers (a) (b) emulate real virtual machines and can run as a separate process within the same python program. Our federation training procedure utilized three virtual workers representing clients and a coordinating worker. As we utilized Federated averaging (FedAvg), a Stochastic Gradient Descent (SGD) was used to optimize each model. Federated models are trained in 128 batches within four epochs in 30 workers iterations. After the clients model training is complete, average weights values are sent to the coordinating worker. This worker aggregates those weights to update the global model.

5. Results and Discussion

This section discusses the experimental results. It details the evaluation comparison of the optimized MEDNN and baseline FCNN models in centralized and federated settings across datasets.

5.1. MEDNN Model Training (Centralized Manner)

With 11 IoT data sets, we first examined the memory requirements for training FCNN and MEDNN models in a centralised manner. Table 2 presents the memory profile in MB across each dataset. The optimized MEDNN model training requires a lower memory. It reduces the memory requirements of training with Philips B120N10 by 97.60 percentage points and achieves a higher classification accuracy of 84.10 percentage points than its baseline counterpart. These results show the regularization advantage [32, 33] on accuracy with certain datasets. It indicates the less complexity, faster learning capability and better performance behaviour of the optimized model. These resources minimization make it a better choice for IoT security monitoring.

5.2. Low Precision 16-bit Training of MEDNN

Training with reduced precision has become the de facto technique for increasing the energy eficiency of deep learning hardware [ 34]. Therefore we investigated the memory eficiency of the proposed MEDNN with low precision implementation. Table 3 presents training memory usage while integrating the FP16 precision. Across each dataset, memory consumption was reduced by the complete training iterations. Regarding the Philips data, the reduction is 43.63 and 80.61 percentage points with the baseline and optimized training process, respectively. With the same data, the accuracy increased by 68.18 percentage points using the optimized method. The results suggest that FP16 operations can influence memory reduction using the optimized training method. It demonstrated that FP16 integration does not influence MEDNN accuracy reduction in most cases. It can reduce the FCNN classification accuracy across some datasets. As a result, the regularized MEDNN can maintain a better accuracy with FP16 computations.

5.3. MEDNN Model Training (Decentralized Manner)

The results in Table 4 are for the implemented FL method with baseline (FCNN) and its optimized model (MEDNN). These results compared the training memory requirements and accuracy across each dataset. In federated training, the MEDNN model requires lower memory across all datasets. It saves 99.46 percentage points of memory while training the SimpleHome XCS1003-WHT dataset. Across all tested datasets, the classification accuracy is not degraded by the proposed method. This result demonstrates the advantage of the optimized model in building an eficient federated training method, and the usefulness of the proposed method for efective attack detection on resource-constrained devices.

We investigated the efect of the proposed method in federated learning using all the datasets. However, due to the space constraint, we only present the result of the SimpleHome XCS71003-WHT device data to show the significant memory reduction (see Table 5). In addition to the significant memory reduction by the MEDNN model, it outperforms the FCNN model with low precision 16-bit implementation. As shown in the table, FP16 integration reduces the accuracy of the FCNN by 0.05 percentage points while reducing that of the MEDNN by only 0.02 percentage points, respectively. In centralized and federated training procedures, both models demonstrate equal accuracy performance. These results suggest the significance of our optimized model compared with its benchmark counterpart. It indicates that the proposed method is eficient and efective for on-device training in a distributed manner.

5.4. Model Performances

Table 6 describes the federated model performance evaluated by test set accuracy, precision, recall and harmonic mean on randomly chosen datasets. As the chosen IoT datasets are often unbalanced, test accuracy alone would not be a suficient metric to measure the performance in security applications. Instead, the F1 score that corresponds to the harmonic mean of precision and recall is more appropriate. It considers accuracy for each class sample. Employed metrics utilized the True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN). Accuracy, precision, recall and F1 score are defined in Equation 6, 7, 8 and 9. In each scenario, the optimized MEDNN model maintains similar detection performance across all metrics. The performance metrics result presented in Table 6 remained identical for models trained in centralized settings against each dataset. In each case, accuracy, precision, recall and F1-score remained similar. The results indicate that the utilized number of virtual workers nodes in the federated settings had a minor influence on model performance. This behaviour indicates the lightweight advantage and efectiveness of MEDNN in detecting IoT attacks with good F1-score performance.

6. Conclusion

This paper investigated the possibility of reducing memory consumption during DNN training, intending to use DNN-based security solutions in resource-constrained environments. Using FCNN, we proposed a memory-eficient MEDNN for the efective detection of cyber attacks on IoT devices. The efectiveness of MEDNN was tested using eleven IoT benchmark datasets in both centralized and federated learning manners. Experimental results showed that the proposed MEDNN can outperform its benchmark counterparts for memory eficiency and accuracy performance, especially with federated learning. This could be because many clients are involved in training in a federation and thus the cumulative savings are higher than with centralized training on a single node. In addition, the aggregation of models in federated training can influence faster learning compared with centralized training. However, these initial experimental results are encouraging and warrant further investigation, particularly consideration of more computational nodes in a virtual and realistic federated environment. Therefore, in future, we plan to deploy the model in a real IoT network and examine its capabilities to detect IoT attacks in near real-time in a federated learning setting. In addition, we plan to investigate the impact of adversarial attacks on the proposed MEDNN.

Acknowledgments

This work was supported by the Petroleum Technology Development Fund (PTDF), Nigeria. N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019) 8026–8037. [31] T. Ryfel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, D. Rueckert, J. Passerat-Palmbach, A generic framework for privacy preserving deep learning, arXiv preprint arXiv:1811.04017 (2018). [32] D. Krueger, R. Memisevic, Regularizing rnns by stabilizing activations, arXiv preprint arXiv:1511.08400 (2015). [33] J. Lever, M. Krzywinski, N. Altman, Points of significance: Regularization, Nature methods 13 (2016) 803–805. [34] X. Sun, N. Wang, C.-Y. Chen, J. Ni, A. Agrawal, X. Cui, S. Venkataramani, K. El Maghraoui, V. V. Srinivasan, K. Gopalakrishnan, Ultra-low precision 4-bit training of deep neural networks, Advances in Neural Information Processing Systems 33 (2020).

[1]

Dong ,

Shi ,

Yang ,

Wen ,

Zhang ,

Lee , Technology evolution from self-powered sensors to aiot enabled smart homes , Nano Energy ( 2020 ) 105414 .

[2]

Antonakakis ,

April ,

Bailey ,

Bernhard ,

Bursztein ,

Cochran ,

Durumeric ,

J. A.

Halderman ,

Invernizzi ,

Kallitsis , et al., Understanding the mirai botnet , in: 26th {USENIX} security symposium ({USENIX} Security 17) , 2017 , pp. 1093 - 1110 .

[3]

I. V.

Kotenko , I. Saenko ,

Branitskiy , Applying big data processing and machine learning methods for mobile internet of things security monitoring ., J. Internet Serv. Inf. Secur . 8 ( 2018 ) 54 - 63 .

[4]

Konečny `, H. B. McMahan , F. X.

Yu , P.

Richtárik , A. T.

Suresh , D.

Bacon , Federated learning: Strategies for improving communication eficiency , arXiv preprint arXiv:1610.05492 ( 2016 ).

[5]

Mohammadi ,

Al-Fuqaha ,

Sorour ,

Guizani , Deep learning for iot big data and streaming analytics: A survey , IEEE Communications Surveys & Tutorials 20 ( 2018 ) 2923 - 2960 .

[6]

Kodali ,

Hansen ,

Mulholland ,

Whatmough ,

Brooks , G.-Y. Wei, Applications of deep neural networks for ultra low power iot , in: 2017 IEEE International Conference on Computer Design (ICCD), IEEE, 2017 , pp. 589 - 592 .

[7]

Shen ,

Li ,

Zhao ,

Liu ,

Liang , H. Zhang, Eficient deep structure learning for resource-limited iot devices , in: GLOBECOM 2020 -2020

IEEE

Global Communications Conference , IEEE, 2020 , pp. 1 - 6 .

[8]

Preuveneers ,

Rimmer , I. Tsingenopoulos ,

Spooren ,

Joosen ,

Ilie-Zudor , Chained anomaly detection models for federated learning: An intrusion detection case study , Applied Sciences 8 ( 2018 ) 2663 .

[9]

W. Y. B.

Lim ,

N. C.

Luong ,

D. T.

Hoang ,

Jiao ,

Y.-C.

Liang ,

Yang ,

Niyato ,

Miao , Federated learning in mobile edge networks: A comprehensive survey , IEEE Communications Surveys & Tutorials 22 ( 2020 ) 2031 - 2063 .

[10]

Imteaj ,

Thakker ,

Wang ,

Li ,

M. H.

Amini , A survey on federated learning for resource-constrained iot devices , IEEE Internet of Things Journal ( 2021 ).

[11] T. D. Nguyen , S.

Marchal , M.

Miettinen , H.

Fereidooni , N.

Asokan , A.-R.

Sadeghi , Dïot: A federated self-learning anomaly detection system for iot , in: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) , IEEE, 2019 , pp. 756 - 767 .

[12]

Liu ,

Kumar ,

Xiong , W. Y. B. Lim , J.

Kang , D.

Niyato , Communication-eficient federated learning for anomaly detection in industrial internet of things , in: GLOBECOM 2020 -2020 IEEE Global Communications

Conference

, IEEE, 2020 , pp. 1 - 6 .

[13]

Jiang ,

Wang ,

Valls ,

B. J.

Ko ,

W.-H.

Lee ,

K. K.

Leung , L. Tassiulas, Model pruning enables eficient federated learning on edge devices , arXiv preprint arXiv: 1909 . 12326 ( 2019 ).

[14]

Bonawitz ,

Eichner ,

Grieskamp ,

Huba ,

Ingerman ,

Ivanov ,

Kiddon , J. Konečny`,

Mazzocchi , H. B. McMahan , et al., Towards federated learning at scale: System design , arXiv preprint arXiv: 1902 . 01046 ( 2019 ).

[15]

Chauvin ,

D. E.

Rumelhart , Backpropagation: theory, architectures, and applications , Psychology press, 2013 .

[16]

O. I.

Abiodun ,

Jantan ,

A. E.

Omolara ,

K. V.

Dada ,

N. A.

Mohamed ,

Arshad , State-ofthe-art in artificial neural network applications: A survey , Heliyon 4 ( 2018 ) e00938 .

[17]

Oyama , T. Ben-Nun, T.

Hoefler , S.

Matsuoka , Accelerating deep learning frameworks with micro-batches , in: 2018 IEEE International Conference on Cluster Computing (CLUSTER) , IEEE, 2018 , pp. 402 - 412 .

[18]

Huang , Y. Cheng, A. Bapna , O.

Firat , D.

Chen , M.

Chen , H.

Lee , J.

Ngiam , Q. V.

Le , Y.

Wu , et al., Gpipe: Eficient training of giant neural networks using pipeline parallelism , Advances in neural information processing systems 32 ( 2019 ) 103 - 112 .

[19]

Han , J . Pool,

Tran ,

W. J.

Dally , Learning both weights and connections for eficient neural networks , arXiv preprint arXiv:1506.02626 ( 2015 ).

[20]

McMahan ,

Moore ,

Ramage ,

Hampson , B. A. y Arcas , Communication-eficient learning of deep networks from decentralized data , in: Artificial intelligence and statistics , PMLR, 2017 , pp. 1273 - 1282 .

[21]

Meidan ,

Bohadana ,

Mathov ,

Mirsky ,

Shabtai ,

Breitenbacher ,

Elovici , N-baiot-network-based detection of iot botnet attacks using deep autoencoders , IEEE Pervasive Computing 17 ( 2018 ) 12 - 22 .

[22]

Mirsky ,

Doitshman ,

Elovici ,

Shabtai , Kitsune: an ensemble of autoencoders for online network intrusion detection , arXiv preprint arXiv: 1802 . 09089 ( 2018 ).

[23]

Siddharth , IoT-DDoS dataset , 2020 . URL: https://www.kaggle.com/siddharthm1698/ ddos -botnet-attack-on-iot-devices , accessed: 2021 -02-10.

[24]

M. A.

Teixeira ,

Salman ,

Zolanvari ,

Jain ,

Meskin ,

Samaka , Scada system testbed for cybersecurity research using machine learning approach , Future Internet 10 ( 2018 ) 76 .

[25]

Pedregosa ,

Gervais , Memory profiler (python ), Python Software Foundation , https://pypi. org/project/memory-profiler/. Accessed March 25 ( 2019 ).

[26]

Raybaut , Spyder-documentation, Available online at: pythonhosted. org ( 2009 ).

[27] I. Zakariyya , Memory eficient federated algorithm ., 2021 . URL: https://github.com/ izakariyya/Robust_DNN_IoT.

[28]

Johansson , Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib , Apress, 2018 .

[29]

Bosman ,

Engelbrecht ,

Helbig , Fitness landscape analysis of weight-elimination neural networks , Neural Processing Letters 48 ( 2018 ) 353 - 373 .

[30]

Paszke ,

Gross ,

Massa ,

Lerer ,

Bradbury , G. Chanan,

Killeen ,

Lin ,