=Paper=
{{Paper
|id=Vol-3189/paper_11
|storemode=property
|title=Latency-Energy Tradeoffs in Federated Learning on Resource Constrained Edge Computing Systems
|pdfUrl=https://ceur-ws.org/Vol-3189/paper_11.pdf
|volume=Vol-3189
|authors=Simone Pampaloni,Nicola Tonellotto,Carlo Vallati
|dblpUrl=https://dblp.org/rec/conf/wcci/PampaloniTV22
}}
==Latency-Energy Tradeoffs in Federated Learning on Resource Constrained Edge Computing Systems==
Latency-Energy Tradeoffs in Federated Learning on Resource Constrained Edge Computing Systems Simone Pampaloni, Nicola Tonellotto and Carlo Vallati Information Engineering Department, University of Pisa, Pisa, Italy Abstract Artificial intelligence and machine learning have become of crucial importance in many scientific and industrial fields, thanks to the ability to extract information, make predictions and identify patterns on data. For the creation of increasingly accurate predictive models, these technologies are based on the collection and control of large amounts of data within controlled systems. Federated learning is a new framework that exploits the computational capabilities and local data of a set of multiple resource- constrained devices coordinated by a central server for the creation of a shared global predictive model, without any centralised data collection. In this work, we focus on assessing the performance of federated learning executed on resource constrained Edge computing system. A set of experiments to assess the energy consumption and processing times on a set of heterogeneous GPU-enabled embedded systems were executed. Our analysis shows that, by varying the amount of data that each system is in charge of processing, it is possible to identify a trade-off between the overall energy consumption of the devices and the processing time required to train an effective predictive model. Keywords Federated Learning, Latency-Energy Tradeoffs, Performance Evaluation, Embedded Systems 1. Introduction Machine and user generated data is rapidly increasing due to the rapid advancement of Internet of Things (IoT), social networking applications and ubiquitous wireless connectivity via 5G/6G networks. The analysis of such amount of data is often carried out via Artificial Intelligence (AI) techniques to create predictive models to support users in planning and decision making detection or to improve cyber-physical systems efficiency via detection of anomalies, e.g. for predictive maintenance [1]. Such an increasing amount of data and the consequent demand to train AI systems on has generated new data management problems, especially in application fields where it is necessary to train predictive models on large quantities of data, but it is not convenient or feasible, in terms of communication efficiency or bandwidth available, to transmit them to the cloud or it is not possible to collect them, for legal, strategic or economic reasons. Recently a novel paradigm has received significant attention, Federated Learning (FL) [2, 3], as a solution to train a deep learning model from decentralized data. FL removes the need to move AI6G’22: First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks, July 21, 2022, Padua, Italy $ s.pampaloni@studenti.unipi.it (S. Pampaloni); nicola.tonellotto@unipi.it (N. Tonellotto); carlo.vallati@unipi.it (C. Vallati) 0000-0002-7427-1001 (N. Tonellotto); 0000-0002-7833-5471 (C. Vallati) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) large amount of data for training of models, thus ensuring scalability and data privacy/security. Initially proposed for data analysis on mobile devices, FL is also expected to be widely adopted on Edge computing, which extends the traditional centralized cloud computing paradigm with additional computing layers installed in proximity of users and cyber-physical systems, for instance exploiting computing devices installed at the edge of the network [4]. Edge computing systems can support the execution new applications characterized by stringent latency and privacy requirements, such as industrial applications. The joint adoption of FL and Edge computing is also expected to play a central role for the implementation of novel network management strategies thus to ensure scalability and minimal human intervention in future 5G/6G networks [5]. In this context, FL can be exploited at the edge to develop applications that could process data locally, resulting in a reduced latency for data analysis and deep learning model training. Edge systems, however, are characterized by limited computing and storage capabilities as they are implemented via embedded systems, which offer an environment quite constrained if compared with a resource rich cloud environment. In addition to this, some edge computing devices might be characterized by energy constraints, as they might be installed on energy restricted environments, e.g. edge devices installed in autonomous vehicles [6]. FL solutions deployed on edge systems will require proper configuration in order to vary the amount of data assigned to each device in order to take into account heterogeneous computing capabilities and master the latency-energy consumption tradeoff. In this paper we analyze the performance of FL when executed on different edge computing systems, characterized by heterogeneous resource capabilities. The goal is to analyze the overall performance in order to offer some guidelines on how FL applications can be configured to optimize the processing time required to train an effective predictive model on different devices, while considering the energy consumption at the same time. In our experiments the training of a model for convolutional neural network for image recognition implemented in a distributed manner using FL is considered as an example. A large set of experiments with different heterogeneous GPU-enabled embedded systems are executed to measure training latency and energy consumption of different devices. Our analysis highlights that, by varying the amount of data that each system is in charge of processing, it is possible to master a trade-off between the overall energy consumption of the devices and the processing time required to train an effective predictive model. The rest of the paper is organized as follows: in Section 2 we overview some background concepts and overview the related work, in Section 3 we present our experimentation method- ology, in Section 4 we analyze the results of our experiments, while in Section 5 we present the conclusions of this work. 2. Background & Related Works The pervasive diffusion of personal mobiles and IoT devices produces huge amounts of data. Currently, AI-based predictive analytic solutions built on such data are typically generated through centralised cloud-based services. The state-of-the-art approach for training a high quality AI model over mobile and IoT smart devices leverages the delivery of collected data to a service hosted on the Cloud, which is in charge of training the model [7]. However, these cloud-based solutions raise major concerns about the user privacy of online services. A recent approach to mitigate this issue is to move the computation where data is stored, i.e., on personal IoT devices connected to the Internet. In edge computing [8] data processing and storage capabilities are not exclusive characteristics of centralized data centers, but an additional layer, called edge, is placed in the middle between the Cloud and the IoT devices. This layer allows for storing data and executing applications on resource-constrained edge computing systems directly connected with IoT devices. Moreover, edge computing allows to preserve the confidentiality of user data. In this scenario, the recent introduction of artificial-intelligence-as- a-service (AIAAS) [9, 10, 11] tackles significant innovations across all the industrial sectors in particular in the artificial intelligence of things (AIoT) [12, 13], where the data required to train AI solutions are kept local to the device without disclosing private or sensitive data. Federated learning [2, 3] is a leading approach for training AI solutions, i.e., neural networks, in AIoT solutions [14]. In FL, the computation involved in the training of a neural network is moved closer to where data are produced and store. FL can naturally apply to the IoT-Edge- Cloud scenario, requiring the preservation of data privacy and ownership. In an FL scenario applied to edge computing, every edge computing system receives a partially trained neural network from a central cloud server, performs additional training using data provided by the respective local IoT devices to refine the neural network without disclosing any private data, and sends back the refined neural network to the Cloud server. The Cloud server collects all the locally-trained neural networks, generates a new global neural network, and broadcasts the global network back to the edge computing systems for a new round of local training. By doing so, the complex aggregation of Machine Learning (ML) models on a cloud server is decoupled from the storage of training data on edge computing systems to preserve data ownership and privacy. The adoption of FL in Edge computing environments is already proposed for many appli- cations, ranging from IoT applications to crowd-sourcing data analysis [1]. Among them, the application of FL for the management of the computing and networking infrastructure is en- visioned, e.g. to implement user management solutions, drive content caching and manage task offloading from user devices to the infrastructure [15]. Edge computing, however, is char- acterized by resource constrained systems, which are usually embedded devices with limited capabilities in terms of computing power, RAM and GPU performance, and, in some specific use-cases where the edge devices are battery powered, they might have constraints also in terms of energy. In this context, models and solutions for data and task management and allocation are crucial to ensure scalability and proper usage of scarce resources. In [16], for instance, an energy aware resource management strategy is proposed to manage the training model process in FL system, by minimizing the training loss and overall time, considering also energy constraints. This work, however, analyzes the performance of the proposed approach only via simulations. In this paper, instead, we carry out an analysis based on real experiments exploiting different heterogeneous GPU-enabled embedded systems, thus re-creating a realistic edge environment for FL execution. Our goal is to derive guidelines to master the trade-off between energy consumption and processing times, so the overall process of distributing data for training can be characterized. 3. Experimental Setup In this section we illustrate the experimental setup of our experiments, in terms of hardware exploited and software used. Then we describe the task under evaluation, together with the neural network adopted, the dataset employed and the performance metrics we measured. Finally, we report the research questions that we will investigate in our experiments, which are detailed in the following section. Hardware. For our experiments we exploited a Cloud/Edge computing platform available at the University of Pisa. Such platform implements both the cloud and edge layers and provides a realistic environment to test distributed computing solutions that adopt the Cloud/Edge computing paradigm. The platform in particular includes the following components: (i) a Cloud platform based on OpenStack, an open-source Infrastructure as a Service platform that supports the creation of virtual machines, hosted at the datacenter of the university, (ii) an Edge/Fog testbed composed of heterogeneous embedded systems with GPU support, installed in the "Cloud Computing, Big Data and Cybersecurity" laboratory. The cloud platform and the testbed are connected through a fiber link. The following embedded systems are considered for our experiments: • NVIDIA Jetson Nano: an embedded system that is equipped with a quad-core ARM microcontroller, 4GB of RAM and a 128-cire NVIDIA GPU; • NVIDIA Jetson TX2: an embedded system with a quad-core ARM microcontroller, 8GB of RAM and a 256-cire NVIDIA GPU Two boards of each type are exploited in our experiments to test different configurations, the Cloud functionalities, instead, are deployed on a virtual machine hosted on the OpenStack platform. Boards overall energy consumption is measured via a commercial smart plug, namely the MEROSS MSS210. Software. We use Flower1 [17] to implement and execute federated learning on our computa- tional resources. Flower automatically manages the distribution of the neural networks form the Cloud to the edge resources, as well as the collection of different locally-trained models on the edge resource to the Cloud, and their aggregation through the FederatedAveraging algorithm [2]. In our experiments, the number of federated learning rounds is 10. The commu- nications are synchronous, i.e., the Cloud server perform model aggregation only when both devices communicate their local models, then it dispatches the new global models to the device. Task. In our experiments we focus on an image recognition task. This task is representative of different IoT-Edge-Cloud scenario, such as intrusion detection [18] and occupational safety [19]. We exploit a convolutional neural network specifically designed for this task, called ResNet [20], and available in PyTorch2 . We use the default hyperparameter values for the CIFAR dataset, namely the cross entropy loss function, a learning rate equal to 0.001, 4 epochs per local training at each edge resource and a batch size of 32 samples. 1 https://flower.dev 2 https://pytorch.org Dataset. For the training and the evaluation of our task, we use the popular CIFAR-10 dataset [21]. CIFAR-10 contains 60,000 RGB images of 32 × 32 pixels, represnting objects in 10 different categories, i.e., 6,000 images per categories. In our experiments, we randomly split the whole dataset in a training set, with 50,000 images, and an evaluation set, with the remaining 10,000 images. The evaluation set will be used to assess the effectiveness performance of the final global model trained with FL. Metrics. For the analysis of the performance of federated learning on the edge computing scenario, we focus on the behaviour of the various system components during our experiments. To assess the effectiveness of the federated learning procedure when varying the distribution of the training set among the different resources, we use accuracy, i.e. the ratio between correct classifications over the total number of classifications, computed on the evaluation set. Our main efficiency metrics are device training time and device energy consumption. The device training time measures the mean time required by a given device to locally train the ResNet neural network at each using the 4 epochs per FL round. The device energy consumption measures the mean the energy consumed by the given device during local training. Research Question. In our experiments, we investigate the following research questions: • RQ1. How does the mean training time per round vary with the computational load on each device and on the Cloud server? • RQ2. How does the mean energy consumption per round vary with the computational load on each device and on the Cloud server? In the following section, we illustrate and discuss the experimental results w.r.t. these RQs. 4. Experimental Analysis Figure 1 reports the mean training time per round (in seconds) w.r.t. the computational load on each device and on the Cloud server. The red bars, resp. the blue bars, report the mean training times per round on the Jetson Nano, resp. on the Jetson TX. The purple line reports the average training time of the whole model on the Cloud server. Since the communication times are negligible, the total training time corresponds to the maximum time required among the different devices. The training dataset configuration that minimises the total training time corresponds to a split of the 27.5% of the dataset on the Jetson Nano, and the remaining 72.5% on the Jetson TX. In this case, the mean training times per round are almost identical on the two devices. Regarding RQ1, we conclude that, in order to minimise the total mean training time, a dataset should be distributed across the device balancing the corresponding computational power. Figures 2 reports the mean energy consumption per round (in kilojoules) w.r.t. the computa- tional load on each device and on the Cloud server. The red bars, resp. the blue bars, report the mean energy consumed per round on the Jetson Nano, resp. on the Jetson TX. The purple line reports the average energy consumed during a federated round to train the local models on all devices. The total energy consumed decreases as more training data is assigned to the Mean training time per round (s) Nano 5000 TX 4000 Total 3000 2000 1000 0 100% 70% 50% 40% 37.5% 35% 32.5% 30% 27.5% 25% 22.5% 20% 10% 0% Percentage of training set on the Nano device Figure 1: Mean training time per round, in seconds, on each device and on the Cloud server, by varying the training set splitting across the devices. Jetson TX. This is due to the higher computing performance of the Jetson TX w.r.t. the Jetson Nano. In fact, the Jetson TX is ∼2.5 times faster than the Jetson Nano (see Fig. 1), and therefore it completes the whole training in a shorter time. Mean energy consumption per round (KJ) 60 50 40 Nano 30 TX Total 20 10 0 100% 70% 50% 40% 37.5% 35% 32.5% 30% 27.5% 25% 22.5% 20% 10% 0% Percentage of training set on the Nano device Figure 2: Mean energy consumption per round, in seconds, on each device and on the Cloud server, by varying the training set splitting across the devices. Regarding RQ2, we conclude that the total energy consumption of a federated learning infrastructure depends on both the characteristics of the edge devices and the size of the datasets assigned to each of them. Overall, we conclude that there is a tradeoff between training latency and energy consumption in federated learning infrastructures, depending on the characteristics of the dataset, and its splitting, as well as the specific devices. To minimise the overall training time, a system designer should adopt a load balancing strategy taking into account both the computational power of the edge devices and the dataset available to each of them. The solution identified by this approach, however, could not minimise the overall energy consumption, which depends on the specific devices. 5. Conclusions In this work a set of experiments on a realistic edge computing environment is carried out to measure training time and energy consumption of a FL model. Two different board models were considered, one board with basic capabilities in terms of RAM and GPU and another more powerful board. Different configurations for the allocation of the training data were assessed in our experiments in order to measure both energy consumption and training time. Our experiments highlighted a tradeoff between training latency and energy consumption, which depends on the characteristics of the dataset and its partition between boards with different capabilities. A load balancing strategy could be adopted, however, the minimization of the training time might not result in the minimal energy consumption as it mainly depends on specific device characteristics. As future work, we plan to further investigate the pareto front highlighted by our results and to derive a model for both processing time and energy consumption to be used for designing a load balancing strategy. Acknowledgments Work funded by the Italian Ministry of Education and Research in the framework of the CrossLab project (Departments of Excellence), and by the University of Pisa in the framework of the PRA 2020 program (AUTENS project, Sustainable Energy Autarky). References [1] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, K. Chan, Adaptive federated learning in resource constrained edge computing systems, IEEE Journal on Selected Areas in Communications 37 (2019) 1205–1221. doi:10.1109/JSAC.2019.2904348. [2] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in: Proc. AISTATS, 2017. [3] P. Kairouz, et al., Advances and open problems in federated learning, ArXiv abs/1912.04977 (2019). [4] W. Shi, S. Dustdar, The promise of edge computing, Computer 49 (2016) 78–81. doi:10. 1109/MC.2016.145. [5] S. Niknam, H. S. Dhillon, J. H. Reed, Federated learning for wireless communications: Motivation, opportunities, and challenges, IEEE Communications Magazine 58 (2020) 46–51. doi:10.1109/MCOM.001.1900461. [6] S. Liu, L. Liu, J. Tang, B. Yu, Y. Wang, W. Shi, Edge computing for autonomous driving: Opportunities and challenges, Proceedings of the IEEE 107 (2019) 1697–1716. doi:10. 1109/JPROC.2019.2915983. [7] H. Li, K. Ota, M. Dong, Learning IoT in edge: Deep learning for the Internet of Things with edge computing, IEEE Network 32 (2018) 96–101. [8] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, G. Srivastava, A survey on security and privacy of federated learning, Future Generation Computer Systems 115 (2021) 619–640. [9] S. B. Calo, M. Touna, D. C. Verma, A. Cullen, Edge computing architecture for applying AI to IoT, in: Proc. BIG DATA, IEEE, 2017, pp. 3012–3016. [10] M. S. Munir, S. F. Abedin, C. S. Hong, Artificial Intelligence-based Service Aggregation for Mobile-Agent in Edge Computing, in: Proc. APNOMS, IEEE, 2019, pp. 1–6. [11] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, A. Sadeghi, Dïot: A federated self-learning anomaly detection system for IoT, in: Proc. ICDCS, 2019, pp. 756–767. [12] F. Samie, L. Bauer, J. Henkel, From cloud down to things: An overview of machine learning in internet of things, IEEE Internet of Things Journal 6 (2019) 4921–4934. [13] M. Mohammadi, A. Al-Fuqaha, S. Sorour, M. Guizani, Deep learning for iot big data and streaming analytics: A survey, IEEE Communications Surveys & Tutorials 20 (2018) 2923–2960. [14] J. Mills, J. Hu, G. Min, Communication-efficient federated learning for wireless edge intelligence in iot, IEEE Internet of Things Journal 7 (2019) 5986–5994. [15] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, M. Chen, In-edge ai: Intelligentizing mobile edge computing, caching and communication by federated learning, IEEE Network 33 (2019) 156–165. doi:10.1109/MNET.2019.1800286. [16] C. W. Zaw, S. R. Pandey, K. Kim, C. S. Hong, Energy-aware resource management for federated learning in multi-access edge computing systems, IEEE Access 9 (2021) 34938– 34950. doi:10.1109/ACCESS.2021.3055523. [17] D. J. Beutel, T. Topal, A. Mathur, X. Qiu, T. Parcollet, N. D. Lane, Flower: A Friendly Federated Learning Research Framework, ArXiv abs/2007.14390 (2020). [18] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, W.-Y. Lin, Intrusion detection by machine learning: A review, Expert Systems with Applications 36 (2009) 11994–12000. URL: https://www.sciencedirect. com/science/article/pii/S0957417409004801. doi:https://doi.org/10.1016/j.eswa. 2009.05.029. [19] G. Gallo, F. Di Rienzo, P. Ducange, V. Ferrari, A. Tognetti, C. Vallati, A smart system for personal protective equipment detection in industrial environments based on deep learning, in: 2021 IEEE International Conference on Smart Computing (SMARTCOMP), 2021, pp. 222–227. doi:10.1109/SMARTCOMP52413.2021.00051. [20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. [21] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Master’s thesis, Department of Computer Science, University of Toronto (2009).