On Analyzing the Household Energy Consumption Detection for Citizen Behavioral Analysis Carbon Footprint Awareness by Deep Residual Networks Arijit Ukila, Antonio J. Jarab and Leandro Marinc a TCS Research and Innovation, Tata Consultancy Services, Kolkata, India b University of Applied Sciences Western Switzerland (HES-SO) Switzerland c University of Murcia, Murcia, Spain Abstract With the exponential growth of household activities particularly due to the lock-down in COVID-19 pandemic as well as with the usual trend of amplified use of energy consuming appliances, household energy usages are becoming extremely high. Consequently, high energy consumption pattern results in severe increase of air pollution and carbon footprint. Carbon footprint is mainly caused by the greenhouse gases while burning of fossil fuels for producing different forms of energy. In order to restrict the carbon footprint, one of the approaches is to analyze the citizen behavioral pattern by detecting the household appliances. We propose deep neural network based supervised learning algorithm that is capable of classifying the household appliances from energy consumption data. More specifically, we use deep residual networks (ResNet) where learning of the residual functions makes the trained model more robust by transforming the representation learning problem to residual learning problem. Our empirical study on publicly available relevant datasets from UCR timeseries archive demonstrates significantly better and consistent performance over baseline algorithms and state-of-the-art methods. Keywords 1 Deep Learning, time series, sensor, classification, residual networks, energy data, carbon footprint, appliance detection 1. Introduction which is produced from burning fossil fuels. Hence, carbon footprint reduction is an inevitable action which is to be predominantly Global warming and adverse climatic taken up by various Governments and other change are supposed to be irreversible and associations [1]. Under the current lock-down affecting human life to a larger extent. Carbon in COVID-19 pandemic, household electricity di-oxide (CO2) is a greenhouse gas and it is one consumption has also increased to a larger of the primary reasons of global warming. CO2 extent. In order to reduce the carbon footprint emission restriction is the need of the hour and of a nation, different associations along with individual citizen has to be taken the required Government agencies attempt to understand the onus for controlled usage of appliances. appliance usage pattern from individual Household appliances like refrigerator, household. Such analysis is performed over washing machine, kitchen appliances, energy data like individual house smart meter computing device consume lots of energy, readings. The household appliance usage can be Name and year of the event: Proceedings of the CIKM 2020 Workshops, October 19-20, 2020, Galway, Ireland Editors of the proceedings (editors): Stefan Conrad, Ilaria Tiddi EMAIL: arijit.ukil@tcs.com (a); jara@ieee.org (b); leandro@um.es (c) ORCID: https://orcid.org/0000-0003-1794-6719 (a) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) linked to enable dynamic energy consumption Unlike, computer vision application, which charges as well as for inculcating the awareness often enjoys the luxury of millions of training among the citizens of their individual carbon examples, the energy consumption data with footprints. We find that different human-centric associated annotations are very tiny in number. applications are proposed for remote healthcare In fact, the collection, annotation and are proposed in literature [12, 13, 16]. In this distribution of such data is an expensive paper, our focus is to macro-level human process. Owing to the scarcity of the training benefit like carbon-footprint reduction. examples, we feel that an appropriate The technology advancements have led us to regularization technique is required to different breakthrough applications and optimally fit the network to the training developments. The challenge of detecting the examples. household appliance in a non-intrusive way Traditionally, time series supervised needs to be done by 1. A strong analytics learning baseline algorithm is the dynamic time algorithm and 2. A smart infrastructural support warping (DTW) based similarity measures with that collects the data from household smart k-nearest neighbor (kNN) based classifier meter and enables the provision for analysis and (DTW-1NN) [3, 4], which is a good feedback. In this paper the we assume that the benchmark. Symbolic Aggregate smart infrastructure facility is supported by Approximation (SAX) is a symbolic Internet of Things (IoT) backbone. In this work, representation for time series for our main focus is to develop a strong analytic dimensionality reduction [5] and sliding solution which is required for the purpose of window-based SAX with cosine similarity analyzing and detecting the household based supervised learning technique SAX- appliances from smart meter timeseries data. VSM has also provided much needed We need to keep in mind that the household momentum to time series classification energy consumption data from accessing the solutions. With the advent and success of multi- smart energy meter is sensitive in nature as it layer perceptron or MLP algorithms have been reveals the in-house activities. It is felt that studied by researchers for similar time of appropriate security and privacy infrastructure classification problems [7]. In this work, we is required to be implemented [14, 15, 17, 18, consider DTW-1NN, SAX-VSM and MLP as 19] and should be made part of the complete the relevant baseline algorithms. analytics eco-system. We present empirical evidence of the We propose deep residual network based proposed deep residual networks, tailored for supervised learning method to classify different energy data analysis through experimentations household appliances. The current trend of over publicly available UCR time series archive supervised learning by deep neural networks [7 – 8]. It is observed that our method have demonstrated success of deep residual conveniently performs better than the relevant learning, particularly in the applications of 2D baseline algorithms. (image) and 3D (video) analytics, mostly for computer vision applications. It is perceived 2. Proposed deep residual network that deep residual learning elegantly solves the menacing learning degradation problem architecture especially when the deep network architecture has good number of layers [2]. With the Deep residual network [2] provides the evidences supporting towards deep residual layer-wise recursive learning (with the basic network as a candidate architecture, we propose transformation and layer mapping process is deep residual architecture for household shown in Fig. 1) of ℋ𝑙+1 (𝑋) = ℋ𝑙 (𝑋) + appliance detection problem using energy 𝒒𝑙 (ℋ𝑙 (𝑋)), where 𝒒𝑙 is the non-linear neural signals. It is to be noted that the deep residual network (in our method, it is a convolution network is largely used in visual analytics network), ℋ𝑙 (𝑋) is the desired mapping at 𝑙th applications with 2D or 3D data. The current layer and the initial condition β„‹0 (𝑋) = problem is supervised learning over 1D time 0, 𝒒0 (β„‹0 (𝑋)) = 𝑋, 𝑋 is the input time series, series. In this paper, we further use which is defined as: 𝑋 = [𝓍1 , 𝓍2 , 𝓍3 , … , 𝓍𝒯 ] be regularization of the network parameters the univariate time series, where 𝑋 ∈ ℝ 𝒯 and 𝑋 (weights) such that the deep neural network is a time series energy consumption signal. does not overfit with the training datasets. Training data π’Ÿ consists of consists of 𝑁 residual blocks contain number of batch number of time series signals each of length 𝒯 normalization layers along with convolution and each of the training instances has layers followed by Rectified Linear Unit corresponding class label 𝐿𝑛 ∈ [1, 𝐢], 𝐢 ∈ β„€ (ReLU) activation function. Finally, a fully and 𝕃 = [𝐿1 , 𝐿2 , 𝐿3 , … 𝐿𝑁 ], 𝑛 = 1, 2, 3 … , 𝑁. connected dense layer is placed. The final Thus the complete training dataset is collection discrimination layer is the softmax function that of pairs (𝑋 𝑛 , 𝐿𝑛 ), where 𝕏= predicts the output label 𝐿̂. The predicted label 1 2 3 𝑁 [𝑋 , 𝑋 , 𝑋 , … 𝑋 ], 𝑋 ∈ ℝ , 𝐿𝑛 𝒯 𝑛 is the 𝐿̂. and actual class label 𝐿 are compared by a corresponding class labels, π’Ÿ = [𝕏, 𝕃]. The loss function (cross-entropy). In this case, we learning algorithm constructs a function minimize the cost function 𝐽 over the training 𝐹: ℝ 𝒯 β†’ {1,2, … , 𝐢}. The learning algorithm examples 𝕏 consisting of 𝑁 number instances, requires the (training) dataset π’Ÿ and generates while the is formed by using the stochastic trained model 𝑀. The learning algorithm is gradient descent algorithm. Given the further a function of regularization factors πœ“ possibility of insufficiency in the number of and functions Ξ₯ along with a collection of training examples, there exists a perpetual necessary hyperparemeters Θ for constructing possibility of constructing an over-complex the trained model 𝑀 and trained model is model with very high number of network 𝐹:𝑓(π’Ÿ,πœ“,Ξ₯,Θ ) parameters, in terms of weight parameters πœ”, generated as: 𝑀 β†’ 𝐿̂ which is likely to be overfitted on the training 𝐿̂ ∈ [1, 𝐢] is the predicted inference out. distribution without attempting to approximate the source data generation function or the target function. In our earlier work [10], we have proposed strongly regularized convolution neural network SRDCNN for time series classification tasks, which shows the positive impact of regularized learning. Similarly, in this work, we control the deep network parameters by regularization techniques [11]. The proposed deep neural model minimizes the cost function 𝐽 to find a regularized cost function 𝐽̂. The regularized cost function is Figure 1: Basic transformation and layer denoted as: mapping in deep residual network model. 𝐽̂(πœ”; 𝕏, 𝕃) = 𝐽(πœ”; 𝕏, 𝕃) + 𝛼Ω(πœ”) (1) Where Ξ© is the regularization function and The individual layers in residual networks the regularization factor is 𝛼 ∈ [0, ∞]. modify the learnt representation from the previous layers to counter the vanishing In this paper, we use the network parameter gradient problem [9]. We further find that (πœ”) norm penalties as expressed above ℋ𝑙+1 (𝑋) is an additive outcome unlike the particularly 𝐿2 and 𝐿1 regularizations [11]. conventional deep neural network where We incorporate 𝐿2 regularization transfer function is multiplicative. The (Tikhonov regularization) as: underlaying mapping at 𝑙th layer ℋ𝑙 (𝑋) and πœ”Ξ€ πœ” 𝐽̂(πœ”; 𝕏, 𝕃) = 𝐽(πœ”; 𝕏, 𝕃) + 𝛼 2 (2) casts to ℋ𝑙 (𝑋) + 𝒒𝑙 (ℋ𝑙 (𝑋)) with Where, the network parameter gradient is: 𝒒𝑙 (ℋ𝑙 (𝑋)) = ℋ𝑙+1 (𝑋) βˆ’ ℋ𝑙 (𝑋) being the βˆ‡πœ” 𝐽̂(πœ”; 𝕏, 𝕃) = π›Όπœ” + βˆ‡πœ” 𝐽(πœ”; 𝕏, 𝕃) (3) residual function. It is hypothesized that [2] the optimization of the residual mapping is easier Subsequently, the weights are updated as: than the optimization of the unreferenced raw πœ” ← (1 βˆ’ πœ–π›Ό)πœ” + π›Όβˆ‡πœ” 𝐽(πœ”; 𝕏, 𝕃) (4) mapping. Owing to the justification made by the authors and the supported evidences of Where, πœ– is the learning rate. From superior performance of ResNet, we consider equation (4), it is noted that the weight decay that such type of deep residual network is a term (1 βˆ’ πœ–π›Ό) that controls the overall weight prudent deep neural architecture choice. Our vector. Similarly, the Lasso or 𝐿1 regularization deep neural architecture is shown in Fig. 2. It is defined as [11]: consists of three residual blocks, each of the 𝐽̂(πœ”; 𝕏, 𝕃) = 𝐽(πœ”; 𝕏, 𝕃) + π›Όβ€–πœ”β€–1 (5) we attempt to solve, the regularization factor The following network parameter gradient settings play an important role for constructing becomes: an effectively learned model. Accordingly, we βˆ‡πœ” 𝐽̂(πœ”; 𝕏, 𝕃) = 𝛼sign(πœ”) + set the 𝐿1 regularization factor hyperparameter βˆ‡πœ” 𝐽(πœ”; 𝕏, 𝕃) (6) (𝛼1 ) to be lower than that of 𝐿2 regularization From equation (3) and equation (6), we are factor hyperparameter (𝛼2 ) with the intent of able to note that 𝐿2 and 𝐿1 regularizations having lesser sparser weight vectors while the impact the network parameters differently. weight vector values are clipped or controlled. While, Lasso or 𝐿1 regularization attempts to generate sparser weight matrix, Tikhonov or 𝐿2 Table 1 regularization clips or controls the network Hyperparameter description weight (πœ”) values. Parameter Brief explanation Value/ Type Epoch Number of times 1000 the entire training dataset is iterated Optimizer Adaptive learning Adam rate optimization Batch size Number of 𝒯 π‘šπ‘–π‘› (| | , 16) 10 training samples where 𝒯 is in one pass the number of sample points at each instant Number of Total number of 3 residual residual layers blocks Number of Residual block #1 3 convolution Residual block #2 5 layers at Residual block #3 3 each residual block Kernel size Residual block #1 {8, 5, 3} Residual block #2 {8, 7, 6, 5, 4, 3} Residual block #3 {8, 5, 3} Number of Residual block #1 {64, 64, 64} filters Residual block #2 {128, 128, 128, 128, Figure 2. Deep residual network architecture 128} for energy consumption data analytics to Residual block #3 {64, 64, 64} detect household appliances. It consists of 𝛼1 𝐿1 regularization 0.01 factor three consecutive residual blocks along with 𝛼2 𝐿2 regularization 0.10 other required layers. factor The hyperparameter set is described in Table 1. One of the noticeable observations is 3. Experimental Analysis and that the network is thinner at the initial and final Results residual blocks with three convolution layers with number of feature maps of 64 at each layer, We consider UCR time series archive with while the middle residual block is deeper with representative datasets which are aligned to the five convolution layers with number of feature problem statement. The dataset description is maps of 128 at each of the layers. From the made in Table 2. There are five different types understanding of the machine learning problem of energy consumption dataset are used for the demonstrated substantially better efficacy than experimentation purposes. Each of the datasets other state-of-the-art [10]. In comparison with consists of separate training and testing parts. SRDCNN, we observe that our method works Our model is first trained over the training better in 80% of the total number of datasets. dataset and the trained model is tested on the One of the major differences with SRDCNN is given testing dataset. We report the test the architecture of the deep neural network: accuracies. The experimental datasets represent SRDCNN is a convolution neural network different types of appliances like kitchen architecture and ours is deep residual network. appliances, computing devices and others. UK The performance table (Table 3) clearly Government's initiative called 'Powering the indicates that the proposed method provides Nation', where the behavioral analysis about the better learning and inferencing capability of usage of electricity by the citizen is used to energy consumption data to detect the make an attempt to reduce the carbon footprint. household appliances. The number of classes also vary among different datasets. With diverse types of Table 3 appliance detection problem that these datasets Performance in terms of test accuracy metric (Table 2) represent, we can fairly justify that the of our proposed method and related state-of- experimental evaluation covers large problem the-art time series classification algorithms areas of detection of appliances from energy consumption data. DTW- SAX- Our Sensor MLP R1- SRDCN Table 2 VSM met name [7] 1NN N [10] [6] hod Energy data (time series) from UCR archive [3] properties Compute 0.496 0.70 0.620 0.781 0.788 rs Electric Dataset Number Number Number of 0.641 0.602 0.705 0.707 0.723 devices of of training testing Italy classes instances instances power 0.946 0.950 0.816 0.955 0.964 2 250 250 demand Computers Large Electric devices 7 8926 7711 kitchen 0.480 0.795 0.877 0.852 0.907 applianc Italy power 2 67 1029 es demand Small 3 375 375 kitchen Large kitchen 0.333 0.653 0.579 0.795 0.709 applianc appliances es Small kitchen 3 375 375 Total 0 0 0 1 4 appliances Count In Table 3, we depict the experimental 4. Conclusion results of our proposed method. The test accuracy of our method has significantly higher Carbon footprint reduction is one of the performance merit over the baseline algorithms most important problems for creating like MLP [7], DTW-R1-1NN [3] and SAX- awareness drive to understand the carbon VSM [6]. In fact, out of the total five different footprint of individual households to achieve use cases, our method outperforms rest of the the goal of manifold reduction of overall carbon baseline algorithms. In a relative merit, DTW- footprint. In that regard, we propose an analytic R1-1NN and SAX-VSM are the closer solution to detect the appliances at the competitors. With this supporting empirical households using energy consumption data, evidence, we claim that our proposed deep which is available from the smart energy meter residual network-based model is an apt choice recording. We propose a robust detection for energy data analysis to detect the household algorithm by using deep residual network along appliances. We further consider SRDCNN as with regularization. Our proposed method has another state-of-the-art algorithm, which has shown considerably better test accuracy over the baseline algorithms for various appliance Experimental Evaluation of Recent detection tasks. This proposed method is part of Algorithmic Advances,” DMKD, 2017. the larger eco-system that attempts to build a [9] Y. Shen, J. Gao. β€œRefine or Represent: convergent human-centric application for the Residual networks with explicit channel- betterment of all of us. We hope that our wise configuration,” IJCAI, 2018. analytics method provides the required impetus [10] A. Ukil, A. Jara, L. Marin, β€œSRDCNN: for such human-centered purposes and the Strongly Regularized Deep Convolution global warming concerns can be addressed Neural Network Architecture for Time- through citizen-level awareness. series Sensor Signal Classification Tasks,” arXiv:2007.06909, 2020. 5. Acknowledgements [11] I. Goodfellow, Y. Bengio, and A. Courville, β€œDeep Learning,” The MIT Press, 2016. Leandro Marin is partially supported by [12] C. Puri, A. Ukil, S. Bandyopadhyay, R. Research Project TIN2017-86885-R from the Singh, A. Pal, and K. Mandana, β€œiCarMa: Spanish Ministery of Economy, Industry and Inexpensive Cardiac Arrhythmia Competitivity and Feder (European Union). Management--An IoT Healthcare Antonio J. Jara is funded by the European Analytics Solution,” IoT-Enabled Union’s Horizon 2020 research and innovation Healthcare and Wellness Technologies programme under grant agreement No 732679, and Systems Workshop, 2016. ACTIVAGE project [13] A. Ukil, S. Bandyopadhyay, C. Puri, and https://www.activageproject.eu/. Arpan Pal, β€œHeart-trend: An Affordable Heart Condition Monitoring System 6. References Exploiting Morphological Pattern,” IEEE ICASSP, pp. 6260-6264, 2016. [1] R. Aicheleand G. Felbermayr, β€œKyoto And [14] A. Ukil, β€œPrivacy Preserving Data the Carbon Footprint of Nations,” Journal Aggregation in Wireless Sensor of Environmental Economics and Networks,” IEEE International Management 63, no. 3 pp. 336-354, 2012. Conference on Wireless and Mobile [2] K. He, X. Zhang, S. Ren and J. Sun, β€œDeep Communications, pp. 435-440, 2010. Residual Learning for Image [15] A. Ukil, β€œSecurity and Privacy in Wireless Recognition,” CVPR, 2016. Sensor Networks,” INTECH Open Access [3] E. Keogh and C. A. Ratanamahatana, Publisher, pp. 395- 418, 2010. β€œExact Indexing of Dynamic Time Warping,” [16] A. Ukil, A. J. Jara, and L. Marin, β€œData- Knowledge and Information Systems, vol. Driven Automated Cardiac Health 7, no. 3, pp. 358–386, 2005. Management with Robust Edge Analytics [4] R.J. Kate, β€œUsing dynamic time warping and De-Risking,” Sensors, June, 2019. distances as features for improved time [17] A. Ukil, β€œSecure Trust Management In series classification,” Data Min Knowl Distributed Computing Systems,” IEEE Disc, pp. 283–312, 2016. International Symposium on Electronic [5] E. Keogh, J. Lin and A. Fu β€œHOT SAX: Design, Test and Application, 2011. Efficiently Finding the Most Unusual [18] T. Bose, S. Bandyopadhyay, A. Ukil, A. Time Series Subsequence,” IEEE ICDM, Bhattacharyya and A. Pal, β€œWhy Not Keep pp. 226 - 233, 2005. Your Personal Data Secure Yet Private In [6] P. Senin, and S. Malinchik, β€œSAX-VSM: IoT?: Our Lightweight Approach,” IEEE Interpretable Time Series Classification ISSNIP, 2015. Using SAX and Vector Space Model,” [19] A. Bhattacharyya, T. Bose, S. IEEE ICDM, pp.1175,1180, 2013. Bandyopadhyay, A. Ukil, and A. Pal, [7] A. Bagnall, J. Lines, W. Vickers and E. β€œLESS: Lightweight Establishment of Keogh, β€œThe UEA & UCR Time Series Secure Session: A Cross-Layer Approach Classification Repository,” Using CoAP and DTLS-PSK Channel www.timeseriesclassification.com Encryption," IEEE International [8] A. Bagnall, J. Lines, A. Bostrom, J. Large Conference on Advanced Information and E. Keogh, β€œThe Great Time Series Networking and Applications Workshops, Classification Bake Off: A Review and pp. 682-687, 2015.