On Analyzing the Household Energy Consumption Detection for
Citizen Behavioral Analysis Carbon Footprint Awareness by Deep
Residual Networks
Arijit Ukila, Antonio J. Jarab and Leandro Marinc
a
  TCS Research and Innovation, Tata Consultancy Services, Kolkata, India
b
  University of Applied Sciences Western Switzerland (HES-SO) Switzerland
c
  University of Murcia, Murcia, Spain


                   Abstract
                   With the exponential growth of household activities particularly due to the lock-down in
                   COVID-19 pandemic as well as with the usual trend of amplified use of energy consuming
                   appliances, household energy usages are becoming extremely high. Consequently, high energy
                   consumption pattern results in severe increase of air pollution and carbon footprint. Carbon
                   footprint is mainly caused by the greenhouse gases while burning of fossil fuels for producing
                   different forms of energy. In order to restrict the carbon footprint, one of the approaches is to
                   analyze the citizen behavioral pattern by detecting the household appliances. We propose deep
                   neural network based supervised learning algorithm that is capable of classifying the household
                   appliances from energy consumption data. More specifically, we use deep residual networks
                   (ResNet) where learning of the residual functions makes the trained model more robust by
                   transforming the representation learning problem to residual learning problem. Our empirical
                   study on publicly available relevant datasets from UCR timeseries archive demonstrates
                   significantly better and consistent performance over baseline algorithms and state-of-the-art
                   methods.

                   Keywords 1
                   Deep Learning, time series, sensor, classification, residual networks, energy data, carbon
                   footprint, appliance detection


1. Introduction                                                                                which is produced from burning fossil fuels.
                                                                                               Hence, carbon footprint reduction is an
                                                                                               inevitable action which is to be predominantly
    Global warming and adverse climatic
                                                                                               taken up by various Governments and other
change are supposed to be irreversible and
                                                                                               associations [1]. Under the current lock-down
affecting human life to a larger extent. Carbon
                                                                                               in COVID-19 pandemic, household electricity
di-oxide (CO2) is a greenhouse gas and it is one
                                                                                               consumption has also increased to a larger
of the primary reasons of global warming. CO2
                                                                                               extent. In order to reduce the carbon footprint
emission restriction is the need of the hour and
                                                                                               of a nation, different associations along with
individual citizen has to be taken the required
                                                                                               Government agencies attempt to understand the
onus for controlled usage of appliances.
                                                                                               appliance usage pattern from individual
Household appliances like refrigerator,
                                                                                               household. Such analysis is performed over
washing machine, kitchen appliances,
                                                                                               energy data like individual house smart meter
computing device consume lots of energy,
                                                                                               readings. The household appliance usage can be

Name and year of the event: Proceedings of the CIKM 2020
Workshops, October 19-20, 2020, Galway, Ireland
Editors of the proceedings (editors): Stefan Conrad, Ilaria Tiddi
EMAIL:       arijit.ukil@tcs.com      (a);   jara@ieee.org      (b);
leandro@um.es (c)
ORCID: https://orcid.org/0000-0003-1794-6719 (a)
               ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative
               Commons License Attribution 4.0 International (CC BY 4.0).

               CEUR Workshop Proceedings (CEUR-WS.org)
linked to enable dynamic energy consumption         Unlike, computer vision application, which
charges as well as for inculcating the awareness    often enjoys the luxury of millions of training
among the citizens of their individual carbon       examples, the energy consumption data with
footprints. We find that different human-centric    associated annotations are very tiny in number.
applications are proposed for remote healthcare     In fact, the collection, annotation and
are proposed in literature [12, 13, 16]. In this    distribution of such data is an expensive
paper, our focus is to macro-level human            process. Owing to the scarcity of the training
benefit like carbon-footprint reduction.            examples, we feel that an appropriate
    The technology advancements have led us to      regularization technique is required to
different breakthrough applications and             optimally fit the network to the training
developments. The challenge of detecting the        examples.
household appliance in a non-intrusive way             Traditionally, time series supervised
needs to be done by 1. A strong analytics           learning baseline algorithm is the dynamic time
algorithm and 2. A smart infrastructural support    warping (DTW) based similarity measures with
that collects the data from household smart         k-nearest neighbor (kNN) based classifier
meter and enables the provision for analysis and    (DTW-1NN) [3, 4], which is a good
feedback. In this paper the we assume that the      benchmark.          Symbolic          Aggregate
smart infrastructure facility is supported by       Approximation (SAX) is a symbolic
Internet of Things (IoT) backbone. In this work,    representation     for    time      series   for
our main focus is to develop a strong analytic      dimensionality reduction [5] and sliding
solution which is required for the purpose of       window-based SAX with cosine similarity
analyzing and detecting the household               based supervised learning technique SAX-
appliances from smart meter timeseries data.        VSM has also provided much needed
We need to keep in mind that the household          momentum to time series classification
energy consumption data from accessing the          solutions. With the advent and success of multi-
smart energy meter is sensitive in nature as it     layer perceptron or MLP algorithms have been
reveals the in-house activities. It is felt that    studied by researchers for similar time of
appropriate security and privacy infrastructure     classification problems [7]. In this work, we
is required to be implemented [14, 15, 17, 18,      consider DTW-1NN, SAX-VSM and MLP as
19] and should be made part of the complete         the relevant baseline algorithms.
analytics eco-system.                                  We present empirical evidence of the
    We propose deep residual network based          proposed deep residual networks, tailored for
supervised learning method to classify different    energy data analysis through experimentations
household appliances. The current trend of          over publicly available UCR time series archive
supervised learning by deep neural networks         [7 – 8]. It is observed that our method
have demonstrated success of deep residual          conveniently performs better than the relevant
learning, particularly in the applications of 2D    baseline algorithms.
(image) and 3D (video) analytics, mostly for
computer vision applications. It is perceived       2. Proposed deep residual network
that deep residual learning elegantly solves the
menacing learning degradation problem                  architecture
especially when the deep network architecture
has good number of layers [2]. With the                  Deep residual network [2] provides the
evidences supporting towards deep residual          layer-wise recursive learning (with the basic
network as a candidate architecture, we propose     transformation and layer mapping process is
deep residual architecture for household            shown in Fig. 1) of ℋ𝑙+1 (𝑋) = ℋ𝑙 (𝑋) +
appliance detection problem using energy             𝒢𝑙 (ℋ𝑙 (𝑋)), where 𝒢𝑙 is the non-linear neural
signals. It is to be noted that the deep residual   network (in our method, it is a convolution
network is largely used in visual analytics         network), ℋ𝑙 (𝑋) is the desired mapping at 𝑙th
applications with 2D or 3D data. The current        layer and the initial condition ℋ0 (𝑋) =
problem is supervised learning over 1D time         0, 𝒢0 (ℋ0 (𝑋)) = 𝑋, 𝑋 is the input time series,
series. In this paper, we further use               which is defined as: 𝑋 = [𝓍1 , 𝓍2 , 𝓍3 , … , 𝓍𝒯 ] be
regularization of the network parameters            the univariate time series, where 𝑋 ∈ ℝ 𝒯 and 𝑋
(weights) such that the deep neural network         is a time series energy consumption signal.
does not overfit with the training datasets.
Training data 𝒟 consists of consists of 𝑁                 residual blocks contain number of batch
number of time series signals each of length 𝒯            normalization layers along with convolution
and each of the training instances has                    layers followed by Rectified Linear Unit
corresponding class label 𝐿𝑛 ∈ [1, 𝐶], 𝐶 ∈ ℤ              (ReLU) activation function. Finally, a fully
and      𝕃 = [𝐿1 , 𝐿2 , 𝐿3 , … 𝐿𝑁 ], 𝑛 = 1, 2, 3 … , 𝑁.   connected dense layer is placed. The final
Thus the complete training dataset is collection          discrimination layer is the softmax function that
of      pairs       (𝑋 𝑛 , 𝐿𝑛 ),       where      𝕏=      predicts the output label 𝐿̂. The predicted label
   1   2   3      𝑁
[𝑋 , 𝑋 , 𝑋 , … 𝑋 ], 𝑋 ∈ ℝ , 𝐿𝑛       𝒯    𝑛
                                               is the     𝐿̂. and actual class label 𝐿 are compared by a
corresponding class labels, 𝒟 = [𝕏, 𝕃]. The               loss function (cross-entropy). In this case, we
learning algorithm constructs a function                  minimize the cost function 𝐽 over the training
𝐹: ℝ 𝒯 → {1,2, … , 𝐶}. The learning algorithm             examples 𝕏 consisting of 𝑁 number instances,
requires the (training) dataset 𝒟 and generates           while the is formed by using the stochastic
trained model 𝑀. The learning algorithm is                gradient descent algorithm. Given the
further a function of regularization factors 𝜓            possibility of insufficiency in the number of
and functions Υ along with a collection of                training examples, there exists a perpetual
necessary hyperparemeters Θ for constructing              possibility of constructing an over-complex
the trained model 𝑀 and trained model is                  model with very high number of network
                   𝐹:𝑓(𝒟,𝜓,Υ,Θ )                          parameters, in terms of weight parameters 𝜔,
generated as: 𝑀 →                  𝐿̂                     which is likely to be overfitted on the training
𝐿̂ ∈ [1, 𝐶] is the predicted inference out.               distribution without attempting to approximate
                                                          the source data generation function or the target
                                                          function. In our earlier work [10], we have
                                                          proposed strongly regularized convolution
                                                          neural network SRDCNN for time series
                                                          classification tasks, which shows the positive
                                                          impact of regularized learning. Similarly, in
                                                          this work, we control the deep network
                                                          parameters by regularization techniques [11].
                                                          The proposed deep neural model minimizes the
                                                          cost function 𝐽 to find a regularized cost
                                                          function 𝐽̂. The regularized cost function is
Figure 1: Basic transformation and layer                  denoted as:
mapping in deep residual network model.                        𝐽̂(𝜔; 𝕏, 𝕃) = 𝐽(𝜔; 𝕏, 𝕃) + 𝛼Ω(𝜔) (1)
                                                               Where Ω is the regularization function and
    The individual layers in residual networks            the regularization factor is 𝛼 ∈ [0, ∞].
modify the learnt representation from the
previous layers to counter the vanishing                      In this paper, we use the network parameter
gradient problem [9]. We further find that                (𝜔) norm penalties as expressed above
ℋ𝑙+1 (𝑋) is an additive outcome unlike the                particularly 𝐿2 and 𝐿1 regularizations [11].
conventional deep neural network where                        We        incorporate      𝐿2 regularization
transfer function is multiplicative. The                  (Tikhonov regularization) as:
underlaying mapping at 𝑙th layer ℋ𝑙 (𝑋) and                                                  𝜔Τ 𝜔
                                                              𝐽̂(𝜔; 𝕏, 𝕃) = 𝐽(𝜔; 𝕏, 𝕃) + 𝛼    2
                                                                                                    (2)
casts     to    ℋ𝑙 (𝑋) + 𝒢𝑙 (ℋ𝑙 (𝑋))         with             Where, the network parameter gradient is:
𝒢𝑙 (ℋ𝑙 (𝑋)) = ℋ𝑙+1 (𝑋) − ℋ𝑙 (𝑋) being          the            ∇𝜔 𝐽̂(𝜔; 𝕏, 𝕃) = 𝛼𝜔 + ∇𝜔 𝐽(𝜔; 𝕏, 𝕃) (3)
residual function. It is hypothesized that [2] the
optimization of the residual mapping is easier                Subsequently, the weights are updated as:
than the optimization of the unreferenced raw                 𝜔 ← (1 − 𝜖𝛼)𝜔 + 𝛼∇𝜔 𝐽(𝜔; 𝕏, 𝕃) (4)
mapping. Owing to the justification made by
the authors and the supported evidences of                    Where, 𝜖 is the learning rate. From
superior performance of ResNet, we consider               equation (4), it is noted that the weight decay
that such type of deep residual network is a              term (1 − 𝜖𝛼) that controls the overall weight
prudent deep neural architecture choice. Our              vector. Similarly, the Lasso or 𝐿1 regularization
deep neural architecture is shown in Fig. 2. It           is defined as [11]:
consists of three residual blocks, each of the
    𝐽̂(𝜔; 𝕏, 𝕃) = 𝐽(𝜔; 𝕏, 𝕃) + 𝛼‖𝜔‖1 (5)               we attempt to solve, the regularization factor
    The following network parameter gradient           settings play an important role for constructing
becomes:                                               an effectively learned model. Accordingly, we
    ∇𝜔 𝐽̂(𝜔; 𝕏, 𝕃) = 𝛼sign(𝜔) +                        set the 𝐿1 regularization factor hyperparameter
     ∇𝜔 𝐽(𝜔; 𝕏, 𝕃)                     (6)             (𝛼1 ) to be lower than that of 𝐿2 regularization
   From equation (3) and equation (6), we are          factor hyperparameter (𝛼2 ) with the intent of
able to note that 𝐿2 and 𝐿1 regularizations            having lesser sparser weight vectors while the
impact the network parameters differently.             weight vector values are clipped or controlled.
While, Lasso or 𝐿1 regularization attempts to
generate sparser weight matrix, Tikhonov or 𝐿2         Table 1
regularization clips or controls the network           Hyperparameter description
weight (𝜔) values.
                                                        Parameter      Brief explanation    Value/ Type
                                                          Epoch        Number of times         1000
                                                                      the entire training
                                                                      dataset is iterated
                                                        Optimizer     Adaptive learning        Adam
                                                                       rate optimization
                                                        Batch size         Number of               𝒯
                                                                                            𝑚𝑖𝑛 (| | , 16)
                                                                                                   10
                                                                        training samples
                                                                                             where 𝒯 is
                                                                           in one pass
                                                                                            the number
                                                                                             of sample
                                                                                              points at
                                                                                            each instant
                                                        Number of      Total number of             3
                                                         residual       residual layers
                                                          blocks
                                                        Number of     Residual block #1             3
                                                       convolution    Residual block #2             5
                                                         layers at
                                                                      Residual block #3             3
                                                           each
                                                         residual
                                                           block
                                                        Kernel size   Residual block #1        {8, 5, 3}
                                                                      Residual block #2     {8, 7, 6, 5, 4,
                                                                                                  3}
                                                                      Residual block #3        {8, 5, 3}
                                                        Number of     Residual block #1      {64, 64, 64}
                                                          filters     Residual block #2       {128, 128,
                                                                                              128, 128,
Figure 2. Deep residual network architecture                                                     128}
for energy consumption data analytics to                              Residual block #3      {64, 64, 64}
detect household appliances. It consists of                 𝛼1        𝐿1 regularization          0.01
                                                                           factor
three consecutive residual blocks along with
                                                            𝛼2        𝐿2 regularization         0.10
other required layers.                                                     factor

   The hyperparameter set is described in
Table 1. One of the noticeable observations is         3. Experimental             Analysis             and
that the network is thinner at the initial and final      Results
residual blocks with three convolution layers
with number of feature maps of 64 at each layer,          We consider UCR time series archive with
while the middle residual block is deeper with         representative datasets which are aligned to the
five convolution layers with number of feature         problem statement. The dataset description is
maps of 128 at each of the layers. From the            made in Table 2. There are five different types
understanding of the machine learning problem
  of energy consumption dataset are used for the        demonstrated substantially better efficacy than
  experimentation purposes. Each of the datasets        other state-of-the-art [10]. In comparison with
  consists of separate training and testing parts.      SRDCNN, we observe that our method works
  Our model is first trained over the training          better in 80% of the total number of datasets.
  dataset and the trained model is tested on the        One of the major differences with SRDCNN is
  given testing dataset. We report the test             the architecture of the deep neural network:
  accuracies. The experimental datasets represent       SRDCNN is a convolution neural network
  different types of appliances like kitchen            architecture and ours is deep residual network.
  appliances, computing devices and others. UK              The performance table (Table 3) clearly
  Government's initiative called 'Powering the          indicates that the proposed method provides
  Nation', where the behavioral analysis about the      better learning and inferencing capability of
  usage of electricity by the citizen is used to        energy consumption data to detect the
  make an attempt to reduce the carbon footprint.       household appliances.
  The number of classes also vary among
  different datasets. With diverse types of             Table 3
  appliance detection problem that these datasets       Performance in terms of test accuracy metric
  (Table 2) represent, we can fairly justify that the   of our proposed method and related state-of-
  experimental evaluation covers large problem          the-art time series classification algorithms
  areas of detection of appliances from energy
  consumption data.                                                         DTW-
                                                                                    SAX-             Our
                                                        Sensor      MLP      R1-            SRDCN
  Table 2                                                                           VSM              met
                                                        name        [7]     1NN             N [10]
                                                                                     [6]             hod
  Energy data (time series) from UCR archive                                 [3]
  properties                                            Compute
                                                                    0.496   0.70    0.620   0.781    0.788
                                                             rs
                                                         Electric
   Dataset         Number     Number       Number of                0.641   0.602   0.705   0.707    0.723
                                                         devices
                      of     of training     testing
                                                           Italy
                   classes   instances     instances     power      0.946   0.950   0.816   0.955    0.964
                      2          250           250      demand
  Computers                                               Large
Electric devices      7         8926          7711       kitchen
                                                                    0.480   0.795   0.877   0.852    0.907
                                                        applianc
  Italy power         2          67           1029
                                                             es
    demand                                                Small
                      3          375           375       kitchen
 Large kitchen                                                      0.333   0.653   0.579   0.795    0.709
                                                        applianc
  appliances
                                                             es
 Small kitchen        3          375           375        Total
                                                                       0       0       0        1       4
  appliances                                              Count


       In Table 3, we depict the experimental           4. Conclusion
  results of our proposed method. The test
  accuracy of our method has significantly higher          Carbon footprint reduction is one of the
  performance merit over the baseline algorithms        most important problems for creating
  like MLP [7], DTW-R1-1NN [3] and SAX-                 awareness drive to understand the carbon
  VSM [6]. In fact, out of the total five different     footprint of individual households to achieve
  use cases, our method outperforms rest of the         the goal of manifold reduction of overall carbon
  baseline algorithms. In a relative merit, DTW-        footprint. In that regard, we propose an analytic
  R1-1NN and SAX-VSM are the closer                     solution to detect the appliances at the
  competitors. With this supporting empirical           households using energy consumption data,
  evidence, we claim that our proposed deep             which is available from the smart energy meter
  residual network-based model is an apt choice         recording. We propose a robust detection
  for energy data analysis to detect the household      algorithm by using deep residual network along
  appliances. We further consider SRDCNN as             with regularization. Our proposed method has
  another state-of-the-art algorithm, which has         shown considerably better test accuracy over
the baseline algorithms for various appliance           Experimental Evaluation of Recent
detection tasks. This proposed method is part of        Algorithmic Advances,” DMKD, 2017.
the larger eco-system that attempts to build a     [9] Y. Shen, J. Gao. “Refine or Represent:
convergent human-centric application for the            Residual networks with explicit channel-
betterment of all of us. We hope that our               wise configuration,” IJCAI, 2018.
analytics method provides the required impetus     [10] A. Ukil, A. Jara, L. Marin, “SRDCNN:
for such human-centered purposes and the                Strongly Regularized Deep Convolution
global warming concerns can be addressed                Neural Network Architecture for Time-
through citizen-level awareness.                        series Sensor Signal Classification Tasks,”
                                                        arXiv:2007.06909, 2020.
5. Acknowledgements                                [11] I. Goodfellow, Y. Bengio, and A.
                                                        Courville, “Deep Learning,” The MIT
                                                        Press, 2016.
   Leandro Marin is partially supported by         [12] C. Puri, A. Ukil, S. Bandyopadhyay, R.
Research Project TIN2017-86885-R from the               Singh, A. Pal, and K. Mandana, “iCarMa:
Spanish Ministery of Economy, Industry and              Inexpensive       Cardiac       Arrhythmia
Competitivity and Feder (European Union).
                                                        Management--An         IoT       Healthcare
   Antonio J. Jara is funded by the European
                                                        Analytics      Solution,”      IoT-Enabled
Union’s Horizon 2020 research and innovation
                                                        Healthcare and Wellness Technologies
programme under grant agreement No 732679,              and Systems Workshop, 2016.
ACTIVAGE                              project      [13] A. Ukil, S. Bandyopadhyay, C. Puri, and
https://www.activageproject.eu/.                        Arpan Pal, “Heart-trend: An Affordable
                                                        Heart Condition Monitoring System
6. References                                           Exploiting Morphological Pattern,” IEEE
                                                        ICASSP, pp. 6260-6264, 2016.
[1] R. Aicheleand G. Felbermayr, “Kyoto And        [14] A. Ukil, “Privacy Preserving Data
    the Carbon Footprint of Nations,” Journal           Aggregation      in    Wireless      Sensor
    of Environmental Economics and                      Networks,”        IEEE        International
    Management 63, no. 3 pp. 336-354, 2012.             Conference on Wireless and Mobile
[2] K. He, X. Zhang, S. Ren and J. Sun, “Deep           Communications, pp. 435-440, 2010.
    Residual       Learning     for     Image      [15] A. Ukil, “Security and Privacy in Wireless
    Recognition,” CVPR, 2016.                           Sensor Networks,” INTECH Open Access
[3] E. Keogh and C. A. Ratanamahatana,                  Publisher, pp. 395- 418, 2010.
    “Exact Indexing of Dynamic Time Warping,”      [16] A. Ukil, A. J. Jara, and L. Marin, “Data-
    Knowledge and Information Systems, vol.             Driven Automated Cardiac Health
    7, no. 3, pp. 358–386, 2005.                        Management with Robust Edge Analytics
[4] R.J. Kate, “Using dynamic time warping              and De-Risking,” Sensors, June, 2019.
    distances as features for improved time        [17] A. Ukil, “Secure Trust Management In
    series classification,” Data Min Knowl              Distributed Computing Systems,” IEEE
    Disc, pp. 283–312, 2016.                            International Symposium on Electronic
[5] E. Keogh, J. Lin and A. Fu “HOT SAX:                Design, Test and Application, 2011.
    Efficiently Finding the Most Unusual           [18] T. Bose, S. Bandyopadhyay, A. Ukil, A.
    Time Series Subsequence,” IEEE ICDM,                Bhattacharyya and A. Pal, “Why Not Keep
    pp. 226 - 233, 2005.                                Your Personal Data Secure Yet Private In
[6] P. Senin, and S. Malinchik, “SAX-VSM:               IoT?: Our Lightweight Approach,” IEEE
    Interpretable Time Series Classification            ISSNIP, 2015.
    Using SAX and Vector Space Model,”             [19] A.     Bhattacharyya,     T. Bose, S.
    IEEE ICDM, pp.1175,1180, 2013.                      Bandyopadhyay, A. Ukil, and A. Pal,
[7] A. Bagnall, J. Lines, W. Vickers and E.             “LESS: Lightweight Establishment of
    Keogh, “The UEA & UCR Time Series                   Secure Session: A Cross-Layer Approach
    Classification                Repository,”          Using CoAP and DTLS-PSK Channel
    www.timeseriesclassification.com                    Encryption,"       IEEE       International
[8] A. Bagnall, J. Lines, A. Bostrom, J. Large          Conference on Advanced Information
    and E. Keogh, “The Great Time Series                Networking and Applications Workshops,
    Classification Bake Off: A Review and               pp. 682-687, 2015.