=Paper= {{Paper |id=Vol-2786/Paper41 |storemode=property |title=Residential Electricity Demand Prediction using Machine Learning |pdfUrl=https://ceur-ws.org/Vol-2786/Paper41.pdf |volume=Vol-2786 |authors=Manpreet Kaur,Shalini Panwar,Ayush Joshi,Kapil Gupta |dblpUrl=https://dblp.org/rec/conf/isic2/KaurPJG21 }} ==Residential Electricity Demand Prediction using Machine Learning== https://ceur-ws.org/Vol-2786/Paper41.pdf
                                                                                                                                          331


Residential Electricity Demand Prediction using Machine
Learning
Manpreet Kaur, Shalini Panwar, Ayush Joshi, and Kapil Gupta
National Institute of Technology, Kurukshetra Haryana, India


                Abstract
                 This paper presents an analysis of the usage of electric power in the residential sector and
                predicting the demand for power consumption of the next day and aims to improve the
                prediction accuracy and find the best model and try to reduce the cost of overall power
                consumption in a building. Consumption of electric power can be broadly divided into two
                categories i.e. commercial and residential sectors. This procedure consists of three steps i.e.
                feature extraction, normalization, and validating. Heavy fluctuation may arise in the residential
                sector may cause damage to electrical appliances. To match the demand of customers and
                generation of power at generating unit prediction is necessary. A variable power pattern may
                cause stress at power grid. Prediction of electric consumption is required in prior so that load at
                the power grid could be balanced. To meet the requirement of demand, appliances can be
                swapped from peak hours to an off-peak hour and also reduce the cost at the customers’ side.
                The performance of different models can be compared by using different evaluating indices are:
                Coefficient of determination (R2), mean absolute error (MAE), mean squared error (MSE). Out
                of Linear Regression, Lasso Regression, Ridge Regression, Elastic Net Regression, Random
                Forest Regression, Extra Trees, Support Vector Regression, Decision tree. Lasso Regression
                and Support Vector Machine outperforms with an accuracy of 99.99% and 99.89% with 0.01%
                and %0.11 % mean squared error respectively.

                Keywords 1
                Prediction, Demand, Consumption, Residential, Generation, Power, Dynamic, Regression,
                Learning, Temperature, Building.


1. Introduction                                                                             issues i.e. fluctuation. When we receive more or
                                                                                            less power than its requirement which can
                                                                                            degrade the performance of the appliances. The
As we know electric power plays a vital role in
                                                                                            consumption of power by appliances is
today’s era. To generate the power accordingly
                                                                                            dependent on various input parameters i.e. type
so that it can fulfill the demand of customer is
                                                                                            of day [12]. If it’s summer season, the Air-
not an easy task as natural sources of generation
                                                                                            conditioners appliance will consume more
of power is extincting day by day. The total
                                                                                            power. In the winter season, heating appliances
generation of power is distributed among
                                                                                            will consume more power whereas, in the rainy
various sectors according to their requirement.
                                                                                            season, lighting will consume more power. The
Sectors can be residential and commercial
                                                                                            total cost of power in the residential sector also
(offices, factories). For now, we are just
                                                                                            depends on the type of the hour i.e. peak hour,
focusing on the residential sector. It is not
                                                                                            off-peak hour, mid-peak hour [11] in addition
necessary that all the electrical appliances will
                                                                                            to the amount of power consumed. During peak
consume the same amount of power.
                                                                                            hours, load at a power grid, and cost of power
Appliances may have to undergoes various

ISIC’21: International Semantic Intelligence Conference,
February 25-27, 2021, New Delhi, India
EMAIL:          mannri346@gmail.com        (M.    Kaur),
panwarshalini40@gmail.com              (S.     Panwar),
ayushjoshi75.aj@gmail.com (A. Joshi);
            ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
            Commons License Attribution 4.0 International (CC BY 4.0).

            CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                      332


per unit hour is more as compared to an off-          IV, experimental settings of the models which
peak hour, load at a power grid, and cost of per      explain the dataset and how the parameters of
unit hour is least. To handle this issue,             models are tuned. In section V, experimental
appliances can be swapped from peak hour to           observation and results. Section VI, finally
off-peak hour or we can say appliances can be         concludes the paper.
scheduled in order to manage the overall load
of the power grid and total cost of customer. To
schedule the appliances, the power generating         2. Related Work
company and the customers make a deal in
order to compensate either by a price based
[1][2][3][4] or incentive-based [4][5][6]             2.1Linear Regression
demand response type. To meet the availability
of electric power and its demand we have to               Linear Regression is used to find the
make predictions. Wrong or no prediction may          relationship between predictor or independent
lead to a violation of the agreement of service       variable and target or dependent variable. If one
level. Consumption of electric power is directly      variable is expressed accurately to another
dependent on climatic conditions. Either if we        variable then it is known as deterministic. The
make a prediction still, we can’t guarantee the       basic idea of the linear regression model is to
weather of the next second as the weather             derive the best fit line, also known as the
changes drastically. If the customer knows they       regression line. Sum of the distance between the
need consumption of power, then they can              points on the graph and regressor line are the
make long-term plans. Even dynamic pricing            total prediction error of all the data points.
may help to know the price of the power in the        Smaller the error, the better the result and vice
next hour. The biggest challenge in the               – versa [8].
prediction of building like indoor temperature,           Linear regression Equation:
outdoor temperature because of climate change.                    𝑌(𝑝𝑟𝑒𝑑) = 𝑏0 + 𝑏1 ∗ 𝑥         (1)
It is very difficult to predict the effect on other
parameters by increasing or decreasing its value         b0 is intercepting whereas b1 is the slope of
during the construction of the building. As it        the regression line. In order to get the minimum
was already discussed that it is not possible to      error, the value of b0 and b1 should be
include all the key features or we can say wide       minimum.
range of building features and general principle         The error between the predicted value and
to predict the demand of a power in residential       the actual value can be calculated as:
sector. But in this, author stated one model i.e.
dynamic high resolution demand side                       𝐸𝑟𝑟𝑜𝑟 = ∑ (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑣𝑎𝑙𝑢𝑒              (2)
management which combine all the building
features and general principle [15]. And also                              − 𝑎𝑐𝑡𝑢𝑎𝑙𝑣𝑎𝑙𝑢𝑒 )
analysed that maximum of the energy                   Exploring the value of ‘b1’:
consumption is consumed by lights and                 If b1<0 then it will have a negative relationship,
building integrated photovoltaic [16]. Different      which means a decrease in target value with an
machine learning algorithms i.e. Linear               increase in predictor value. If b1>0 then it will
Regression, Elastic Net, Decision Tree,               have a positive relationship, which means the
Random Forest, SVM, etc. out of all these             value of the target will increase with the
models Linear Regression is the benchmark             increase in the value of predictors.
model for the prediction, and the performance         Exploring the value of b0: If the predictor is 0
of different models can be compared by using          then the equation will be meaningless and of no
different evaluating indices are: Coefficient of      use.
determination (R2) [7], mean absolute error
(MAE) [7], mean squared error (MSE).
    This paper is organized as follows: Section
II related work of machine learning algorithms
in which 8 Regression algorithms used for
prediction are briefly explained. In section III,
4 evaluation indices are used to evaluate the
performance of prediction models. In section
                                                                                                       333


   In Figure 1, it shows the graph of predicted       other. L1 regularization will results in a sparse
value of power in a residential sector.               model.
                                                         It is a challenging task to select one variable
                                                      as the predictor which particulates suite the
                                                      property of Lasso Regression. The selection can
                                                      be done haphazardly but it can result in a very
                                                      bad decision means a very time-consuming
                                                      process.


                                                      2.3 Ridge Regression
                                                          If an overfitting or underfitting type of
                                                      problem arises then there are chances that it
                                                      works as linear regression. Ridge Regression is
                                                      a method to create a parsimonious model i.e.
                                                      when the number of predictor values is more
                                                      than the number of observations means when
              Figure 1: Linear Regression             there is a correlation between predictor values
                                                      i.e. the dataset has multicollinearity.
                                                          Tikhivov's method has a larger set as
2.2 Lasso Regression                                  compared to the parsimonious model but it is
                                                      similar to ridge regression. If a dataset contains
   It stands for Least Absolute Shrinkage and         a noise i.e. statistical noise still this model can
Selection Operator. From its full form, it is clear   produce the solution.
that it uses shrinkage and it is a type of Linear         Ridge       regression       undergoes       L2
Regression. Here, shrink means that values of         regularization. Also known as the L2 penalty.
the dataset will be shrunk towards the central        coefficients of data values are shrunk by the
point, say similar to that of Mean. The               same factor and none of the value is eliminated.
performance of this model is good when the            Unlike L1 regularization, L2 will not result in a
dataset contains multicollinearity.                   sparse model.
   Lasso       Regression       undergoes       L1
Regularization means it is the summation of the           ∑𝑛𝑖=1(𝑦𝑖 − 𝑦𝑖 ′ )2 =    ∑𝑛𝑖=1(𝑦𝑖 −     (4)
absolute value of the magnitude of the                  𝑝            2        𝑝
                                                       ∑𝑗=0 𝑤𝑗 ∗ 𝑥𝑖𝑗) + λ ∗ ∑𝑗=0 𝑤𝑗 2
coefficient. Here, a few of the coefficients can
be zero and that values can be eliminated from
                                                      To strengthen the term of penalty, we have to
the dataset. Larger penalties will result in the
                                                      tune the parameter i.e. λ When λ is 0, least
coefficient values near zero whereas smaller
                                                      squares and ridge regression are equal. When λ
penalties will result in the coefficient values far
                                                      is ∞, all coefficient will be zero. The overall
away from zero. The aim of this algorithm is to
                                                      penalty will range from 0 to ∞ Overall, Least
minimize the error
      ∑𝑛𝑖=1(𝑦𝑖 − ∑𝑗 𝑥𝑖𝑗 ∗ 𝛽𝑗)2 + λ (3)                Square uses the following equation:
                                                                   𝐵′ = (𝑋 ′ 𝑋)−1 𝑋′𝑌          (5)
 ∑𝑝𝑗=1 |𝛽𝑗|                                              Here, X is a scaled and centered matrix.
   If λ = 0 means there is an absence of              When columns of the X matrix have high
regularization and thus we get Ordinary Least         multicollinearity then the cross product of
Squares solution. When λ-> INF, then                  (X’X) matrix will be singular or nearly
coefficients will lead to 0 and the model left out    Singular. Including ridge parameter (k) to the
be a constant function To tune the parameters,        above equation, then the new equation will be
λ is the amount of shrinkage when λ = 0,                         𝐵′ = (𝑋 ′ 𝑋 + 𝑘𝐼)−1 X’Y     (6)
parameters will not be eliminated. When λ
increases, bias also increases whereas when λ
decreases, variance also decreases. Bias and
variance are inversely proportional to each
                                                                                                        334


                                                          continuous and discrete output values. Here,
                                                          continuous output example is to predict the
                                                          required power of the building where our
                                                          ultimate goal is to reduce the overall cost of the
                                                          power whereas discrete output value means to
                                                          predict the rain on a particular day that whether
                                                          it rains or not.
                                                              Decision Nodes are known as conditions of
                                                          a flowchart whereas terminals are known as
                                                          results of a flowchart. The root node is called as
                                                          best predictor node as this node is the topmost
                                                          decision node. Every machine learning
                                                          algorithm model has its advantages and
                                                          disadvantages but the advantage of the decision
   Figure 2: Lasso Regression,                  Ridge     tree is that it is a very good model at handling
                                                          the tabular data with categorical features with
Regression, Elastic Net Regression
                                                          lesser than hundreds of categories and
                                                          numerical data.
2.4 Elastic Net Regression                                    The decision tree can capture the non-linear
                                                          interaction between the predictors and the
   It is a technique that uses properties of the          target value. Suppose, target variable is air-
L1 penalty (Lasso Regression) and L2 penalty              conditioner and predictor variable is room
(Ridge Regression). To improve the                        occupancy (empty or not) and outdoor air-
regularization, we combine both lasso and ridge           temperature (<=26o C) see figure 3
regression. It is a 2-step process i.e. in the first
step it finds the coefficient of ridge regression
by selecting group feature and in the second
step, it performs lasso sort of the coefficients of
shrinking by performing feature selection. The
objective of this model is at minimizing by
using the following equation:
                ∑n (yi - xiβ) * (yi - xiβ)      1-α (7)
     Lenet (β) = i=1                       + λ(
                        2n                2


 ∑m                m
  j=1 βj * βj + α ∑j=1 |βj|)


   Here, α is the mixing parameter i.e. α = 1
reduces the function to lasso regression
whereas α = 0 reduces the function to ridge
regression. Parameter λ is highly dependent on
the α parameter. It has better predictive
potential than lasso regression.
One of the biggest disadvantages of Elastic Net
                                                                         Figure 3: Decision Tree
Regression is that it may or may not remove all
the irrelevant coefficients.
In Figure 2, it shows the relationship between            2.6 Random forest
Lasso Regression, Ridge Regression and
ElasticNet Regression.                                        It is a supervised machine learning
                                                          algorithm. The random forest can perform both
                                                          classification and regression problems. The
2.5 Decision Tree                                         random forest contains multiple decision trees
    It is a supervised machine learning                   and the output of this is not only dependent on
algorithm. From the name, it defines that it is a         one decision tree but every single decision tree.
decision-making tool and it uses flowchart like           Every tree is independent, none of any tree has
tree structure [8]. It supports both the                  interaction with each other while building the
                                                                                                   335


model. All these trees run parallelly but           2.8 Support Vector Regression
independently. Every tree performs its
prediction and these predictions are aggregated        This algorithm is one of the most popular
and perform arithmetic mean on that to produce      algorithms for regression problems. Basically
a single final result. It can be formulated as:     [8], it draws a boundary line or straight line so
 g(x) = f0(x) + f1(x) + f2(x) + --- fn(x) (8)       that n-dimensional space can be segregated into
                                                    classes. The Boundary line is drawn in such a
Here, g(x) is a single final result whereas fi(x)
                                                    way that it can cover maximum data points
is a decision tree.
                                                    between them. This boundary line is known as
    Each Decision tree can be drawn using a
                                                    a hyperplane. There are two types of SVR:
random sample from the original dataset by
splitting it and add randomness to it to prevent
it from overfitting. Random forest is one of the    Linear SVR: This type of data is known as
highly accurate models which can handle             linearly separable data. It draws a single straight
thousands of predictors without the deletion of     line to differentiate two classes.
any variable.
From Figure 4, it is clear that Random Forest is    Non-Linear SVR: This type of data is known
multiple Decision Trees with multiple features.     as non-linear separable data. It is not possible to
                                                    segregate data into classes by just one single
                                                    line.
                                                        This linear and non-Linear data is handled
                                                    by the SVR kernel. Kernel Helps to find and
                                                    draw the hyperplane without increasing the cost
                                                    in n-dimensional space. Sometimes it is not
                                                    possible to find the hyperplane in n-
                                                    dimensional space. So, we draw n+1
                                                    dimensional space. The value of kernel can be
                                                    poly, RBF, sigmoid, gaussian for non-linear
                                                    datasets whereas for linear dataset value should
                                                    be linear kernel only to solve the problems.
                                                        Cross-Validation is also one of the
                                                    techniques which can be used in Support Vector
                                                    Regression from the training purpose of the
           Figure 4: Random Forest                  model and then evaluate the model. It is failed
                                                    to generalize the pattern of the dataset but can
2.7 Extra Trees                                     detect the fitting whereas cross-validation is
                                                    used to find the most accurate value but it may
   It is also known as Extremely Randomized         fail to enhance the accuracy.
Trees. Unlike, Random Forest and Decision
Tree, Extra Trees makes the next best split from
the uniform random splits from the subsets of
features and can't be substituted with another
sample. Extra Trees creates a greater number of
unpruned Decision Trees. Unlike Random
Forest, it makes random split. In addition to the
optimization of algorithms, it also adds
randomization. This model is faster than other
models. It takes less time to compute as it
doesn't have to select the optimal split but a
random split.                                            Figure 5: Support Vector Regression

                                                    Figure 5 shows the result of Support Vector
                                                    Regression.
                                                                                                    336


3. Evaluation indices                                 4. System Architecture
   Evaluation indices are considered to
evaluate the model by various authors. The
performance can be checked by finding the
accuracy and error of the models. The lesser the
error, the more the accuracy is better than the
model.
   R2 [7][13] (Coefficient of Determination)
and RMSE [13][14] (Root Mean Squared
Error) are the two methods for the model
optimization whereas MAE [7][14] (Mean
Absolute Error) and MSE (Mean Squared
Error) two methods to evaluate the model
                                                            Figure 6: System Architecture
means to check the error and tries to reduce it.
R2 its value varies between 0 to 1 defining the
                                                      From the System Architecture figure i.e. Figure
accuracy of the model.
                      𝑟𝑒𝑠                             6, The prediction of power input parameters
                 R2 = 𝑡𝑜𝑡                (9)
                                                      will be of that type only like the type of a day
Here, the res is the sum of the square of the         i.e. summer, winter, rainy. The material used
residual error whereas tot is the total sum of the    for construction is used i.e. if the material is
error. If R2 >0 it means the result is accurate, if   insulation then it would be best. Dataset is a
R2 means the same result and R2 is ambiguous          daily, hourly basis, yearly basis, etc. Other
results.                                              details of buildings can also be included i.e.
                    𝑛                                 height, width, illumination, occupancy, etc.
            res= ∑𝑖=1(𝑦𝑖 − 𝑦𝑖 ′)2         (9 i)
                                                      Even dataset can be of 3 types i.e. Real data,
                                                      Simulated data, Sensor-based data [8].
                 𝑛                                    After analyzing the dataset, it undergoes the
          tot=∑𝑖=1(𝑦𝑖 − 𝑦𝑚𝑒𝑎𝑛 ′)2          (9 ii)
                                                      feature extraction phase, in which filtration of
RMSE a metric and it is dependable on the             the dataset is done i.e. unusual data and noise is
                                                      discarded and only useful data is left behind and
scale.                                                thus undergoes transformation process i.e.
                                                      dataset is transformed according to the
                          𝑛
                        ∑𝑖=1(𝑦𝑖 −𝑦𝑖 ′)2    (10)       requirement of the algorithm and after that size
            RMSE = √          𝑛
                                                      of the dataset is decreased to increase the
                                                      performance and this process is known as
MAE is also dependable on the scale. Basically,       reduction of dataset.
it finds the absolute of all the data points either   After feature extraction, transformation and
if they have a negative error or positive error.      reduction, the entire dataset is divided into
None of any error cancels out the effect of each      training and testing dataset and there is a
other                                                 training and testing phase of the model. In the
                       n
                                           (11)       training of a model first we have to select an
                     ∑     |yi -y'i |
             MAE =     i=1                            appropriate algorithm for the prediction and
                          n                           thus training can be done in two ways i.e. First
                                                      principle approach i.e. the prediction of power
MSE is similar to that of MAE but instead of          is done based on the current situation rather
absolute the values, it squares the values then       than observing the history or Data-Driven
finds the Mean error.                                 approach i.e. want to give detailed information
                  n
                ∑     (yi -y'i ) * (yi -y'i ) (12)    about building and thus results are validated
        MSE = i=1                                     and accuracy is measured. If it ends with good
                          n
                                                      accuracy then our algorithm is ready for an
   In the above equations, yi is the predicted        unknown dataset of the building and prediction
value whereas yi’ is the actual values and n is       there demand of power. Thus, results are
the number of data points on the graph                compared based on evaluation metrics and
                                                                                                   337


declare one model as the best model with            dataset is taken from Kaggle and we uploaded
greater accuracy and minimum error.                 on                   GitHub,                   link:
                                                    https://raw.githubusercontent.com/navkapil/go
5. Experimental Settings                            oglecolab/master/pwrpred.csv
                                                        The dataset consists of 1048576 rows and 9
                                                    columns. It contains per minute data of the day
    Using various libraries of python such as       for approx. 2 years from 16-12-2006 17:24:00
pandas, scipy we can carry out the analysis. For
                                                    to 13-12-2008 21:38:00. 9 columns of the
the basic implementation of mining of data or
                                                    dataset are DateTime, global active power,
ML we are analysing the library “SKlearn”.          global reactive power, Voltage, Global
Sklearn is a module of python language which        intensity, sub-metering 1, sub-metering 2, sub-
integrates all the ML algorithms in the world of    metering 3, and sub-metering 4. Out of all these
different python libraries such as NumPy,           parameters, global active power is taken as
sciPy, Matplotlib. We are trying to use this as     output, and the rest all other as input
the study shows that this gives efficient and       parameters. This dataset is divided into training
simple solutions. Now training and testing can      and testing with a percentage of 90% and 10%
be done in different models and then compared
                                                    respectively. After this division, it undergoes
from each other in-order to get the better
                                                    normalization so that all the parameters lie in
outcomes. From the review of different authors
                                                    the same range. Models for the experiments are
and according to our online study we can say        Linear Regression, Decision tree, Random
that ensemble models of machine learning give       Forest, Extra Trees, Lasso Regression, Ridge
better performance than others. For the time        Regression, Elastic Net Regression, Support
being the study about different models is being     Vector Regression. The result of the dataset
done and a dataset is collected. Also, the          also depends on the type of the dataset i.e.
gradient boosting may give us the accurate          whether it is a linear or Non-Linear Dataset. All
results. The gradient boosting is the algorithm
                                                    the parameters of a model have Default values.
which also trains various models in gradual,        The performance of SVR is affected if data is
additive and sequential manner. Since this          linear and we set the value of kernel as non-
algorithm is prone to overfitting therefore it      linear (RBF, poly, gaussian) and vice-versa.
uses hyperparameter tuning. This analysis
                                                    The kernel we have taken is RBF. We checked
totally depends on how better accuracy we are
                                                    values of C and gramma varies from .01 to 100
demanding according to the dataset available.       and 0.01 to 10 by 10 units respectively and we
On the other hand, Random forests also prove        got good results at C=100 and gamma = 0.1 and
to be good for the efficient results. It is the     degree as 3 (default). Linear Regression is the
algorithm which uses a special process known        baseline model for this project because it gives
as "Early Stopping" in which training stops
                                                    very good results by taking default values of all
once the performance on testing data stops          the parameters. Decision Tree, Random Forest,
improving further. This is an optimized             and Extra Trees are somewhat similar models.
technique. Also, this avoids overfitting.           Indecision tree default value of random_state is
Therefore, this also can be the model for our       None but we give it as 42 as this parameter
project which is to be applied and analysed.
                                                    dominates the randomness of the estimator, in
And this is beneficial for categorization as well   random forest n_estimators tells the number of
as regresssion problem. Also, this can be           trees to be formed and we checked the
modelled for categorical values. After the          performance of the model by varying its value
search results, this model can also be compared     from 10 to 100 and got a better result for value
for the study purpose gathering more dataset        as 50 whereas the value of n_estimators in Extra
and then visualizing for more accuracy and          trees is 150 but its default value is 100. If we set
better performance for achieving the results.       the default value of alpha in Lasso Regression
The search result done after training can be        then it works as a Linear Regression but it is not
gathered in a document and the result could be      advised. For better performance, we set its
concluded.                                          value as 0.01 whereas in Ridge Regression if
    Experiments should be done to declare one       we set the default value of alpha then it works
model as the best model and for experiments,        as a Logistic Regression so it would be better if
we need a dataset. Generally, the dataset can
                                                    we tuned them with value as 0.1. Basically,
vary from 2 weeks [9] to 4 years [10]. So, the
                                                    alpha regularization improves the problem
                                                                                                                 338


    conditioning and hence lower down the                           R2_score,MSE, MAE for the prediction of
variance of the estimators. For Elastic Net                         next-day power consumption.
Regression, we tuned two parameters that are                           From the table i.e. Table 1, it is clear that
alpha and l1_ratio. For alpha = 0 it is solved by                   Support Vector Machine and Lasso Regression
Linear regression. If l1_ratio = 0 then penalty is                  give better accuracy with the minimum error
l2 i.e. Ridge Regression and if l1_ratio = 1 then                   where Linear Regression is the worst performer
penalty is l1 i.e. Lasso Regression. 0 < l1_ratio                   out of these 8 models that is why Linear
< 1, is the combination of l1 and l2 penalty. Rest                  Regression is taken as Baseline Model and
all the parameters of the models will be the                        Support Vector Machine as a benchmark
default value.                                                      system.
                                                                       Figure (7) (8) shows that Elastic Net,
6. Experiment Result                                                Random Forest, Extra Trees, Support Vector
                                                                    Regression, Lasso all have nearly the same
                                                                    results but Support Vector Regressor and Lasso
      It shows the results of different models using                give good results.
4        evaluation     indices      i.e.,  RMSE,

Table 1:
Summary of results of models
                 Models                          RMSE                R2_score                  MSE         MAE

           Linear Regression                     0.4088                0.9999                 0.1671      0.235

               Elastic Net                   0.11204                   0.9862                 0.01255     0.0916

            Random Forest                        0.1451                0.9998                  0.02       0.0345

               Extra trees                   0.01088                  0.99988                  0.012      0.0248

            Support Vector                   0.03404                   0.9989                 0.0011     0.02706

              Decision tree                  0.01952                   0.9999                 0.3026      0.0413

                  Ridge                          0.2112                0.9999                 0.1238      0.7048

                  lasso                      0.04626                  0.99990                 0.0001     0.01119



                                     RMSE and R2 for various regressors
    1.2
     1
    0.8
    0.6
    0.4
    0.2
     0
            Linear     Elastic Net     Random      Extra trees      Support   Decision tree     Ridge   lasso
          Regression                    Forest                       Vector

                                                     Rmse        R2_score

                                                   Figure 7: RMSE and R2
                                                                                                            339



                       Test error in the prediction with respect to various
                                            regressors
    0.8
    0.7
    0.6
    0.5
    0.4
    0.3
    0.2
    0.1
      0
            Linear     Elastic Net   Random    Extra trees     Support   Decision tree   Ridge      lasso
          Regression                  Forest                    Vector

                                                    MSE      MAE


                                          Figure 8: MSE and MAE

7. Conclusion                                                    system. In 2015 Conference on Power,
                                                                 Control,         Communication            and
                                                                 Computational         Technologies         for
    This paper focuses on the implementation of
                                                                 Sustainable Growth (PCCCTSG) (pp. 178-
various machine learning algorithms for
                                                                 182). IEEE.
predicting the power of buildings. It is not
                                                             [3] Amasyali, K., & El-Gohary, N. M. (2018).
necessary that for a particular dataset it will
                                                                 A review of data-driven building energy
always show a good result, sometime it may
                                                                 consumption prediction studies. Renewable
show the uncertain results as every model have
                                                                 and Sustainable Energy Reviews, 81,
their pros and cons. By varying one or more
                                                                 11921205.
parameters of the building what would be its
                                                             [4] Gomes, Á., Antunes, C. H., & Oliveira, E.
effect on the other parameters, it would be very
                                                                 (2011). Direct load control in the
difficult to predict as we can't set all the
                                                                 perspective of an electricity retailer–a multi-
parameters according to our requirements.
                                                                 objective evolutionary approach. In Soft
Because of prediction, it would be very easy to
                                                                 Computing in Industrial Applications (pp.
make long term plans. Weather data plays a
                                                                 13-26). Springer, Berlin, Heidelberg.
vital role in the prediction of the power of a
                                                             [5] Babar, M., Ahamed, T. I., AlAmmar, E. A.,
building. According to our analysis of the result
                                                                 & Shah, A. (2013). A novel algorithm for
of the model's Support vector regression, lasso
                                                                 demand reduction bid based incentive
regression is the best model. Even to increase
                                                                 program in direct load control. Energy
the accuracy of these models we can use Long
                                                                 Procedia, 42, 607613.
Short Term Memory (LSTM) as it is very
                                                             [6] Liu, D., & Chen, Q. (2013, June). Prediction
robust deep learning algorithm for prediction of
                                                                 of building lighting energy consumption
time based forecasting and has potential to give
                                                                 based on support vector regression. In 2013
accurate prediction results or hybrid approach.
                                                                 9th Asian Control Conference (ASCC) (pp.
                                                                 1-5). IEEE.
8. References                                                [7] Muralitharan, K., Sakthivel, R., & Shi, Y.
                                                                 (2016).     Multiobjective        optimization
[1] Setlhaolo, D., Xia, X., & Zhang, J. (2014).                  technique for demand side management
    Optimal sceduling of household appliances                    with load balancing approach in smart grid.
    for demand response. Electric Power                          Neurocomputing, 177, 110-119.
    Systems Research, 116, 24-28.                            [8] Amasyali, K., & El-Gohary, N. (2016).
[2] Gayatri, P., Sukumar, G. D., &                               Building lighting energy consumption
    Jithendranah, J. (2015, December). Effect of                 prediction for supporting energy data
    load change on source parameters in power                    analytics. Procedia Engineering, 145, 511-
                                                                                                  340


     517.                                                Information Technology (pp. 231-236).
 [9] Liu, D., & Chen, Q. (2013, June). Prediction        IEEE.
     of building lighting energy consumption        [13] Hahn, H., Meyer-Nieberg, S., & Pickl, S.
     based on support vector regression. In 2013         (2009). Electric load forecasting methods:
     9th Asian Control Conference (ASCC) (pp.            Tools for decision making. European
     1-5). IEEE                                          journal of operational research, 199(3), 902-
[10] Dagnely, P., Ruette, T., Tourwé, T.,                907.
     Tsiporkova, E., & Verhelst, C. (2015,          [14] Gonzalez-Romera, E., Jaramillo-Moran, M.
     September). Predicting hourly energy                A., & Carmona-Fernandez, D. (2006).
     consumption. can regression modeling                Monthly electric energy demand forecasting
     improve on an autoregressive baseline?. In          based      on     trend     extraction. IEEE
     International Workshop on Data Analytics            Transactions on power systems, 21(4),
     for Renewable Energy Integration (pp. 105-          1946-1953.
     122). Springer, Cham.                          [15] Stavrakas, V., & Flamos, A. (2020). A
[11] Naji, S., Çelik, O. C., Alengaram, U. J.,           modular      high-resolution    demand-side
     Jumaat, M. Z., & Shamshirband, S. (2014).           management model to quantify benefits of
     Structure, energy and cost efficiency               demand-flexibility in the residential
     evaluation of three different lightweight           sector. Energy        Conversion          and
     construction systems used in low-rise               Management, 205, 112339.
     residential buildings. Energy and buildings,   [16] Luo, X. J., Oyedele, L. O., Ajayi, A. O., &
     84, 727-739.                                        Akinade, O. O. (2020). Comparative study
[12] Ali, S., Ahmad, R., & Kim, D. (2012,                of machine learning-based multi-objective
     December). A study of pricing policy for            prediction framework for multiple building
     demand response of home appliances in               energy        loads. Sustainable       Cities
     smart grid based on M2M. In 2012 10th               andSociety, 61, 102283.
     International Conference on Frontiers of