=Paper= {{Paper |id=Vol-2786/Paper41 |storemode=property |title=Residential Electricity Demand Prediction using Machine Learning |pdfUrl=https://ceur-ws.org/Vol-2786/Paper41.pdf |volume=Vol-2786 |authors=Manpreet Kaur,Shalini Panwar,Ayush Joshi,Kapil Gupta |dblpUrl=https://dblp.org/rec/conf/isic2/KaurPJG21 }} ==Residential Electricity Demand Prediction using Machine Learning== https://ceur-ws.org/Vol-2786/Paper41.pdf

331

Residential Electricity Demand Prediction using Machine
Learning
Manpreet Kaur, Shalini Panwar, Ayush Joshi, and Kapil Gupta
National Institute of Technology, Kurukshetra Haryana, India

Abstract
This paper presents an analysis of the usage of electric power in the residential sector and
predicting the demand for power consumption of the next day and aims to improve the
prediction accuracy and find the best model and try to reduce the cost of overall power
consumption in a building. Consumption of electric power can be broadly divided into two
categories i.e. commercial and residential sectors. This procedure consists of three steps i.e.
feature extraction, normalization, and validating. Heavy fluctuation may arise in the residential
sector may cause damage to electrical appliances. To match the demand of customers and
generation of power at generating unit prediction is necessary. A variable power pattern may
cause stress at power grid. Prediction of electric consumption is required in prior so that load at
the power grid could be balanced. To meet the requirement of demand, appliances can be
swapped from peak hours to an off-peak hour and also reduce the cost at the customers’ side.
The performance of different models can be compared by using different evaluating indices are:
Coefficient of determination (R2), mean absolute error (MAE), mean squared error (MSE). Out
of Linear Regression, Lasso Regression, Ridge Regression, Elastic Net Regression, Random
Forest Regression, Extra Trees, Support Vector Regression, Decision tree. Lasso Regression
and Support Vector Machine outperforms with an accuracy of 99.99% and 99.89% with 0.01%
and %0.11 % mean squared error respectively.

Keywords 1
Prediction, Demand, Consumption, Residential, Generation, Power, Dynamic, Regression,
Learning, Temperature, Building.

1. Introduction issues i.e. fluctuation. When we receive more or
less power than its requirement which can
degrade the performance of the appliances. The
As we know electric power plays a vital role in
consumption of power by appliances is
today’s era. To generate the power accordingly
dependent on various input parameters i.e. type
so that it can fulfill the demand of customer is
of day [12]. If it’s summer season, the Air-
not an easy task as natural sources of generation
conditioners appliance will consume more
of power is extincting day by day. The total
power. In the winter season, heating appliances
generation of power is distributed among
will consume more power whereas, in the rainy
various sectors according to their requirement.
season, lighting will consume more power. The
Sectors can be residential and commercial
total cost of power in the residential sector also
(offices, factories). For now, we are just
depends on the type of the hour i.e. peak hour,
focusing on the residential sector. It is not
off-peak hour, mid-peak hour [11] in addition
necessary that all the electrical appliances will
to the amount of power consumed. During peak
consume the same amount of power.
hours, load at a power grid, and cost of power
Appliances may have to undergoes various

ISIC’21: International Semantic Intelligence Conference,
February 25-27, 2021, New Delhi, India
EMAIL: mannri346@gmail.com (M. Kaur),
panwarshalini40@gmail.com (S. Panwar),
ayushjoshi75.aj@gmail.com (A. Joshi);
©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).

CEUR Workshop Proceedings (CEUR-WS.org)
332

per unit hour is more as compared to an off- IV, experimental settings of the models which
peak hour, load at a power grid, and cost of per explain the dataset and how the parameters of
unit hour is least. To handle this issue, models are tuned. In section V, experimental
appliances can be swapped from peak hour to observation and results. Section VI, finally
off-peak hour or we can say appliances can be concludes the paper.
scheduled in order to manage the overall load
of the power grid and total cost of customer. To
schedule the appliances, the power generating 2. Related Work
company and the customers make a deal in
order to compensate either by a price based
[1][2][3][4] or incentive-based [4][5][6] 2.1Linear Regression
demand response type. To meet the availability
of electric power and its demand we have to Linear Regression is used to find the
make predictions. Wrong or no prediction may relationship between predictor or independent
lead to a violation of the agreement of service variable and target or dependent variable. If one
level. Consumption of electric power is directly variable is expressed accurately to another
dependent on climatic conditions. Either if we variable then it is known as deterministic. The
make a prediction still, we can’t guarantee the basic idea of the linear regression model is to
weather of the next second as the weather derive the best fit line, also known as the
changes drastically. If the customer knows they regression line. Sum of the distance between the
need consumption of power, then they can points on the graph and regressor line are the
make long-term plans. Even dynamic pricing total prediction error of all the data points.
may help to know the price of the power in the Smaller the error, the better the result and vice
next hour. The biggest challenge in the – versa [8].
prediction of building like indoor temperature, Linear regression Equation:
outdoor temperature because of climate change. 𝑌(𝑝𝑟𝑒𝑑) = 𝑏0 + 𝑏1 ∗ 𝑥 (1)
It is very difficult to predict the effect on other
parameters by increasing or decreasing its value b0 is intercepting whereas b1 is the slope of
during the construction of the building. As it the regression line. In order to get the minimum
was already discussed that it is not possible to error, the value of b0 and b1 should be
include all the key features or we can say wide minimum.
range of building features and general principle The error between the predicted value and
to predict the demand of a power in residential the actual value can be calculated as:
sector. But in this, author stated one model i.e.
dynamic high resolution demand side 𝐸𝑟𝑟𝑜𝑟 = ∑ (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑣𝑎𝑙𝑢𝑒 (2)
management which combine all the building
features and general principle [15]. And also − 𝑎𝑐𝑡𝑢𝑎𝑙𝑣𝑎𝑙𝑢𝑒 )
analysed that maximum of the energy Exploring the value of ‘b1’:
consumption is consumed by lights and If b1<0 then it will have a negative relationship,
building integrated photovoltaic [16]. Different which means a decrease in target value with an
machine learning algorithms i.e. Linear increase in predictor value. If b1>0 then it will
Regression, Elastic Net, Decision Tree, have a positive relationship, which means the
Random Forest, SVM, etc. out of all these value of the target will increase with the
models Linear Regression is the benchmark increase in the value of predictors.
model for the prediction, and the performance Exploring the value of b0: If the predictor is 0
of different models can be compared by using then the equation will be meaningless and of no
different evaluating indices are: Coefficient of use.
determination (R2) [7], mean absolute error
(MAE) [7], mean squared error (MSE).
This paper is organized as follows: Section
II related work of machine learning algorithms
in which 8 Regression algorithms used for
prediction are briefly explained. In section III,
4 evaluation indices are used to evaluate the
performance of prediction models. In section
333

In Figure 1, it shows the graph of predicted other. L1 regularization will results in a sparse
value of power in a residential sector. model.
It is a challenging task to select one variable
as the predictor which particulates suite the
property of Lasso Regression. The selection can
be done haphazardly but it can result in a very
bad decision means a very time-consuming
process.

2.3 Ridge Regression
If an overfitting or underfitting type of
problem arises then there are chances that it
works as linear regression. Ridge Regression is
a method to create a parsimonious model i.e.
when the number of predictor values is more
than the number of observations means when
Figure 1: Linear Regression there is a correlation between predictor values
i.e. the dataset has multicollinearity.
Tikhivov's method has a larger set as
2.2 Lasso Regression compared to the parsimonious model but it is
similar to ridge regression. If a dataset contains
It stands for Least Absolute Shrinkage and a noise i.e. statistical noise still this model can
Selection Operator. From its full form, it is clear produce the solution.
that it uses shrinkage and it is a type of Linear Ridge regression undergoes L2
Regression. Here, shrink means that values of regularization. Also known as the L2 penalty.
the dataset will be shrunk towards the central coefficients of data values are shrunk by the
point, say similar to that of Mean. The same factor and none of the value is eliminated.
performance of this model is good when the Unlike L1 regularization, L2 will not result in a
dataset contains multicollinearity. sparse model.
Lasso Regression undergoes L1
Regularization means it is the summation of the ∑𝑛𝑖=1(𝑦𝑖 − 𝑦𝑖 ′ )2 = ∑𝑛𝑖=1(𝑦𝑖 − (4)
absolute value of the magnitude of the 𝑝 2 𝑝
∑𝑗=0 𝑤𝑗 ∗ 𝑥𝑖𝑗) + λ ∗ ∑𝑗=0 𝑤𝑗 2
coefficient. Here, a few of the coefficients can
be zero and that values can be eliminated from
To strengthen the term of penalty, we have to
the dataset. Larger penalties will result in the
tune the parameter i.e. λ When λ is 0, least
coefficient values near zero whereas smaller
squares and ridge regression are equal. When λ
penalties will result in the coefficient values far
is ∞, all coefficient will be zero. The overall
away from zero. The aim of this algorithm is to
penalty will range from 0 to ∞ Overall, Least
minimize the error
∑𝑛𝑖=1(𝑦𝑖 − ∑𝑗 𝑥𝑖𝑗 ∗ 𝛽𝑗)2 + λ (3) Square uses the following equation:
𝐵′ = (𝑋 ′ 𝑋)−1 𝑋′𝑌 (5)
∑𝑝𝑗=1 |𝛽𝑗| Here, X is a scaled and centered matrix.
If λ = 0 means there is an absence of When columns of the X matrix have high
regularization and thus we get Ordinary Least multicollinearity then the cross product of
Squares solution. When λ-> INF, then (X’X) matrix will be singular or nearly
coefficients will lead to 0 and the model left out Singular. Including ridge parameter (k) to the
be a constant function To tune the parameters, above equation, then the new equation will be
λ is the amount of shrinkage when λ = 0, 𝐵′ = (𝑋 ′ 𝑋 + 𝑘𝐼)−1 X’Y (6)
parameters will not be eliminated. When λ
increases, bias also increases whereas when λ
decreases, variance also decreases. Bias and
variance are inversely proportional to each
334

continuous and discrete output values. Here,
continuous output example is to predict the
required power of the building where our
ultimate goal is to reduce the overall cost of the
power whereas discrete output value means to
predict the rain on a particular day that whether
it rains or not.
Decision Nodes are known as conditions of
a flowchart whereas terminals are known as
results of a flowchart. The root node is called as
best predictor node as this node is the topmost
decision node. Every machine learning
algorithm model has its advantages and
disadvantages but the advantage of the decision
Figure 2: Lasso Regression, Ridge tree is that it is a very good model at handling
the tabular data with categorical features with
Regression, Elastic Net Regression
lesser than hundreds of categories and
numerical data.
2.4 Elastic Net Regression The decision tree can capture the non-linear
interaction between the predictors and the
It is a technique that uses properties of the target value. Suppose, target variable is air-
L1 penalty (Lasso Regression) and L2 penalty conditioner and predictor variable is room
(Ridge Regression). To improve the occupancy (empty or not) and outdoor air-
regularization, we combine both lasso and ridge temperature (<=26o C) see figure 3
regression. It is a 2-step process i.e. in the first
step it finds the coefficient of ridge regression
by selecting group feature and in the second
step, it performs lasso sort of the coefficients of
shrinking by performing feature selection. The
objective of this model is at minimizing by
using the following equation:
∑n (yi - xiβ) * (yi - xiβ) 1-α (7)
Lenet (β) = i=1 + λ(
2n 2

∑m m
j=1 βj * βj + α ∑j=1 |βj|)

Here, α is the mixing parameter i.e. α = 1
reduces the function to lasso regression
whereas α = 0 reduces the function to ridge
regression. Parameter λ is highly dependent on
the α parameter. It has better predictive
potential than lasso regression.
One of the biggest disadvantages of Elastic Net
Figure 3: Decision Tree
Regression is that it may or may not remove all
the irrelevant coefficients.
In Figure 2, it shows the relationship between 2.6 Random forest
Lasso Regression, Ridge Regression and
ElasticNet Regression. It is a supervised machine learning
algorithm. The random forest can perform both
classification and regression problems. The
2.5 Decision Tree random forest contains multiple decision trees
It is a supervised machine learning and the output of this is not only dependent on
algorithm. From the name, it defines that it is a one decision tree but every single decision tree.
decision-making tool and it uses flowchart like Every tree is independent, none of any tree has
tree structure [8]. It supports both the interaction with each other while building the
335

model. All these trees run parallelly but 2.8 Support Vector Regression
independently. Every tree performs its
prediction and these predictions are aggregated This algorithm is one of the most popular
and perform arithmetic mean on that to produce algorithms for regression problems. Basically
a single final result. It can be formulated as: [8], it draws a boundary line or straight line so
g(x) = f0(x) + f1(x) + f2(x) + --- fn(x) (8) that n-dimensional space can be segregated into
classes. The Boundary line is drawn in such a
Here, g(x) is a single final result whereas fi(x)
way that it can cover maximum data points
is a decision tree.
between them. This boundary line is known as
Each Decision tree can be drawn using a
a hyperplane. There are two types of SVR:
random sample from the original dataset by
splitting it and add randomness to it to prevent
it from overfitting. Random forest is one of the Linear SVR: This type of data is known as
highly accurate models which can handle linearly separable data. It draws a single straight
thousands of predictors without the deletion of line to differentiate two classes.
any variable.
From Figure 4, it is clear that Random Forest is Non-Linear SVR: This type of data is known
multiple Decision Trees with multiple features. as non-linear separable data. It is not possible to
segregate data into classes by just one single
line.
This linear and non-Linear data is handled
by the SVR kernel. Kernel Helps to find and
draw the hyperplane without increasing the cost
in n-dimensional space. Sometimes it is not
possible to find the hyperplane in n-
dimensional space. So, we draw n+1
dimensional space. The value of kernel can be
poly, RBF, sigmoid, gaussian for non-linear
datasets whereas for linear dataset value should
be linear kernel only to solve the problems.
Cross-Validation is also one of the
techniques which can be used in Support Vector
Regression from the training purpose of the
Figure 4: Random Forest model and then evaluate the model. It is failed
to generalize the pattern of the dataset but can
2.7 Extra Trees detect the fitting whereas cross-validation is
used to find the most accurate value but it may
It is also known as Extremely Randomized fail to enhance the accuracy.
Trees. Unlike, Random Forest and Decision
Tree, Extra Trees makes the next best split from
the uniform random splits from the subsets of
features and can't be substituted with another
sample. Extra Trees creates a greater number of
unpruned Decision Trees. Unlike Random
Forest, it makes random split. In addition to the
optimization of algorithms, it also adds
randomization. This model is faster than other
models. It takes less time to compute as it
doesn't have to select the optimal split but a
random split. Figure 5: Support Vector Regression

Figure 5 shows the result of Support Vector
Regression.
336

3. Evaluation indices 4. System Architecture
Evaluation indices are considered to
evaluate the model by various authors. The
performance can be checked by finding the
accuracy and error of the models. The lesser the
error, the more the accuracy is better than the
model.
R2 [7][13] (Coefficient of Determination)
and RMSE [13][14] (Root Mean Squared
Error) are the two methods for the model
optimization whereas MAE [7][14] (Mean
Absolute Error) and MSE (Mean Squared
Error) two methods to evaluate the model
Figure 6: System Architecture
means to check the error and tries to reduce it.
R2 its value varies between 0 to 1 defining the
From the System Architecture figure i.e. Figure
accuracy of the model.
𝑟𝑒𝑠 6, The prediction of power input parameters
R2 = 𝑡𝑜𝑡 (9)
will be of that type only like the type of a day
Here, the res is the sum of the square of the i.e. summer, winter, rainy. The material used
residual error whereas tot is the total sum of the for construction is used i.e. if the material is
error. If R2 >0 it means the result is accurate, if insulation then it would be best. Dataset is a
R2 means the same result and R2 is ambiguous daily, hourly basis, yearly basis, etc. Other
results. details of buildings can also be included i.e.
𝑛 height, width, illumination, occupancy, etc.
res= ∑𝑖=1(𝑦𝑖 − 𝑦𝑖 ′)2 (9 i)
Even dataset can be of 3 types i.e. Real data,
Simulated data, Sensor-based data [8].
𝑛 After analyzing the dataset, it undergoes the
tot=∑𝑖=1(𝑦𝑖 − 𝑦𝑚𝑒𝑎𝑛 ′)2 (9 ii)
feature extraction phase, in which filtration of
RMSE a metric and it is dependable on the the dataset is done i.e. unusual data and noise is
discarded and only useful data is left behind and
scale. thus undergoes transformation process i.e.
dataset is transformed according to the
𝑛
∑𝑖=1(𝑦𝑖 −𝑦𝑖 ′)2 (10) requirement of the algorithm and after that size
RMSE = √ 𝑛
of the dataset is decreased to increase the
performance and this process is known as
MAE is also dependable on the scale. Basically, reduction of dataset.
it finds the absolute of all the data points either After feature extraction, transformation and
if they have a negative error or positive error. reduction, the entire dataset is divided into
None of any error cancels out the effect of each training and testing dataset and there is a
other training and testing phase of the model. In the
n
(11) training of a model first we have to select an
∑ |yi -y'i |
MAE = i=1 appropriate algorithm for the prediction and
n thus training can be done in two ways i.e. First
principle approach i.e. the prediction of power
MSE is similar to that of MAE but instead of is done based on the current situation rather
absolute the values, it squares the values then than observing the history or Data-Driven
finds the Mean error. approach i.e. want to give detailed information
n
∑ (yi -y'i ) * (yi -y'i ) (12) about building and thus results are validated
MSE = i=1 and accuracy is measured. If it ends with good
n
accuracy then our algorithm is ready for an
In the above equations, yi is the predicted unknown dataset of the building and prediction
value whereas yi’ is the actual values and n is there demand of power. Thus, results are
the number of data points on the graph compared based on evaluation metrics and
337

declare one model as the best model with dataset is taken from Kaggle and we uploaded
greater accuracy and minimum error. on GitHub, link:
https://raw.githubusercontent.com/navkapil/go
5. Experimental Settings oglecolab/master/pwrpred.csv
The dataset consists of 1048576 rows and 9
columns. It contains per minute data of the day
Using various libraries of python such as for approx. 2 years from 16-12-2006 17:24:00
pandas, scipy we can carry out the analysis. For
to 13-12-2008 21:38:00. 9 columns of the
the basic implementation of mining of data or
dataset are DateTime, global active power,
ML we are analysing the library “SKlearn”. global reactive power, Voltage, Global
Sklearn is a module of python language which intensity, sub-metering 1, sub-metering 2, sub-
integrates all the ML algorithms in the world of metering 3, and sub-metering 4. Out of all these
different python libraries such as NumPy, parameters, global active power is taken as
sciPy, Matplotlib. We are trying to use this as output, and the rest all other as input
the study shows that this gives efficient and parameters. This dataset is divided into training
simple solutions. Now training and testing can and testing with a percentage of 90% and 10%
be done in different models and then compared
respectively. After this division, it undergoes
from each other in-order to get the better
normalization so that all the parameters lie in
outcomes. From the review of different authors
the same range. Models for the experiments are
and according to our online study we can say Linear Regression, Decision tree, Random
that ensemble models of machine learning give Forest, Extra Trees, Lasso Regression, Ridge
better performance than others. For the time Regression, Elastic Net Regression, Support
being the study about different models is being Vector Regression. The result of the dataset
done and a dataset is collected. Also, the also depends on the type of the dataset i.e.
gradient boosting may give us the accurate whether it is a linear or Non-Linear Dataset. All
results. The gradient boosting is the algorithm
the parameters of a model have Default values.
which also trains various models in gradual, The performance of SVR is affected if data is
additive and sequential manner. Since this linear and we set the value of kernel as non-
algorithm is prone to overfitting therefore it linear (RBF, poly, gaussian) and vice-versa.
uses hyperparameter tuning. This analysis
The kernel we have taken is RBF. We checked
totally depends on how better accuracy we are
values of C and gramma varies from .01 to 100
demanding according to the dataset available. and 0.01 to 10 by 10 units respectively and we
On the other hand, Random forests also prove got good results at C=100 and gamma = 0.1 and
to be good for the efficient results. It is the degree as 3 (default). Linear Regression is the
algorithm which uses a special process known baseline model for this project because it gives
as "Early Stopping" in which training stops
very good results by taking default values of all
once the performance on testing data stops the parameters. Decision Tree, Random Forest,
improving further. This is an optimized and Extra Trees are somewhat similar models.
technique. Also, this avoids overfitting. Indecision tree default value of random_state is
Therefore, this also can be the model for our None but we give it as 42 as this parameter
project which is to be applied and analysed.
dominates the randomness of the estimator, in
And this is beneficial for categorization as well random forest n_estimators tells the number of
as regresssion problem. Also, this can be trees to be formed and we checked the
modelled for categorical values. After the performance of the model by varying its value
search results, this model can also be compared from 10 to 100 and got a better result for value
for the study purpose gathering more dataset as 50 whereas the value of n_estimators in Extra
and then visualizing for more accuracy and trees is 150 but its default value is 100. If we set
better performance for achieving the results. the default value of alpha in Lasso Regression
The search result done after training can be then it works as a Linear Regression but it is not
gathered in a document and the result could be advised. For better performance, we set its
concluded. value as 0.01 whereas in Ridge Regression if
Experiments should be done to declare one we set the default value of alpha then it works
model as the best model and for experiments, as a Logistic Regression so it would be better if
we need a dataset. Generally, the dataset can
we tuned them with value as 0.1. Basically,
vary from 2 weeks [9] to 4 years [10]. So, the
alpha regularization improves the problem
338

conditioning and hence lower down the R2_score,MSE, MAE for the prediction of
variance of the estimators. For Elastic Net next-day power consumption.
Regression, we tuned two parameters that are From the table i.e. Table 1, it is clear that
alpha and l1_ratio. For alpha = 0 it is solved by Support Vector Machine and Lasso Regression
Linear regression. If l1_ratio = 0 then penalty is give better accuracy with the minimum error
l2 i.e. Ridge Regression and if l1_ratio = 1 then where Linear Regression is the worst performer
penalty is l1 i.e. Lasso Regression. 0 < l1_ratio out of these 8 models that is why Linear
< 1, is the combination of l1 and l2 penalty. Rest Regression is taken as Baseline Model and
all the parameters of the models will be the Support Vector Machine as a benchmark
default value. system.
Figure (7) (8) shows that Elastic Net,
6. Experiment Result Random Forest, Extra Trees, Support Vector
Regression, Lasso all have nearly the same
results but Support Vector Regressor and Lasso
It shows the results of different models using give good results.
4 evaluation indices i.e., RMSE,

Table 1:
Summary of results of models
Models RMSE R2_score MSE MAE

Linear Regression 0.4088 0.9999 0.1671 0.235

Elastic Net 0.11204 0.9862 0.01255 0.0916

Random Forest 0.1451 0.9998 0.02 0.0345

Extra trees 0.01088 0.99988 0.012 0.0248

Support Vector 0.03404 0.9989 0.0011 0.02706

Decision tree 0.01952 0.9999 0.3026 0.0413

Ridge 0.2112 0.9999 0.1238 0.7048

lasso 0.04626 0.99990 0.0001 0.01119

RMSE and R2 for various regressors
1.2
1
0.8
0.6
0.4
0.2
0
Linear Elastic Net Random Extra trees Support Decision tree Ridge lasso
Regression Forest Vector

Rmse R2_score

Figure 7: RMSE and R2
339

Test error in the prediction with respect to various
regressors
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Linear Elastic Net Random Extra trees Support Decision tree Ridge lasso
Regression Forest Vector

MSE MAE

Figure 8: MSE and MAE

7. Conclusion system. In 2015 Conference on Power,
Control, Communication and
Computational Technologies for
This paper focuses on the implementation of
Sustainable Growth (PCCCTSG) (pp. 178-
various machine learning algorithms for
182). IEEE.
predicting the power of buildings. It is not
[3] Amasyali, K., & El-Gohary, N. M. (2018).
necessary that for a particular dataset it will
A review of data-driven building energy
always show a good result, sometime it may
consumption prediction studies. Renewable
show the uncertain results as every model have
and Sustainable Energy Reviews, 81,
their pros and cons. By varying one or more
11921205.
parameters of the building what would be its
[4] Gomes, Á., Antunes, C. H., & Oliveira, E.
effect on the other parameters, it would be very
(2011). Direct load control in the
difficult to predict as we can't set all the
perspective of an electricity retailer–a multi-
parameters according to our requirements.
objective evolutionary approach. In Soft
Because of prediction, it would be very easy to
Computing in Industrial Applications (pp.
make long term plans. Weather data plays a
13-26). Springer, Berlin, Heidelberg.
vital role in the prediction of the power of a
[5] Babar, M., Ahamed, T. I., AlAmmar, E. A.,
building. According to our analysis of the result
& Shah, A. (2013). A novel algorithm for
of the model's Support vector regression, lasso
demand reduction bid based incentive
regression is the best model. Even to increase
program in direct load control. Energy
the accuracy of these models we can use Long
Procedia, 42, 607613.
Short Term Memory (LSTM) as it is very
[6] Liu, D., & Chen, Q. (2013, June). Prediction
robust deep learning algorithm for prediction of
of building lighting energy consumption
time based forecasting and has potential to give
based on support vector regression. In 2013
accurate prediction results or hybrid approach.
9th Asian Control Conference (ASCC) (pp.
1-5). IEEE.
8. References [7] Muralitharan, K., Sakthivel, R., & Shi, Y.
(2016). Multiobjective optimization
[1] Setlhaolo, D., Xia, X., & Zhang, J. (2014). technique for demand side management
Optimal sceduling of household appliances with load balancing approach in smart grid.
for demand response. Electric Power Neurocomputing, 177, 110-119.
Systems Research, 116, 24-28. [8] Amasyali, K., & El-Gohary, N. (2016).
[2] Gayatri, P., Sukumar, G. D., & Building lighting energy consumption
Jithendranah, J. (2015, December). Effect of prediction for supporting energy data
load change on source parameters in power analytics. Procedia Engineering, 145, 511-
340

517. Information Technology (pp. 231-236).
[9] Liu, D., & Chen, Q. (2013, June). Prediction IEEE.
of building lighting energy consumption [13] Hahn, H., Meyer-Nieberg, S., & Pickl, S.
based on support vector regression. In 2013 (2009). Electric load forecasting methods:
9th Asian Control Conference (ASCC) (pp. Tools for decision making. European
1-5). IEEE journal of operational research, 199(3), 902-
[10] Dagnely, P., Ruette, T., Tourwé, T., 907.
Tsiporkova, E., & Verhelst, C. (2015, [14] Gonzalez-Romera, E., Jaramillo-Moran, M.
September). Predicting hourly energy A., & Carmona-Fernandez, D. (2006).
consumption. can regression modeling Monthly electric energy demand forecasting
improve on an autoregressive baseline?. In based on trend extraction. IEEE
International Workshop on Data Analytics Transactions on power systems, 21(4),
for Renewable Energy Integration (pp. 105- 1946-1953.
122). Springer, Cham. [15] Stavrakas, V., & Flamos, A. (2020). A
[11] Naji, S., Çelik, O. C., Alengaram, U. J., modular high-resolution demand-side
Jumaat, M. Z., & Shamshirband, S. (2014). management model to quantify benefits of
Structure, energy and cost efficiency demand-flexibility in the residential
evaluation of three different lightweight sector. Energy Conversion and
construction systems used in low-rise Management, 205, 112339.
residential buildings. Energy and buildings, [16] Luo, X. J., Oyedele, L. O., Ajayi, A. O., &
84, 727-739. Akinade, O. O. (2020). Comparative study
[12] Ali, S., Ahmad, R., & Kim, D. (2012, of machine learning-based multi-objective
December). A study of pricing policy for prediction framework for multiple building
demand response of home appliances in energy loads. Sustainable Cities
smart grid based on M2M. In 2012 10th andSociety, 61, 102283.
International Conference on Frontiers of