331 Residential Electricity Demand Prediction using Machine Learning Manpreet Kaur, Shalini Panwar, Ayush Joshi, and Kapil Gupta National Institute of Technology, Kurukshetra Haryana, India Abstract This paper presents an analysis of the usage of electric power in the residential sector and predicting the demand for power consumption of the next day and aims to improve the prediction accuracy and find the best model and try to reduce the cost of overall power consumption in a building. Consumption of electric power can be broadly divided into two categories i.e. commercial and residential sectors. This procedure consists of three steps i.e. feature extraction, normalization, and validating. Heavy fluctuation may arise in the residential sector may cause damage to electrical appliances. To match the demand of customers and generation of power at generating unit prediction is necessary. A variable power pattern may cause stress at power grid. Prediction of electric consumption is required in prior so that load at the power grid could be balanced. To meet the requirement of demand, appliances can be swapped from peak hours to an off-peak hour and also reduce the cost at the customers’ side. The performance of different models can be compared by using different evaluating indices are: Coefficient of determination (R2), mean absolute error (MAE), mean squared error (MSE). Out of Linear Regression, Lasso Regression, Ridge Regression, Elastic Net Regression, Random Forest Regression, Extra Trees, Support Vector Regression, Decision tree. Lasso Regression and Support Vector Machine outperforms with an accuracy of 99.99% and 99.89% with 0.01% and %0.11 % mean squared error respectively. Keywords 1 Prediction, Demand, Consumption, Residential, Generation, Power, Dynamic, Regression, Learning, Temperature, Building. 1. Introduction issues i.e. fluctuation. When we receive more or less power than its requirement which can degrade the performance of the appliances. The As we know electric power plays a vital role in consumption of power by appliances is today’s era. To generate the power accordingly dependent on various input parameters i.e. type so that it can fulfill the demand of customer is of day [12]. If it’s summer season, the Air- not an easy task as natural sources of generation conditioners appliance will consume more of power is extincting day by day. The total power. In the winter season, heating appliances generation of power is distributed among will consume more power whereas, in the rainy various sectors according to their requirement. season, lighting will consume more power. The Sectors can be residential and commercial total cost of power in the residential sector also (offices, factories). For now, we are just depends on the type of the hour i.e. peak hour, focusing on the residential sector. It is not off-peak hour, mid-peak hour [11] in addition necessary that all the electrical appliances will to the amount of power consumed. During peak consume the same amount of power. hours, load at a power grid, and cost of power Appliances may have to undergoes various ISIC’21: International Semantic Intelligence Conference, February 25-27, 2021, New Delhi, India EMAIL: mannri346@gmail.com (M. Kaur), panwarshalini40@gmail.com (S. Panwar), ayushjoshi75.aj@gmail.com (A. Joshi); ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 332 per unit hour is more as compared to an off- IV, experimental settings of the models which peak hour, load at a power grid, and cost of per explain the dataset and how the parameters of unit hour is least. To handle this issue, models are tuned. In section V, experimental appliances can be swapped from peak hour to observation and results. Section VI, finally off-peak hour or we can say appliances can be concludes the paper. scheduled in order to manage the overall load of the power grid and total cost of customer. To schedule the appliances, the power generating 2. Related Work company and the customers make a deal in order to compensate either by a price based [1][2][3][4] or incentive-based [4][5][6] 2.1Linear Regression demand response type. To meet the availability of electric power and its demand we have to Linear Regression is used to find the make predictions. Wrong or no prediction may relationship between predictor or independent lead to a violation of the agreement of service variable and target or dependent variable. If one level. Consumption of electric power is directly variable is expressed accurately to another dependent on climatic conditions. Either if we variable then it is known as deterministic. The make a prediction still, we can’t guarantee the basic idea of the linear regression model is to weather of the next second as the weather derive the best fit line, also known as the changes drastically. If the customer knows they regression line. Sum of the distance between the need consumption of power, then they can points on the graph and regressor line are the make long-term plans. Even dynamic pricing total prediction error of all the data points. may help to know the price of the power in the Smaller the error, the better the result and vice next hour. The biggest challenge in the – versa [8]. prediction of building like indoor temperature, Linear regression Equation: outdoor temperature because of climate change. 𝑌(𝑝𝑟𝑒𝑑) = 𝑏0 + 𝑏1 ∗ 𝑥 (1) It is very difficult to predict the effect on other parameters by increasing or decreasing its value b0 is intercepting whereas b1 is the slope of during the construction of the building. As it the regression line. In order to get the minimum was already discussed that it is not possible to error, the value of b0 and b1 should be include all the key features or we can say wide minimum. range of building features and general principle The error between the predicted value and to predict the demand of a power in residential the actual value can be calculated as: sector. But in this, author stated one model i.e. dynamic high resolution demand side 𝐸𝑟𝑟𝑜𝑟 = ∑ (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑣𝑎𝑙𝑢𝑒 (2) management which combine all the building features and general principle [15]. And also − 𝑎𝑐𝑡𝑢𝑎𝑙𝑣𝑎𝑙𝑢𝑒 ) analysed that maximum of the energy Exploring the value of ‘b1’: consumption is consumed by lights and If b1<0 then it will have a negative relationship, building integrated photovoltaic [16]. Different which means a decrease in target value with an machine learning algorithms i.e. Linear increase in predictor value. If b1>0 then it will Regression, Elastic Net, Decision Tree, have a positive relationship, which means the Random Forest, SVM, etc. out of all these value of the target will increase with the models Linear Regression is the benchmark increase in the value of predictors. model for the prediction, and the performance Exploring the value of b0: If the predictor is 0 of different models can be compared by using then the equation will be meaningless and of no different evaluating indices are: Coefficient of use. determination (R2) [7], mean absolute error (MAE) [7], mean squared error (MSE). This paper is organized as follows: Section II related work of machine learning algorithms in which 8 Regression algorithms used for prediction are briefly explained. In section III, 4 evaluation indices are used to evaluate the performance of prediction models. In section 333 In Figure 1, it shows the graph of predicted other. L1 regularization will results in a sparse value of power in a residential sector. model. It is a challenging task to select one variable as the predictor which particulates suite the property of Lasso Regression. The selection can be done haphazardly but it can result in a very bad decision means a very time-consuming process. 2.3 Ridge Regression If an overfitting or underfitting type of problem arises then there are chances that it works as linear regression. Ridge Regression is a method to create a parsimonious model i.e. when the number of predictor values is more than the number of observations means when Figure 1: Linear Regression there is a correlation between predictor values i.e. the dataset has multicollinearity. Tikhivov's method has a larger set as 2.2 Lasso Regression compared to the parsimonious model but it is similar to ridge regression. If a dataset contains It stands for Least Absolute Shrinkage and a noise i.e. statistical noise still this model can Selection Operator. From its full form, it is clear produce the solution. that it uses shrinkage and it is a type of Linear Ridge regression undergoes L2 Regression. Here, shrink means that values of regularization. Also known as the L2 penalty. the dataset will be shrunk towards the central coefficients of data values are shrunk by the point, say similar to that of Mean. The same factor and none of the value is eliminated. performance of this model is good when the Unlike L1 regularization, L2 will not result in a dataset contains multicollinearity. sparse model. Lasso Regression undergoes L1 Regularization means it is the summation of the ∑𝑛𝑖=1(𝑦𝑖 − 𝑦𝑖 ′ )2 = ∑𝑛𝑖=1(𝑦𝑖 − (4) absolute value of the magnitude of the 𝑝 2 𝑝 ∑𝑗=0 𝑤𝑗 ∗ 𝑥𝑖𝑗) + λ ∗ ∑𝑗=0 𝑤𝑗 2 coefficient. Here, a few of the coefficients can be zero and that values can be eliminated from To strengthen the term of penalty, we have to the dataset. Larger penalties will result in the tune the parameter i.e. λ When λ is 0, least coefficient values near zero whereas smaller squares and ridge regression are equal. When λ penalties will result in the coefficient values far is ∞, all coefficient will be zero. The overall away from zero. The aim of this algorithm is to penalty will range from 0 to ∞ Overall, Least minimize the error ∑𝑛𝑖=1(𝑦𝑖 − ∑𝑗 𝑥𝑖𝑗 ∗ 𝛽𝑗)2 + λ (3) Square uses the following equation: 𝐵′ = (𝑋 ′ 𝑋)−1 𝑋′𝑌 (5) ∑𝑝𝑗=1 |𝛽𝑗| Here, X is a scaled and centered matrix. If λ = 0 means there is an absence of When columns of the X matrix have high regularization and thus we get Ordinary Least multicollinearity then the cross product of Squares solution. When λ-> INF, then (X’X) matrix will be singular or nearly coefficients will lead to 0 and the model left out Singular. Including ridge parameter (k) to the be a constant function To tune the parameters, above equation, then the new equation will be λ is the amount of shrinkage when λ = 0, 𝐵′ = (𝑋 ′ 𝑋 + 𝑘𝐼)−1 X’Y (6) parameters will not be eliminated. When λ increases, bias also increases whereas when λ decreases, variance also decreases. Bias and variance are inversely proportional to each 334 continuous and discrete output values. Here, continuous output example is to predict the required power of the building where our ultimate goal is to reduce the overall cost of the power whereas discrete output value means to predict the rain on a particular day that whether it rains or not. Decision Nodes are known as conditions of a flowchart whereas terminals are known as results of a flowchart. The root node is called as best predictor node as this node is the topmost decision node. Every machine learning algorithm model has its advantages and disadvantages but the advantage of the decision Figure 2: Lasso Regression, Ridge tree is that it is a very good model at handling the tabular data with categorical features with Regression, Elastic Net Regression lesser than hundreds of categories and numerical data. 2.4 Elastic Net Regression The decision tree can capture the non-linear interaction between the predictors and the It is a technique that uses properties of the target value. Suppose, target variable is air- L1 penalty (Lasso Regression) and L2 penalty conditioner and predictor variable is room (Ridge Regression). To improve the occupancy (empty or not) and outdoor air- regularization, we combine both lasso and ridge temperature (<=26o C) see figure 3 regression. It is a 2-step process i.e. in the first step it finds the coefficient of ridge regression by selecting group feature and in the second step, it performs lasso sort of the coefficients of shrinking by performing feature selection. The objective of this model is at minimizing by using the following equation: ∑n (yi - xiβ) * (yi - xiβ) 1-α (7) Lenet (β) = i=1 + λ( 2n 2 ∑m m j=1 βj * βj + α ∑j=1 |βj|) Here, α is the mixing parameter i.e. α = 1 reduces the function to lasso regression whereas α = 0 reduces the function to ridge regression. Parameter λ is highly dependent on the α parameter. It has better predictive potential than lasso regression. One of the biggest disadvantages of Elastic Net Figure 3: Decision Tree Regression is that it may or may not remove all the irrelevant coefficients. In Figure 2, it shows the relationship between 2.6 Random forest Lasso Regression, Ridge Regression and ElasticNet Regression. It is a supervised machine learning algorithm. The random forest can perform both classification and regression problems. The 2.5 Decision Tree random forest contains multiple decision trees It is a supervised machine learning and the output of this is not only dependent on algorithm. From the name, it defines that it is a one decision tree but every single decision tree. decision-making tool and it uses flowchart like Every tree is independent, none of any tree has tree structure [8]. It supports both the interaction with each other while building the 335 model. All these trees run parallelly but 2.8 Support Vector Regression independently. Every tree performs its prediction and these predictions are aggregated This algorithm is one of the most popular and perform arithmetic mean on that to produce algorithms for regression problems. Basically a single final result. It can be formulated as: [8], it draws a boundary line or straight line so g(x) = f0(x) + f1(x) + f2(x) + --- fn(x) (8) that n-dimensional space can be segregated into classes. The Boundary line is drawn in such a Here, g(x) is a single final result whereas fi(x) way that it can cover maximum data points is a decision tree. between them. This boundary line is known as Each Decision tree can be drawn using a a hyperplane. There are two types of SVR: random sample from the original dataset by splitting it and add randomness to it to prevent it from overfitting. Random forest is one of the Linear SVR: This type of data is known as highly accurate models which can handle linearly separable data. It draws a single straight thousands of predictors without the deletion of line to differentiate two classes. any variable. From Figure 4, it is clear that Random Forest is Non-Linear SVR: This type of data is known multiple Decision Trees with multiple features. as non-linear separable data. It is not possible to segregate data into classes by just one single line. This linear and non-Linear data is handled by the SVR kernel. Kernel Helps to find and draw the hyperplane without increasing the cost in n-dimensional space. Sometimes it is not possible to find the hyperplane in n- dimensional space. So, we draw n+1 dimensional space. The value of kernel can be poly, RBF, sigmoid, gaussian for non-linear datasets whereas for linear dataset value should be linear kernel only to solve the problems. Cross-Validation is also one of the techniques which can be used in Support Vector Regression from the training purpose of the Figure 4: Random Forest model and then evaluate the model. It is failed to generalize the pattern of the dataset but can 2.7 Extra Trees detect the fitting whereas cross-validation is used to find the most accurate value but it may It is also known as Extremely Randomized fail to enhance the accuracy. Trees. Unlike, Random Forest and Decision Tree, Extra Trees makes the next best split from the uniform random splits from the subsets of features and can't be substituted with another sample. Extra Trees creates a greater number of unpruned Decision Trees. Unlike Random Forest, it makes random split. In addition to the optimization of algorithms, it also adds randomization. This model is faster than other models. It takes less time to compute as it doesn't have to select the optimal split but a random split. Figure 5: Support Vector Regression Figure 5 shows the result of Support Vector Regression. 336 3. Evaluation indices 4. System Architecture Evaluation indices are considered to evaluate the model by various authors. The performance can be checked by finding the accuracy and error of the models. The lesser the error, the more the accuracy is better than the model. R2 [7][13] (Coefficient of Determination) and RMSE [13][14] (Root Mean Squared Error) are the two methods for the model optimization whereas MAE [7][14] (Mean Absolute Error) and MSE (Mean Squared Error) two methods to evaluate the model Figure 6: System Architecture means to check the error and tries to reduce it. R2 its value varies between 0 to 1 defining the From the System Architecture figure i.e. Figure accuracy of the model. 𝑟𝑒𝑠 6, The prediction of power input parameters R2 = 𝑡𝑜𝑡 (9) will be of that type only like the type of a day Here, the res is the sum of the square of the i.e. summer, winter, rainy. The material used residual error whereas tot is the total sum of the for construction is used i.e. if the material is error. If R2 >0 it means the result is accurate, if insulation then it would be best. Dataset is a R2 means the same result and R2 is ambiguous daily, hourly basis, yearly basis, etc. Other results. details of buildings can also be included i.e. 𝑛 height, width, illumination, occupancy, etc. res= ∑𝑖=1(𝑦𝑖 − 𝑦𝑖 ′)2 (9 i) Even dataset can be of 3 types i.e. Real data, Simulated data, Sensor-based data [8]. 𝑛 After analyzing the dataset, it undergoes the tot=∑𝑖=1(𝑦𝑖 − 𝑦𝑚𝑒𝑎𝑛 ′)2 (9 ii) feature extraction phase, in which filtration of RMSE a metric and it is dependable on the the dataset is done i.e. unusual data and noise is discarded and only useful data is left behind and scale. thus undergoes transformation process i.e. dataset is transformed according to the 𝑛 ∑𝑖=1(𝑦𝑖 −𝑦𝑖 ′)2 (10) requirement of the algorithm and after that size RMSE = √ 𝑛 of the dataset is decreased to increase the performance and this process is known as MAE is also dependable on the scale. Basically, reduction of dataset. it finds the absolute of all the data points either After feature extraction, transformation and if they have a negative error or positive error. reduction, the entire dataset is divided into None of any error cancels out the effect of each training and testing dataset and there is a other training and testing phase of the model. In the n (11) training of a model first we have to select an ∑ |yi -y'i | MAE = i=1 appropriate algorithm for the prediction and n thus training can be done in two ways i.e. First principle approach i.e. the prediction of power MSE is similar to that of MAE but instead of is done based on the current situation rather absolute the values, it squares the values then than observing the history or Data-Driven finds the Mean error. approach i.e. want to give detailed information n ∑ (yi -y'i ) * (yi -y'i ) (12) about building and thus results are validated MSE = i=1 and accuracy is measured. If it ends with good n accuracy then our algorithm is ready for an In the above equations, yi is the predicted unknown dataset of the building and prediction value whereas yi’ is the actual values and n is there demand of power. Thus, results are the number of data points on the graph compared based on evaluation metrics and 337 declare one model as the best model with dataset is taken from Kaggle and we uploaded greater accuracy and minimum error. on GitHub, link: https://raw.githubusercontent.com/navkapil/go 5. Experimental Settings oglecolab/master/pwrpred.csv The dataset consists of 1048576 rows and 9 columns. It contains per minute data of the day Using various libraries of python such as for approx. 2 years from 16-12-2006 17:24:00 pandas, scipy we can carry out the analysis. For to 13-12-2008 21:38:00. 9 columns of the the basic implementation of mining of data or dataset are DateTime, global active power, ML we are analysing the library “SKlearn”. global reactive power, Voltage, Global Sklearn is a module of python language which intensity, sub-metering 1, sub-metering 2, sub- integrates all the ML algorithms in the world of metering 3, and sub-metering 4. Out of all these different python libraries such as NumPy, parameters, global active power is taken as sciPy, Matplotlib. We are trying to use this as output, and the rest all other as input the study shows that this gives efficient and parameters. This dataset is divided into training simple solutions. Now training and testing can and testing with a percentage of 90% and 10% be done in different models and then compared respectively. After this division, it undergoes from each other in-order to get the better normalization so that all the parameters lie in outcomes. From the review of different authors the same range. Models for the experiments are and according to our online study we can say Linear Regression, Decision tree, Random that ensemble models of machine learning give Forest, Extra Trees, Lasso Regression, Ridge better performance than others. For the time Regression, Elastic Net Regression, Support being the study about different models is being Vector Regression. The result of the dataset done and a dataset is collected. Also, the also depends on the type of the dataset i.e. gradient boosting may give us the accurate whether it is a linear or Non-Linear Dataset. All results. The gradient boosting is the algorithm the parameters of a model have Default values. which also trains various models in gradual, The performance of SVR is affected if data is additive and sequential manner. Since this linear and we set the value of kernel as non- algorithm is prone to overfitting therefore it linear (RBF, poly, gaussian) and vice-versa. uses hyperparameter tuning. This analysis The kernel we have taken is RBF. We checked totally depends on how better accuracy we are values of C and gramma varies from .01 to 100 demanding according to the dataset available. and 0.01 to 10 by 10 units respectively and we On the other hand, Random forests also prove got good results at C=100 and gamma = 0.1 and to be good for the efficient results. It is the degree as 3 (default). Linear Regression is the algorithm which uses a special process known baseline model for this project because it gives as "Early Stopping" in which training stops very good results by taking default values of all once the performance on testing data stops the parameters. Decision Tree, Random Forest, improving further. This is an optimized and Extra Trees are somewhat similar models. technique. Also, this avoids overfitting. Indecision tree default value of random_state is Therefore, this also can be the model for our None but we give it as 42 as this parameter project which is to be applied and analysed. dominates the randomness of the estimator, in And this is beneficial for categorization as well random forest n_estimators tells the number of as regresssion problem. Also, this can be trees to be formed and we checked the modelled for categorical values. After the performance of the model by varying its value search results, this model can also be compared from 10 to 100 and got a better result for value for the study purpose gathering more dataset as 50 whereas the value of n_estimators in Extra and then visualizing for more accuracy and trees is 150 but its default value is 100. If we set better performance for achieving the results. the default value of alpha in Lasso Regression The search result done after training can be then it works as a Linear Regression but it is not gathered in a document and the result could be advised. For better performance, we set its concluded. value as 0.01 whereas in Ridge Regression if Experiments should be done to declare one we set the default value of alpha then it works model as the best model and for experiments, as a Logistic Regression so it would be better if we need a dataset. Generally, the dataset can we tuned them with value as 0.1. Basically, vary from 2 weeks [9] to 4 years [10]. So, the alpha regularization improves the problem 338 conditioning and hence lower down the R2_score,MSE, MAE for the prediction of variance of the estimators. For Elastic Net next-day power consumption. Regression, we tuned two parameters that are From the table i.e. Table 1, it is clear that alpha and l1_ratio. For alpha = 0 it is solved by Support Vector Machine and Lasso Regression Linear regression. If l1_ratio = 0 then penalty is give better accuracy with the minimum error l2 i.e. Ridge Regression and if l1_ratio = 1 then where Linear Regression is the worst performer penalty is l1 i.e. Lasso Regression. 0 < l1_ratio out of these 8 models that is why Linear < 1, is the combination of l1 and l2 penalty. Rest Regression is taken as Baseline Model and all the parameters of the models will be the Support Vector Machine as a benchmark default value. system. Figure (7) (8) shows that Elastic Net, 6. Experiment Result Random Forest, Extra Trees, Support Vector Regression, Lasso all have nearly the same results but Support Vector Regressor and Lasso It shows the results of different models using give good results. 4 evaluation indices i.e., RMSE, Table 1: Summary of results of models Models RMSE R2_score MSE MAE Linear Regression 0.4088 0.9999 0.1671 0.235 Elastic Net 0.11204 0.9862 0.01255 0.0916 Random Forest 0.1451 0.9998 0.02 0.0345 Extra trees 0.01088 0.99988 0.012 0.0248 Support Vector 0.03404 0.9989 0.0011 0.02706 Decision tree 0.01952 0.9999 0.3026 0.0413 Ridge 0.2112 0.9999 0.1238 0.7048 lasso 0.04626 0.99990 0.0001 0.01119 RMSE and R2 for various regressors 1.2 1 0.8 0.6 0.4 0.2 0 Linear Elastic Net Random Extra trees Support Decision tree Ridge lasso Regression Forest Vector Rmse R2_score Figure 7: RMSE and R2 339 Test error in the prediction with respect to various regressors 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Linear Elastic Net Random Extra trees Support Decision tree Ridge lasso Regression Forest Vector MSE MAE Figure 8: MSE and MAE 7. Conclusion system. In 2015 Conference on Power, Control, Communication and Computational Technologies for This paper focuses on the implementation of Sustainable Growth (PCCCTSG) (pp. 178- various machine learning algorithms for 182). IEEE. predicting the power of buildings. It is not [3] Amasyali, K., & El-Gohary, N. M. (2018). necessary that for a particular dataset it will A review of data-driven building energy always show a good result, sometime it may consumption prediction studies. Renewable show the uncertain results as every model have and Sustainable Energy Reviews, 81, their pros and cons. By varying one or more 11921205. parameters of the building what would be its [4] Gomes, Á., Antunes, C. H., & Oliveira, E. effect on the other parameters, it would be very (2011). Direct load control in the difficult to predict as we can't set all the perspective of an electricity retailer–a multi- parameters according to our requirements. objective evolutionary approach. In Soft Because of prediction, it would be very easy to Computing in Industrial Applications (pp. make long term plans. Weather data plays a 13-26). Springer, Berlin, Heidelberg. vital role in the prediction of the power of a [5] Babar, M., Ahamed, T. I., AlAmmar, E. A., building. According to our analysis of the result & Shah, A. (2013). A novel algorithm for of the model's Support vector regression, lasso demand reduction bid based incentive regression is the best model. Even to increase program in direct load control. Energy the accuracy of these models we can use Long Procedia, 42, 607613. Short Term Memory (LSTM) as it is very [6] Liu, D., & Chen, Q. (2013, June). Prediction robust deep learning algorithm for prediction of of building lighting energy consumption time based forecasting and has potential to give based on support vector regression. In 2013 accurate prediction results or hybrid approach. 9th Asian Control Conference (ASCC) (pp. 1-5). IEEE. 8. References [7] Muralitharan, K., Sakthivel, R., & Shi, Y. (2016). Multiobjective optimization [1] Setlhaolo, D., Xia, X., & Zhang, J. (2014). technique for demand side management Optimal sceduling of household appliances with load balancing approach in smart grid. for demand response. Electric Power Neurocomputing, 177, 110-119. Systems Research, 116, 24-28. [8] Amasyali, K., & El-Gohary, N. (2016). [2] Gayatri, P., Sukumar, G. D., & Building lighting energy consumption Jithendranah, J. (2015, December). Effect of prediction for supporting energy data load change on source parameters in power analytics. Procedia Engineering, 145, 511- 340 517. Information Technology (pp. 231-236). [9] Liu, D., & Chen, Q. (2013, June). Prediction IEEE. of building lighting energy consumption [13] Hahn, H., Meyer-Nieberg, S., & Pickl, S. based on support vector regression. In 2013 (2009). Electric load forecasting methods: 9th Asian Control Conference (ASCC) (pp. Tools for decision making. European 1-5). IEEE journal of operational research, 199(3), 902- [10] Dagnely, P., Ruette, T., Tourwé, T., 907. Tsiporkova, E., & Verhelst, C. (2015, [14] Gonzalez-Romera, E., Jaramillo-Moran, M. September). Predicting hourly energy A., & Carmona-Fernandez, D. (2006). consumption. can regression modeling Monthly electric energy demand forecasting improve on an autoregressive baseline?. In based on trend extraction. IEEE International Workshop on Data Analytics Transactions on power systems, 21(4), for Renewable Energy Integration (pp. 105- 1946-1953. 122). Springer, Cham. [15] Stavrakas, V., & Flamos, A. (2020). A [11] Naji, S., Çelik, O. C., Alengaram, U. J., modular high-resolution demand-side Jumaat, M. Z., & Shamshirband, S. (2014). management model to quantify benefits of Structure, energy and cost efficiency demand-flexibility in the residential evaluation of three different lightweight sector. Energy Conversion and construction systems used in low-rise Management, 205, 112339. residential buildings. Energy and buildings, [16] Luo, X. J., Oyedele, L. O., Ajayi, A. O., & 84, 727-739. Akinade, O. O. (2020). Comparative study [12] Ali, S., Ahmad, R., & Kim, D. (2012, of machine learning-based multi-objective December). A study of pricing policy for prediction framework for multiple building demand response of home appliances in energy loads. Sustainable Cities smart grid based on M2M. In 2012 10th andSociety, 61, 102283. International Conference on Frontiers of