130


UDC 925.17
      Modeling the energy consumption of smart buildings using
                        artificial intelligence
                                   Eugene Yu. Shchetinin
        Department of Data Analysis, Decision Making and Financial Technologies
          Financial University under the Government of the Russian Federation
                     Leningradsky pr. 49, Moscow, 117198, Russia
                                    Email: riviera-molto@mail.ru

   Intelligent energy saving and energy efficiency technologies are the modern large-scale global
trend in the energy systems development. The demand for smart buildings is growing not
only in the world, but also in Russia, especially in the market of construction and operation of
large business centers, shopping centers and other business projects. Accurate cost estimates
are important for promoting energy efficiency construction projects and demonstrating their
economic attractiveness. The growing number of digital measurement infrastructure, used in
commercial buildings, has led to increased availability of high-frequency data that can be used
for anomaly detection and diagnostics of equipment, heating, ventilation, and optimization of
air conditioning. This led to the use of modern and efficient machine learning methods that
provide promising opportunities to obtain more accurate forecasts of energy consumption
of the building, and thus increase energy efficiency. In this paper, based on the gradient
boosting model, a method of modeling and forecasting the energy consumption of buildings
is proposed, and computer algorithms are developed to implement it. Energy consumption
dataset of 300 commercial buildings was used to assess the effectiveness of the proposed
algorithm. Computer simulations showed that the use of these algorithms has increased the
accuracy of the prediction of energy consumption in more than 80 percent of cases compared
to other machine learning algorithms.

    Key words and phrases: Energy consumption, smart buildings, smart meters, machine
learning, random forest, gradient boosting.


Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted for private and
academic purposes. This volume is published and copyrighted by its editors.
In: K. E. Samouylov, L. A. Sevastianov, D. S. Kulyabov (eds.): Selected Papers of the IX Conference
“Information and Telecommunication Technologies and Mathematical Modeling of High-Tech Systems”,
Moscow, Russia, 19-Apr-2019, published at http://ceur-ws.org
                                        Shchetinin E. Y.                                131


                                   1.     Introduction
    The most important direction of the world economic development is to improve the
energy efficiency of the industrial and consumer sectors of the economy. The state
program of the Russian Federation "Energy saving and increase of energy efficiency
for the period up to 2030" was approved as one of the most important directions of
development of the Russian economy. Several energy efficiency solutions have been
implemented to reduce the environmental impact and costs of the commercial building
sector. For example, long-term energy saving targets have been set at the state and
federal levels in Russia, and these targets should be achieved with the help of energy
efficiency programs. Measuring and verifying energy efficiency is a process of assessing
savings and is therefore crucial to determining the value of energy efficiency for building
owners, utility tariff consumers and service providers. Today, the growing availability of
data from smart meters and devices, combined with data mining, allows the process
to be optimized by increasing the level of automation while maintaining or improving
the accuracy of the forecasting consumption. The development of intelligent networks
in industry, finance and services creates new opportunities for the development and
application of effective methods of machine learning and big data analysis [1, 2]. The
introduction of smart meters provides benefits to end users, energy suppliers and network
operators by providing consumers with near-real-time consumption information that
will help them manage real energy consumption, save money and reduce greenhouse gas
emissions. At the same time, smart meters facilitate the planning and operation of the
distribution network as well as demand management. In this regard, smart metering
data will allow more accurate forecasting of demand, increase the efficiency of the
distribution networks, reduce the recovery time of supplies, as well as reduce operating
costs of networks. Intelligent technologies for collecting, recording and monitoring
energy consumption data create a wealth of data of different nature for use by energy
providers and network operators. The amount of data varies depending on the number
of smart meters installed, the number of smart meter messages received, the size of
the message, and the frequency of measurement records, for example, every 15 or 30
minutes. These data can be used for optimal network management, improving the
accuracy of load forecasting, identifying anomalous effects of electricity supply (peak
load condi-tions), formation of flexible price tariffs for different groups of consumers.
The basic models used in estimating the energy consumption profiles, are regres-sion
models that link the consumption of energy in buildings with such parameters as, for
example, the temperature of the external environment, humidity, characteristics of the
building, etc. Traditionally, for building such models, the data of monthly bills for
utilities are used. However, the increasing of data availability from smart meters with
an hour and 15-minute intervals helped us to create new methods for more accurate
energy consumption estimation and forecasting. In recent years, several approaches to
energy consumption modeling using smart meter data have appeared. These methods
are based on traditional linear regression, nonlinear regression, and machine learning
methods. The linear regression model, described in paper [3], includes the time of day,
the day of the week, and two temperature functions that allow different combinations of
heating and cooling. It can also include humidity and holidays as variables. This model
is well described by the usual regression, which is a common practice in such cases [7, 15].
Over the past two decades, significant progress has been made in the development
of new methods of machine learning, among which the most promising in terms of
prediction accuracy are the approaches of the family of ensemble learning algorithms.
Ensemble methods construct a model by training several relatively simple models, and
then combine them to create a more complex model with higher predictive properties.
The most well-known ensemble learning algorithms are bagging, random forests and
gradient boosting GBM [4, 5, 13]. The bagging, decision trees and random forests are
based on a simple averaging of the base model, while boosting algorithms are built on a
constructive iterative strategy. Although historically these ensemble machine learning
algorithms have been used with great success in many fields, they are just beginning
to be applied to problems in the field of the buildings energy consumption modeling.
132                                                                            ITTMM—2019


In this paper, we presented a new computer algorithm for estimating and fine-tuning
the parameters of the model of the energy consumption profile of a conglomerate of
commercial buildings [11]. For this we used machine learning algorithms of ensembles,
such as random forest and GBM gradient boosting [4, 5, 13] and compared them with the
regression model. We also developed effective numerical algorithms for estimating and
fine tuning the parameters of the GBM model and estimating the accuracy of energy
consumption forecasting in the buildings.

      2.   Methods of energy consumption modeling using machine learning
                                  algorithms
    The growing availability of data from smart meters in combination with data mining
allows to optimize the process of power consumption by increasing the level of automation
in storage and improving the accuracy of forecasting [1–3]. The main models used in
the assessment of energy consumption profiles are the regression models that relate
energy consumption in buildings to such parameters as ambient temperature, humidity,
individual characteristics of the building, etc. [3, 7]. Data from monthly utility bills have
traditionally been used to build such models, but the growth of up-to-date data from
smart meters with hourly and 30-minute intervals allows new methods to be developed
for more accurate forecasting. However, with the introduction of intelligent technologies
for the collection, analysis and control of energy consumption data, significant progress
has been made in the development of new methods using machine learning algorithms,
among which the most promising in terms of prediction accuracy are the approaches
of a family of ensembles algorithms. Ensemble methods create a model that trains
several relatively simple models and then combines them to create more complex but
with better predictive properties. Well-known learning algorithms used for this purpose,
such as regression trees, random forests [4], bagging [5], and boosting [13]. Although
these machine learning algorithms are used with great success in many areas, they
are only beginning to be applied to energy saving modeling problems. For example,
in papers [8, 12, 15] the authors used a random forest to predict the hourly energy
consumption, in [19] the random forest algorithm was applied to detect anomalies in
the energy consumption of the building. In this paper we propose a new algorithm for
selecting and fine - tuning the hyperparameters of the energy consumption model and
tested it on the example of conglomerate of the commercial buildings. To solve this task,
we used the GBM gradient boosting algorithm and developed algorithms to estimate
the accuracy of energy consumption forecasting. Using the decision trees as a regression
method has several advantages, one of which is that the splitting rules are an intuitive
and simple graphical way to visualize the results. In addition, by their design, they can
simultaneously process numerical, categorical and other types of input parameters. They
are also robust to outliers and can effectively handle missing data in the input parameter
space [18, 19]. The hierarchical structure of the decision tree automatically models the
interaction between input parameters and selects a variable, for example, if the input
parameter is not used at all during the split procedure, the prediction is independent
of that input parameter. Finally, decision tree algorithms are easy to implement and
computationally efficient with large amounts of data [9,10]. The GBM algorithm was first
proposed for classification problems [13]. Its basic principle is that several simple models,
called "weak learning models", should be combined into a iterative scheme to select
parameters in order to obtain a so-called strong learning model", that is, model with
better forecast accuracy. Thus, the GBM algorithm iteratively adds a new decision tree
at each step in a way that best one reduces the loss function. The algorithm continues
to run until the maximum number of iterations is reached or the specified precision is
reached. This means that at each new step, the decision trees added to the model in the
previous steps are fixed. Thus, the model can be improved in those parts of it where
it still does not evaluate residuals well enough. The GBM algorithm is more efficient,
if at each iteration the contribution of the added decision tree is taken into account
using a certain hyperparameter, that describes one of the important characteristics of
                                       Shchetinin E. Y.                                   133


the algorithm, namely, the model learning rate 𝜈. One of the problems in choosing the
hyperparameter 𝜈 is that in order to achieve the required accuracy 𝜖, depending on the
value of 𝜈, an appropriate number of iterations m is necessary. Namely, the smaller the
value of 𝜈, the greater the number of iterations 𝑚 necessary to perform. Thus, it is
necessary to develop a numerical procedure for the optimal choice of hyperparameters
𝜈, 𝑚 of GBM model. Another problem is the presence of autocorrelation in researched
datasets, which, as is known, introduces additional distortions in the model parameter
estimates [10, 16]. The solution to the above problems in our opinion is the use of
randomization in the process of constructing the GBM algorithm. At each iteration,
instead of the entire sample, a subsample, randomly extracted from it, is used to evaluate
the decision tree. In practice, however, it is hard to allocate a sufficient number of data
points to accurately assess the predictive performance of the models without affecting the
quality of the assessment. When the number of observations is not enough, reducing the
size of the training set can lead to a significant decrease in accuracy [6, 9, 17]. Therefore,
to take into account the influence of the subsample size on the quality of the model
fit, it is necessary to use subsamples of different sizes 𝑘. In this situation, to solve
the problems we have developed the numerical algorithm for selecting the optimal
values of the hyperparameters of the GBM model using the𝑘-fold cross validation and
randomization procedure. The 𝑘-fold cross-validation method involves splitting the data
set into 𝑘 sub-samples of approximately the same size. First, the model is estimated
using (𝑘 − 1) folds as a training sample and the 𝑘-th fold (test sample) is used to
determine the accuracy of the forecast. Secondly, the procedure is repeated k times,
and each time a new block is used as a test sample. To estimate the accuracy of CV
procedure, we use the root mean square error
                                           ⎯
                                           ⎸
                                           ⎸ 1 ∑︁ 𝑁
                               𝑅𝑀 𝑆𝐸𝑘 = ⎷                  ̃︀ 2 ,
                                                     (𝑦𝑖 − 𝑦)                             (1)
                                               𝑁 𝑖=1

    where 𝑦𝑖 is the value of the real data set, 𝑦̃︀ is the predicted value, 𝑁 - the number
of measurements corresponding to the size of the 𝑘-th subsample. The 𝑘- fold cross-
validation method uses the root mean square error as follows
                                                   𝑘
                                               1 ∑︁
                             𝑅𝑀 𝑆𝐸(𝐶𝑉 ) =            (𝑅𝑀 𝑆𝐸𝑖 ).
                                               𝑘 𝑖=1
    Our GBM model has four hyperparameters that need to be configured: 1) 𝑑 - the
depth of the decision trees; 2)𝑚 - the number of iterations; 3)𝜈 - the learning rate, which
is usually accepts the values between 0 and 1; 4) 𝑘 - the number of splits of sample for
sub-samples that are used at each iteration step in cross-validation procedure.
    The global problem of machine learning algorithms is their overfitting. This phenom-
enon is that it is usually a disadvantage of too complex model. In the case of the GBM
model, this can happen if too many iterations m or too depth of decision trees with
𝑑 parameter are selected. Thus, it is necessary to choose the optimal combination of
hyperparameters to avoid overfitting and at the same time to ensure the best accuracy
of the forecast. Most effective method to solve this problem is grid search method [14].
This approach consists of determining a grid of combinations of hyperparameter values,
constructing a model for each combination, and selecting the optimal combination using
indicators that determine the model in terms of prediction accuracy. In the 𝑘-fold
cross-validation algorithm described above, we used a grid search method to achieve the
optimal values of the hyperparameters of the model (1).
    The pseudocode of the presented algorithm is as follows:
    1. The user specifies the depth of the decision trees 𝑑, the number of iterations 𝑚,
the learning rate value 𝜈 and the accuracy forecast 𝜖;
    2. Initial step equal 𝑥0 = 𝑦 and 𝑓̃︁0 = 0, where 𝑦 is an average value of 𝑦 was
suggested as the initial value for decision tree 𝑓 ;
134                                                                          ITTMM—2019


   3. For 𝑗 = 1, ..., 𝑚 the following steps need to be completed:
   a) Randomly select a subsample (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1, ..., 𝑁 from the training dataset ,
where 𝑁 - the size of the subsample;
   b) Build the decision tree 𝑓̃︀𝑗 with 𝑑 depth using the values (𝑥𝑖 , 𝑦𝑖 ) and considering
the residuals 𝑧 𝑗 ;
   c) Update the residuals 𝑧 𝑗+1 = 𝑧 𝑗 + 𝜈 𝑓̃︀𝑗 ;
   4. If 𝑅𝑀 𝑆𝐸(𝐶𝑉 ) < 𝜖 or 𝑗 = 𝑚 then go to 5. Else go to 3;
   5. End.

             3.   Computer experiments and analysis the results
    To test the algorithms considered in the article, the data of power consumption
indicators of the urban conglomerate of buildings in Dublin, Republic Ireland were
used [11]. The data are 15-minute measurements of electricity consumption in (Kw) for
the period 29.03.2011-20.02.2013. A typical graph of the studied time series is shown in
Fig. 1. For each building, time series data were splitted into training and forecasting
periods. The forecast period was defined as the different periods of the last 12 months
of the explored data set. Models are trained using two different training periods, which
are 6 and 12 months.


      Figure 1. Energy consumption time series for one of the buildings. The
           measurements are indicated in a logarithmic scale and days.


   As for the results of the evaluation of the accuracy with RMSE(CV), the GBM-1d
and GBM-7d models are superior to the Random Forest and GBM models in both
periods of training. The accuracy of the GBM models improved significantly when
their training period was increased from 6 to 12 months, while the accuracy 𝑅2 of
                                     Shchetinin E. Y.                                  135


the regression model decreased slightly. Recall that higher values are desirable for 𝑅2 ,
whereas for RMSE(CV) values it is desirable to have them close to zero. Table 1 shows
the percentage of buildings for which GBM models seemed more accurate than regression
and RF models, respectively. For RMSE(CV) the columns represent the percentage of
buildings with a lower RMSE (CV) than the regression model. These results confirm
that GBM-1d and GBM-7d algorithms have better accuracy than regression and RF
models.


                                                                       Table 1
Estimates of the accuracy of the energy consumption models in the test sample
                             for 6 and 12 months.


        Model            𝑅2 6𝑚     RMSE(CV) 6m          𝑅2 12𝑚     RMSE(CV) 12m
        𝐺𝐵𝑀                33          47                 62            76
      𝐺𝐵𝑀 − 1𝑑             57          63                 67            81
      𝐺𝐵𝑀 − 7𝑑             77          70                 81            86
     𝑅𝑎𝑛𝑑𝑜𝑚𝐹 𝑜𝑟𝑒𝑠𝑡         28          42                 35            48
      𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛           17          30                 28            38

    Hyperparameters of the gradient boosting model were tuned automatically with the
a grid search algorithm and the various implementations of cross-validation method,
mentioned above. The depth of regression trees 𝑑 was chosen from the set (3, . . . , 10),
the learning rate 𝜈 was chosen between 0.05 and 1, the number of iterations 𝑚 have
been taken the values from 1 to 1000 with steps of 10 iterations. Three variants of
the standard CV were used: 5-fold CV, 5-fold CV and one day as a block (CV-1day),
5-fold CV with a week as a block for CV (CV-7day). Thus, taking into account the
above proposed cross-validation algorithms with fine-tuning of hyperparameters, we
have three different implementations of the gradient boosting model: 1) GBM model
with the choice of parameters by the standard method 5-fold CV; 2) GBM model c
CV-1day (GBM-1d; 3) GBM model with CV-7days (GBM-7d). The results of computer
experiments showed that the value of the learning rate 𝜈 = 0.1 is too low, because the
algorithm turned out to be too sensitive to both the number of iterations and the depth
of the decision trees. It was also found that at the learning rate 𝜈 = 0.2 and the depth of
the decision tree 𝑑 = 5, the algorithm did not reach the optimal number of iterations at
𝑚 = 500. When the learning rate was increased to 𝜈 = 1 due to the need to increase the
number of iterations and due to the increased complexity of the model, the algorithm
began to overfitting. Thus, the optimal learning rate was assumed to be 𝜈 = 0.5. The
behavior of the learning error for different learning values of the GBM algorithm is
shown in Fig. 2.
    Accuracy estimation indicators such as the determination coefficient 𝑅2 and square
mean error CV(RMSE) were calculated for the entire data set. Their values demon-
strated the accuracy decreasing while reducing the training period from 12 to 6 months.
Computer experiments showed that the values of 𝑅2 for GBM models exceeds the
corresponding values of 𝑅2 for Random Forests and regression models. This is especially
noticeable for GBM-1d and GBM-7d models. As expected above, the GBM model, using
the standard CV, has less accuracy than the other two versions, which use the CV in
the form of 𝑘-fold blocks. With the reduction of the training period to 6 months in
the standard GBM model, there was a slight decrease in 𝑅2 and its slight decrease for
GBM-1d, GBM-7d and RF models. However, the accuracy of the regression model did
not improved, which means that for this dataset, the regression model does not improve
accuracy as the number of observations increases.
136                                                                         ITTMM—2019


Figure 2. RMSE standard error for different values of the learning rate 𝜈 under
                        the training GBM model.


    As for the results of the estimation of the accuracy with RMSE(CV), the GBM-1d and
GBM-7d models are superior to the Random Forest and GBM models in both periods of
training. The accuracy of the GBM models improved significantly when their training
period was increased from 6 to 12 months, while the accuracy 𝑅2 of the regression
model decreased slightly. It should be reminded that higher values are desirable for
𝑅2 , whereas for RMSE(CV) values are desirable to be close to zero. Table 1 shows the
percentage of buildings for which GBM models seemed more accurate than regression
and RF models, respectively. For RMSE(CV) the columns represent the percentage of
buildings with a lower RMSE(CV) than the regression model. These results confirm
that GBM-1d and GBM-7d algorithms have better accuracy than regression and RF
models.
    Based on the constructed models, modeling and forecasting of energy consump-tion
were also carried out on the example of indicators of one of the buildings. The results
of modeling and forecasting the energy consumption for 120 days using RGBM-7d
algorithm are presented in Fig.1.

                          4.   Discussion and conclusions
    In this paper, we propose a method for estimating the consumption of electric power
by large commercial centers and business buildings, based on the GBM gradient boosting
algorithm with adaptive tuning of hyperparameters using the 𝑘-fold cross-validation
and randomization procedures. To find the optimal values of the hyperparameters, a
modern computer algorithm was developed using a grid search method for sets of values
of the hyperparameters. The capacity and effectiveness of GBM in solving the problem
of energy efficiency was tested on both model and real data of energy consumption from
smart meters of building conglomerate. As a whole, the GBM model showed higher
forecasting accuracy than the regression and random forest models for all tested training
periods. The results of computer experiments showed that the use of the GBM model
can improve the accuracy of energy efficiency assessment as a separate building and a
complex of buildings as a whole. It also turned out that the use of a 6-month training
                                       Shchetinin E. Y.                                137


period to build GBM models led to a slight decrease in the accuracy of the energy
consumption forecast, compared to those obtained for the 12-month training period,
which is usually used for the entire building. As we can see from Table 1, this result
is not true for regression analysis models and for Random Forest algorithm. Similar
conclusions also were made for a number of machine learning algorithms in papers [7, 15].
Thus, the application of GBM algorithms allows not only to improve the accuracy of
energy saving assessment in general, but also to reduce the total time required for energy
saving assessment of the entire complex of buildings.
    A comparison of different hyperparameter tuning algorithms showed the importance
to take into account time series autocorrelation. Indeed, the results showed that the use
of standard 𝑘-fold cross-validation reduces the accuracy of the GBM algorithm. The
reason is that when using the standard approach of cross-validation the observations in
test and training data sets are not independent (due to autocorrelation of measurements
obtained from smart sensors), which leads to the model overfitting. In order to overcome
the effect of autocorrelation, a randomization procedure was included in the algorithm for
estimating the accuracy of the energy consumption forecast. It was also shown that the
difference in use as a forecast period of one week or one day does not have a significant
impact on the results of the assessment and forecast of energy consumption. Therefore,
we can conclude that for most cases, using one day as a default block, when estimating
the accuracy of the energy forecast, is a good choice. It is known that one of the main
advantages of the ensemble models is its flexibility and reliability when used a large
number of input parameters [18]. In addition, compared to models such as regression,
there is no need to modify the algorithm to handle additional input parameters such as
building occupancy, humidity or illumination. Instead, it is sufficient to include these
variables in the input table of the algorithm without the need to determine the specific
shape of the model for each of the parameters, as in the case with most of the standard
regression algorithms used in modern practical applications. In addition, the ability to
select GBM model variables allows to exclude parameters that do not affect the model,
without reducing its predictability.
    In conclusion we resulted that the GBM model has a number of obvious advantages
over regression models, consisting in its ability to maintain accuracy for shorter training
periods, improve overall accuracy with respect to energy efficiency indicators and ease
of inclusion of additional explanatory variables. Key areas of future work will be the
application of the GBM model to address energy efficiency issues [16, 17, 21], such
as forecasting energy consumption in long-term load planning, continuous anomaly
detection, and quantifying the reduction of demand-side load [19, 20]. Also, further
development of the GBM model is aimed at expanding its capabilities in the direction
of application of the methods of deep learning and neural networks.

                                  5.    Program Code

PYTHON code for parameter estimation using grid search with block K-fold cross-
validation
from __future__ import print_function
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
# Loading the elec-consumer dataset
digits = datasets.load_elec-consumer()
# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
138                                                                   ITTMM—2019


n_samples = len(elec-consumer.images)
X = elec-consumer.images.reshape((n_samples, -1))
y = elec-consumer.target
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)
# Set the parameters by cross-validation
tuned_parameters = [{’kernel’: [’rbf’], ’gamma’: [1e-3, 1e-4],
                     ’C’: [1, 10, 100, 1000]},
                    {’kernel’: [’linear’], ’C’: [1, 10, 100, 1000]}]
scores = [’precision’, ’recall’]
for score in scores:
    print("# Tuning hyper-parameters for     score)
    print()
       clf = GridSearchCV(SVC(), tuned_parameters, cv=5,
                          scoring=’s_macro’ score)
       clf.fit(X_train, y_train)
       print("Best parameters set found on development set:")
       print()
       print(clf.best_params_)
       print()
       print("Grid scores on development set:")
       print()
       means = clf.cv_results_[’mean_test_score’]
       stds = clf.cv_results_[’std_test_score’]
       for mean, std, params in zip(means, stds, clf.cv_results_[’params’]):
           print("0.3f (+/-0.03f) for r" (mean, std * 2, params))
       print()
       print("Detailed classification report:")
       print()
       print("The model is trained on the full development set.")
       print("The scores are computed on the full evaluation set.")
       print()
       y_true, y_pred = y_test, clf.predict(X_test)
       print(classification_report(y_true, y_pred))
       print()
      PYTHON code for Gradient Boosting Machine Algorithm
import numpy as np
import matplotlib.pyplot as plt
from sklearn import ensemble
from sklearn import datasets
X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X = X.astype(np.float32)
# map labels from {-1, 1} to {0, 1}
labels, y = np.unique(y, return_inverse=True)
                                     Shchetinin E. Y.                                 139


X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
original_params = {’n_estimators’: 1000, ’max_leaf_nodes’: 4,
 ’max_depth’: None, ’random_state’: 2,’min_samples_split’: 5}
plt.figure()
for label, color, setting in [(’No shrinkage’, ’orange’,
     {’learning_rate’: 1.0, ’subsample’: 1.0}),
       (’learning_rate=0.1’, ’turquoise’,
       {’learning_rate’: 0.1, ’subsample’: 1.0}),
        (’subsample=0.5’, ’blue’,
        {’learning_rate’: 1.0, ’subsample’: 0.5}),
        (’learning_rate=0.1, subsample=0.5’, ’gray’,
        {’learning_rate’: 0.1, ’subsample’: 0.5}),
        (’learning_rate=0.1, max_features=2’, ’magenta’,
        {’learning_rate’: 0.1, ’max_features’: 2})]:
    params = dict(original_params)
    params.update(setting)
     clf = ensemble.GradientBoostingClassifier(**params)
     clf.fit(X_train, y_train)
     # compute test set deviance
     test_deviance = np.zeros((params[’n_estimators’],), dtype=np.float64)
     for i, y_pred in enumerate(clf.staged_decision_function(X_test)):
         # clf.loss_ assumes that y_test[i] in {0, 1}
         test_deviance[i] = clf.loss_(y_test, y_pred)
     plt.plot((np.arange(test_deviance.shape[0]) + 1)[::5],
     test_deviance[::5], ’-’, color=color, label=label)
plt.legend(loc=’upper left’)
plt.xlabel(’Boosting Iterations’)
plt.ylabel(’Test Set Deviance’)


                                      References
1.   C. W. Gellings, The smart grids: enabling energy efficiency and demand response.
     The Fairmont Press Inc., 2009.
2.   N. Arghira, L. Hawarah, S. Ploix, Prediction of appliances energy use in smart
     homes, Energy. 2012. V. 48. P. 128-134.
3.   R.K. Jain, K.M. Smith, P.J. Culligan, Forecasting energy consumption of multi-
     family residential buildings using support vector regression:investigating the impact
     of temporal and spatial monitoring granularity on performance accuracy, Appl.
     Energy. 2012. V. 123. P. 168–178.
4.   L. Breiman, Random forests, Machine learning. 2001. V. 45, 1, P. 5-32.
5.   L. Breiman, Bagging predictors, Machine learning. 1996. 24(2), V.123–140.
6.   C. Bergmeir, J.M. Benítez, On the use of cross-validation for time series predictor
     evaluation, Information Sciences. 2012. 191, P.192-213.
7.   V. Ediger, S. Akar, ARIMA forecasting of primary energy demand by fuel in Turkey,
     Energy Policy. 2007. 35, P.1701–1708.
140                                                                          ITTMM—2019


8.  S. Cincotti, G. Gallo, L. Ponta, Modeling and forecasting of electricity spot-prices:
    Computational intelligence vs. classical econometrics, AI Commun. 2014. V. 27, pp.
    301–314.
9. F.J. Ardakani, M.M. Ardehali, Novel effects of demand side management data
    on accuracy of electrical energy consumption modeling and long-term forecasting,
    Energy Convers. Manag. 2014. V. 78, pp. 745–752.
10. E. Yu. Shchetinin, Cluster-based energy consumption forecasting in smart grids,
    Springer Communications in Computer and Information Science (CCIS), 2018, 919,
    P. 446-456.
11. https://data.gov.ie/dataset/energy-consumption-gas-and-electricity-civic-offices-
    2009-2012/.
12. A. Liaw, M. Wiener, Classification and Regression by randomForest, R News. 2002.
    V. 2(3), P. 18-22.
13. J. Friedman, Greedy function approximation: a gradient boosting machine, Ann.
    Stat. 2001. V.29(5). P. 1189-1232.
14. K.P. Burnham, D.R. Anderson, Model Selection and Multi-Model Inference: A
    Practical, Information-theoretic Approach. Springer Verlag, 2002.
15. M.J. Kane, N. Price, M. Scotch M., Comparison of ARIMA and Random Forest time
    series models for prediction of avian influenza H5N1 outbreaks, BMC Bioinformatics,
    2014, 15, P. 276- doi:10.1186/1471-2105-15-276.
16. J. Granderson, S. Touzani, C. Custodio, Accuracy of automated measurement and
    verification techniques for energy savings in commercial buildings, Applied Energy,
    2016, 173, 296–308.
17. R.K. Jain, K.M. Smith, P.J. Culligan, J.E. Taylor, Forecasting energy consumption
    of multi-family residential buildings using support vector regression:investigating
    the impact of temporal and spatial monitoring granularity on performance accuracy,
    Appl. Energy, 2014, 123, P. 168–178.
18. Z.H. Zhou, Ensemble learning , Encycl. Biom., 2015, P. 411–416.
19. A.L. Zorita, M.A. Fernández-Temprano, L. A. García-Escudero, O. Duque-Perez, A
    statistical modeling approach to detect anomalies in energetic efficiency of buildings,
    Energy Build, 2016, 110, P.377–386.
20. M. Amozegar, K. Khorasani, An ensemble of dynamic neural network identifiers for
    fault detection and isolation of gas turbine engines, Neural Network, 2016, 76, P.
    106–121.
21. E. Yu. Shchetinin, V.S. Melezhik, L.A. Sevastyanov, Improving the energy efficiency
    of the smart buildings with the boosting algorithms, Proceedings of the Selected
    Papers of the 12th International Workshop on Applied Problems in Theory of
    Probabilities and Mathematical Statistics in the framework of the Conference on
    Information and Telecommunication Technologies and Mathematical Modeling of
    High-Tech Systems (APTP+MS’2018), CEUR Workshop Proceedings, 2018, 2332,
    P.69-78.
22. E.Yu. Shchetinin, E.A. Popova, Smart buildings energy savings with gradient
    boosting algorithm, Communications in Computer and Information Science, CEUR
    Workshop Proceedings, 2018, Proc. of selected papers of the 8th International
    Conference ’Distributed Computing and Grid-technologies in Science and Education",
    GRID-2018, 2267, P. 318-322.