=Paper=
{{Paper
|id=Vol-3806/S_4_Doroshenko_Zhora_Haidukevych_Yatsenko
|storemode=property
|title=
Predicting 24-Hour Nationwide Electrical Energy Consumption Based on Regression Techniques
|pdfUrl=https://ceur-ws.org/Vol-3806/S_4_Doroshenko_Zhora_Haidukevych_Yatsenko.pdf
|volume=Vol-3806
|authors=Anatoliy Doroshenko,Dmytro Zhora,Vladyslav Haidukevych,Yaroslav Haidukevych,Olena Yatsenko
|dblpUrl=https://dblp.org/rec/conf/ukrprog/DoroshenkoZHHY24
}}
==
Predicting 24-Hour Nationwide Electrical Energy Consumption Based on Regression Techniques
==
Predicting 24-Hour Nationwide Electrical Energy
Consumption Based on Regression Techniques
Anatoliy Doroshenko1,2,†, Dmytro Zhora1,†, Vladyslav Haidukevych1,†, Yaroslav
Haidukevych1,†, and Olena Yatsenko1,*,†
1 Institute of Software Systems of the National Academy of Sciences of Ukraine, Glushkov ave. 40, build. 5, Kyiv, 03187,
Ukraine
2 National Technical University "Ihor Sikorsky Kyiv Polytechnic Institute", Polytechnichna str. 41, build. 18, Kyiv, 03056,
Ukraine
Abstract
This paper applies standard regression techniques to forecast the country-wide consumption of electrical
energy. All considered machine learning algorithms are available as a part of the Scikit-learn library.
Besides the fine-tuning of regression hyperparameters, several data preparation techniques are employed
to improve the forecasting accuracy. It is demonstrated that forecasting for 24 hours ahead is possible with
good accuracy and has practical significance.
Keywords 1
Electricity markets, forecasting, machine learning, regression
1. Introduction
For a long time, Ukraine had only one market for electrical energy. That was the market of bilateral
agreements that wasn’t flexible enough to balance the interests of consumers and suppliers of
electricity. Such agreements could span weeks, months, or even years. On July 1st, 2019, Ukraine
adopted the European model [1] that assumes the following four markets: bilateral, day-ahead,
intraday, and balancing. Despite the electricity market models in Europe having some differences
[2], this was also a significant step forward in liberalizing electricity trading between countries.
The bilateral market can be referenced also as a future or forward market. In Ukraine, as shown
in Figure 1, the total amount of deals is recorded every hour. At the same time, some European
markets allow 15-minute contracts. If we consider four electricity markets in the order they are
mentioned above (from bilateral to balancing), the properties of these markets can be formulated as
follows:
• the volume of the market decreases,
• the price of the electricity increases,
• the volatility of the volume increases.
The laws of physics apply to electrical circuits regardless of the scale. There are some electricity
losses associated with resistance, but usually, they are negligible. If the amount of electrical energy
14th International Scientific and Practical Conference from Programming UkrPROG’2024, May 14-15, 2024, Kyiv, Ukraine
*
Corresponding author.
†
These authors contributed equally.
doroshenkoanatoliy2@gmail.com (A. Doroshenko); dmitry.zhora@gmx.com (D. Zhora); gaidukevichvlad@gmail.com
(V. Haidukevych); yarmcfly@gmail.com (Y. Haidukevych); oayat@ukr.net (O. Yatsenko)
0000-0002-8435-1451 (A. Doroshenko); 0009-0006-6073-7751 (D. Zhora); 0000-0002-0614-6778 (V. Haidukevych); 0000-
0002-6300-1778 (Y. Haidukevych); 0000-0002-4700-6704 (O. Yatsenko)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
traded and transmitted is measured on substations, we can conclude that the amount of produced
electricity is exactly equal to the amount of consumed electricity. That is, for the purpose of this
paper we can use the following terms interchangeably: energy production, energy consumption,
and market volume. When the country is considered an open system, the following equation
applies.
production + import = consumption + export. (1)
The dataset used in this research represents the time range from July 1st, 2020, to December 31st,
2021. For historical reasons, the time range from July 1st, 2019, to June 30th, 2020, did not contain
bilateral market data [3]. The market volume data were provided by the Institute of Energy
Modelling, Ukraine. Figure 2 shows the dynamics of all four market components in time.
Figure 1: Hourly data of electricity market volumes, in megawatt-hours (MWh)
Figure 2: Market volume dependency on time, in megawatt-hours (MWh)
2. Volume Data Augmentation
It is often the case the modeled process is affected by other external factors not represented via
input parameters from the original dataset. The outside temperature influences the consumption of
electricity as more energy is needed in winter for heating and in summer for air-conditioning. Two
columns with hourly data were added to the dataset representing the temperature for Ukraine and
its capital, see the dependencies below in Figure 3. The location representing the country was
selected as its linear geographic center with decimal GPS coordinates 48.379433N 31.165580E.
Figure 3: Dependency of outside temperature in Ukraine, hourly representation
Another important factor is the periodicity in the consumption of electrical energy. For example,
at night people need less electricity than in the daytime. Similarly, on weekends the electricity
consumption is lower than on weekdays. This paper considers four cycle types: daily, weekly,
monthly, and yearly. One of the next sections will analyze whether these additions are helpful.
The problem is how to feed time representation to the machine learning algorithm in a way that
similar moments in time would be interpreted as close by the algorithm. As shown in Figure 4, hour
values 23 and 0 are close on the timescale, but they are distant in real-valued representation. One of
the possible solutions to this problem is to calculate the sine and cosine of the cycle phase [4].
Figure 5 demonstrates how every hour in the daily cycle can be represented without gaps. In
particular, close values on the timescale are represented by close values of sine and cosine
functions.
Figure 4: Raw hour data as can be submitted to the machine learning algorithm
Figure 5: Sine and cosine time series for representation of temporal cycles
The augmented dataset is shown in Figure 6. The first two columns can be interpreted as
composite primary key. In addition to the original 4 attribute columns with market volume data now
we have 10 more columns. The temperature data were downloaded from the site
https://openweathermap.org, the periodic columns were calculated using an algorithm written in
Python.
Figure 6: Augmented market volume dataset with temperature and periodic data
3. Resampling of Temporal Data
The usage of additional input parameters typically provides better regression results. If we need to
forecast market volumes for 24 hours ahead then it makes sense to take into account the available
data for the last 24 hours (at least). The machine learning algorithms and library functions expect
that both input and output parameters are represented as one record. So, as a data preparation step,
the data displayed in Figure 6 were resampled into the following columns, where M1 suffix means
the parameter was taken one hour ago, P1 suffix means the parameter was taken one hour later, etc.
Primary key: TradeDate, TradeHour
Input columns: SinDay, CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear,
Bilateral, DayAhead, Intraday, Balancing, TempUkr, TempKiev, BilateralM1, DayAheadM1,
IntradayM1, BalancingM1, TempUkrM1, TempKievM1, BilateralM2, DayAheadM2, IntradayM2,
BalancingM2, TempUkrM2, TempKievM2, ..., BilateralM23, DayAheadM23, IntradayM23,
BalancingM23, TempUkrM23, TempKievM23
Output columns: BilateralP1, DayAheadP1, IntradayP1, BalancingP1, BilateralP2, DayAheadP2,
IntradayP2, BalancingP2, ..., BilateralP24, DayAheadP24, IntradayP24, BalancingP24
The obtained dataset had 13'129 records as the first 24 records and the last 24 records after
resampling were not fully qualified. The dataset was split into training and testing parts using the
standard library function train_test_split from sklearn.model_selection namespace [5]. The obtained
datasets were saved into files, so different regression algorithms mentioned further in the paper
were evaluated on the same data.
4. Model Evaluation Metrics
To measure the influence of input parameters, we used the nearest neighbors regression model
represented by class KNeighborsRegressor from sklearn.neighbors namespace. This machine
learning algorithm provides quite competitive results and has a small number of hyperparameters
to optimize.
The Python code snippets that implement this functionality are provided in Appendix 1. The
complexity of the algorithm is hidden behind fit and prediction methods. Other regression and
classification algorithms also reuse these methods, so the substitution of one algorithm instead of
another is relatively simple.
The metrics used to measure the discrepancy between the test set and forecasted data are given
in Table 1. Here yi is the output value from the i-th record in the testing dataset, fi is the predicted
value for the i-th record, y is the average output value over the test dataset. These formulas are
considered in the context of one selected output column representing the market volume.
Table 1
The name and definition of standard metrics for regression task
Metric Name Metric Formula Formula Number
2
R2 score (or determination
R2 = 1 −
∑i ( yi − fi ) (2)
coefficient) 2
∑i ( yi − y )
n
1 yi − fi
Mean absolute percentage error MAPE = ∑ (3)
n i =1 yi
n
1
Mean absolute error MAE = ∑ yi − fi (4)
n i =1
5. Manual Feature Selection
Now we need to evaluate the effect of additional parameters and history length on prediction
accuracy. Table 2 shows the accuracy improvements after adding temperature and periodic
parameters. It appears all additional parameters are useful, but the overall effect is rather minor.
Here are the parameters for the starting model.
Input columns: Bilateral, DayAhead, Intraday, Balancing
Output columns: BilateralP24, DayAheadP24, IntradayP24, BalancingP24
Table 2
The R2 score obtained for different input parameter sets
Bilateral DayAhead Intraday Balancing
Starting Model 0.93291021 0.90164708 0.72947093 0.77183561
Temperature Data 0.93439629 0.90410349 0.73455079 0.77676317
Daily Cycle 0.93455756 0.90418401 0.73498679 0.77679022
Weekly Cycle 0.93465155 0.90445549 0.73560531 0.77761618
Monthly Cycle 0.93471811 0.90462339 0.73591691 0.77787893
Yearly Cycle 0.93479404 0.90470696 0.73575183 0.77860211
And the following is the intermediate input parameter set obtained.
Input columns: Bilateral, DayAhead, Intraday, Balancing, TempUkr, TempKiev, SinDay,
CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear
Figure 7 shows the improvements in forecasting results when more historical data is added to the
input dataset. The full history for the last 24 hours provides better results. And now the full set of
input parameters contains 106 entries that are listed below.
Input columns: Bilateral, DayAhead, Intraday, Balancing, BilateralM1, DayAheadM1,
IntradayM1, BalancingM1, ..., BilateralM23, DayAheadM23, IntradayM23, BalancingM23, TempUkr,
TempKiev, SinDay, CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear
Figure 7: The dependency of the R2 score on the history length in hours
6. Automatic Feature Selection
The high dimensionality of input space is typically considered a problem, especially with noisy
data. On the other hand, not all input parameters explored so far have equal contribution to the
quality of results. So, it would be helpful to try removing the parameters that provide less useful
information than others.
It appears this is not complex with the class SelectFromModel from sklearn.feature_selection
namespace [6]. This meta-transformer should be provided with an estimator object that, in turn, can
calculate the array of feature importances. One of such classes is RandomForestRegressor which
gets feature importances as a function of informational entropy. The Python code that implements
this approach is demonstrated in Appendix 2. The constructor for the SelectFromModel class also
takes the threshold parameter that allows to vary the number of features selected. The optimal
results were obtained with 60 features taken out of 106, see the results in Table 3 and Appendix 3
for the feature list itself.
Table 3
The R2 score improvements obtained using input feature selection
Bilateral DayAhead Intraday Balancing
Full Set: 106 Features 0.96129509 0.94019898 0.83718184 0.86971889
60 Selected Features 0.96322701 0.94024491 0.85536199 0.87121345
7. Hourly Forecasting Results
So far, all the results were related to 24-hour forecasting. Figures 8 and 9 below show the R2
score and mean absolute percentage error for the range from 1 and up to 24 hours. The one-hour
forecasting provides the best results. It is also worth noting that bilateral and day-ahead markets
have much better predictability than balancing markets. As for the intraday market, it has a low
mean absolute error just because the size of this market is small.
Figure 8: The dependency of the R2 score from the forecast range in hours
Figure 9: The dependency of mean absolute error from the forecast range (MWh)
8. Forecasting Error Distribution
The 24-hour prediction error for all four markets can be measured on the test set, which
represents 20 % of the original dataset. For convenience in representation and analysis, the test set
was sorted by real market volume. The predicted values are shown in Figures 10–17 with dots. The
probability distribution of error is shown using histograms. An interesting finding is that
forecasting error is not always Gaussian. In particular, this is the case for bilateral and intraday
market volumes.
The curve representing balancing market volume in Figure 16 crosses the zero line, also it has
more negative values than positive. This can be interpreted as that market players tend to overbuy
electricity in other markets, so they need to sell more on average at the last moment. Let’s note that
this inefficiency can be mitigated by the usage of forecasting models.
Figure 10: Prediction error for 24 hours ahead, bilateral market volume (MWh)
Figure 11: Residuals histogram for 24-hour forecasting, bilateral market volume (MWh)
Figure 12: Prediction error for 24-hours ahead, day-ahead market volume (MWh)
Figure 13: Residuals histogram for 24-hour forecasting, day-ahead market volume (MWh)
Figure 14: Prediction error for 24 hours ahead, intraday market volume (MWh)
Figure 15: Residuals histogram for 24-hour forecasting, intraday market volume (MWh)
Figure 16: Prediction error for 24 hours ahead, balancing market volume (MWh)
Figure 17: Residuals histogram for 24-hour forecasting, balancing market volume (MWh)
9. Comparison of Regression Algorithms
So far, all the results were obtained with the nearest neighbor regressor. And it makes sense to
explore the performance of other algorithms on the same column configuration that is represented
in Appendix 3. The output parameters were selected for 24-hour forecasting. The results shown in
Table 4 and Table 5 include the comparison with classic instruments like multi-layer perceptron [7],
support vector machine [8], and linear regression [9]. The constructors of Python objects
representing regression algorithms with corresponding manually optimized hyperparameters are
provided in Appendix 4.
Table 4
Comparison of R2 scores for regression algorithms on the testing dataset
Regression Algorithm Bilateral DayAhead Intraday Balancing
Histogram Gradient Boosting 0.98734425 0.97273813 0.87836457 0.91963280
Ada Boost Regressor 0.98008607 0.96134363 0.85172910 0.90325404
Gradient Boosting Regressor 0.97878979 0.96317970 0.84666374 0.90112536
Extra Trees Regressor 0.97461940 0.95963273 0.86484512 0.89815645
Nearest Neighbors Regressor 0.96751227 0.94895676 0.86066507 0.87555149
Random Forest Regressor 0.96680397 0.94718425 0.83167183 0.87304825
Support Vector Machine 0.93841639 0.90790177 0.78281964 0.78573216
Multi-Layer Perceptron (QNO) 0.93589612 0.90409299 0.75444413 0.79110787
Multi-Layer Perceptron (SGD) 0.93414003 0.90877942 0.77358025 0.81562885
Elastic Net Regressor 0.92924816 0.90300302 0.75547081 0.77908284
Linear Regression 0.92921485 0.90297901 0.75552627 0.77906737
Bayes Ridge Regressor 0.92502565 0.89258447 0.74195841 0.77884534
Table 5
Comparison of mean absolute percentage errors for regression algorithms
Regression Algorithm Bilateral DayAhead Intraday Balancing
Histogram Gradient Boosting 0.00970813 0.03555078 0.30680050 3.41473968
Ada Boost Regressor 0.01043662 0.03988947 0.29964852 3.70352770
Gradient Boosting Regressor 0.01167197 0.04196347 0.33108930 4.30691288
Extra Trees Regressor 0.01340342 0.04470638 0.39715721 3.68779356
Nearest Neighbors Regressor 0.01484254 0.04741409 0.31222159 4.16012392
Random Forest Regressor 0.01538395 0.05016324 0.44490383 4.21424456
Support Vector Machine 0.02049737 0.06506308 0.44628820 5.01011223
Multi-Layer Perceptron (QNO) 0.02201155 0.06895532 0.48465482 4.25173649
Multi-Layer Perceptron (SGD) 0.02328186 0.06766103 0.49768136 4.58588195
Elastic Net Regressor 0.02164488 0.06785682 0.46013951 5.91742745
Linear Regression 0.02167965 0.06799572 0.46071594 5.92935640
Bayes Ridge Regressor 0.02222508 0.06981429 0.50172640 5.90361401
It is worth noting that some algorithms do not natively support multi-output configuration, so it
was needed to use the class MultiOutputRegressor to overcome this problem and cover four
electrical energy markets with one machine learning model.
Tables 4–6 represent the following characteristics obtained for different machine learning
models: R2 score, mean absolute percentage error, and mean absolute error. It appears, that for this
specific task, the ensemble methods are much better than others, and the winning algorithm
Histogram Gradient Boosting is one of them. Also, it is one of the fastest and it can flawlessly
handle datasets with missing values. On the current dataset, the training phase takes about 20
seconds.
Two different training approaches were used for multi-layer perceptron: quasi-Newton
optimizer (QNO) and stochastic gradient descent (SGD). The first algorithm uses analytic solution to
weight optimization problem, while the second algorithm employs an iterative process to find the
minimum of error function. In both cases the architecture was the same, the perceptron had four
layers (that is two hidden layers). Empirically, this architecture was more successful than three or
five-layer perceptrons. Meanwhile, all these configurations are universal approximators.
It makes sense to explain the high mean absolute percentage error for the balancing column in
Table 5. This is not the error, also a couple of zero values in the input dataset were replaced to
improve MAPE figures. As shown in Figure 16, most of the values for this column are located close
to zero. So, the calculations according to formula (3) involve the division by a small value. In other
words, the MAPE metric is just not that adequate for the balancing column.
Table 6
Comparison of mean absolute errors for regression algorithms (MWh)
Regression Algorithm Bilateral DayAhead Intraday Balancing
Histogram Gradient Boosting 114.528749 136.017198 107.623324 287.495480
Ada Boost Regressor 122.856359 151.671858 107.042688 308.036252
Gradient Boosting Regressor 137.181056 161.254944 119.274015 324.798985
Extra Trees Regressor 156.724684 165.609354 118.335948 320.904618
Nearest Neighbors Regressor 175.124238 183.691670 112.622368 344.727996
Random Forest Regressor 180.816224 187.724606 131.586872 360.659798
Support Vector Machine 239.541471 247.547199 150.333357 475.954169
Multi-Layer Perceptron (QNO) 257.264807 260.785606 159.517580 481.078516
Multi-Layer Perceptron (SGD) 270.583353 256.261356 157.399092 445.710338
Elastic Net Regressor 253.327126 259.857349 155.518920 487.916805
Linear Regression 253.691852 260.242048 155.743585 488.161734
Bayes Ridge Regressor 260.956241 269.083319 160.748064 487.498900
10.Conclusion
This paper demonstrates that proper forecasting model selection is a multi-stage process that
may involve data selection, data preprocessing, data augmentation, selection of machine learning
algorithm, optimization of hyperparameters, etc. While all computations for this work were done on
a regular 8-core machine, the creation of the MLOps pipeline may require much more powerful
computation resources.
The pre-trained model can be saved into a file for subsequent reuse in the production
environment. There are two formats popular among Python developers: .joblib and .pickle. In
addition, there is .onnx format that can be loaded not just in Python, but also in faster .NET or Java-
based applications [10]. It is worth noting that Microsoft and other vendors invest significant
resources into the development of multi-platform capabilities for machine learning [11].
There are dedicated solutions that can host the machine learning models using a microservice
approach like Seldon Core [12]. According to this architecture, the serialized models are preloaded
within docker containers and expose the HTTPS interface. So, the application can send input data
vector as JSON document in REST API request. The HTTP response will contain the JSON
document with predicted values.
The forecasting accuracy that was obtained for electrical energy markets is different.
Nevertheless, the 1 % error for 24-hour forecasting of the bilateral market looks impressive. Such a
forecast can be useful at the country scale to ensure required fuel supply, plan import/export
operations, reduce electricity costs, etc. Similar research can be done for more specific datasets from
commercial and state energy enterprises.
References
[1] A new model of the electricity market has been launched in Ukraine. URL:
https://expro.com.ua/en/tidings/a-new-model-of-the-electricity-market-has-been-launched-in-
ukraine.
[2] M. Osińska, M. Kyzym, V. Khaustova, O. Ilyash, T. Salashenko, Does the Ukrainian electricity
market correspond to the European model?, Utilities Policy 79 (2022), 1–14. doi:
10.1016/j.jup.2022.101436.
[3] A. Doroshenko, D. Zhora, O. Savchuk, O. Yatsenko, Application of machine learning
techniques for forecasting electricity generation and consumption in Ukraine, in: Proceedings
of IT&I 2023, 2023, pp. 136–146. URL: https://ceur-ws.org/Vol-3624/Paper_12.pdf.
[4] Van Wyk, Encoding Cyclical Features for Deep Learning. URL:
https://www.kaggle.com/code/avanwyk/encoding-cyclical-features-for-deep-learning.
[5] Scikit-learn: Machine Learning in Python. URL: https://scikit-learn.org/stable/
[6] Feature selection with Scikit-learn library. URL: https://scikit-
learn.org/stable/modules/feature_selection.html.
[7] S. Haykin, Neural networks: a comprehensive foundation, Prentice Hall, Upper Saddle River,
NJ, 1998.
[8] V. N. Vapnik, Statistical learning theory, Wiley, Hoboken, NJ, 1998.
[9] C. M. Bishop, Pattern recognition and machine learning, Springer, New York, NY, 2006. URL:
https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-
and-Machine-Learning-2006.pdf.
[10] G. Novack, Deploy Sci-kit Learn models in .NET Core Applications. URL:
https://towardsdatascience.com/deploy-sci-kit-learn-models-in-net-core-applications-
90e24e572f64.
[11] X. Dupre, O. Grisel, Accelerate and simplify Scikit-learn model inference with ONNX Runtime.
URL: https://cloudblogs.microsoft.com/opensource/2020/12/17/accelerate-simplify-scikit-learn-
model-inference-onnx-runtime/
[12] V. Shanawad, Optimizing Custom Model Deployment with Seldon Core. URL:
https://medium.com/@vinayakshanawad/serving-hugging-face-transformers-optimizing-
custom-model-deployment-with-seldon-core-a593f6ea7549.
Appendix 1
The function that evaluates the input parameter set using nearest neighbors regressor:
def evaluate_input_features \
(training_inputs, testing_inputs, training_outputs, testing_outputs):
print("Evaluating the datasets using nearest neighbors regression model")
evaluation_regressor = KNeighborsRegressor(n_neighbors = 5, weights =
'distance', algorithm = 'auto', p = 1, metric = 'minkowski', n_jobs = 8)
evaluation_regressor.fit(training_inputs, training_outputs)
predicted_training_outputs = evaluation_regressor.predict(training_inputs)
predicted_testing_outputs = evaluation_regressor.predict(testing_inputs)
print_evaluation_metrics(training_outputs, testing_outputs, \
predicted_training_outputs, predicted_testing_outputs)
def print_evaluation_metrics(training_outputs, testing_outputs, \
predicted_training_outputs, predicted_testing_outputs):
training_score = r2_score(training_outputs, predicted_training_outputs,
multioutput = 'raw_values', force_finite = True)
print(F"Training dataset R2 score(s): {training_score}")
training_percentage_error = mean_absolute_percentage_error \
(training_outputs, predicted_training_outputs, multioutput = 'raw_values')
print(F"Training percentage error(s): {training_percentage_error}")
train_standard_deviation = mean_squared_error \
(training_outputs, predicted_training_outputs, \
multioutput = "raw_values", squared = False)
print("Training standard deviation(s):", train_standard_deviation)
train_absolute_error = mean_absolute_error \
(training_outputs, predicted_training_outputs, multioutput = 'raw_values')
print("Training mean absolute error(s):", train_absolute_error)
test_score = r2_score(testing_outputs, predicted_testing_outputs,
multioutput = 'raw_values', force_finite = True)
print(F"Testing dataset R2 score(s): {test_score}")
test_percentage_error = mean_absolute_percentage_error \
(testing_outputs, predicted_testing_outputs, multioutput = 'raw_values')
print(F"Testing percentage error(s): {test_percentage_error}")
test_standard_deviation = mean_squared_error \
(testing_outputs, predicted_testing_outputs, \
multioutput = "raw_values", squared = False)
print("Testing standard deviation(s):", test_standard_deviation)
test_absolute_error = mean_absolute_error \
(testing_outputs, predicted_testing_outputs, multioutput = 'raw_values')
print("Testing mean absolute error(s):", test_absolute_error)
Appendix 2
Identifying the features that provide higher information entropy:
random_forest = RandomForestRegressor \
(n_estimators = 100, criterion = 'squared_error', ccp_alpha = 0.0)
random_forest.fit(dataset_inputs, dataset_outputs)
random_forest.feature_importances_
optimized_model = SelectFromModel(random_forest, \
threshold = "0.12 * mean", prefit = True)
optimized_inputs = optimized_model.transform(dataset_inputs)
optimized_model.get_feature_names_out(input_names)
Appendix 3
Most informative input parameters selected with a random forest model:
selected_names = \
['Bilateral', 'DayAhead', 'Intraday', 'Balancing', 'BilateralM1',
'DayAheadM1', 'IntradayM1', 'BalancingM1', 'DayAheadM2',
'IntradayM2', 'BalancingM2', 'IntradayM6', 'BilateralM7',
'DayAheadM7', 'IntradayM7', 'BalancingM7', 'BilateralM8',
'DayAheadM8', 'IntradayM8', 'BilateralM9', 'DayAheadM9',
'IntradayM9', 'BilateralM10', 'DayAheadM10', 'IntradayM10',
'BilateralM13', 'BilateralM14', 'IntradayM14', 'BalancingM14',
'BilateralM15', 'DayAheadM15', 'IntradayM15', 'BalancingM15',
'BilateralM16', 'DayAheadM16', 'IntradayM16', 'BilateralM17',
'DayAheadM17', 'IntradayM17', 'BilateralM18', 'DayAheadM18',
'IntradayM18', 'IntradayM19', 'BilateralM21', 'BilateralM22',
'DayAheadM22', 'IntradayM22', 'BilateralM23', 'DayAheadM23',
'IntradayM23', 'BalancingM23', 'TempUkr', 'TempKiev', 'CosDay',
'SinWeek', 'CosWeek', 'SinMonth', 'CosMonth', 'SinYear', 'CosYear']
Appendix 4
Regressor constructors with corresponding hyperparameters:
MultiOutputRegressor(HistGradientBoostingRegressor
(loss = 'squared_error', learning_rate = 0.20, max_iter = 300,
early_stopping = False, scoring = 'loss', random_state = 1))
MultiOutputRegressor(AdaBoostRegressor
(estimator = DecisionTreeRegressor(criterion = 'squared_error',
splitter = 'best', max_depth = None, min_samples_split = 2,
max_features = None, random_state = 1), n_estimators = 10,
learning_rate = 1.0, loss = 'square', random_state = 1))
MultiOutputRegressor(GradientBoostingRegressor
(loss = 'squared_error', learning_rate = 0.39, n_estimators = 100,
subsample = 1.0, criterion = 'squared_error', max_depth = 6,
random_state = 1, max_leaf_nodes = None, ccp_alpha = 0.0))
ExtraTreesRegressor(n_estimators = 100,
criterion = 'squared_error', max_depth = None, max_features = 1.0,
bootstrap = False, n_jobs = 8, random_state = 1, ccp_alpha = 0.0)
KNeighborsRegressor(n_neighbors = 3, weights = 'distance',
algorithm = 'auto', p = 1, metric = 'minkowski', n_jobs = 8)
RandomForestRegressor(n_estimators = 100,
criterion = 'squared_error', max_features = 1.0, bootstrap = True,
ccp_alpha = 0.0, n_jobs = 8, random_state = 1)
MultiOutputRegressor(NuSVR(nu = 0.4, C = 1000000.0,
kernel = 'rbf', gamma = 'scale', shrinking = True, max_iter = -1))
MLPRegressor(hidden_layer_sizes = (200, 200,),
activation = 'relu', solver = 'lbfgs', alpha = 0.0000, max_iter = 5000,
random_state = 1)
MLPRegressor(hidden_layer_sizes = (200, 200,),
activation = 'relu', solver = 'adam', alpha = 0.0002, max_iter = 200,
batch_size = min(50, training_set_size), shuffle = True, random_state = 1,
early_stopping = False)
ElasticNet(alpha = 1.0, l1_ratio = 1.0, fit_intercept = True,
max_iter = 1000, positive = False, random_state = 1, selection = 'cyclic')
LinearRegression(fit_intercept = True, n_jobs = 8)
MultiOutputRegressor(BayesianRidge(max_iter = 300,
tol = 0.001, alpha_init = None, lambda_init = 1.0, fit_intercept = True))