Predicting 24-Hour Nationwide Electrical Energy Consumption Based on Regression Techniques Anatoliy Doroshenko1,2,†, Dmytro Zhora1,†, Vladyslav Haidukevych1,†, Yaroslav Haidukevych1,†, and Olena Yatsenko1,*,† 1 Institute of Software Systems of the National Academy of Sciences of Ukraine, Glushkov ave. 40, build. 5, Kyiv, 03187, Ukraine 2 National Technical University "Ihor Sikorsky Kyiv Polytechnic Institute", Polytechnichna str. 41, build. 18, Kyiv, 03056, Ukraine Abstract This paper applies standard regression techniques to forecast the country-wide consumption of electrical energy. All considered machine learning algorithms are available as a part of the Scikit-learn library. Besides the fine-tuning of regression hyperparameters, several data preparation techniques are employed to improve the forecasting accuracy. It is demonstrated that forecasting for 24 hours ahead is possible with good accuracy and has practical significance. Keywords 1 Electricity markets, forecasting, machine learning, regression 1. Introduction For a long time, Ukraine had only one market for electrical energy. That was the market of bilateral agreements that wasn’t flexible enough to balance the interests of consumers and suppliers of electricity. Such agreements could span weeks, months, or even years. On July 1st, 2019, Ukraine adopted the European model [1] that assumes the following four markets: bilateral, day-ahead, intraday, and balancing. Despite the electricity market models in Europe having some differences [2], this was also a significant step forward in liberalizing electricity trading between countries. The bilateral market can be referenced also as a future or forward market. In Ukraine, as shown in Figure 1, the total amount of deals is recorded every hour. At the same time, some European markets allow 15-minute contracts. If we consider four electricity markets in the order they are mentioned above (from bilateral to balancing), the properties of these markets can be formulated as follows: • the volume of the market decreases, • the price of the electricity increases, • the volatility of the volume increases. The laws of physics apply to electrical circuits regardless of the scale. There are some electricity losses associated with resistance, but usually, they are negligible. If the amount of electrical energy 14th International Scientific and Practical Conference from Programming UkrPROG’2024, May 14-15, 2024, Kyiv, Ukraine * Corresponding author. † These authors contributed equally. doroshenkoanatoliy2@gmail.com (A. Doroshenko); dmitry.zhora@gmx.com (D. Zhora); gaidukevichvlad@gmail.com (V. Haidukevych); yarmcfly@gmail.com (Y. Haidukevych); oayat@ukr.net (O. Yatsenko) 0000-0002-8435-1451 (A. Doroshenko); 0009-0006-6073-7751 (D. Zhora); 0000-0002-0614-6778 (V. Haidukevych); 0000- 0002-6300-1778 (Y. Haidukevych); 0000-0002-4700-6704 (O. Yatsenko) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings traded and transmitted is measured on substations, we can conclude that the amount of produced electricity is exactly equal to the amount of consumed electricity. That is, for the purpose of this paper we can use the following terms interchangeably: energy production, energy consumption, and market volume. When the country is considered an open system, the following equation applies. production + import = consumption + export. (1) The dataset used in this research represents the time range from July 1st, 2020, to December 31st, 2021. For historical reasons, the time range from July 1st, 2019, to June 30th, 2020, did not contain bilateral market data [3]. The market volume data were provided by the Institute of Energy Modelling, Ukraine. Figure 2 shows the dynamics of all four market components in time. Figure 1: Hourly data of electricity market volumes, in megawatt-hours (MWh) Figure 2: Market volume dependency on time, in megawatt-hours (MWh) 2. Volume Data Augmentation It is often the case the modeled process is affected by other external factors not represented via input parameters from the original dataset. The outside temperature influences the consumption of electricity as more energy is needed in winter for heating and in summer for air-conditioning. Two columns with hourly data were added to the dataset representing the temperature for Ukraine and its capital, see the dependencies below in Figure 3. The location representing the country was selected as its linear geographic center with decimal GPS coordinates 48.379433N 31.165580E. Figure 3: Dependency of outside temperature in Ukraine, hourly representation Another important factor is the periodicity in the consumption of electrical energy. For example, at night people need less electricity than in the daytime. Similarly, on weekends the electricity consumption is lower than on weekdays. This paper considers four cycle types: daily, weekly, monthly, and yearly. One of the next sections will analyze whether these additions are helpful. The problem is how to feed time representation to the machine learning algorithm in a way that similar moments in time would be interpreted as close by the algorithm. As shown in Figure 4, hour values 23 and 0 are close on the timescale, but they are distant in real-valued representation. One of the possible solutions to this problem is to calculate the sine and cosine of the cycle phase [4]. Figure 5 demonstrates how every hour in the daily cycle can be represented without gaps. In particular, close values on the timescale are represented by close values of sine and cosine functions. Figure 4: Raw hour data as can be submitted to the machine learning algorithm Figure 5: Sine and cosine time series for representation of temporal cycles The augmented dataset is shown in Figure 6. The first two columns can be interpreted as composite primary key. In addition to the original 4 attribute columns with market volume data now we have 10 more columns. The temperature data were downloaded from the site https://openweathermap.org, the periodic columns were calculated using an algorithm written in Python. Figure 6: Augmented market volume dataset with temperature and periodic data 3. Resampling of Temporal Data The usage of additional input parameters typically provides better regression results. If we need to forecast market volumes for 24 hours ahead then it makes sense to take into account the available data for the last 24 hours (at least). The machine learning algorithms and library functions expect that both input and output parameters are represented as one record. So, as a data preparation step, the data displayed in Figure 6 were resampled into the following columns, where M1 suffix means the parameter was taken one hour ago, P1 suffix means the parameter was taken one hour later, etc. Primary key: TradeDate, TradeHour Input columns: SinDay, CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear, Bilateral, DayAhead, Intraday, Balancing, TempUkr, TempKiev, BilateralM1, DayAheadM1, IntradayM1, BalancingM1, TempUkrM1, TempKievM1, BilateralM2, DayAheadM2, IntradayM2, BalancingM2, TempUkrM2, TempKievM2, ..., BilateralM23, DayAheadM23, IntradayM23, BalancingM23, TempUkrM23, TempKievM23 Output columns: BilateralP1, DayAheadP1, IntradayP1, BalancingP1, BilateralP2, DayAheadP2, IntradayP2, BalancingP2, ..., BilateralP24, DayAheadP24, IntradayP24, BalancingP24 The obtained dataset had 13'129 records as the first 24 records and the last 24 records after resampling were not fully qualified. The dataset was split into training and testing parts using the standard library function train_test_split from sklearn.model_selection namespace [5]. The obtained datasets were saved into files, so different regression algorithms mentioned further in the paper were evaluated on the same data. 4. Model Evaluation Metrics To measure the influence of input parameters, we used the nearest neighbors regression model represented by class KNeighborsRegressor from sklearn.neighbors namespace. This machine learning algorithm provides quite competitive results and has a small number of hyperparameters to optimize. The Python code snippets that implement this functionality are provided in Appendix 1. The complexity of the algorithm is hidden behind fit and prediction methods. Other regression and classification algorithms also reuse these methods, so the substitution of one algorithm instead of another is relatively simple. The metrics used to measure the discrepancy between the test set and forecasted data are given in Table 1. Here yi is the output value from the i-th record in the testing dataset, fi is the predicted value for the i-th record, y is the average output value over the test dataset. These formulas are considered in the context of one selected output column representing the market volume. Table 1 The name and definition of standard metrics for regression task Metric Name Metric Formula Formula Number 2 R2 score (or determination R2 = 1 − ∑i ( yi − fi ) (2) coefficient) 2 ∑i ( yi − y ) n 1 yi − fi Mean absolute percentage error MAPE = ∑ (3) n i =1 yi n 1 Mean absolute error MAE = ∑ yi − fi (4) n i =1 5. Manual Feature Selection Now we need to evaluate the effect of additional parameters and history length on prediction accuracy. Table 2 shows the accuracy improvements after adding temperature and periodic parameters. It appears all additional parameters are useful, but the overall effect is rather minor. Here are the parameters for the starting model. Input columns: Bilateral, DayAhead, Intraday, Balancing Output columns: BilateralP24, DayAheadP24, IntradayP24, BalancingP24 Table 2 The R2 score obtained for different input parameter sets Bilateral DayAhead Intraday Balancing Starting Model 0.93291021 0.90164708 0.72947093 0.77183561 Temperature Data 0.93439629 0.90410349 0.73455079 0.77676317 Daily Cycle 0.93455756 0.90418401 0.73498679 0.77679022 Weekly Cycle 0.93465155 0.90445549 0.73560531 0.77761618 Monthly Cycle 0.93471811 0.90462339 0.73591691 0.77787893 Yearly Cycle 0.93479404 0.90470696 0.73575183 0.77860211 And the following is the intermediate input parameter set obtained. Input columns: Bilateral, DayAhead, Intraday, Balancing, TempUkr, TempKiev, SinDay, CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear Figure 7 shows the improvements in forecasting results when more historical data is added to the input dataset. The full history for the last 24 hours provides better results. And now the full set of input parameters contains 106 entries that are listed below. Input columns: Bilateral, DayAhead, Intraday, Balancing, BilateralM1, DayAheadM1, IntradayM1, BalancingM1, ..., BilateralM23, DayAheadM23, IntradayM23, BalancingM23, TempUkr, TempKiev, SinDay, CosDay, SinWeek, CosWeek, SinMonth, CosMonth, SinYear, CosYear Figure 7: The dependency of the R2 score on the history length in hours 6. Automatic Feature Selection The high dimensionality of input space is typically considered a problem, especially with noisy data. On the other hand, not all input parameters explored so far have equal contribution to the quality of results. So, it would be helpful to try removing the parameters that provide less useful information than others. It appears this is not complex with the class SelectFromModel from sklearn.feature_selection namespace [6]. This meta-transformer should be provided with an estimator object that, in turn, can calculate the array of feature importances. One of such classes is RandomForestRegressor which gets feature importances as a function of informational entropy. The Python code that implements this approach is demonstrated in Appendix 2. The constructor for the SelectFromModel class also takes the threshold parameter that allows to vary the number of features selected. The optimal results were obtained with 60 features taken out of 106, see the results in Table 3 and Appendix 3 for the feature list itself. Table 3 The R2 score improvements obtained using input feature selection Bilateral DayAhead Intraday Balancing Full Set: 106 Features 0.96129509 0.94019898 0.83718184 0.86971889 60 Selected Features 0.96322701 0.94024491 0.85536199 0.87121345 7. Hourly Forecasting Results So far, all the results were related to 24-hour forecasting. Figures 8 and 9 below show the R2 score and mean absolute percentage error for the range from 1 and up to 24 hours. The one-hour forecasting provides the best results. It is also worth noting that bilateral and day-ahead markets have much better predictability than balancing markets. As for the intraday market, it has a low mean absolute error just because the size of this market is small. Figure 8: The dependency of the R2 score from the forecast range in hours Figure 9: The dependency of mean absolute error from the forecast range (MWh) 8. Forecasting Error Distribution The 24-hour prediction error for all four markets can be measured on the test set, which represents 20 % of the original dataset. For convenience in representation and analysis, the test set was sorted by real market volume. The predicted values are shown in Figures 10–17 with dots. The probability distribution of error is shown using histograms. An interesting finding is that forecasting error is not always Gaussian. In particular, this is the case for bilateral and intraday market volumes. The curve representing balancing market volume in Figure 16 crosses the zero line, also it has more negative values than positive. This can be interpreted as that market players tend to overbuy electricity in other markets, so they need to sell more on average at the last moment. Let’s note that this inefficiency can be mitigated by the usage of forecasting models. Figure 10: Prediction error for 24 hours ahead, bilateral market volume (MWh) Figure 11: Residuals histogram for 24-hour forecasting, bilateral market volume (MWh) Figure 12: Prediction error for 24-hours ahead, day-ahead market volume (MWh) Figure 13: Residuals histogram for 24-hour forecasting, day-ahead market volume (MWh) Figure 14: Prediction error for 24 hours ahead, intraday market volume (MWh) Figure 15: Residuals histogram for 24-hour forecasting, intraday market volume (MWh) Figure 16: Prediction error for 24 hours ahead, balancing market volume (MWh) Figure 17: Residuals histogram for 24-hour forecasting, balancing market volume (MWh) 9. Comparison of Regression Algorithms So far, all the results were obtained with the nearest neighbor regressor. And it makes sense to explore the performance of other algorithms on the same column configuration that is represented in Appendix 3. The output parameters were selected for 24-hour forecasting. The results shown in Table 4 and Table 5 include the comparison with classic instruments like multi-layer perceptron [7], support vector machine [8], and linear regression [9]. The constructors of Python objects representing regression algorithms with corresponding manually optimized hyperparameters are provided in Appendix 4. Table 4 Comparison of R2 scores for regression algorithms on the testing dataset Regression Algorithm Bilateral DayAhead Intraday Balancing Histogram Gradient Boosting 0.98734425 0.97273813 0.87836457 0.91963280 Ada Boost Regressor 0.98008607 0.96134363 0.85172910 0.90325404 Gradient Boosting Regressor 0.97878979 0.96317970 0.84666374 0.90112536 Extra Trees Regressor 0.97461940 0.95963273 0.86484512 0.89815645 Nearest Neighbors Regressor 0.96751227 0.94895676 0.86066507 0.87555149 Random Forest Regressor 0.96680397 0.94718425 0.83167183 0.87304825 Support Vector Machine 0.93841639 0.90790177 0.78281964 0.78573216 Multi-Layer Perceptron (QNO) 0.93589612 0.90409299 0.75444413 0.79110787 Multi-Layer Perceptron (SGD) 0.93414003 0.90877942 0.77358025 0.81562885 Elastic Net Regressor 0.92924816 0.90300302 0.75547081 0.77908284 Linear Regression 0.92921485 0.90297901 0.75552627 0.77906737 Bayes Ridge Regressor 0.92502565 0.89258447 0.74195841 0.77884534 Table 5 Comparison of mean absolute percentage errors for regression algorithms Regression Algorithm Bilateral DayAhead Intraday Balancing Histogram Gradient Boosting 0.00970813 0.03555078 0.30680050 3.41473968 Ada Boost Regressor 0.01043662 0.03988947 0.29964852 3.70352770 Gradient Boosting Regressor 0.01167197 0.04196347 0.33108930 4.30691288 Extra Trees Regressor 0.01340342 0.04470638 0.39715721 3.68779356 Nearest Neighbors Regressor 0.01484254 0.04741409 0.31222159 4.16012392 Random Forest Regressor 0.01538395 0.05016324 0.44490383 4.21424456 Support Vector Machine 0.02049737 0.06506308 0.44628820 5.01011223 Multi-Layer Perceptron (QNO) 0.02201155 0.06895532 0.48465482 4.25173649 Multi-Layer Perceptron (SGD) 0.02328186 0.06766103 0.49768136 4.58588195 Elastic Net Regressor 0.02164488 0.06785682 0.46013951 5.91742745 Linear Regression 0.02167965 0.06799572 0.46071594 5.92935640 Bayes Ridge Regressor 0.02222508 0.06981429 0.50172640 5.90361401 It is worth noting that some algorithms do not natively support multi-output configuration, so it was needed to use the class MultiOutputRegressor to overcome this problem and cover four electrical energy markets with one machine learning model. Tables 4–6 represent the following characteristics obtained for different machine learning models: R2 score, mean absolute percentage error, and mean absolute error. It appears, that for this specific task, the ensemble methods are much better than others, and the winning algorithm Histogram Gradient Boosting is one of them. Also, it is one of the fastest and it can flawlessly handle datasets with missing values. On the current dataset, the training phase takes about 20 seconds. Two different training approaches were used for multi-layer perceptron: quasi-Newton optimizer (QNO) and stochastic gradient descent (SGD). The first algorithm uses analytic solution to weight optimization problem, while the second algorithm employs an iterative process to find the minimum of error function. In both cases the architecture was the same, the perceptron had four layers (that is two hidden layers). Empirically, this architecture was more successful than three or five-layer perceptrons. Meanwhile, all these configurations are universal approximators. It makes sense to explain the high mean absolute percentage error for the balancing column in Table 5. This is not the error, also a couple of zero values in the input dataset were replaced to improve MAPE figures. As shown in Figure 16, most of the values for this column are located close to zero. So, the calculations according to formula (3) involve the division by a small value. In other words, the MAPE metric is just not that adequate for the balancing column. Table 6 Comparison of mean absolute errors for regression algorithms (MWh) Regression Algorithm Bilateral DayAhead Intraday Balancing Histogram Gradient Boosting 114.528749 136.017198 107.623324 287.495480 Ada Boost Regressor 122.856359 151.671858 107.042688 308.036252 Gradient Boosting Regressor 137.181056 161.254944 119.274015 324.798985 Extra Trees Regressor 156.724684 165.609354 118.335948 320.904618 Nearest Neighbors Regressor 175.124238 183.691670 112.622368 344.727996 Random Forest Regressor 180.816224 187.724606 131.586872 360.659798 Support Vector Machine 239.541471 247.547199 150.333357 475.954169 Multi-Layer Perceptron (QNO) 257.264807 260.785606 159.517580 481.078516 Multi-Layer Perceptron (SGD) 270.583353 256.261356 157.399092 445.710338 Elastic Net Regressor 253.327126 259.857349 155.518920 487.916805 Linear Regression 253.691852 260.242048 155.743585 488.161734 Bayes Ridge Regressor 260.956241 269.083319 160.748064 487.498900 10.Conclusion This paper demonstrates that proper forecasting model selection is a multi-stage process that may involve data selection, data preprocessing, data augmentation, selection of machine learning algorithm, optimization of hyperparameters, etc. While all computations for this work were done on a regular 8-core machine, the creation of the MLOps pipeline may require much more powerful computation resources. The pre-trained model can be saved into a file for subsequent reuse in the production environment. There are two formats popular among Python developers: .joblib and .pickle. In addition, there is .onnx format that can be loaded not just in Python, but also in faster .NET or Java- based applications [10]. It is worth noting that Microsoft and other vendors invest significant resources into the development of multi-platform capabilities for machine learning [11]. There are dedicated solutions that can host the machine learning models using a microservice approach like Seldon Core [12]. According to this architecture, the serialized models are preloaded within docker containers and expose the HTTPS interface. So, the application can send input data vector as JSON document in REST API request. The HTTP response will contain the JSON document with predicted values. The forecasting accuracy that was obtained for electrical energy markets is different. Nevertheless, the 1 % error for 24-hour forecasting of the bilateral market looks impressive. Such a forecast can be useful at the country scale to ensure required fuel supply, plan import/export operations, reduce electricity costs, etc. Similar research can be done for more specific datasets from commercial and state energy enterprises. References [1] A new model of the electricity market has been launched in Ukraine. URL: https://expro.com.ua/en/tidings/a-new-model-of-the-electricity-market-has-been-launched-in- ukraine. [2] M. Osińska, M. Kyzym, V. Khaustova, O. Ilyash, T. Salashenko, Does the Ukrainian electricity market correspond to the European model?, Utilities Policy 79 (2022), 1–14. doi: 10.1016/j.jup.2022.101436. [3] A. Doroshenko, D. Zhora, O. Savchuk, O. Yatsenko, Application of machine learning techniques for forecasting electricity generation and consumption in Ukraine, in: Proceedings of IT&I 2023, 2023, pp. 136–146. URL: https://ceur-ws.org/Vol-3624/Paper_12.pdf. [4] Van Wyk, Encoding Cyclical Features for Deep Learning. URL: https://www.kaggle.com/code/avanwyk/encoding-cyclical-features-for-deep-learning. [5] Scikit-learn: Machine Learning in Python. URL: https://scikit-learn.org/stable/ [6] Feature selection with Scikit-learn library. URL: https://scikit- learn.org/stable/modules/feature_selection.html. [7] S. Haykin, Neural networks: a comprehensive foundation, Prentice Hall, Upper Saddle River, NJ, 1998. [8] V. N. Vapnik, Statistical learning theory, Wiley, Hoboken, NJ, 1998. [9] C. M. Bishop, Pattern recognition and machine learning, Springer, New York, NY, 2006. URL: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition- and-Machine-Learning-2006.pdf. [10] G. Novack, Deploy Sci-kit Learn models in .NET Core Applications. URL: https://towardsdatascience.com/deploy-sci-kit-learn-models-in-net-core-applications- 90e24e572f64. [11] X. Dupre, O. Grisel, Accelerate and simplify Scikit-learn model inference with ONNX Runtime. URL: https://cloudblogs.microsoft.com/opensource/2020/12/17/accelerate-simplify-scikit-learn- model-inference-onnx-runtime/ [12] V. Shanawad, Optimizing Custom Model Deployment with Seldon Core. URL: https://medium.com/@vinayakshanawad/serving-hugging-face-transformers-optimizing- custom-model-deployment-with-seldon-core-a593f6ea7549. Appendix 1 The function that evaluates the input parameter set using nearest neighbors regressor: def evaluate_input_features \ (training_inputs, testing_inputs, training_outputs, testing_outputs): print("Evaluating the datasets using nearest neighbors regression model") evaluation_regressor = KNeighborsRegressor(n_neighbors = 5, weights = 'distance', algorithm = 'auto', p = 1, metric = 'minkowski', n_jobs = 8) evaluation_regressor.fit(training_inputs, training_outputs) predicted_training_outputs = evaluation_regressor.predict(training_inputs) predicted_testing_outputs = evaluation_regressor.predict(testing_inputs) print_evaluation_metrics(training_outputs, testing_outputs, \ predicted_training_outputs, predicted_testing_outputs) def print_evaluation_metrics(training_outputs, testing_outputs, \ predicted_training_outputs, predicted_testing_outputs): training_score = r2_score(training_outputs, predicted_training_outputs, multioutput = 'raw_values', force_finite = True) print(F"Training dataset R2 score(s): {training_score}") training_percentage_error = mean_absolute_percentage_error \ (training_outputs, predicted_training_outputs, multioutput = 'raw_values') print(F"Training percentage error(s): {training_percentage_error}") train_standard_deviation = mean_squared_error \ (training_outputs, predicted_training_outputs, \ multioutput = "raw_values", squared = False) print("Training standard deviation(s):", train_standard_deviation) train_absolute_error = mean_absolute_error \ (training_outputs, predicted_training_outputs, multioutput = 'raw_values') print("Training mean absolute error(s):", train_absolute_error) test_score = r2_score(testing_outputs, predicted_testing_outputs, multioutput = 'raw_values', force_finite = True) print(F"Testing dataset R2 score(s): {test_score}") test_percentage_error = mean_absolute_percentage_error \ (testing_outputs, predicted_testing_outputs, multioutput = 'raw_values') print(F"Testing percentage error(s): {test_percentage_error}") test_standard_deviation = mean_squared_error \ (testing_outputs, predicted_testing_outputs, \ multioutput = "raw_values", squared = False) print("Testing standard deviation(s):", test_standard_deviation) test_absolute_error = mean_absolute_error \ (testing_outputs, predicted_testing_outputs, multioutput = 'raw_values') print("Testing mean absolute error(s):", test_absolute_error) Appendix 2 Identifying the features that provide higher information entropy: random_forest = RandomForestRegressor \ (n_estimators = 100, criterion = 'squared_error', ccp_alpha = 0.0) random_forest.fit(dataset_inputs, dataset_outputs) random_forest.feature_importances_ optimized_model = SelectFromModel(random_forest, \ threshold = "0.12 * mean", prefit = True) optimized_inputs = optimized_model.transform(dataset_inputs) optimized_model.get_feature_names_out(input_names) Appendix 3 Most informative input parameters selected with a random forest model: selected_names = \ ['Bilateral', 'DayAhead', 'Intraday', 'Balancing', 'BilateralM1', 'DayAheadM1', 'IntradayM1', 'BalancingM1', 'DayAheadM2', 'IntradayM2', 'BalancingM2', 'IntradayM6', 'BilateralM7', 'DayAheadM7', 'IntradayM7', 'BalancingM7', 'BilateralM8', 'DayAheadM8', 'IntradayM8', 'BilateralM9', 'DayAheadM9', 'IntradayM9', 'BilateralM10', 'DayAheadM10', 'IntradayM10', 'BilateralM13', 'BilateralM14', 'IntradayM14', 'BalancingM14', 'BilateralM15', 'DayAheadM15', 'IntradayM15', 'BalancingM15', 'BilateralM16', 'DayAheadM16', 'IntradayM16', 'BilateralM17', 'DayAheadM17', 'IntradayM17', 'BilateralM18', 'DayAheadM18', 'IntradayM18', 'IntradayM19', 'BilateralM21', 'BilateralM22', 'DayAheadM22', 'IntradayM22', 'BilateralM23', 'DayAheadM23', 'IntradayM23', 'BalancingM23', 'TempUkr', 'TempKiev', 'CosDay', 'SinWeek', 'CosWeek', 'SinMonth', 'CosMonth', 'SinYear', 'CosYear'] Appendix 4 Regressor constructors with corresponding hyperparameters: MultiOutputRegressor(HistGradientBoostingRegressor (loss = 'squared_error', learning_rate = 0.20, max_iter = 300, early_stopping = False, scoring = 'loss', random_state = 1)) MultiOutputRegressor(AdaBoostRegressor (estimator = DecisionTreeRegressor(criterion = 'squared_error', splitter = 'best', max_depth = None, min_samples_split = 2, max_features = None, random_state = 1), n_estimators = 10, learning_rate = 1.0, loss = 'square', random_state = 1)) MultiOutputRegressor(GradientBoostingRegressor (loss = 'squared_error', learning_rate = 0.39, n_estimators = 100, subsample = 1.0, criterion = 'squared_error', max_depth = 6, random_state = 1, max_leaf_nodes = None, ccp_alpha = 0.0)) ExtraTreesRegressor(n_estimators = 100, criterion = 'squared_error', max_depth = None, max_features = 1.0, bootstrap = False, n_jobs = 8, random_state = 1, ccp_alpha = 0.0) KNeighborsRegressor(n_neighbors = 3, weights = 'distance', algorithm = 'auto', p = 1, metric = 'minkowski', n_jobs = 8) RandomForestRegressor(n_estimators = 100, criterion = 'squared_error', max_features = 1.0, bootstrap = True, ccp_alpha = 0.0, n_jobs = 8, random_state = 1) MultiOutputRegressor(NuSVR(nu = 0.4, C = 1000000.0, kernel = 'rbf', gamma = 'scale', shrinking = True, max_iter = -1)) MLPRegressor(hidden_layer_sizes = (200, 200,), activation = 'relu', solver = 'lbfgs', alpha = 0.0000, max_iter = 5000, random_state = 1) MLPRegressor(hidden_layer_sizes = (200, 200,), activation = 'relu', solver = 'adam', alpha = 0.0002, max_iter = 200, batch_size = min(50, training_set_size), shuffle = True, random_state = 1, early_stopping = False) ElasticNet(alpha = 1.0, l1_ratio = 1.0, fit_intercept = True, max_iter = 1000, positive = False, random_state = 1, selection = 'cyclic') LinearRegression(fit_intercept = True, n_jobs = 8) MultiOutputRegressor(BayesianRidge(max_iter = 300, tol = 0.001, alpha_init = None, lambda_init = 1.0, fit_intercept = True))