<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. Zhora);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Comparison of Machine Learning Algorithms for the Task of Weather and Air Pollution Forecasting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anatoliy Doroshenko</string-name>
          <email>doroshenkoanatoliy2@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Zhora</string-name>
          <email>dmitry.zhora@gmx.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavlo Ivanenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Yatsenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Software Systems of the National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>Glushkov Ave. 40, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University of Ukraine "Igor Sikorsky Kiev Polytechnic Institute"</institution>
          ,
          <addr-line>Peremohy Ave, 37, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The task of weather forecasting becomes more important under conditions of global warming. Similarly, the air pollution prediction has higher value when industrial enterprises neglect environmental pollution issues. This research demonstrates how hourly weather and air pollution data can be restructured for the forecasting up to 24 hours ahead, and studies the cross-influence of parameters as all of them represent the atmosphere as single object from physical world. The parameter differences calculated for different points in time are considered as additional inputs and outputs of machine learning model. The prediction accuracy is analyzed for twelve regression algorithms using popular metrics like MASE, R2 and MAE.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;machine learning</kwd>
        <kwd>regression algorithms</kwd>
        <kwd>weather forecasting</kwd>
        <kwd>air pollution forecasting 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Weather and Air Pollution Data</title>
      <p>The weather and air pollution data were downloaded from the website openweathermap.org. This
service allows to retrieve multiple atmospheric characteristics for arbitrary GPS coordinates. The
main columns of this dataset for Kyiv city are shown below in Fig. 1. This table contains hourly
data and 33,863 records overall, from Nov 25, 2020 to Oct 05, 2024.</p>
      <sec id="sec-2-1">
        <title>Number of seconds elapsed since</title>
        <p>1970-01-01T00:00:00 GMT
Local date of measurement (Kyiv)
Local hour from 0 to 23 (Kyiv)
Air temperature in degrees Celsius
Dew point in degrees Celsius
Atmospheric pressure in millibars
Air humidity as percentage
Wind speed in meters per second
Wind direction azimuth in degrees
Sine of wind direction angle
Cosine of wind direction angle
Sky cloudiness as percentage
CO pollution level in μg/m³
NO pollution level in μg/m³
NO2 pollution level in μg/m³
O3 pollution level in μg/m³
SO2 pollution level in μg/m³
NH3 pollution level in μg/m³
Dust pollution with particles less
than 2.5 micrometers in μg/m³
Dust pollution with particles less
than 10 micrometers in μg/m³
Sine value for daily cycle
Cosine value for daily cycle
Sine value for weekly cycle
Cosine value for weekly cycle
Sine value for monthly cycle
Cosine value for monthly cycle
Sine value for yearly cycle
Cosine value for yearly cycle
Despite this work accounts only for data from one city, the first UTC time column in Table 1 above
is helpful to synchronize records from multiple locations. Correspondingly, the local date and time
columns are important for customers. The air temperature and dew point are presented in degrees
Celsius. The atmospheric pressure is measured in millibars (or hectopascals). The humidity and
cloudiness are both represented as percentages.</p>
        <p>
          The next subset of weather-related parameters are wind characteristics. The degrees are used
typically to register wind direction. However, this format is not convenient for machine learning
algorithms [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] due to the representation gap between 359° and 0°. One of the popular approaches
for solving this problem is the usage of the sine and cosine of the corresponding angle [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. These
columns were calculated using an algorithm written in Python. The reverse transformation is also
possible when forecasted values of wind sine and cosine are properly normalized. The wind speed
is measured correspondingly in meters per second.
        </p>
        <p>The air pollution levels for various indicators shown in Table 1 are measured in micrograms
per cubic meter (μg/m³). Carbon monoxide stands out as the most significant pollutant due to its
high concentration. The parameters LevelPM2 and LevelPM10 denote dust pollution with particles
up to 2.5 and 10 micrometers, respectively. It's important to note that the PM10 value includes the
PM2.5 level. The particles that are 2.5 micrometers or smaller are particularly harmful as they can
directly enter the bloodstream. Mid-sized particles can easily pass through the airways and settle in
the lungs. Lastly, particles larger than 10 micrometers are typically filtered out by the respiratory
tract and do not reach the lungs.</p>
        <p>
          The accuracy of the forecast can be enhanced by incorporating cyclical parameters [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], that are
presented in the lower section of Table 1. For instance, the cosine of daily cycle represents the
temperature and light variations between day and night. Likewise, the cosine of the yearly cycle
captures the changes between winter and summer.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data Imputation and Resampling</title>
      <p>
        The weather dataset included all necessary records for the specified period. At the same time, the
pollution data lacked 275 records and contained several negative and outlier values, which were
removed. The missing entries were subsequently recalculated using the KNNImputer class [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The machine learning algorithms in the scikit-learn library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] require that all input and output
parameters be represented in separate columns. However, this structure is not ideal for time series
forecasting, where past and future data vary by record number and occupy the same columns. So,
the dataset was restructured for training and forecasting purposes, with additional weather and
pollution parameters included. The suffix notation used is detailed in the example below.
•
•
•
•
      </p>
      <p>Temperature-P1, the temperature in 1 hour
…
Temperature-P24, the temperature in 24 hours
Temperature-M1, the temperature 1 hour ago
…</p>
      <p>Temperature-M24, the temperature 24 hours ago</p>
      <p>
        Similarly, the dataset was augmented with parameter differences, as described in the list below.
Strictly speaking this information is redundant, but the layout of samples in the multi-dimensional
space can be different in relation to internal computations of regression algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
•
•
•
•
      </p>
      <p>In time series slang, the two groups of parameters above are often referred to as lags and diffs.
The periodic parameters do not need to be duplicated, as they precisely represent the moment in
time for machine learning purposes. The dataset was divided into training and testing segments in
an 80% to 20% ratio. All training data precede the testing records chronologically, with the split
date being December 28, 2023.</p>
      <p>In total, there are 8 weather parameters and 8 pollution parameters available for current hour.
In particular, the feature WindAngle was excluded due to its discontinuous nature. If the past and
future hours are considered then differences can be added. So, overall 16 weather and 16 pollution
parameters can be used as inputs and outputs of a machine learning algorithm. When the whole
24-hour history is taken into account and periodic parameters are added the total number of inputs
becomes 8 + 8 + (16 + 16) * 24 + 8 = 792. Thus, the total number of possible input combinations is
2792. Clearly, this work does not attempt to explore this combinatorial space and aims to use more
affordable approaches to optimize the forecasting accuracy.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Regression Performance Metrics</title>
      <p>The mean absolute scaled error (MASE) is regarded as a superior alternative to the mean absolute
percentage error (MAPE). A major drawback of the MAPE metric is that it can produce excessively
large values when the dataset includes samples that are near zero. A classic example of this issue is
temperature measured in degrees Celsius.</p>
      <p>The main idea behind MASE metric is to compare the performance of a regression algorithm to
naïve forecast approach when the current value of time series is used as a forecast for next step.
This is also called as null hypothesis in the terminology of capital markets. So, here’s the formula
that implements this approach.</p>
      <p>Here designates the number of records in the test set, – the number of steps the forecast is
made for, – the actual component output value from the test set, – the predicted component
output value. The numerator represents mean absolute error, and denominator represents the error
of naïve forecast. As can be concluded from the formula, the MASE metric is higher than or equal
to 0. The lower its value the more accurate predictions were made. The forecast can be considered
as successful when MASE metric is lower than 1. Correspondingly, when MASE value is higher
than 1 the forecast cannot be considered as useful, and regression algorithm performs even worse
than naïve method. The algorithm that calculates MASE metric is presented in Appendix A.</p>
      <p>Another popular metric for regression tasks is R2 score, also called as determination coefficient.
It has some similarities with a correlation coefficient in the interpretation aspects. Nevertheless, the
calculation formula is different.
(1)
(2)</p>
      <p>Here designates the mean value for actual component output from the test set. The higher
the value of R2 score the better, its maximum possible value is 1 for precise forecast. If R2 score is
higher than 0 the prediction can be considered as successful. If it is lower than 0 than forecast is
rather harmful and its results better be avoided.</p>
      <p>The mean absolute error (MAE) is the simplest metric. It is convenient for field engineers as its
values are represented in corresponding measurement units, so that it is easy to verify if the error
matches the real-world constraints. The calculation formula for MAE error is presented below.
(3)</p>
      <p>As demonstrated in Table 1, up to 16 parameters can be selected as the outputs of a regression
algorithm. Meanwhile, this research does not attempt to address the multi-objective optimization
problem. All parameters of the machine learning algorithm are optimized solely to minimize the
sum of MASE metrics for individual output parameters.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Prediction of Combined Outputs</title>
      <p>
        The evaluation of input features was accomplished with ExtraTreesRegressor algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] from
scikit-learn library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It has limited number of hyperparameters to tune and provides the array of
feature importances that enable individual feature selection.
      </p>
      <p>The starting point of this research is to employ a single machine learning model that forecasts
all 16 output parameters. The users are typically interested in all forecast ranges from 1 hour and
up to 24 hours ahead. In order to reduce the computational burden and balance the quality of
shortterm and long-term forecasting it was desided to tune the model initially for 12-hour forecasting.</p>
      <p>The MASE metric dependencies on the history length in hours are illustrated in Figure 2. It is
evident that difference inputs noticeably improve the quality of prediction. Additionally, periodic
parameters are quite important for shorter history. Nevetheless, the best results were achieved
with a 13-hour history and without periodic parameters. Below are the lists representing
inputoutput configuration for this scenario (400 inputs vs 16 outputs).</p>
      <p>Input features: ['Temperature', 'DewPoint', 'Pressure', 'Humidity', 'WindSpeed',
'WindSine', 'WindCosine', 'CloudLevel', 'LevelCO', 'LevelNO', 'LevelNO2', 'LevelO3',
'LevelSO2', 'LevelNH3', 'LevelPM2', 'LevelPM10', 'Temperature-M1', 'DewPoint-M1',
'Pressure-M1', 'Humidity-M1', 'WindSpeed-M1', 'WindSine-M1', 'WindCosine-M1',
'CloudLevel-M1', 'LevelCO-M1', 'LevelNO-M1', 'LevelNO2-M1', 'LevelO3-M1',
'LevelSO2-M1', 'LevelNH3-M1', 'LevelPM2-M1', 'LevelPM10-M1', 'Temperature-Diff-M1',
'DewPoint-Diff-M1', 'Pressure-Diff-M1', 'Humidity-Diff-M1', 'WindSpeed-Diff-M1',
'WindSine-Diff-M1', 'WindCosine-Diff-M1', 'CloudLevel-Diff-M1', 'LevelCO-Diff-M1',
'LevelNO-Diff-M1', 'LevelNO2-Diff-M1', 'LevelO3-Diff-M1', 'LevelSO2-Diff-M1',
'LevelNH3-Diff-M1', 'LevelPM2-Diff-M1', 'LevelPM10-Diff-M1', ... , 'Temperature-M13',
'DewPoint-M13', 'Pressure-M13', 'Humidity-M13', 'WindSpeed-M13', 'WindSine-M13',
'WindCosine-M13', 'CloudLevel-M13', 'LevelCO-M13', 'LevelNO-M13', 'LevelNO2-M13',
'LevelO3-M13', 'LevelSO2-M13', 'LevelNH3-M13', 'LevelPM2-M13', 'LevelPM10-M13',
'Temperature-Diff-M13', 'DewPoint-Diff-M13', 'Pressure-Diff-M13', 'Humidity-Diff-M13',
'WindSpeed-Diff-M13', 'WindSine-Diff-M13', 'WindCosine-Diff-M13',
'CloudLevel-DiffM13', 'LevelCO-Diff-M13', 'LevelNO-Diff-M13', 'LevelNO2-Diff-M13', 'LevelO3-Diff-M13',
'LevelSO2-Diff-M13', 'LevelNH3-Diff-M13', 'LevelPM2-Diff-M13', 'LevelPM10-Diff-M13']
Output features: ['Temperature-P12', 'DewPoint-P12', 'Pressure-P12', 'Humidity-P12',
'WindSpeed-P12', 'WindSine-P12', 'WindCosine-P12', 'CloudLevel-P12', 'LevelCO-P12',
'LevelNO-P12', 'LevelNO2-P12', 'LevelO3-P12', 'LevelSO2-P12', 'LevelNH3-P12',
'LevelPM2-P12', 'LevelPM10-P12']</p>
      <p>The performance of this input model for different forecast ranges is demonstrated in Figure 4.
The R2 score is more relevant in this case, and the best results were obtained for 1-hour forecasting.
As shown in Equation 1, the MASE metric depends on the forecast range, making the comparison
of nearby samples unfair. This dependency is presented here for illustrative purposes.</p>
      <p>The feature importances calculated by ExtraTreesRegressor class for a full 24-hour history with
periodic parameters are presented in Figure 5. It appears that cloudiness and CO concentration are
the most predictive parameters. Additionally, the cosine representation of yearly and daily cycles
are quite important.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Prediction of Weather Outputs</title>
      <p>While preserving the same input features there is a way to split output parameters on weather
and air pollution groups. The MASE metrics for the forecasting of weather parameters are shown
above in Figure 6. The best results were obtained again for 12-hour history and without periodic
parameters, and this is an improvement in relation to combined forecast.</p>
      <p>Testing mean scaled error(s) (MASE): [0.37630252 0.98280471 1.15055085 0.37381643
0.79749537 1.05510771 1.09868261 1.05295743], sum = 6.887717634</p>
    </sec>
    <sec id="sec-7">
      <title>7. Prediction of Pollution Outputs</title>
      <p>The MASE metrics for the prediction of pollution parameters are shown below in Figure 9. The best
results were obtained for 17-hour history with differences and with periodic parameters.</p>
      <p>Testing mean scaled error(s) (MASE): [0.85229725 1.02571192 0.75076816 0.49912568
0.75825543 0.76679182 1.01558379 1.0260866], sum = 6.694620651
And this is another improvement in comparison to the combined forecast. Regarding the shape of
MASE graph, there is a general rule that initially the prediction accuracy improves when more
useful information is provided to machine learning algorithm. However, when parameters become
redundant or start introducing the noise into the system the forecast quality decreases.</p>
      <p>
        As for input feature selection, there is a possibility to select the most important features using
SelectFromModel class [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. At the same time, this research is particularly difficult for weather and
air pollution datasets and it did not become the part of this article.
      </p>
    </sec>
    <sec id="sec-8">
      <title>8. Comparison of Regression Algorithms</title>
      <p>Once the split of output parameters allowed to improve the prediction accuracy, it makes sense to
consider forecasting of a single output. Besides, this can be done using other regression algorithms
available in scikit-learn library, the MASE metrics obtained are presented in Table 2.
Multi-Layer Perceptron
Nearest Neighbors
Ada Boost Regressor</p>
      <p>The prediction accuracy has been improved again. The hyperparameters for machine learning
algorithms listed in a table were manually optimized and they are available in Appendix B. As for
R2 scores and MAE metrics for the same experiments they are presented in Appendices C and D.</p>
      <p>It was quite expected that decision tree based ensemble methods would take top of the chart.
The negative surprises are that KNeighborsRegressor provided poor results and AdaBoostRegressor
failed to forecast many output characteristics. The positive surprise is that Support Vector Machine
(class NuSVR) took second place. However, this was achieved at the cost of high training time that
takes tens of minutes on 8-core machine.</p>
      <p>The winner algorithm for this dataset is GradientBoostingRegressor, its training time for every
model takes about 5 minutes. The HistGradientBoostingRegressor provides similar results, but runs
much faster, its training time is about 5 seconds per model. As for ExtraTreesRegressor, the time to
train the model is also short and takes tens of seconds.</p>
      <p>The linear methods occupy the middle of the list and this emphasizes the complexity of current
task. It is quite unexpected that linear regression outperforms classic machine learning instruments
like DecisionTreeRegressor and Multi-Layer Perceptron with quasi-Newton optimizer.</p>
      <p>The prediction accuracy is not the only factor for selection of machine learning model. Other
factors include the training time and the size of the serialized model on the disk. These aspects
become especially important in cloud environments. Additionally, for selecting an input-output
model that requires many iterations to complete, faster algorithms are preferred.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Prediction of Parameter Differences</title>
      <p>So far the parameter differences were used only as inputs. At the same time, the differences can be
forecasted the same way as direct parameters. The future value of a parameter can be calculated as
the sum of current parameter value and difference forecasted.</p>
      <p>The table 3 below compares these two approaches. Because of Equations 1 and 2 the MASE and
R2 metrics are not directly comparable. However, the MAE error for differences is calculated using
equivalent formula, and this metric allows to compare the forecasting accuracy. It appears, that the
forecast of differences provides an improvement for many weather parameters and some pollution
parameters. And this happens more often for characteristics with good predictability.</p>
      <sec id="sec-9-1">
        <title>Prediction Type, Metric</title>
      </sec>
      <sec id="sec-9-2">
        <title>Temperature-P12 DewPoint-P12 Pressure-P12 Humidity-P12</title>
      </sec>
      <sec id="sec-9-3">
        <title>Prediction Type</title>
      </sec>
      <sec id="sec-9-4">
        <title>Direct Forecast, MASE</title>
        <p>Difference Forecast, MASE
Direct Forecast, R2
Difference Forecast, R2
Direct Forecast, MAE
Difference Forecast, MAE</p>
      </sec>
      <sec id="sec-9-5">
        <title>Prediction Type</title>
      </sec>
      <sec id="sec-9-6">
        <title>Direct Forecast, MASE</title>
        <p>Difference Forecast, MASE
Direct Forecast, R2
Difference Forecast, R2
Direct Forecast, MAE
Difference Forecast, MAE</p>
      </sec>
      <sec id="sec-9-7">
        <title>Prediction Type</title>
      </sec>
      <sec id="sec-9-8">
        <title>Direct Forecast, MASE</title>
        <p>Difference Forecast, MASE
Direct Forecast, R2
Difference Forecast, R2
Direct Forecast, MAE
Difference Forecast, MAE
This work proposes modern approaches for the forecasting of weather and air pollution parameters
that define input history length, output parameter configuration and selection of machine learning
algorithm. The best results were obtained for GradientBoostingRegressor class.</p>
        <p>The usage of differences both on input and output sides of the algorithm helps to improve the
results. The forecasting accuracy varies a lot for different output parameters. In particular, wind,
cloudiness and air pollution characteristics are quite difficult to predict.</p>
        <p>The selection of output parameters has significant influence on the accuracy of the algorithm.
And the best results were obtained when individual machine learning model was trained for every
output feature. Correspondingly, the selection of single multi-output regression algorithm is not
the optimal choice. As expected, better results require more computational resources.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <sec id="sec-10-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>A. Appendix: MASE Metric</title>
      <p>The function to calculate the mean absolute scaled error is missing in version 1.6 of scikit-learn
library, so one of the options is to implement it manually.
def mean_absolute_scaled_error(dataset_outputs, \
predicted_dataset_outputs, multioutput = 'raw_values', forecast_range = 1):
assert multioutput == 'raw_values', "Only multi-output mode is supported for now"
if (isinstance(dataset_outputs, pandas.DataFrame)):</p>
      <p>dataset_outputs = dataset_outputs.to_numpy()
if (isinstance(predicted_dataset_outputs, pandas.DataFrame)):</p>
      <p>predicted_dataset_outputs = predicted_dataset_outputs.to_numpy()
if (len(dataset_outputs.shape) == 1):</p>
      <p>dataset_outputs = numpy.array([[number] for number in dataset_outputs])
if (len(predicted_dataset_outputs.shape) == 1):
predicted_dataset_outputs = numpy.array \</p>
      <p>([[number] for number in predicted_dataset_outputs])
record_count = dataset_outputs.shape[0]
assert record_count == predicted_dataset_outputs.shape[0], \</p>
      <p>
        "The original and predicted dataset outputs should have the same record count"
column_count = dataset_outputs.shape[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
assert column_count == predicted_dataset_outputs.shape[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], \
      </p>
      <p>"The original and predicted dataset outputs should have the same column count"
assert record_count &gt; forecast_range, \</p>
      <p>"The number of dataset records should be higher than forecast range"
scaled_errors = []
for j in range(0, column_count):
naive_prediction_mismatch = 0.0
for i in range (forecast_range, record_count):
diff = dataset_outputs[i, j] - dataset_outputs[i - forecast_range, j]
naive_prediction_mismatch += abs(diff)
mase_denominator = naive_prediction_mismatch / (record_count - forecast_range)
current_prediction_mismatch = 0.0
for i in range(0, record_count):
diff = predicted_dataset_outputs[i, j] - dataset_outputs[i, j]
current_prediction_mismatch += abs(diff)
mase_numerator = current_prediction_mismatch / record_count
scaled_error = mase_numerator / mase_denominator
scaled_errors.append(scaled_error)
return numpy.array(scaled_errors)</p>
    </sec>
    <sec id="sec-12">
      <title>B. Appendix: Hyperparameters</title>
      <p>The Python-based expressions below represent the constructors of regression algorithm objects
with corresponding hyperparameters, random number generation and parallelization settings.
ExtraTreesRegressor(n_estimators = 100, criterion = 'squared_error',</p>
      <p>ccp_alpha = 0.0, random_state = 1, n_jobs = 8)
RandomForestRegressor(n_estimators = 100, criterion = 'squared_error',
max_features = 0.2, min_samples_split = 6, ccp_alpha = 0.0,
random_state = 1, n_jobs = 8)
HistGradientBoostingRegressor(loss = 'squared_error', learning_rate = 0.1,
max_iter = 100, min_samples_leaf = 20, l2_regularization = 0.1, random_state = 1)
GradientBoostingRegressor(loss = 'huber', learning_rate = 0.15,
n_estimators = 100, subsample = 0.9, criterion = 'friedman_mse',
max_depth = 5, alpha = 0.85, random_state = 1)
AdaBoostRegressor(estimator = initial_estimator,</p>
      <p>n_estimators = 100, loss = 'linear', random_state = 1)
DecisionTreeRegressor(criterion = 'squared_error', max_depth = 7,</p>
      <p>min_samples_leaf = 2, min_weight_fraction_leaf = 0.011, random_state = 1)
KNeighborsRegressor(n_neighbors = 24, weights = 'distance',</p>
      <p>algorithm = 'auto', p = 1, metric='minkowski', n_jobs = 8)
NuSVR(nu = 0.8, C = 1000.0, kernel = 'rbf')
MLPRegressor(hidden_layer_sizes = (200,), activation = 'relu',</p>
      <p>solver = 'lbfgs', alpha = 0.0000, max_iter = 1000, random_state = 1)
ElasticNet(alpha = 0.01, l1_ratio = 0.01, fit_intercept = True, precompute = True,
max_iter = 1000, tol = 0.001, selection='cyclic', random_state = 1)
Ridge(alpha = 1.0, fit_intercept = True, solver = 'svd', random_state = 1)
LinearRegression(fit_intercept = True, n_jobs = 8)</p>
    </sec>
    <sec id="sec-13">
      <title>C. Appendix: R2 Scores</title>
      <p>The R2 scores below were calculated for experiments covered in section 8, when the machine
learning algorithm had just one output parameter configured. The best algorithm according to this
metric is still gradient boosting regressor.</p>
      <p>Table 4a: R2 scores obtained for weather parameters and 12-hour forecasting.</p>
      <sec id="sec-13-1">
        <title>Regression Algorithm Temperature-P12 DewPoint-P12 Pressure-P12</title>
      </sec>
      <sec id="sec-13-2">
        <title>Gradient Boosting</title>
        <p>Support Vector Machine
Histo-Gradient Boosting
Extra Trees Regressor
Random Forest Regressor
Elastic Net Regression
Linear Regression
Bayes Ridge Regression
Decision Tree Regressor
Multi-Layer Perceptron
Nearest Neighbors
Ada Boost Regressor</p>
      </sec>
      <sec id="sec-13-3">
        <title>Regression Algorithm LevelSO2-P12 LevelNH3-P12 LevelPM2-P12 LevelPM10-P12</title>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>D. Appendix: MAE Results</title>
      <p>The MAE errors below were calculated for experiments covered in section 8, when the machine
learning algorithm had just one output parameter configured. The measurement units correspond
to original parameters listed in Table 1.
16.238159
16.389989
17.017623
17.189998
17.594297
17.661253
17.667499
17.671916
18.572952
18.063861
18.868328
48.326031</p>
      <sec id="sec-14-1">
        <title>Regression Algorithm</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer, New York, NY,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haykin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Neural Networks</surname>
            :
            <given-names>A Comprehensive</given-names>
          </string-name>
          <string-name>
            <surname>Foundation</surname>
          </string-name>
          , Prentice Hall, Hoboken, NJ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          , Statistical Learning Theory, Wiley, Hoboken, NJ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Hands-On Introduction</surname>
          </string-name>
          to Machine Learning, 1st. ed., Cambridge University Press, Cambridge,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] Machine learning methods for predicting wind generation</article-title>
          ,
          <source>Electricity Authority Te Mana Hiko, Wellington</source>
          ,
          <year>2022</year>
          . URL: https://www.ea.govt.nz/documents/2385/Machine-learningmethods
          <article-title>-for-predicting-wind-generation_MkxN3ZL</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Levinson</surname>
          </string-name>
          ,
          <article-title>Three approaches to encoding time information as features for ML models</article-title>
          ,
          <source>Nvidia Developer Technical Blog</source>
          ,
          <year>2022</year>
          , URL: https://developer.nvidia.com/blog/three
          <article-title>-approaches-toencoding-time-information-as-features-for-ml-models/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>A. Van Wyk</surname>
          </string-name>
          ,
          <article-title>Encoding cyclical features for deep learning</article-title>
          , URL: https://www.kaggle.com/code /avanwyk/encoding
          <article-title>-cyclical-features-for-deep-learning.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] Scikit-learn: imputation of missing values</article-title>
          . URL: https://scikit-learn.org/stable/modules /impute.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Scikit-learn: machine learning in Python</article-title>
          . URL: https://scikit-learn.org/stable/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>Skforecast: a Python library for time series forecasting</article-title>
          . URL: https://skforecast.org/0.14.0 /index.html.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mlforecast</surname>
            <given-names>:</given-names>
          </string-name>
          <article-title>scalable machine learning for time series forecasting</article-title>
          . URL: https://nixtlaverse.nixtla.io/mlforecast/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <article-title>Scikit-learn: ExtraTreesRegressor</article-title>
          . URL: https://scikit-learn.org/stable/modules/generated /sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wohlwend</surname>
          </string-name>
          ,
          <article-title>Regression model evaluation metrics: R-squared, adjusted R-squared, MSE, RMSE, and</article-title>
          <string-name>
            <surname>MAE</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://medium.com/@brandon93.w/regression
          <article-title>-modelevaluation-metrics-r-squared-adjusted-r-squared-mse-rmse-and-mae-24dcc0e4cbd3.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <article-title>Feature selection with scikit-learn library</article-title>
          . URL: https://scikit-learn.org/stable/modules /feature_selection.html.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>Scikit-learn: SelectFromModel class</article-title>
          . URL: https://scikit-learn.org/stable/modules/generated /sklearn.feature_selection.SelectFromModel.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <article-title>Get started with time series forecasting in Python, 2020</article-title>
          . URL: https://medium.com /analytics-vidhya/
          <article-title>get-started-with-time-series-forecasting-in-python-c8ca78ee84a5.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ravindiran</surname>
          </string-name>
          , G. Hayder,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanagarathinam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alagumalai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sonne</surname>
          </string-name>
          ,
          <article-title>Air quality prediction by machine learning models: A predictive study on the Indian coastal city of Visakhapatnam</article-title>
          ,
          <source>Chemosphere</source>
          <volume>338</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1016/j.chemosphere.
          <year>2023</year>
          .
          <volume>139518</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Samad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garuda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Vogt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Air pollution prediction using machine learning techniques - an approach to replace existing monitoring stations with virtual monitoring stations</article-title>
          ,
          <source>Atmospheric Environment</source>
          <volume>310</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1016/j.atmosenv.
          <year>2023</year>
          .
          <volume>119987</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>