1. Introduction

D. Zhora);

The Comparison of Machine Learning Algorithms for the Task of Weather and Air Pollution Forecasting

Anatoliy Doroshenko

doroshenkoanatoliy2@gmail.com 0 1

Dmitry Zhora

dmitry.zhora@gmx.com 0

Pavlo Ivanenko

Olena Yatsenko

0 0 Institute of Software Systems of the National Academy of Sciences of Ukraine , Glushkov Ave. 40, Kyiv, 03187 , Ukraine 1 National Technical University of Ukraine "Igor Sikorsky Kiev Polytechnic Institute" , Peremohy Ave, 37, Kyiv, 03056 , Ukraine

000 0 0001

The task of weather forecasting becomes more important under conditions of global warming. Similarly, the air pollution prediction has higher value when industrial enterprises neglect environmental pollution issues. This research demonstrates how hourly weather and air pollution data can be restructured for the forecasting up to 24 hours ahead, and studies the cross-influence of parameters as all of them represent the atmosphere as single object from physical world. The parameter differences calculated for different points in time are considered as additional inputs and outputs of machine learning model. The prediction accuracy is analyzed for twelve regression algorithms using popular metrics like MASE, R2 and MAE.

eol>machine learning regression algorithms weather forecasting air pollution forecasting 1

1. Introduction 2. Weather and Air Pollution Data

The weather and air pollution data were downloaded from the website openweathermap.org. This service allows to retrieve multiple atmospheric characteristics for arbitrary GPS coordinates. The main columns of this dataset for Kyiv city are shown below in Fig. 1. This table contains hourly data and 33,863 records overall, from Nov 25, 2020 to Oct 05, 2024.

Number of seconds elapsed since

1970-01-01T00:00:00 GMT Local date of measurement (Kyiv) Local hour from 0 to 23 (Kyiv) Air temperature in degrees Celsius Dew point in degrees Celsius Atmospheric pressure in millibars Air humidity as percentage Wind speed in meters per second Wind direction azimuth in degrees Sine of wind direction angle Cosine of wind direction angle Sky cloudiness as percentage CO pollution level in μg/m³ NO pollution level in μg/m³ NO2 pollution level in μg/m³ O3 pollution level in μg/m³ SO2 pollution level in μg/m³ NH3 pollution level in μg/m³ Dust pollution with particles less than 2.5 micrometers in μg/m³ Dust pollution with particles less than 10 micrometers in μg/m³ Sine value for daily cycle Cosine value for daily cycle Sine value for weekly cycle Cosine value for weekly cycle Sine value for monthly cycle Cosine value for monthly cycle Sine value for yearly cycle Cosine value for yearly cycle Despite this work accounts only for data from one city, the first UTC time column in Table 1 above is helpful to synchronize records from multiple locations. Correspondingly, the local date and time columns are important for customers. The air temperature and dew point are presented in degrees Celsius. The atmospheric pressure is measured in millibars (or hectopascals). The humidity and cloudiness are both represented as percentages.

The next subset of weather-related parameters are wind characteristics. The degrees are used typically to register wind direction. However, this format is not convenient for machine learning algorithms [ 1 ] due to the representation gap between 359° and 0°. One of the popular approaches for solving this problem is the usage of the sine and cosine of the corresponding angle [ 5 ]. These columns were calculated using an algorithm written in Python. The reverse transformation is also possible when forecasted values of wind sine and cosine are properly normalized. The wind speed is measured correspondingly in meters per second.

The air pollution levels for various indicators shown in Table 1 are measured in micrograms per cubic meter (μg/m³). Carbon monoxide stands out as the most significant pollutant due to its high concentration. The parameters LevelPM2 and LevelPM10 denote dust pollution with particles up to 2.5 and 10 micrometers, respectively. It's important to note that the PM10 value includes the PM2.5 level. The particles that are 2.5 micrometers or smaller are particularly harmful as they can directly enter the bloodstream. Mid-sized particles can easily pass through the airways and settle in the lungs. Lastly, particles larger than 10 micrometers are typically filtered out by the respiratory tract and do not reach the lungs.

The accuracy of the forecast can be enhanced by incorporating cyclical parameters [ 7 ], that are presented in the lower section of Table 1. For instance, the cosine of daily cycle represents the temperature and light variations between day and night. Likewise, the cosine of the yearly cycle captures the changes between winter and summer.

3. Data Imputation and Resampling

The weather dataset included all necessary records for the specified period. At the same time, the pollution data lacked 275 records and contained several negative and outlier values, which were removed. The missing entries were subsequently recalculated using the KNNImputer class [ 8 ].

The machine learning algorithms in the scikit-learn library [ 9 ] require that all input and output parameters be represented in separate columns. However, this structure is not ideal for time series forecasting, where past and future data vary by record number and occupy the same columns. So, the dataset was restructured for training and forecasting purposes, with additional weather and pollution parameters included. The suffix notation used is detailed in the example below. • • • •

Temperature-P1, the temperature in 1 hour … Temperature-P24, the temperature in 24 hours Temperature-M1, the temperature 1 hour ago …

Temperature-M24, the temperature 24 hours ago

Similarly, the dataset was augmented with parameter differences, as described in the list below. Strictly speaking this information is redundant, but the layout of samples in the multi-dimensional space can be different in relation to internal computations of regression algorithm [ 16 ]. • • • •

In time series slang, the two groups of parameters above are often referred to as lags and diffs. The periodic parameters do not need to be duplicated, as they precisely represent the moment in time for machine learning purposes. The dataset was divided into training and testing segments in an 80% to 20% ratio. All training data precede the testing records chronologically, with the split date being December 28, 2023.

In total, there are 8 weather parameters and 8 pollution parameters available for current hour. In particular, the feature WindAngle was excluded due to its discontinuous nature. If the past and future hours are considered then differences can be added. So, overall 16 weather and 16 pollution parameters can be used as inputs and outputs of a machine learning algorithm. When the whole 24-hour history is taken into account and periodic parameters are added the total number of inputs becomes 8 + 8 + (16 + 16) * 24 + 8 = 792. Thus, the total number of possible input combinations is 2792. Clearly, this work does not attempt to explore this combinatorial space and aims to use more affordable approaches to optimize the forecasting accuracy.

4. Regression Performance Metrics

The mean absolute scaled error (MASE) is regarded as a superior alternative to the mean absolute percentage error (MAPE). A major drawback of the MAPE metric is that it can produce excessively large values when the dataset includes samples that are near zero. A classic example of this issue is temperature measured in degrees Celsius.

The main idea behind MASE metric is to compare the performance of a regression algorithm to naïve forecast approach when the current value of time series is used as a forecast for next step. This is also called as null hypothesis in the terminology of capital markets. So, here’s the formula that implements this approach.

Here designates the number of records in the test set, – the number of steps the forecast is made for, – the actual component output value from the test set, – the predicted component output value. The numerator represents mean absolute error, and denominator represents the error of naïve forecast. As can be concluded from the formula, the MASE metric is higher than or equal to 0. The lower its value the more accurate predictions were made. The forecast can be considered as successful when MASE metric is lower than 1. Correspondingly, when MASE value is higher than 1 the forecast cannot be considered as useful, and regression algorithm performs even worse than naïve method. The algorithm that calculates MASE metric is presented in Appendix A.

Another popular metric for regression tasks is R2 score, also called as determination coefficient. It has some similarities with a correlation coefficient in the interpretation aspects. Nevertheless, the calculation formula is different. (1) (2)

Here designates the mean value for actual component output from the test set. The higher the value of R2 score the better, its maximum possible value is 1 for precise forecast. If R2 score is higher than 0 the prediction can be considered as successful. If it is lower than 0 than forecast is rather harmful and its results better be avoided.

The mean absolute error (MAE) is the simplest metric. It is convenient for field engineers as its values are represented in corresponding measurement units, so that it is easy to verify if the error matches the real-world constraints. The calculation formula for MAE error is presented below. (3)

As demonstrated in Table 1, up to 16 parameters can be selected as the outputs of a regression algorithm. Meanwhile, this research does not attempt to address the multi-objective optimization problem. All parameters of the machine learning algorithm are optimized solely to minimize the sum of MASE metrics for individual output parameters.

5. Prediction of Combined Outputs

The evaluation of input features was accomplished with ExtraTreesRegressor algorithm [ 12 ] from scikit-learn library [ 9 ]. It has limited number of hyperparameters to tune and provides the array of feature importances that enable individual feature selection.

The starting point of this research is to employ a single machine learning model that forecasts all 16 output parameters. The users are typically interested in all forecast ranges from 1 hour and up to 24 hours ahead. In order to reduce the computational burden and balance the quality of shortterm and long-term forecasting it was desided to tune the model initially for 12-hour forecasting.

The MASE metric dependencies on the history length in hours are illustrated in Figure 2. It is evident that difference inputs noticeably improve the quality of prediction. Additionally, periodic parameters are quite important for shorter history. Nevetheless, the best results were achieved with a 13-hour history and without periodic parameters. Below are the lists representing inputoutput configuration for this scenario (400 inputs vs 16 outputs).

Input features: ['Temperature', 'DewPoint', 'Pressure', 'Humidity', 'WindSpeed', 'WindSine', 'WindCosine', 'CloudLevel', 'LevelCO', 'LevelNO', 'LevelNO2', 'LevelO3', 'LevelSO2', 'LevelNH3', 'LevelPM2', 'LevelPM10', 'Temperature-M1', 'DewPoint-M1', 'Pressure-M1', 'Humidity-M1', 'WindSpeed-M1', 'WindSine-M1', 'WindCosine-M1', 'CloudLevel-M1', 'LevelCO-M1', 'LevelNO-M1', 'LevelNO2-M1', 'LevelO3-M1', 'LevelSO2-M1', 'LevelNH3-M1', 'LevelPM2-M1', 'LevelPM10-M1', 'Temperature-Diff-M1', 'DewPoint-Diff-M1', 'Pressure-Diff-M1', 'Humidity-Diff-M1', 'WindSpeed-Diff-M1', 'WindSine-Diff-M1', 'WindCosine-Diff-M1', 'CloudLevel-Diff-M1', 'LevelCO-Diff-M1', 'LevelNO-Diff-M1', 'LevelNO2-Diff-M1', 'LevelO3-Diff-M1', 'LevelSO2-Diff-M1', 'LevelNH3-Diff-M1', 'LevelPM2-Diff-M1', 'LevelPM10-Diff-M1', ... , 'Temperature-M13', 'DewPoint-M13', 'Pressure-M13', 'Humidity-M13', 'WindSpeed-M13', 'WindSine-M13', 'WindCosine-M13', 'CloudLevel-M13', 'LevelCO-M13', 'LevelNO-M13', 'LevelNO2-M13', 'LevelO3-M13', 'LevelSO2-M13', 'LevelNH3-M13', 'LevelPM2-M13', 'LevelPM10-M13', 'Temperature-Diff-M13', 'DewPoint-Diff-M13', 'Pressure-Diff-M13', 'Humidity-Diff-M13', 'WindSpeed-Diff-M13', 'WindSine-Diff-M13', 'WindCosine-Diff-M13', 'CloudLevel-DiffM13', 'LevelCO-Diff-M13', 'LevelNO-Diff-M13', 'LevelNO2-Diff-M13', 'LevelO3-Diff-M13', 'LevelSO2-Diff-M13', 'LevelNH3-Diff-M13', 'LevelPM2-Diff-M13', 'LevelPM10-Diff-M13'] Output features: ['Temperature-P12', 'DewPoint-P12', 'Pressure-P12', 'Humidity-P12', 'WindSpeed-P12', 'WindSine-P12', 'WindCosine-P12', 'CloudLevel-P12', 'LevelCO-P12', 'LevelNO-P12', 'LevelNO2-P12', 'LevelO3-P12', 'LevelSO2-P12', 'LevelNH3-P12', 'LevelPM2-P12', 'LevelPM10-P12']

The performance of this input model for different forecast ranges is demonstrated in Figure 4. The R2 score is more relevant in this case, and the best results were obtained for 1-hour forecasting. As shown in Equation 1, the MASE metric depends on the forecast range, making the comparison of nearby samples unfair. This dependency is presented here for illustrative purposes.

The feature importances calculated by ExtraTreesRegressor class for a full 24-hour history with periodic parameters are presented in Figure 5. It appears that cloudiness and CO concentration are the most predictive parameters. Additionally, the cosine representation of yearly and daily cycles are quite important.

6. Prediction of Weather Outputs

While preserving the same input features there is a way to split output parameters on weather and air pollution groups. The MASE metrics for the forecasting of weather parameters are shown above in Figure 6. The best results were obtained again for 12-hour history and without periodic parameters, and this is an improvement in relation to combined forecast.

Testing mean scaled error(s) (MASE): [0.37630252 0.98280471 1.15055085 0.37381643 0.79749537 1.05510771 1.09868261 1.05295743], sum = 6.887717634

7. Prediction of Pollution Outputs

The MASE metrics for the prediction of pollution parameters are shown below in Figure 9. The best results were obtained for 17-hour history with differences and with periodic parameters.

Testing mean scaled error(s) (MASE): [0.85229725 1.02571192 0.75076816 0.49912568 0.75825543 0.76679182 1.01558379 1.0260866], sum = 6.694620651 And this is another improvement in comparison to the combined forecast. Regarding the shape of MASE graph, there is a general rule that initially the prediction accuracy improves when more useful information is provided to machine learning algorithm. However, when parameters become redundant or start introducing the noise into the system the forecast quality decreases.

As for input feature selection, there is a possibility to select the most important features using SelectFromModel class [ 15 ]. At the same time, this research is particularly difficult for weather and air pollution datasets and it did not become the part of this article.

8. Comparison of Regression Algorithms

Once the split of output parameters allowed to improve the prediction accuracy, it makes sense to consider forecasting of a single output. Besides, this can be done using other regression algorithms available in scikit-learn library, the MASE metrics obtained are presented in Table 2. Multi-Layer Perceptron Nearest Neighbors Ada Boost Regressor

The prediction accuracy has been improved again. The hyperparameters for machine learning algorithms listed in a table were manually optimized and they are available in Appendix B. As for R2 scores and MAE metrics for the same experiments they are presented in Appendices C and D.

It was quite expected that decision tree based ensemble methods would take top of the chart. The negative surprises are that KNeighborsRegressor provided poor results and AdaBoostRegressor failed to forecast many output characteristics. The positive surprise is that Support Vector Machine (class NuSVR) took second place. However, this was achieved at the cost of high training time that takes tens of minutes on 8-core machine.

The winner algorithm for this dataset is GradientBoostingRegressor, its training time for every model takes about 5 minutes. The HistGradientBoostingRegressor provides similar results, but runs much faster, its training time is about 5 seconds per model. As for ExtraTreesRegressor, the time to train the model is also short and takes tens of seconds.

The linear methods occupy the middle of the list and this emphasizes the complexity of current task. It is quite unexpected that linear regression outperforms classic machine learning instruments like DecisionTreeRegressor and Multi-Layer Perceptron with quasi-Newton optimizer.

The prediction accuracy is not the only factor for selection of machine learning model. Other factors include the training time and the size of the serialized model on the disk. These aspects become especially important in cloud environments. Additionally, for selecting an input-output model that requires many iterations to complete, faster algorithms are preferred.

9. Prediction of Parameter Differences

So far the parameter differences were used only as inputs. At the same time, the differences can be forecasted the same way as direct parameters. The future value of a parameter can be calculated as the sum of current parameter value and difference forecasted.

The table 3 below compares these two approaches. Because of Equations 1 and 2 the MASE and R2 metrics are not directly comparable. However, the MAE error for differences is calculated using equivalent formula, and this metric allows to compare the forecasting accuracy. It appears, that the forecast of differences provides an improvement for many weather parameters and some pollution parameters. And this happens more often for characteristics with good predictability.

Prediction Type, Metric Temperature-P12 DewPoint-P12 Pressure-P12 Humidity-P12 Prediction Type Direct Forecast, MASE

Difference Forecast, MASE Direct Forecast, R2 Difference Forecast, R2 Direct Forecast, MAE Difference Forecast, MAE

Prediction Type Direct Forecast, MASE

Difference Forecast, MASE Direct Forecast, R2 Difference Forecast, R2 Direct Forecast, MAE Difference Forecast, MAE

Prediction Type Direct Forecast, MASE

Difference Forecast, MASE Direct Forecast, R2 Difference Forecast, R2 Direct Forecast, MAE Difference Forecast, MAE This work proposes modern approaches for the forecasting of weather and air pollution parameters that define input history length, output parameter configuration and selection of machine learning algorithm. The best results were obtained for GradientBoostingRegressor class.

The usage of differences both on input and output sides of the algorithm helps to improve the results. The forecasting accuracy varies a lot for different output parameters. In particular, wind, cloudiness and air pollution characteristics are quite difficult to predict.

The selection of output parameters has significant influence on the accuracy of the algorithm. And the best results were obtained when individual machine learning model was trained for every output feature. Correspondingly, the selection of single multi-output regression algorithm is not the optimal choice. As expected, better results require more computational resources.

Declaration on Generative AI The author(s) have not employed any Generative AI tools. A. Appendix: MASE Metric

The function to calculate the mean absolute scaled error is missing in version 1.6 of scikit-learn library, so one of the options is to implement it manually. def mean_absolute_scaled_error(dataset_outputs, \ predicted_dataset_outputs, multioutput = 'raw_values', forecast_range = 1): assert multioutput == 'raw_values', "Only multi-output mode is supported for now" if (isinstance(dataset_outputs, pandas.DataFrame)):

dataset_outputs = dataset_outputs.to_numpy() if (isinstance(predicted_dataset_outputs, pandas.DataFrame)):

predicted_dataset_outputs = predicted_dataset_outputs.to_numpy() if (len(dataset_outputs.shape) == 1):

dataset_outputs = numpy.array([[number] for number in dataset_outputs]) if (len(predicted_dataset_outputs.shape) == 1): predicted_dataset_outputs = numpy.array \

([[number] for number in predicted_dataset_outputs]) record_count = dataset_outputs.shape[0] assert record_count == predicted_dataset_outputs.shape[0], \

"The original and predicted dataset outputs should have the same record count" column_count = dataset_outputs.shape[ 1 ] assert column_count == predicted_dataset_outputs.shape[ 1 ], \

"The original and predicted dataset outputs should have the same column count" assert record_count > forecast_range, \

"The number of dataset records should be higher than forecast range" scaled_errors = [] for j in range(0, column_count): naive_prediction_mismatch = 0.0 for i in range (forecast_range, record_count): diff = dataset_outputs[i, j] - dataset_outputs[i - forecast_range, j] naive_prediction_mismatch += abs(diff) mase_denominator = naive_prediction_mismatch / (record_count - forecast_range) current_prediction_mismatch = 0.0 for i in range(0, record_count): diff = predicted_dataset_outputs[i, j] - dataset_outputs[i, j] current_prediction_mismatch += abs(diff) mase_numerator = current_prediction_mismatch / record_count scaled_error = mase_numerator / mase_denominator scaled_errors.append(scaled_error) return numpy.array(scaled_errors)

B. Appendix: Hyperparameters

The Python-based expressions below represent the constructors of regression algorithm objects with corresponding hyperparameters, random number generation and parallelization settings. ExtraTreesRegressor(n_estimators = 100, criterion = 'squared_error',

ccp_alpha = 0.0, random_state = 1, n_jobs = 8) RandomForestRegressor(n_estimators = 100, criterion = 'squared_error', max_features = 0.2, min_samples_split = 6, ccp_alpha = 0.0, random_state = 1, n_jobs = 8) HistGradientBoostingRegressor(loss = 'squared_error', learning_rate = 0.1, max_iter = 100, min_samples_leaf = 20, l2_regularization = 0.1, random_state = 1) GradientBoostingRegressor(loss = 'huber', learning_rate = 0.15, n_estimators = 100, subsample = 0.9, criterion = 'friedman_mse', max_depth = 5, alpha = 0.85, random_state = 1) AdaBoostRegressor(estimator = initial_estimator,

n_estimators = 100, loss = 'linear', random_state = 1) DecisionTreeRegressor(criterion = 'squared_error', max_depth = 7,

min_samples_leaf = 2, min_weight_fraction_leaf = 0.011, random_state = 1) KNeighborsRegressor(n_neighbors = 24, weights = 'distance',

algorithm = 'auto', p = 1, metric='minkowski', n_jobs = 8) NuSVR(nu = 0.8, C = 1000.0, kernel = 'rbf') MLPRegressor(hidden_layer_sizes = (200,), activation = 'relu',

solver = 'lbfgs', alpha = 0.0000, max_iter = 1000, random_state = 1) ElasticNet(alpha = 0.01, l1_ratio = 0.01, fit_intercept = True, precompute = True, max_iter = 1000, tol = 0.001, selection='cyclic', random_state = 1) Ridge(alpha = 1.0, fit_intercept = True, solver = 'svd', random_state = 1) LinearRegression(fit_intercept = True, n_jobs = 8)

C. Appendix: R2 Scores

The R2 scores below were calculated for experiments covered in section 8, when the machine learning algorithm had just one output parameter configured. The best algorithm according to this metric is still gradient boosting regressor.

Table 4a: R2 scores obtained for weather parameters and 12-hour forecasting.

Regression Algorithm Temperature-P12 DewPoint-P12 Pressure-P12 Gradient Boosting

Support Vector Machine Histo-Gradient Boosting Extra Trees Regressor Random Forest Regressor Elastic Net Regression Linear Regression Bayes Ridge Regression Decision Tree Regressor Multi-Layer Perceptron Nearest Neighbors Ada Boost Regressor

Regression Algorithm LevelSO2-P12 LevelNH3-P12 LevelPM2-P12 LevelPM10-P12 D. Appendix: MAE Results

The MAE errors below were calculated for experiments covered in section 8, when the machine learning algorithm had just one output parameter configured. The measurement units correspond to original parameters listed in Table 1. 16.238159 16.389989 17.017623 17.189998 17.594297 17.661253 17.667499 17.671916 18.572952 18.063861 18.868328 48.326031

Regression Algorithm

[1]

C. M.

Bishop , Pattern Recognition and Machine Learning , Springer, New York, NY, 2006 .

[2]

Haykin , Neural Networks : A Comprehensive Foundation , Prentice Hall, Hoboken, NJ, 1998 .

[3]

V. N.

Vapnik , Statistical Learning Theory, Wiley, Hoboken, NJ, 1998 .

[4]

Shah ,

Hands-On Introduction to Machine Learning, 1st. ed., Cambridge University Press, Cambridge, 2023 .

[5] Machine learning methods for predicting wind generation , Electricity Authority Te Mana Hiko, Wellington , 2022 . URL: https://www.ea.govt.nz/documents/2385/Machine-learningmethods -for-predicting-wind-generation_MkxN3ZL .pdf.

[6]

Levinson , Three approaches to encoding time information as features for ML models , Nvidia Developer Technical Blog , 2022 , URL: https://developer.nvidia.com/blog/three -approaches-toencoding-time-information-as-features-for-ml-models/.

[7] A. Van Wyk , Encoding cyclical features for deep learning , URL: https://www.kaggle.com/code /avanwyk/encoding -cyclical-features-for-deep-learning.

[8] Scikit-learn: imputation of missing values . URL: https://scikit-learn.org/stable/modules /impute.html.

[9] Scikit-learn: machine learning in Python . URL: https://scikit-learn.org/stable/.

[10] Skforecast: a Python library for time series forecasting . URL: https://skforecast.org/0.14.0 /index.html.

[11] Mlforecast

scalable machine learning for time series forecasting . URL: https://nixtlaverse.nixtla.io/mlforecast/index.html.

[12] Scikit-learn: ExtraTreesRegressor . URL: https://scikit-learn.org/stable/modules/generated /sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor.

[13]

Wohlwend , Regression model evaluation metrics: R-squared, adjusted R-squared, MSE, RMSE, and MAE , 2023 . URL: https://medium.com/@brandon93.w/regression -modelevaluation-metrics-r-squared-adjusted-r-squared-mse-rmse-and-mae-24dcc0e4cbd3.

[14] Feature selection with scikit-learn library . URL: https://scikit-learn.org/stable/modules /feature_selection.html.

[15] Scikit-learn: SelectFromModel class . URL: https://scikit-learn.org/stable/modules/generated /sklearn.feature_selection.SelectFromModel.html.

[16]

Sande , Get started with time series forecasting in Python, 2020 . URL: https://medium.com /analytics-vidhya/ get-started-with-time-series-forecasting-in-python-c8ca78ee84a5.

[17]

Ravindiran , G. Hayder,

Kanagarathinam ,

Alagumalai ,

Sonne , Air quality prediction by machine learning models: A predictive study on the Indian coastal city of Visakhapatnam , Chemosphere 338 ( 2023 ). doi: 10 .1016/j.chemosphere. 2023 . 139518 .

[18]

Samad ,

Garuda ,

Vogt ,

Yang , Air pollution prediction using machine learning techniques - an approach to replace existing monitoring stations with virtual monitoring stations , Atmospheric Environment 310 ( 2023 ). doi: 10 .1016/j.atmosenv. 2023 . 119987 .