INTRODUCTION

Personal Air Quality Index Prediction Using Inverse Distance Weighting Method

Trung-Quan Nguyen

Dang-Hieu Nguyen

Loc Tai Tan Nguyen

locntt.12@grad.uit.edu.vn 0 0 Vietnam National University , Ho Chi Minh City , Vietnam

2020

14 15

In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.

INTRODUCTION

The need to know the personal air pollution data is vital because it is better to provide each individual with regional air quality data, which seems to be more accurate than the global data measured from far away weather stations. The problem is finding a suitable method to predict air quality data in a local area from the global data. This paper reports our solution to tackle this challenge.

To know more about this challenge and the dataset that we will use, you can refer to the overview paper of MediaEval 2020 - Insight for Wellbeing: Multimodal personal health lifelog data analysis [ 1 ].

RELATED WORK 2 3

The inverse distance weighting method [ 4 ] is used commonly in spatial interpolation [ 3 ]. This paper will apply the basic form of IDW without any modification.

APPROACH

Due to the limited time available for experimenting with algorithms requiring more time to train data, such as neural network-related algorithms, we choose the IDW. Moreover, because there are no statistical assumptions involved [ 2 ], it is simpler than Kriging or other statistical interpolation methods. The way it works is easy to understand. Based on the assumption that closer points will have similar values than further points, it will use the measured values surrounding the unknown point to predict the value. By giving each known point a weight, the predicted value will be the average of those points.

The weight for a known point is the inverse of the distance from that point to the unknown point , which is computed as: =

1 (, ) ( ) = Í =1 Í =1 (1) (2) with is the power value that is used to control the value of the weight. It should be noticed that the Haversine method is used to calculate the distance between the two coordinates.

The value of an unknown point is calculated as: with is the weight, is the value of the known point ℎ . 3.1

Prediction

At first, all possible time frame in hour-interval is listed by grouping the training data. Then, we start to loop through the training data per time frame.

In each loop, we get the coordinates of all unknown points that need to be predicted. After that, we get the values of the known points and their respective coordinates from the public air pollution data provided by 26 weather stations surrounding the Tokyo area also in that time frame.

With all the necessary data gathered, we can use the IDW formula to make the prediction. Please note that the initial power value of the IDW formula is 2.

After repeating those steps for each air pollutant data (PM2.5, NO2, O3), we have the final results. 3.2

Optimization

To have the best performance, we could find the optimal value of power value p by trying diferent values of until the IDW produces acceptable values of SMAPE/RMSE/MAE.

After evaluating the -value ranges from 0 to 5, we find that the best power values for PM2.5, NO2, and O3 are 1.5, 3.5, and 0, respectively. 4

RESULTS AND ANALYSIS

The evaluation of PM2.5, NO2, O3, and AQI prediction provided by MediaEval task organizers are shown in Table 1, Table 2, Table 3, and Table 4, respectively.

In general, PM2.5 prediction is acceptable, but there is a big gap in NO2 and O3 prediction results. It is mainly because the IDW formula does not have any ofset parameters to compensate for the big diference between weather stations’ public weather data and the one carried out by personal equipment used by volunteers. This could be because of some diferences in methods and devices of those two data providers. 5.190319201 3.720370835 1.619832154 2.874009812 3.233921439 1.695290448 6.465190052 4.815504659 8.732748788 5.511739014 2.095919331 4.055352722 4.341966928 1.707219278 9.724716828 7.436923815 0.45931373 0.406428735 0.133032135 0.35371517 0.468214919 0.625317245 0.444137991 0.400557289 30.15104 13.80071 18.85267 12.69285 11.92978 14.99076 12.27167 7.664357 34.62797 18.2614 20.40416 16.3694 14.12164 15.85102 15.1809 9.571268 0.729989 0.399087 1.218212 0.411915 0.452494 0.562354 0.364154 0.257642 11.14697072 13.71316126 12.15603603 12.91552723 15.72452576 30.3013034 14.62686484 22.0919231 18.21506046 18.10474466 30.32401094 10.79848535 14.29939129 23.5094483 16.31585216 12.93598111 16.74763774 18.17918429 14.13207772 15.99672071 19.40818331 31.07255621 18.79131409 31.69232972

SMAPE 0.474838877 0.595873229 0.554840783 0.53328839 0.728461886 1.600495059 0.490170718 0.58440423 RMSE

SMAPE 0.496721967 0.49921946 0.311432437 0.389208159 0.44466795 0.521219253 0.4097449 0.378573048 We intend to explore more advanced algorithms in our future work, such as the advanced form of IDW [ 4 ], the combination of IDW with multiple regression. Also, we plan to utilize more weather

[1] Zhao P. J Nguyen N.T. Nguyen T.B. Dang-Nguyen D. T. Gurrin C. Dao , M. S. 2020 . Overview of MediaEval 2020: Insights for Wellbeing Task - Multimodal Personal Health Lifelog Data Analysis . In MediaEval Benchmarking Initiative for Multimedia Evaluation, CEUR Workshop Proceedings.

[2]

Leonardo

Ramos Emmendorfer and Graçaliz Pereira Dimuro. 2020 . A Novel Formulation for Inverse Distance Weighting from Weighted Linear Regression . In Computational Science - ICCS 2020 , Valeria

Krzhizhanovskaya , Gábor Závodszky, Michael H. Lees , Jack J. Dongarra , Peter M. A. Sloot , Sérgio Brissos, and João Teixeira (Eds.). Springer International Publishing, Cham, 576 - 589 .

[3]

Jin

Li and

Andrew D.

Heap . 2011 . A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors . Ecological Informatics 6 , 3 ( 2011 ), 228 - 241 . https: //doi.org/10.1016/j.ecoinf. 2010 . 12 .003

[4]

Donald

Shepard . 1968 . A Two-Dimensional Interpolation Function for Irregularly-Spaced Data . In Proceedings of the 1968 23rd ACM National Conference (ACM '68) . Association for Computing Machinery , New York, NY, USA, 517 - 524 . https://doi.org/10.1145/800186.810616