<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Personal Air Quality Index Prediction Using Inverse Distance Weighting Method</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Trung-Quan Nguyen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dang-Hieu Nguyen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Loc Tai Tan Nguyen</string-name>
          <email>locntt.12@grad.uit.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The need to know the personal air pollution data is vital because it
is better to provide each individual with regional air quality data,
which seems to be more accurate than the global data measured
from far away weather stations. The problem is finding a suitable
method to predict air quality data in a local area from the global
data. This paper reports our solution to tackle this challenge.</p>
      <p>
        To know more about this challenge and the dataset that we will
use, you can refer to the overview paper of MediaEval 2020 - Insight
for Wellbeing: Multimodal personal health lifelog data analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK 2 3</title>
      <p>
        The inverse distance weighting method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is used commonly in
spatial interpolation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This paper will apply the basic form of
IDW without any modification.
      </p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
      <p>
        Due to the limited time available for experimenting with algorithms
requiring more time to train data, such as neural network-related
algorithms, we choose the IDW. Moreover, because there are no
statistical assumptions involved [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is simpler than Kriging or
other statistical interpolation methods. The way it works is easy to
understand. Based on the assumption that closer points will have
similar values than further points, it will use the measured values
surrounding the unknown point to predict the value. By giving
each known point a weight, the predicted value will be the average
of those points.
      </p>
      <p>The weight  for a known point  is the inverse of the distance
 from that point to the unknown point  , which is computed as:
 =</p>
      <p>1
 (,  )
 ( ) =
Í
=1  
Í
=1 
(1)
(2)
with  is the power value that is used to control the value of the
weight. It should be noticed that the Haversine method is used to
calculate the distance between the two coordinates.</p>
      <p>The value  of an unknown point  is calculated as:
with  is the weight,  is the value of the known point ℎ .
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Prediction</title>
      <p>At first, all possible time frame in hour-interval is listed by grouping
the training data. Then, we start to loop through the training data
per time frame.</p>
      <p>In each loop, we get the coordinates of all unknown points that
need to be predicted. After that, we get the values of the known
points and their respective coordinates from the public air pollution
data provided by 26 weather stations surrounding the Tokyo area
also in that time frame.</p>
      <p>With all the necessary data gathered, we can use the IDW
formula to make the prediction. Please note that the initial power
value  of the IDW formula is 2.</p>
      <p>After repeating those steps for each air pollutant data (PM2.5,
NO2, O3), we have the final results.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Optimization</title>
      <p>To have the best performance, we could find the optimal value
of power value p by trying diferent values of  until the IDW
produces acceptable values of SMAPE/RMSE/MAE.</p>
      <p>After evaluating the -value ranges from 0 to 5, we find that
the best power values for PM2.5, NO2, and O3 are 1.5, 3.5, and 0,
respectively.
4</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS AND ANALYSIS</title>
      <p>The evaluation of PM2.5, NO2, O3, and AQI prediction provided by
MediaEval task organizers are shown in Table 1, Table 2, Table 3,
and Table 4, respectively.</p>
      <p>In general, PM2.5 prediction is acceptable, but there is a big gap
in NO2 and O3 prediction results. It is mainly because the IDW
formula does not have any ofset parameters to compensate for the
big diference between weather stations’ public weather data and
the one carried out by personal equipment used by volunteers. This
could be because of some diferences in methods and devices of
those two data providers.
5.190319201
3.720370835
1.619832154
2.874009812
3.233921439
1.695290448
6.465190052
4.815504659
8.732748788
5.511739014
2.095919331
4.055352722
4.341966928
1.707219278
9.724716828
7.436923815
0.45931373
0.406428735
0.133032135
0.35371517
0.468214919
0.625317245
0.444137991
0.400557289
30.15104
13.80071
18.85267
12.69285
11.92978
14.99076
12.27167
7.664357
34.62797
18.2614
20.40416
16.3694
14.12164
15.85102
15.1809
9.571268
0.729989
0.399087
1.218212
0.411915
0.452494
0.562354
0.364154
0.257642
11.14697072
13.71316126
12.15603603
12.91552723
15.72452576
30.3013034
14.62686484
22.0919231
18.21506046
18.10474466
30.32401094
10.79848535
14.29939129
23.5094483
16.31585216
12.93598111
16.74763774
18.17918429
14.13207772
15.99672071
19.40818331
31.07255621
18.79131409
31.69232972</p>
      <p>SMAPE
0.474838877
0.595873229
0.554840783
0.53328839
0.728461886
1.600495059
0.490170718
0.58440423
RMSE</p>
      <p>
        SMAPE
0.496721967
0.49921946
0.311432437
0.389208159
0.44466795
0.521219253
0.4097449
0.378573048
We intend to explore more advanced algorithms in our future work,
such as the advanced form of IDW [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the combination of IDW
with multiple regression. Also, we plan to utilize more weather
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Zhao P. J Nguyen N.T. Nguyen T.B. Dang-Nguyen D. T. Gurrin C. Dao</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Overview of MediaEval 2020: Insights for Wellbeing Task - Multimodal Personal Health Lifelog Data Analysis</article-title>
          .
          <source>In MediaEval Benchmarking Initiative for Multimedia Evaluation, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Leonardo</given-names>
            <surname>Ramos</surname>
          </string-name>
          Emmendorfer and Graçaliz Pereira Dimuro.
          <year>2020</year>
          .
          <article-title>A Novel Formulation for Inverse Distance Weighting from Weighted Linear Regression</article-title>
          .
          <source>In Computational Science - ICCS</source>
          <year>2020</year>
          ,
          <string-name>
            <surname>Valeria</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Krzhizhanovskaya</surname>
          </string-name>
          , Gábor Závodszky,
          <string-name>
            <surname>Michael H. Lees</surname>
            ,
            <given-names>Jack J.</given-names>
          </string-name>
          <string-name>
            <surname>Dongarra</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter M. A. Sloot</surname>
          </string-name>
          , Sérgio Brissos, and João Teixeira (Eds.). Springer International Publishing, Cham,
          <fpage>576</fpage>
          -
          <lpage>589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jin</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew D.</given-names>
            <surname>Heap</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors</article-title>
          .
          <source>Ecological Informatics</source>
          <volume>6</volume>
          ,
          <issue>3</issue>
          (
          <year>2011</year>
          ),
          <fpage>228</fpage>
          -
          <lpage>241</lpage>
          . https: //doi.org/10.1016/j.ecoinf.
          <year>2010</year>
          .
          <volume>12</volume>
          .003
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Donald</given-names>
            <surname>Shepard</surname>
          </string-name>
          .
          <year>1968</year>
          .
          <article-title>A Two-Dimensional Interpolation Function for Irregularly-Spaced Data</article-title>
          .
          <source>In Proceedings of the 1968 23rd ACM National Conference (ACM '68)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>517</fpage>
          -
          <lpage>524</lpage>
          . https://doi.org/10.1145/800186.810616
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>