<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Personal Air Quality Index Prediction Using Inverse Distance Weighting Method</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Trung-Quan</forename><surname>Nguyen</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Dang-Hieu</forename><surname>Nguyen</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Loc</forename><surname>Tai</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Tan</forename><surname>Nguyen</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Information Technology</orgName>
								<address>
									<addrLine>Ho Chi Minh</addrLine>
									<settlement>City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Vietnam National University</orgName>
								<address>
									<addrLine>Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Personal Air Quality Index Prediction Using Inverse Distance Weighting Method</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">07AE8E6EFC9B8FBC5F2EEF370AC59789</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The need to know the personal air pollution data is vital because it is better to provide each individual with regional air quality data, which seems to be more accurate than the global data measured from far away weather stations. The problem is finding a suitable method to predict air quality data in a local area from the global data. This paper reports our solution to tackle this challenge.</p><p>To know more about this challenge and the dataset that we will use, you can refer to the overview paper of MediaEval 2020 -Insight for Wellbeing: Multimodal personal health lifelog data analysis <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>The inverse distance weighting method <ref type="bibr" target="#b3">[4]</ref> is used commonly in spatial interpolation <ref type="bibr" target="#b2">[3]</ref>. This paper will apply the basic form of IDW without any modification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">APPROACH</head><p>Due to the limited time available for experimenting with algorithms requiring more time to train data, such as neural network-related algorithms, we choose the IDW. Moreover, because there are no statistical assumptions involved <ref type="bibr" target="#b1">[2]</ref>, it is simpler than Kriging or other statistical interpolation methods. The way it works is easy to understand. Based on the assumption that closer points will have similar values than further points, it will use the measured values surrounding the unknown point to predict the value. By giving each known point a weight, the predicted value will be the average of those points.</p><p>The weight 𝑤 𝑖 for a known point 𝑖 is the inverse of the distance 𝑑 from that point to the unknown point 𝑥, which is computed as:</p><formula xml:id="formula_0">𝑤 𝑖 = 1 𝑑 (𝑥, 𝑥 𝑖 ) 𝑝 (1)</formula><p>with 𝑝 is the power value that is used to control the value of the weight. It should be noticed that the Haversine method is used to calculate the distance between the two coordinates.</p><p>The value 𝑦 of an unknown point 𝑥 is calculated as:</p><formula xml:id="formula_1">𝑦 (𝑥) = 𝑛 𝑖=1 𝑤 𝑖 𝑋 𝑖 𝑛 𝑖=1 𝑤 𝑖<label>(2)</label></formula><p>with 𝑤 𝑖 is the weight, 𝑋 𝑖 is the value of the known point 𝑖 𝑡ℎ .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Prediction</head><p>At first, all possible time frame in hour-interval is listed by grouping the training data. Then, we start to loop through the training data per time frame.</p><p>In each loop, we get the coordinates of all unknown points that need to be predicted. After that, we get the values of the known points and their respective coordinates from the public air pollution data provided by 26 weather stations surrounding the Tokyo area also in that time frame.</p><p>With all the necessary data gathered, we can use the IDW formula to make the prediction. Please note that the initial power value 𝑝 of the IDW formula is 2.</p><p>After repeating those steps for each air pollutant data (PM2.5, NO2, O3), we have the final results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Optimization</head><p>To have the best performance, we could find the optimal value of power value p by trying different values of 𝑝 until the IDW produces acceptable values of SMAPE/RMSE/MAE.</p><p>After evaluating the 𝑝-value ranges from 0 to 5, we find that the best power values for PM2.5, NO2, and O3 are 1.5, 3.5, and 0, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">RESULTS AND ANALYSIS</head><p>The evaluation of PM2.5, NO2, O3, and AQI prediction provided by MediaEval task organizers are shown in Table <ref type="table" target="#tab_0">1</ref>, Table <ref type="table" target="#tab_1">2</ref>, Table <ref type="table" target="#tab_2">3</ref>, and Table <ref type="table" target="#tab_3">4</ref>, respectively.</p><p>In general, PM2.5 prediction is acceptable, but there is a big gap in NO2 and O3 prediction results. It is mainly because the IDW formula does not have any offset parameters to compensate for the big difference between weather stations' public weather data and the one carried out by personal equipment used by volunteers. This could be because of some differences in methods and devices of those two data providers.</p><p>MediaEval'20, December 14-15 2020, Online Quan N.T. et al. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">DISCUSSION AND OUTLOOK</head><p>We intend to explore more advanced algorithms in our future work, such as the advanced form of IDW <ref type="bibr" target="#b3">[4]</ref>, the combination of IDW with multiple regression. Also, we plan to utilize more weather data, such as wind direction, wind speed, temperature, to improve accuracy.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Evaluation of the PM2.5 prediction</figDesc><table><row><cell>Sensor MAE</cell><cell>RMSE</cell><cell>SMAPE</cell></row><row><cell cols="3">100001 5.190319201 8.732748788 0.45931373</cell></row><row><cell cols="3">100002 3.720370835 5.511739014 0.406428735</cell></row><row><cell cols="3">100003 1.619832154 2.095919331 0.133032135</cell></row><row><cell cols="3">100005 2.874009812 4.055352722 0.35371517</cell></row><row><cell cols="3">100006 3.233921439 4.341966928 0.468214919</cell></row><row><cell cols="3">100007 1.695290448 1.707219278 0.625317245</cell></row><row><cell cols="3">200003 6.465190052 9.724716828 0.444137991</cell></row><row><cell cols="3">200004 4.815504659 7.436923815 0.400557289</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Evaluation of the NO2 prediction</figDesc><table><row><cell>Sensor MAE</cell><cell>RMSE</cell><cell>SMAPE</cell></row><row><cell cols="3">100001 30.15104 34.62797 0.729989</cell></row><row><cell cols="2">100002 13.80071 18.2614</cell><cell>0.399087</cell></row><row><cell cols="3">100003 18.85267 20.40416 1.218212</cell></row><row><cell cols="2">100005 12.69285 16.3694</cell><cell>0.411915</cell></row><row><cell cols="3">100006 11.92978 14.12164 0.452494</cell></row><row><cell cols="3">100007 14.99076 15.85102 0.562354</cell></row><row><cell cols="2">200003 12.27167 15.1809</cell><cell>0.364154</cell></row><row><cell cols="3">200004 7.664357 9.571268 0.257642</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Evaluation of the O3 prediction</figDesc><table><row><cell>Sensor MAE</cell><cell>RMSE</cell><cell>SMAPE</cell></row><row><cell cols="3">100001 11.14697072 16.74763774 0.474838877</cell></row><row><cell cols="3">' 100002 13.71316126 18.17918429 0.595873229</cell></row><row><cell cols="3">100003 12.15603603 14.13207772 0.554840783</cell></row><row><cell cols="3">100005 12.91552723 15.99672071 0.53328839</cell></row><row><cell cols="3">100006 15.72452576 19.40818331 0.728461886</cell></row><row><cell>100007 30.3013034</cell><cell cols="2">31.07255621 1.600495059</cell></row><row><cell cols="3">200003 14.62686484 18.79131409 0.490170718</cell></row><row><cell>200004 22.0919231</cell><cell cols="2">31.69232972 0.58440423</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Evaluation of the AQI prediction</figDesc><table><row><cell>Sensor MAE</cell><cell>RMSE</cell><cell>SMAPE</cell></row><row><cell cols="3">100001 18.21506046 34.20371647 0.496721967</cell></row><row><cell cols="2">100002 18.10474466 38.8695944</cell><cell>0.49921946</cell></row><row><cell cols="2">100003 30.32401094 78.4465017</cell><cell>0.311432437</cell></row><row><cell cols="2">100005 10.79848535 19.6665506</cell><cell>0.389208159</cell></row><row><cell cols="3">100006 14.29939129 34.48844262 0.44466795</cell></row><row><cell>100007 23.5094483</cell><cell cols="2">60.19537217 0.521219253</cell></row><row><cell cols="3">200003 16.31585216 22.42326978 0.4097449</cell></row><row><cell cols="2">200004 12.93598111 19.188617</cell><cell>0.378573048</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of MediaEval 2020: Insights for Wellbeing Task -Multimodal Personal Health Lifelog Data Analysis</title>
		<author>
			<persName><forename type="first">P</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Dang-Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gurrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Dao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval Benchmarking Initiative for Multimedia Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A Novel Formulation for Inverse Distance Weighting from Weighted Linear Regression</title>
		<author>
			<persName><forename type="first">Leonardo</forename><surname>Ramos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Emmendorfer</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Graçaliz</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dimuro</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Science -ICCS 2020</title>
				<editor>
			<persName><forename type="first">Valeria</forename><forename type="middle">V</forename><surname>Krzhizhanovskaya</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Gábor</forename><surname>Závodszky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Michael</forename><forename type="middle">H</forename><surname>Lees</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Jack</forename><forename type="middle">J</forename><surname>Dongarra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Peter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Sérgio</forename><surname>Sloot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">João</forename><surname>Brissos</surname></persName>
		</editor>
		<editor>
			<persName><surname>Teixeira</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="576" to="589" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors</title>
		<author>
			<persName><forename type="first">Jin</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">D</forename><surname>Heap</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ecoinf.2010.12.003</idno>
		<ptr target="https://doi.org/10.1016/j.ecoinf.2010.12.003" />
	</analytic>
	<monogr>
		<title level="j">Ecological Informatics</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="228" to="241" />
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Two-Dimensional Interpolation Function for Irregularly-Spaced Data</title>
		<author>
			<persName><forename type="first">Donald</forename><surname>Shepard</surname></persName>
		</author>
		<idno type="DOI">10.1145/800186.810616</idno>
		<ptr target="https://doi.org/10.1145/800186.810616" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1968 23rd ACM National Conference (ACM &apos;68)</title>
				<meeting>the 1968 23rd ACM National Conference (ACM &apos;68)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="1968">1968</date>
			<biblScope unit="page" from="517" to="524" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
