Leveraging Egocentric and Surrounding Environment Data to Adaptively Measure a Personal Air Quality Index Dang-Hieu Nguyen1 , Minh-Tam Nguyen2 , Loc Tai Tan Nguyen3 , 1,2,3 University of Information Technology, VietNam hieund.12@grad.uit.edu.vn,tamnm.12@grad.uit.edu.vn,locntt.12@grad.uit.edu.vn ABSTRACT 2 METHODOLOGY This paper introduces a new solution for measuring the per- As mentioned above, environmental factors, weather vari- sonal air quality index that reflects the egocentric perspective ables, urban nature, and traffic impact on individuals. Ob- of human beings with their surrounding environment. Two serving the dataset provided by the subtask, we found that instances of the solution are introduced and evaluated by main streets with lots of traffic and fewer trees will have a using the MediaEval 2019 Insights for wellbeing task dataset low PAQI and vice versa. This observation gives a hint to and evaluation metric. The first instance calculates the Air propose the solution to measure PAQI using AQI, user’s tags, Quality Index (AQI) using sensors data, utilizes the user’s and visual features. Two instances of the solution are intro- tags and visual features to measure the personal AQI adap- duced and evaluated by using the MediaEval 2019 Insights tively. The second instance leverages the average value of for wellbeing task dataset and evaluation metric. The first the user’s tags and feature of the route to determine per- instance calculates the Air Quality Index (AQI) using sensors sonal AQI. The performance of these two instances is also data, utilizes the user’s tags and visual features to measure discussed. the personal AQI adaptively. The second instance leverages the average value of the user’s tags and feature of the route 1 INTRODUCTION to determine personal AQI. In [2], the author gives various evidence gathered from many reference sources and points out the impact of air pollution 2.1 Data Processing on individuals in many perspectives (health, psychology). First, data along each route are pre-processed to get rid of The mentioned pollution factors include environmental fac- noises and outliers. Necessary interpolations are conducted tors (e.g. fine particulate matter PM 2.5 , Nitrogen dioxide NO 2 , to compensate for missing data. Then, two instances (runs) Ozone O 3 , Sulfur dioxide SO 2 ), weather variables (e.g. tem- of the proposed solution are constructed as follows: perature, humidity), and urban nature, traffic. Unfortunately, Run 1: From the dataset, we can identify a group of users most of investigations on this domain focusing on measur- walking along a specific route. Since the 2018 dataset is ing the air quality index using sensors data regardless of recorded by seconds, we convert a recording time to the understanding how people feel of air qualification around minute to make sure the highest value of each factor within them. 1 minute is retained. Then we calculate AQI using these fac- MediaEval 2019 Insights for wellbeing task [1] introduces tors (e.g., PM 2.5 , NO 2 , O 3 ). Next, visual features are extracted an interesting subtask of measuring personal air quality in- from images. dex (PAQI). The PAQI is defined as the personal feeling of Run 2: We first collect all data in the same group, then we AQI comparing to the real AQI calculated by using sensors only keep data coordinating with user’s tags. Next, we divide data. The subtask requests to measure the PAQI using ego- each segment of one route into four smaller segments. This centric data (e.g., lifelog image, heartbeat, step counts, user’s task aims to have a segment as straight as possible so that the annotations) and surrounding environment data (e.g., air radius can sweep all the points tagged on the segment. Then, pollution, weather). we scan the radius with a radius by the distance between The definitions, dataset and evaluation metric of this sub- the small segments, if any of the tag points are within this task are described in [1]. range, we collect them and calculate the average value of Copyright 2019 for this paper by its authors. Use that user’s tags (e.g., assume the distance between line_start permitted under Creative Commons License Attribution and line_end is 100 m. We divide it into four road segment 4.0 International (CC BY 4.0). with 25 m each and get new 3 points in between). MediaEval’19, 27-29 October 2019, Sophia Antipolis, France MediaEval’19, 27-29 October 2019, Sophia Antipolis, France D.H. Nguyen et al. 2.2 Visual Features Extraction trail” feature same value with “Bayside path” feature. Table We use the visual features provided by the task’s organizers. 1 shows the weight of routes 1, 2, and 3 when running on Besides, we develop a tool that crawls images from Google the development dataset. Street View using coordinates provided in the dataset. This Table 1. The weight of features tool aims to enrich the image dataset. Finally, we extracted Feature Weight Feature Weight traffic and tree density from these images. Main street 0.33 Shopping street 0.5 Path 0.67 Underground arcade 0.25 2.3 PAQI Measurement Sightseeing 1.5 Garden 1.5 In this section, we use the data input obtains from section Street 0.5 Bayside path 2 2.1 corresponding to each run mentioned. Park 1.5 Mountain trail 2 2.3.1 Run 1: PAQI measurement is first calculated by We use the weight of routes in Table 1 to infer PAQI’s value using the AQI calculation formula. Then we use the user’s of routes 4 and 5 by the formula below: tags and traffic density, tree density to adjust the AQI values. PAQIoutput =w r · avд(user ′s taдs) (4) We build a function to adaptively adjust AQI into PAQI as follows: Where: PAQIoutput : is value predicted of routes 4 and 5.; w r : Ín (f actor i · α i ) is weight from Table 1.; avд(user ′staдs): the average user’s f (x) = i=1 (1) tags on routes 4 and 5. n Where: i=1 (α i ) = 1; f actor i : input data such as user’s Ín tags and visual feature. 3 RESULTS AND ANALYSIS The PAQI’s value are specified by: The experimental results running on the training dataset are denoted in Table 2. The results show we can measure PAQI PAQI =AQI · f (x) (2) with acceptable accuracy. Table 3 shows the results when Finally, we adjust the value of α i according to a route to running on the testing dataset. In Table 2 does not include get the final PAQI. The value of α i is calculated based on the the result of run 2 because we only obtain the weight of factors’ values. If the factors’ values are high, α i increases routes from Development data and use it to infer PAQI for and the PAQI is high. If factors’ values are low, α i decreases Testing data. and the PAQI is low. Table 2. Results running on the training dataset We set parameters as follows: f actor 1 ← user’s tags, ROUTE/ List of course groudtruth List of course run 1 f actor 2 ← traffic density, f actor 3 ← tree density, α 1 + α 2 + Course 1/ (1, 2, 3, 4, 1) 1,2,2,3,2 α 3 = 1. First, we define α 1 = α 2 = α 3 = 13 . Then we use ad- Course 2/(1,2,3,4,1) 1,2,2,4,2 hoc-based approach to calculate factors’ values and adjust Course 3/(2,1,2,0,3,3,2) 2,1,2,2,4,2,2 the values of α corresponding with each factor. With f actor 1 , if its value is larger than the predefined threshold (2.5 in our Table 3. Results running on the testing dataset case), α 1 increases, otherwise α 1 decreases. With f actor 2 , Group_id Subtask_id Run_id Score if its value is high, α 2 decreases, otherwise, α 2 increases. SHT_UIT 2 1 0.8 With f actor 3 , if its value is high, α 3 increases, otherwise, α 3 SHT_UIT 2 2 1 decreases. This optimal loop is carried on until the conver- In Table 3, we can see that run 1 has better performance gence happens. With the maximum value of α is 1 and the than run 2. It shows that the approach of measuring PAQI minimum value of α is 0. by adaptively adjusting by user’s tags, density traffic, and 2.3.2 Run 2: First, we based on line_start and line_end density tree is more efficient than the approach that uses points to determine the features of routes 1, 2, and 3 that are the average user’s tags and feature of the route. In run 2, featured in Table 1. Second, we calculate the average value since we use the rounding of numbers when calculating the of the user’s tags. Third, we calculate the weight of routes: average user’s tags, it could affect the whole performance. PAQIinput 4 CONCLUSION wr = (3) avд(user ′s taдs) In this paper, we report our solution for the challenge raised Where: w r : is weight of route.; PAQIintput : based on Devel- by MediaEval 2019 Insight for wellbeing task - subtask2. opment Dataset of routes 1, 2, and 3.; avд(user ′staдs): the We introduce an ad-hoc approach that adaptively adjust average user’s tags on routes 1, 2, and 3. user’s tags, traffic and tree density observed along a route Because we can not find the “Mountain trail” feature in De- to re-adjust the AQI value towards measuring an acceptable velopment Dataset, we assume the weight of “Mountain personal AQI value. Insights for Well-being MediaEval’19, 27-29 October 2019, Sophia Antipolis, France REFERENCES [1] Minh-Son Dao, Peijiang Zhao, Tomohiro Sato, Koji Zettsu, Duc- Tien Dang-Nguyen, Cathal Gurrin, and Ngoc-Thanh Nguyen. 2019. Overview of MediaEval 2019: Insights for Wellbeing Task: Multimodal Personal Health Lifelog Data Analysis. In MediaEval2019 Working Notes (CEUR Workshop Proceedings). CEUR-WS.org , Sophia Antipolis, France. [2] Siqi Zheng, Jianghao Wang, Cong Sun, Xiaonan Zhang, and Matthew E Kahn. 2019. Air pollution lowers Chinese urbanites’ expressed happiness on social media. Nature Human Behaviour 3, 3 (2019), 237.