Leveraging Egocentric and Surrounding Environment
 Data to Adaptively Measure a Personal Air Quality
                       Index
                    Dang-Hieu Nguyen1 , Minh-Tam Nguyen2 , Loc Tai Tan Nguyen3 ,
                                          1,2,3 University of Information Technology, VietNam

                      hieund.12@grad.uit.edu.vn,tamnm.12@grad.uit.edu.vn,locntt.12@grad.uit.edu.vn
ABSTRACT                                                              2     METHODOLOGY
This paper introduces a new solution for measuring the per-           As mentioned above, environmental factors, weather vari-
sonal air quality index that reflects the egocentric perspective      ables, urban nature, and traffic impact on individuals. Ob-
of human beings with their surrounding environment. Two               serving the dataset provided by the subtask, we found that
instances of the solution are introduced and evaluated by             main streets with lots of traffic and fewer trees will have a
using the MediaEval 2019 Insights for wellbeing task dataset          low PAQI and vice versa. This observation gives a hint to
and evaluation metric. The first instance calculates the Air          propose the solution to measure PAQI using AQI, user’s tags,
Quality Index (AQI) using sensors data, utilizes the user’s           and visual features. Two instances of the solution are intro-
tags and visual features to measure the personal AQI adap-            duced and evaluated by using the MediaEval 2019 Insights
tively. The second instance leverages the average value of            for wellbeing task dataset and evaluation metric. The first
the user’s tags and feature of the route to determine per-            instance calculates the Air Quality Index (AQI) using sensors
sonal AQI. The performance of these two instances is also             data, utilizes the user’s tags and visual features to measure
discussed.                                                            the personal AQI adaptively. The second instance leverages
                                                                      the average value of the user’s tags and feature of the route
1    INTRODUCTION                                                     to determine personal AQI.
In [2], the author gives various evidence gathered from many
reference sources and points out the impact of air pollution          2.1    Data Processing
on individuals in many perspectives (health, psychology).             First, data along each route are pre-processed to get rid of
The mentioned pollution factors include environmental fac-            noises and outliers. Necessary interpolations are conducted
tors (e.g. fine particulate matter PM 2.5 , Nitrogen dioxide NO 2 ,   to compensate for missing data. Then, two instances (runs)
Ozone O 3 , Sulfur dioxide SO 2 ), weather variables (e.g. tem-       of the proposed solution are constructed as follows:
perature, humidity), and urban nature, traffic. Unfortunately,        Run 1: From the dataset, we can identify a group of users
most of investigations on this domain focusing on measur-             walking along a specific route. Since the 2018 dataset is
ing the air quality index using sensors data regardless of            recorded by seconds, we convert a recording time to the
understanding how people feel of air qualification around             minute to make sure the highest value of each factor within
them.                                                                 1 minute is retained. Then we calculate AQI using these fac-
   MediaEval 2019 Insights for wellbeing task [1] introduces          tors (e.g., PM 2.5 , NO 2 , O 3 ). Next, visual features are extracted
an interesting subtask of measuring personal air quality in-          from images.
dex (PAQI). The PAQI is defined as the personal feeling of            Run 2: We first collect all data in the same group, then we
AQI comparing to the real AQI calculated by using sensors             only keep data coordinating with user’s tags. Next, we divide
data. The subtask requests to measure the PAQI using ego-             each segment of one route into four smaller segments. This
centric data (e.g., lifelog image, heartbeat, step counts, user’s     task aims to have a segment as straight as possible so that the
annotations) and surrounding environment data (e.g., air              radius can sweep all the points tagged on the segment. Then,
pollution, weather).                                                  we scan the radius with a radius by the distance between
   The definitions, dataset and evaluation metric of this sub-        the small segments, if any of the tag points are within this
task are described in [1].                                            range, we collect them and calculate the average value of
Copyright 2019 for this paper by its authors. Use                     that user’s tags (e.g., assume the distance between line_start
permitted under Creative Commons License Attribution                  and line_end is 100 m. We divide it into four road segment
4.0 International (CC BY 4.0).                                        with 25 m each and get new 3 points in between).
MediaEval’19, 27-29 October 2019, Sophia Antipolis, France
MediaEval’19, 27-29 October 2019, Sophia Antipolis, France                                                        D.H. Nguyen et al.

2.2    Visual Features Extraction                                      trail” feature same value with “Bayside path” feature. Table
We use the visual features provided by the task’s organizers.          1 shows the weight of routes 1, 2, and 3 when running on
Besides, we develop a tool that crawls images from Google              the development dataset.
Street View using coordinates provided in the dataset. This                          Table 1. The weight of features
tool aims to enrich the image dataset. Finally, we extracted               Feature      Weight          Feature          Weight
traffic and tree density from these images.                              Main street      0.33    Shopping street          0.5
                                                                         Path             0.67    Underground arcade       0.25
2.3    PAQI Measurement                                                  Sightseeing      1.5     Garden                   1.5
In this section, we use the data input obtains from section              Street           0.5     Bayside path               2
2.1 corresponding to each run mentioned.                                 Park             1.5     Mountain trail             2
   2.3.1 Run 1: PAQI measurement is first calculated by                We use the weight of routes in Table 1 to infer PAQI’s value
using the AQI calculation formula. Then we use the user’s              of routes 4 and 5 by the formula below:
tags and traffic density, tree density to adjust the AQI values.                    PAQIoutput =w r · avд(user ′s taдs)            (4)
We build a function to adaptively adjust AQI into PAQI as
follows:                                                               Where: PAQIoutput : is value predicted of routes 4 and 5.; w r :
                           Ín
                                (f actor i · α i )                     is weight from Table 1.; avд(user ′staдs): the average user’s
                   f (x) = i=1                               (1)       tags on routes 4 and 5.
                                    n
   Where: i=1 (α i ) = 1; f actor i : input data such as user’s
           Ín
tags and visual feature.                                               3   RESULTS AND ANALYSIS
   The PAQI’s value are specified by:                                  The experimental results running on the training dataset are
                                                                       denoted in Table 2. The results show we can measure PAQI
                       PAQI =AQI · f (x)                        (2)    with acceptable accuracy. Table 3 shows the results when
    Finally, we adjust the value of α i according to a route to        running on the testing dataset. In Table 2 does not include
get the final PAQI. The value of α i is calculated based on the        the result of run 2 because we only obtain the weight of
factors’ values. If the factors’ values are high, α i increases        routes from Development data and use it to infer PAQI for
and the PAQI is high. If factors’ values are low, α i decreases        Testing data.
and the PAQI is low.                                                       Table 2. Results running on the training dataset
    We set parameters as follows: f actor 1 ← user’s tags,               ROUTE/ List of course groudtruth List of course run 1
f actor 2 ← traffic density, f actor 3 ← tree density, α 1 + α 2 +             Course 1/ (1, 2, 3, 4, 1)             1,2,2,3,2
α 3 = 1. First, we define α 1 = α 2 = α 3 = 13 . Then we use ad-
                                                                                Course 2/(1,2,3,4,1)                 1,2,2,4,2
hoc-based approach to calculate factors’ values and adjust
                                                                               Course 3/(2,1,2,0,3,3,2)            2,1,2,2,4,2,2
the values of α corresponding with each factor. With f actor 1 ,
if its value is larger than the predefined threshold (2.5 in our            Table 3. Results running on the testing dataset
case), α 1 increases, otherwise α 1 decreases. With f actor 2 ,                  Group_id Subtask_id Run_id Score
if its value is high, α 2 decreases, otherwise, α 2 increases.                   SHT_UIT            2          1       0.8
With f actor 3 , if its value is high, α 3 increases, otherwise, α 3             SHT_UIT            2          2        1
decreases. This optimal loop is carried on until the conver-              In Table 3, we can see that run 1 has better performance
gence happens. With the maximum value of α is 1 and the                than run 2. It shows that the approach of measuring PAQI
minimum value of α is 0.                                               by adaptively adjusting by user’s tags, density traffic, and
   2.3.2 Run 2: First, we based on line_start and line_end             density tree is more efficient than the approach that uses
points to determine the features of routes 1, 2, and 3 that are        the average user’s tags and feature of the route. In run 2,
featured in Table 1. Second, we calculate the average value            since we use the rounding of numbers when calculating the
of the user’s tags. Third, we calculate the weight of routes:          average user’s tags, it could affect the whole performance.
                              PAQIinput                                4   CONCLUSION
                     wr =                                       (3)
                            avд(user ′s taдs)                          In this paper, we report our solution for the challenge raised
Where: w r : is weight of route.; PAQIintput : based on Devel-         by MediaEval 2019 Insight for wellbeing task - subtask2.
opment Dataset of routes 1, 2, and 3.; avд(user ′staдs): the           We introduce an ad-hoc approach that adaptively adjust
average user’s tags on routes 1, 2, and 3.                             user’s tags, traffic and tree density observed along a route
Because we can not find the “Mountain trail” feature in De-            to re-adjust the AQI value towards measuring an acceptable
velopment Dataset, we assume the weight of “Mountain                   personal AQI value.
Insights for Well-being                                             MediaEval’19, 27-29 October 2019, Sophia Antipolis, France

REFERENCES
[1] Minh-Son Dao, Peijiang Zhao, Tomohiro Sato, Koji Zettsu, Duc-
    Tien Dang-Nguyen, Cathal Gurrin, and Ngoc-Thanh Nguyen.
    2019. Overview of MediaEval 2019: Insights for Wellbeing
    Task: Multimodal Personal Health Lifelog Data Analysis. In
    MediaEval2019 Working Notes (CEUR Workshop Proceedings).
    CEUR-WS.org <http://ceur-ws.org>, Sophia Antipolis, France.
[2] Siqi Zheng, Jianghao Wang, Cong Sun, Xiaonan Zhang, and
    Matthew E Kahn. 2019. Air pollution lowers Chinese urbanites’
    expressed happiness on social media. Nature Human Behaviour
    3, 3 (2019), 237.