<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Healthism@MediaEval 2019 - Insights for Wellbeing Task: Factors related to Subjective and Objective Health</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qi Huang</string-name>
          <email>huangqi@stu.scu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ailin Sheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lei Pang</string-name>
          <email>pangl01@pcl.ac.cn</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaoyong Wei</string-name>
          <email>weixy@pcl.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ramesh Jain</string-name>
          <email>jain@ics.uci.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peng Cheng Lab</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shenzhen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>China</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sichuan University</institution>
          ,
          <addr-line>Chengdu</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California Irvine</institution>
          ,
          <addr-line>Irvine</addr-line>
          ,
          <country country="US">US</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>This paper presents an overview of our proposed methods for the two subtasks of MediaEval 2019 “Insights for Wellbeing”. The goal is to investigate the factors related to personal environmental health conditions (PEH). We model this as a regression problem where environmental factors measured by both physical sensors and psychological perception ratings are used as indicators for predicting the target values of objective or subjective PEH measures. A variety of models (e.g., GBDT and LSTM) have been employed to conduct the regressions. The experimental results indicate that objective PEH is mainly determined by the factors indicated by physical sensors and the temperature and humidity contribute the most. Subjective PEH is dominated by the perceptual factors collected from questionnaires and indicated by urban natures mined from a GIS dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Plenty of studies have been carried out for investigating the
relationship between the environmental conditions and the human
wellbeing [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ]. While fruitful and usefully findings have been
obtained, however, these demographic conclusions provide limited
guidance when we are applying them to individual cases (e.g., to
study the diseases that are tightly related to personal exposure) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Therefore, there is an urgent need to measure the personal
environmental health conditions (PEH). Thanks to the recent progress on
wearable devices, we are able to measure PEH through a wide range
of sensors such as those for general conditions (e.g., temperature,
humidity) and those for specifical air pollution (e.g., PM2.5, N O2,
O3). This makes datasets like SEPHLA [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (the first PEH dataset)
possible.
      </p>
      <p>With SEPHLA, this paper presents our methods proposed for the
two subtasks of MediaEval 2019 “Insights for Wellbeing”. The goal
is to investigate the main factors related to personal environmental
health conditions (PEH). In this study, PEH has been measured
objectively by the PM 2.5 and subjectively by P-AQI (personal air
quality index which is determined by a human rating system). We
model this as a regression problem in which PEH measures are
the targets while factors are the indicators. We have investigated a
wide of factors including those measured by physical sensors (e.g.,
temperature, humidity, heart rate), those collected from
questionnaires for measuring people’s psychological perceptions to the PEH,
and urban natures extracted from OpenStreetMap1 (a third-party
GIS dataset). A variety of models (e.g., GBDT and LSTM) have been
employed to fulfill the regressions.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>FACTORS OVERVIEW</title>
    </sec>
    <sec id="sec-3">
      <title>Physical Sensors</title>
      <p>Since PEH focuses on a road-level area, the factors from city-level
sensors are discarded, including N O2, O3 and N O. The remaining
factors are collected from wearable devices, such as PM2.5,
temperature, humidity as well as heart rate. For PM2.5, a calibration
strategy, which replace those values shifting from mean with more
than three times of variation with mean of its surrounding 30
values, is adopted to filter outliers. In total, there are 43,684 and 24,055
samples in training and testing set respectively.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Questionnaires</title>
      <p>
        In SEPHLA, the participants should answer 5 questions at each
checkpoint according to the data collection instruction. The
participants need to provide their subjective perception about the segment,
including quietness, calmness, fun, easy of walking and
crowdedness. To assign questionnaires to their corresponding checkpoints,
we use k-means to group them into clusters based on time
distribution and the number of clusters is equal to the number of
checkpoints. Finally, we have 307 and 197 valid questionnaires for
training and testing respectively and each questionnaire is
represented by a 5-dimensional vector. The label (i.e. P-AQI) of
questionnaires are directly derived from that of corresponding segments in
a multiple instance learning strategy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], since the P-AQI actually
is generated based on the perceptions of all participants.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Urban Natures</title>
      <p>
        Since questionnaire is inaccurate because of individual diference,
we make use of OSM to have a relatively accurate description about
the surrounding environment. In OpenStreetMap, each location is
described by a JSON file, which contains the description of
surrounding buildings, roads and scenery. We sample locations along the
routine by sliding window with stride as 20 meters and a circle with
radius as 25 meters is drawn around the location. Then we collect
all the urban nature description inside this circle and all the name
entities are extracted with Stanford CoreNLP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Then, the name
entities are manually split into road, building and scenery. Since
human beings have diferent subjective perceptions under various
urban nature, we further cluster the entities into diferent groups
as shown in Table 1. Similar to questionnaires, All the locations
are assigned with the same label with that of the corresponding
segment. Finally, each point is represented by a three-dimensional
vector and there are 479 and 367 samples for training and testing
respectively.
3
      </p>
    </sec>
    <sec id="sec-6">
      <title>RUN DESIGN AND RESULT ANALYSIS</title>
      <p>We have submitted five runs for each task to explore the impact of
diferent factors on PEH.
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Segment Replacement</title>
      <p>
        The five runs are listed as following:
• MEAN: The mean value of PM2.5 from the other
participants in the routine is calculated and the hidden values in
the replaced segment are filled with it. This is the baseline
run based on the assumption that the PM2.5 should not
largely fluctuate within a small region.
• LOCA: The hidden values is replaced with the PM2.5 value
of the nearest point from other participants. Note that the
map distance between two points is adopted rather than l2
distance. This run also bases on the same assumption of
MEAN but further narrow down the region area.
• GBDT: LightGBM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is adopted to model the relationship
between temperature, humidity and PM2.5. By observing
the data, we find that the distribution of the development
and testing data are quite diferent. Hence, we directly train
the model on the data collected from other participants in
the same routine with the hidden segment.
• FLSTM: Since the distribution in development and testing
data are diferent, we propose to prediction the
fluctuations by subtracting the mean value of routines. LSTM is
adopted to incorporate the contextual information for
precise prediction and the input features are also temperature
and humidity. The hyperparameters are set as following:
batch size as 1,000, learning rate as 0.0006, hidden units as
40. Adam optimizer is adopted for 500 epochs.
• LSTM: As run GBDT, the LSTM is directly trained on the
temperature and humidity collected from other
participants in the same routine with the hidden segment. The
settings of hyperparameters are the same as that of FLSTM.
As shown in Table 2, GBDT achieves the best performance and
LSTM performs even worse than LOCA. We attribute the low
performance of LSTM to the random state in PM2.5, which means that
LSTM is overfitting to model the contextual information. As for
      </p>
      <p>• QUEST: As described in Section 2.2, the questionnaires are
represented by a five dimensional vector with the same
label as corresponding segment.
• WEAR: The physical sensor factors, including temperature,
humidity and PM2.5 are assigned with the same label as
corresponding segment.
• OSM: As described in Section 2.3, the locations inside a
segment with urban nature are used as training samples.
• WOMER: The results of WEAR and OSM are further fused
by average voting.
• QOMER: The results of QUEST and OSM are further fused
by average voting.</p>
      <p>The performance is listed in Table 3. WEAR is the worst run, which
means that physical sensors actually basically provide no clues
for P-AQI prediction. Meanwhile, QUEST and OSM achieve better
performance. Questionnaire is easy to be afected by individual
difference and OSM provides a better description of the environment.
Hence, OSM achieves the best performance as single feature. In
addition, by fusing diferent features, we further find that
questionnaire provides complementary information to OSM and achieve
the best performance among all the five runs.
4</p>
    </sec>
    <sec id="sec-8">
      <title>FUTURE WORK</title>
      <p>While the findings in this study are interesting, however, they are
not conclusive, in the way that they are made with a biased set of
factors with a limited number of sensors and a limited coverage of
participants. In the future, We will extend the study by including
more sensors such as ECG, EEG, and blood sugar, and a larger
coverage of participants across age, gender, occupation, and education.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by the National Natural Science
Foundation of China under Project 61906108.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Marc-Andr</surname>
            <given-names>Carbonneau</given-names>
          </string-name>
          , Veronika Cheplygina, Eric Granger, and
          <string-name>
            <given-names>Ghyslain</given-names>
            <surname>Gagnon</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multiple Instance Learning</article-title>
          .
          <source>Pattern Recogn</source>
          . 77,
          <string-name>
            <surname>C</surname>
          </string-name>
          (May
          <year>2018</year>
          ),
          <fpage>329</fpage>
          -
          <lpage>353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Minh-Son</surname>
            <given-names>Dao</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peijiang</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tomohiro</given-names>
            <surname>Sato</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Koji</given-names>
            <surname>Zettsu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Overview of MediaEval 2019: Insights for Wellbeing Task Multimodal Personal Health Lifelog Data Analysis</article-title>
          .
          <source>In MediaEval 2019 workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Guolin</given-names>
            <surname>Ke</surname>
          </string-name>
          , Qi Meng, Thomas Finley, Taifeng Wang,
          <string-name>
            <surname>Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Weidong Ma, Qiwei Ye, and
          <string-name>
            <surname>Tie-Yan Liu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>LightGBM: A Highly Eficient Gradient Boosting Decision Tree</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Christopher</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            , Mihai Surdeanu, John Bauer, Jenny Finkel,
            <given-names>Steven J.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
          </string-name>
          , and
          <string-name>
            <surname>David McClosky</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The Stanford CoreNLP Natural Language Processing Toolkit</article-title>
          .
          <article-title>In Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          .
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Darshan</given-names>
            <surname>Santani</surname>
          </string-name>
          , Salvador Ruiz-Correa, and
          <string-name>
            <surname>Daniel</surname>
          </string-name>
          Gatica-Perez.
          <year>2018</year>
          .
          <article-title>Looking South: Learning Urban Perception in Developing Cities</article-title>
          .
          <source>Trans. Soc. Comput. 1</source>
          ,
          <issue>3</issue>
          (Dec.
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tomohiro</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <surname>Minh-Son</surname>
            <given-names>Dao</given-names>
          </string-name>
          , Kota Kuribayashi, and
          <string-name>
            <given-names>Koji</given-names>
            <surname>Zettsu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>SEPHLA: Challenges and Opportunities Within Environment - Personal Health Archives</article-title>
          . In MMM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Hyeonjin</given-names>
            <surname>Song</surname>
          </string-name>
          , Kevin James Lane, Honghyok Kim, Hyomi Kim, Garam Byun, Minh Le, Yongsoo Choi, Chan Ryul Park, and Jong-Tae
          <string-name>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Association between Urban Greenness and Depressive Symptoms: Evaluation of Greenness Using Various Indicators</article-title>
          .
          <source>Int J Environ Res Public Health (Jan</source>
          .
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>