<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of Insight for Wellbeing Task at MediaEval 2021: Cross-Data Analytics for Transboundary Haze Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Asem Kasem</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Son Dao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Efa Nabilla Aziz</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cathal Gurrin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanh-Binh Nguyen</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wida Suhaili</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Information and Communications Technology</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <country>Universiti Teknologi Brunei</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Science VNU-HCMUS</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper provides an overview of the MediaEval 2021 task on "Insights for Wellbeing: Cross-Data Analytics for Transboundary Haze Prediction". The task targets researchers in multimedia information retrieval, machine learning, data science, environmental and atmospheric sciences. The term "cross-data" refers to approaches that utilize multimodal data across domains, platforms, and prediction models to make predictions. The main objective of this task is to perform accurate multi-day haze prediction in some neighboring countries from the ASEAN region. There are three subtasks: (i) 3Day Localized Air Pollution Prediction, with emphasis on accurate predictions in each country depending only on its weather and air quality data; (ii) 3-Day Transboundary Air Pollution Prediction, with emphasis on addressing transboundary haze efects through the use of multiple cross-data sources; (iii) Transfer Learning subtask, focusing on transfer learning techniques to demonstrate that patterns learned from certain regions' data sources help in improving predictions for other regions. The task aims to utilize cross-data sources to provide accurate predictions that can help mitigate the adverse efects of haze air pollution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Haze air pollution describes the pollution consisting of particulate
matter of smoke, dust, and other vapors present in the air, which
originate from large-scale forest and land fires, factories, and cars.
This mixture of air-borne pollutants, when it reaches high levels,
causes respiratory health problems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and has negative impacts
on visibility [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], economic production [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], transportation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and
tourism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Transboundary haze problem refers to the situation
where high levels of haze remain measurable after crossing into
another country’s air space, resulting a recurrent issue in many
regions in the world, especially in Southeast Asia where the sources
contributing to haze pollution difer at each country with varying
percentages come from localized or transboundary sources. For
example, transboundary (haze) pollution episodes are often attributed
to the long-range transport of biomass fires from slash-and-burn
activities during dry seasons or from forests fires, which travel
depending on weather conditions to afect several neighboring
countries.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORKS</title>
      <p>
        Particulate matter concentrations (PM10, PM2.5) are usually used to
calculate air quality index measures that describe pollutant severity.
Past recorded data of air pollution and meteorological parameters
have been used by researchers and practitioners from academia and
government agencies to develop air pollution prediction models
to forecast changes in air pollution. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the authors aimed to
predict the first three hours during transboundary haze events by
using three diferent stepwise Multiple Linear Regression (MLR)
models for predicting the PM10 concentration, one for each hour.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the authors emphasized modeling diverse inter-station
relationships for air quality prediction of city-wide stations using an
Attentive Temporal Graph Convolutional Network (ATGCN) model.
The method could be extended to transboundary haze prediction
if considering inter-station relationships in region-wide stations.
In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the authors generated ordered city clusters by higher-order
spectral clustering on pollution-transport networks among cities,
then projected those clusters into one-dimensional Euclidean space.
The clusters contributed to the partial diferential equation (PDE)
model for predicting PM2.5. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the authors introduced a
hierarchical graph neural network-based air quality forecasting method
working on data collected from stations. Here, transboundary haze
can be concerned as the difusion processes of air pollutants
between cities and monitoring stations. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], authors analyzed
transboundary haze from mainland China to Fukuoka, Japan,
using atmospheric sensing data. They used a CRNN-LSTM model to
predict PM2.5.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>TASK DESCRIPTION</title>
      <p>The task is organized into three subtasks sharing a common
objective, but difering in the approach and data sources utilized to
achieve it. The objective for all subtasks is to perform 3-day
prediction of air pollution, as measured by PM10, in three diferent
ASEAN countries, namely: Brunei, Singapore, and Thailand. There
are training and testing timeseries data for each country, with daily
or hourly readings of weather and air pollution parameters. The
training data covers mostly 2010-2017 period (Singapore is only for
2016-2017), and the testing data covers mostly 2018-2019 period.</p>
      <p>Participants need to predict, as accurately as possible, the PM10
values for certain temporal gaps (of consecutive days) in the testing
data. The same objective will be revisited in each subtask, but with
diferences in terms of the data sources accessible and the prediction
approach, as explained below.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 3-Day Localized Air Pollution Prediction</title>
      <p>In this subtask, the objective is to predict the PM10 value for the
ifrst 3 days in each gap in the testing files, at the location of each
air-quality station in a country. To do that, participants are required
to develop 3 diferent predictive models based only on the localized
training data from each country. This subtask will explore the
accuracy of predicting air pollution for 3 days ahead, and it is
designed to evaluate how well this objective can be achieved if each
country depends only on its own weather and air pollution data.</p>
      <p>The main challenges in this subtask are expected to be on
addressing missing reading values, capturing the timeseries changes
across multiple stations in a country, and finding insights that may
improve prediction quality for each country separately. For
example, in some countries, only certain stations will have wind speed
and direction readings, or certain stations might be faulty for a long
duration. To encourage multi-disciplinary research, we believe that
spatial interpolation methods used in environmental studies and
GIS-based solutions, or insights that consider the type of parameter,
and/or geographical distance information, can be helpful in filling
missing parameter readings, or in increasing the spatial granularity
of readings in each country.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>3-Day Transboundary Air Pollution</title>
    </sec>
    <sec id="sec-6">
      <title>Prediction</title>
      <p>The objective in this subtask is the same of the previous subtask
with respect to the values to predict, but participants are required
to consider other data sources available from the same country
or neighboring countries (including training data of other
countries, remote sensing, terrain information, information, social media
streams, news reports, etc.). This subtask attempts to address
transboundary haze efects by observing the improvements of prediction
accuracy once the haze and weather situation (e.g. wind/fire
information) in neighboring countries is taken into account, or through
other insights and conclusions that the participants may find.</p>
      <p>There will be additional weather and air pollution data from
stations in Indonesia, containing daily weather and weekly PM10
readings, which participants are encouraged to utilize in modeling
or pre-processing steps. A challenge in this subtask could be the
synchronization of multiple data sources, since each country has
a diferent subset of parameters, readings frequency, distribution
of stations, and missing periods of recordings. One approach to
synchronize the reading frequency diferences could be to either
summarize the more frequent readings, or apply some form of
temporal interpolation to the less frequent readings. Participants can
consider additional data sources (cross-data) to improve
prediction and/or find insights based on environmental factors, satellite
remote sensing, social/news data, etc.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Use of Transfer Learning</title>
      <p>In this subtask, participants will re-visit either of subtask 1 or 2
above, by considering transfer learning techniques in their
solutions. For example, participants can reattempt subtask-1 to answer
the question: can the use of pre-trained models from one country
improve the 3-day prediction of another country?</p>
      <p>The application of transfer learning can demonstrate that
patterns learnt from certain regions’ data sources (e.g., via access to
Asem Kasem et al.
larger datasets) help to improve predictions in other regions (e.g.
where data is scarce or less granular).
4</p>
    </sec>
    <sec id="sec-8">
      <title>DATASETS AND EVALUATION</title>
      <p>All datasets are provided in CSV format, but with diferent structure
per country, as the granularity, weather and pollution parameters,
and number of stations is diferent across countries. For each
country, the testing datasets will have similar structure to the training
datasets, but containing multiple temporal gaps (of 8 consecutive
days) where only the dates and stations IDs are given. Each gap
will be preceded by consecutive periods (of about 10 days) where
all available readings are provided (with some potential missing
values). Participants need to predict the PM10 values of the first
3 days in each temporal gap. Predicting more than 3 days is also
welcomed, but will not be used for ranking the submitted results.</p>
      <p>For evaluation, predicted PM10 values at each station in a
country will be compared with ground truth values using MAE (Mean
Absolute Error) function. Submissions will be ranked (based on
MAE) for each country’s testing data separately, and points will be
assigned based on least average MAE across all stations (there are
roughly equal number of required predictions for each station in
a country). For each subtask, total points for all countries will be
summed and used to determine the overall ranking in that subtask.
Furthermore, based on the submitted working notes, the approach
used will be evaluated by the task organizers in terms of
innovativeness, motivation of used methods, and gained insights. In any
subtask, the data and methods used to make predictions should
adhere to the given subtask’s descriptions and conditions.
5</p>
    </sec>
    <sec id="sec-9">
      <title>DISCUSSION AND OUTLOOK</title>
      <p>Prediction of haze air pollution is an important task that can guide
policy and decision making in many countries that are afected by
it. However, its usefulness depends on accurate prediction for a
suficient duration of time, and hence the 3-day ahead requirement
of the task. Given the transboundary haze efect, it is expected that
neighbouring countries can benefit from each other’s data sources
to improve their own prediction. In practice however, agencies in
diferent countries will have data sources that are diferent in the
parameters they record, and the (spatial and temporal)
granularity of recordings. Besides, other non-traditional data sources such
as satellite images or social media streams may help the overall
objective of accurate prediction. The subtasks were organized to
address the practical challenges related to transboundary haze
pollution, and we hope to motivate researchers to present innovative
approaches in tackling them. Details on the methods and results of
each participating team can be found in the working note papers
of the MediaEval 2021 workshop proceedings.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>We would like to thank Brunei’s Meteorological Department (BDMD)
and Department of Environment, Parks and Recreation (JASTRE);
Singapore’s Meteorological Service Singapore (MSS) and National
Environment Agency (NEA); Thailand’s Meteorological
Department (TMD) and Pollution Control Department (PCD); and
Indonesia’s Meteorology, Climatology, and Geophysical Agency (BMKG),
for providing the meteorological and air quality data.</p>
      <p>Insight for Wellbeing: Cross-Data Analytics for Transboundary
Haze Prediction</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Samsuri</given-names>
            <surname>Abdullah</surname>
          </string-name>
          , Nur Nazmi Liyana Mohd Napi, Ali Najah Ahmed, Wan Nurdiyana Wan Mansor, Amalina Abu Mansor, Marzuki Ismail,
          <source>Ahmad Makmom Abdullah, and Zamzam Tuah Ahmad Ramly</source>
          .
          <year>2020</year>
          .
          <article-title>Development of Multiple Linear Regression for Particulate Matter (PM10) Forecasting during Episodic Transboundary Haze Event in Malaysia</article-title>
          .
          <source>Atmosphere</source>
          <volume>11</volume>
          ,
          <issue>3</issue>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kang</given-names>
            <surname>Hao</surname>
          </string-name>
          et. al.
          <source>Cheong</source>
          .
          <year>2019</year>
          .
          <article-title>Acute Health Impacts of the Southeast Asian Transboundary Haze Problem: A Review</article-title>
          .
          <source>Int. J. Environ Res Public Health</source>
          <volume>16</volume>
          ,
          <issue>18</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Forsyth</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Public concerns about transboundary haze: A comparison of Indonesia, Singapore, and Malaysia</article-title>
          .
          <source>Global Environmental Change</source>
          <volume>25</volume>
          (
          <year>2014</year>
          ),
          <fpage>76</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ding</surname>
            <given-names>A. Wang Z.</given-names>
          </string-name>
          et al.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <year>2021</year>
          .
          <article-title>Amplified transboundary transport of haze by aerosol-boundary layer interaction in China</article-title>
          .
          <source>Nat. Geosci</source>
          .
          <volume>13</volume>
          (
          <year>2021</year>
          ),
          <fpage>428</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mostafanezhad</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Evrard</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Geopolitical Ecologies of Tourism and the Transboundary Haze Disaster in Thailand, Laos and Myanmar. The Tourism-Disaster-Conflict Nexus (Community, Environment</article-title>
          and Disaster Risk Management)
          <volume>19</volume>
          (
          <year>2018</year>
          ),
          <fpage>53</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Euston</given-names>
            <surname>Quah</surname>
          </string-name>
          ,
          <string-name>
            <surname>Wai-Mun Chia</surname>
          </string-name>
          , and
          <string-name>
            <surname>Tsiat-Siong Tan</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Economic impact of 2015 transboundary haze on Singapore</article-title>
          .
          <source>Journal of Asian Economics</source>
          <volume>75</volume>
          (
          <year>2021</year>
          ),
          <fpage>101329</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Chunyang</given-names>
            <surname>Wang</surname>
          </string-name>
          , Yanmin Zhu, Tianzi Zang, Haobing Liu, and
          <string-name>
            <given-names>Jiadi</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Modeling Inter-Station Relationships with Attentive Temporal Graph Convolutional Network for Air Quality Prediction</article-title>
          .
          <source>In WSDM '21 (WSDM '21)</source>
          .
          <source>Association for Computing Machinery</source>
          ,
          <fpage>616</fpage>
          -
          <lpage>634</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Yufang</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Haiyan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Shuhua</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Quantifying prediction and intervention measures for PM2.5 by a PDE model</article-title>
          .
          <source>Journal of Cleaner Production</source>
          <volume>268</volume>
          (
          <year>2020</year>
          ),
          <fpage>122</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jiahui</given-names>
            <surname>Xu</surname>
          </string-name>
          , Ling Chen, Mingqi Lv, Chaoqun Zhan, Sanjian Chen, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <year>2021</year>
          . HighAir:
          <string-name>
            <given-names>A Hierarchical</given-names>
            <surname>Graph Neural NetworkBased Air Quality Forecasting Method.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <source>arXiv:cs.LG/2101.04264</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Zettsu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Convolution recurrent neural networks for short-term prediction of atmospheric sensing data</article-title>
          .
          <source>In IEEE CPSCom-SmartData.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>