<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of MediaEval 2022 Urban Air: Urban Life and Air Pollution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minh-Son Dao</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanh-Hai Dang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tan-Loc Nguyen-Tai</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanh-Binh Nguyen</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bergen University</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dalat University</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LOC GOLD Technology MTV Ltd.</institution>
          <addr-line>Co</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Institute of Information and Communications Technology</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Science, Vietnam National University in HCM City</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Air pollution and urban life mutually influence each other, posing a challenge for urban management. Monitoring air pollution and urban activities through systems like air pollution stations and CCTV has led to the development of prediction methods and applications. However, limited access to data and narrow datasets with ideal conditions hinder research eforts. To address this, we introduce the UrbanAir task, which provides a streaming dataset from CCTV and air stations in Dalat City, Vietnam. This task focuses on multimodal and crossmodal prediction of air pollution, even in the absence of data from specific stations. It targets researchers in fields such as multimedia information retrieval, machine learning, AI, data science, urban management, and environmental science, among others.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Air pollution poses grave threats to human health due to factors such as industrial development,
climate change, agrochemical use, and urbanization. In the EU alone, at least 238,000 premature
deaths were reported in 2020 due to PM2.5 pollution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Fuel combustion in various sectors,
including residential, commercial, institutional, and transportation, was identified as a significant
source of particulate matter pollution. Road transport, agriculture, and industry were the
primary sources of increasing nitrogen oxide (NO2) emissions, resulting in 49,000 deaths in the
EU. Exposure to ozone also contributed to 24,000 premature deaths in the EU.
      </p>
      <p>
        Despite these dangers, 99% of the global population lived in areas where WHO’s air quality
guidelines were not met in 2019 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In 2019, around four million people died due to fine
particulate outdoor air pollution, with highest death rates observed in East Asia and Central
Europe.
      </p>
      <p>Hence, there is an urgent need for robust models to predict air pollution and uncover
correlations with human activities, especially in urban settings. This task aims to develop a novel
framework to uncover localized correlations between trafic factors, weather conditions, and air
pollution, with the goal of enhancing Air Quality Index (AQI) prediction accuracy and deepening
understanding of the mutual impact between urban life and air pollution.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Description</title>
      <p>The data utilized in this study is directly obtained from various sources, including air pollution,
weather, and CCTV stations that are strategically installed across Dalat City, Vietnam.
Specifically, there are ten air pollution stations (referred to as sensor01-10), with three of them attached
to weather stations (sensor01, sensor02, sensor03), as depicted in Figure 1. In addition, there are
fourteen CCTV cameras that provide trafic-related data. For the purpose of this study, the terms
"environmental data" and "trafic data" are used interchangeably with air pollution/weather
data and CCTV data, respectively, as provided by the air pollution and weather stations, as well
as the CCTV networks.</p>
      <p>At each air pollution station, sensory data is collected and recorded, encompassing parameters
such as Temperature, Humidity, PM1.0, PM2.5, PM10, CO, NO2, SO2, O3, and UV. Similarly, the
weather stations capture data on WindSpeed, WindGust, Direction, and Rainfall. All stations are
identified by unique codes, including SensorID, SensorCode, SensorName, Latitude, Longitude,
and Altitude, which provide their specific location information. Data at these stations are
recorded at frequent intervals based on Date and Time.</p>
      <p>On the other hand, each CCTV camera is identified by distinct codes, including CameraID,
CameraCode, CameraName, Latitude, and Longitude, which denote their location information.
Data from these cameras, including Date, Time, and Image, can be accessed by individuals.
Notably, only images are stored instead of videos to conserve storage space.</p>
      <p>All the collected data from the air pollution, weather, and CCTV stations are continuously
streamed and made accessible to task participants through the project’s website and the ftp
protocol, granting convenient access for research and analysis purposes.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>We have designed two subtasks for participants in order to leverage multimodal and crossmodal
prediction models for air pollution and trafic prediction. Subtask 1 challenges participants to
predict air pollution levels using only environmental data, while subtask 2 requires the use of
trafic data exclusively. However, subtask 2 allows for training a prediction model using both
environmental and trafic data, but only trafic data can be used for predicting air pollution.
Subtask 1 requires predicting both the exact concentration value and AQI (Air Quality Index)
level for each pollutant, whereas subtask 2 only requires AQI levels.</p>
      <p>Subtask 1 presents a traditional prediction problem where predicted results are influenced
by historical data. However, in this challenge, participants are encouraged to use data from
one or a group of stations to predict data from other stations. The real challenge here lies in
dealing with live-dead circumstances, where there is no guarantee that a particular station will
operate consistently for an extended period of time. Relying solely on data from one sensor
may result in poor model performance if that sensor is ofline for some time, or if it produces
inaccurate data due to unexpected local efects caused by human activities or natural factors.
Furthermore, this subtask aims to develop robust models that can predict air pollution in areas
without dedicated monitoring stations by utilizing data from neighboring stations.</p>
      <p>
        The original idea behind subtask 2 is to use images to estimate AQI levels concurrently [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
For example, participants can utilize images, either current or historical, captured by one camera
or a group of cameras to estimate AQI levels at nearby locations. Scenarios for this subtask
could include using images from smartphones to estimate AQI levels based on images captured
around a location, or utilizing CCTV images to estimate AQI levels in the surrounding areas.
      </p>
      <p>Participants are required to make predictions for air pollution levels at three diferent moments
of the day (8 am-9 am, 11 am-12 pm, and 5 pm-6 pm) on days D+1, D+5, and D+7, where D
represents the day of submitting the predicted results.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Quest for Insight</title>
      <p>In addition to the challenges outlined in the subtasks, there are several research questions
associated with this challenge that participants can explore to move beyond a mere focus on
evaluation metrics:
• Which factors from weather, air pollution, and CCTV data contribute to the air pollution
prediction model?
• When building the correlation hypothesis between trafic, weather, and air pollution, what
is the diference and similarity between people’s experience and the model’s knowledge?
This is the first instance of an open dataset being made available to the public, encompassing
real-world challenges and unforeseen efects that have not been present in previous in-lab
datasets. However, this dataset also serves as a valuable source of rich information that can
ofer valuable insights into addressing urban air pollution and associated life challenges. Many
extraneous factors, such as the influx of tourists during weekends or festival events, the unique
weather patterns, and the mountain-valley geography of Dalat city, can significantly influence
air pollution levels and city dynamics. Participants are encouraged to explore and leverage
external information beyond the dataset to enhance their models and uncover novel findings.
Additionally, these factors represent only the tip of the iceberg, and participants are encouraged
to dig deeper and uncover hidden gems in their quest for solutions.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Ground Truth and Evaluation</title>
      <p>One of the unique aspects of this task is that participants have the opportunity to self-evaluate
their predicted results using incoming or recorded data from the system as ground truth. For
example, a prediction model can forecast the air pollution levels for days D+1, D+5, and D+7
on day D. When those days arrive, the predicted results can be compared against the actual
air pollution values. Therefore, there is no predefined ground truth for evaluation, as the data
from the sensors on the day of prediction may contain noise, outliers, or even be missing due to
sensor shutdowns.</p>
      <p>
        For evaluation, we utilize the incoming data to assess participants’ results. The Mean Absolute
Error (MAE) and Mean Squared Error (MSE) are used to evaluate the accuracy of prediction
models for air pollutant values, while the F1-score is used for evaluating the accuracy of AQI
levels, which are calculated based on the instructions provided in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The AQI levels are divided
into seven categories, namely Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy,
Very Unhealthy, Hazardous, and Extreme Hazardous. Participants are required to predict the
values and AQI levels for six air pollutants, namely PM2.5, PM10, CO, NO2, SO2, and O3.
      </p>
      <p>The task necessitates participants to submit their predicted results (for both subtasks) for
days D+1, D+5, and D+7, with the submission format communicated to all participants via
email.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Outlook</title>
      <p>
        Numerous methods have been proposed in the literature for air pollution prediction, utilizing
various data sources such as air pollution stations [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], mobile air pollution devices [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], lifelog
cameras [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and satellites [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, these methods often rely solely on datasets provided
by third parties or collected by themselves, which have limitations such as being ofline, in-lab,
and incomplete, and may not capture the actual unexpected errors that occur in reality. In
contrast, the task presented in this paper provides a streaming dataset that covers a wide range
of scenarios, making it more challenging for participants to develop efective solutions that are
applicable to real-world situations.
      </p>
      <p>As the organizers of this task, we believe it can pave the way for a new direction in air
pollution prediction research. Instead of solely relying on historical data from one sensor to
predict its own values, we encourage participants to explore the use of subsets of sensors to
predict values of other sensors or to incorporate data from diferent locations with unique
characteristics. This approach may enable better detection of outliers and sudden changes in
air pollution with higher accuracy. Furthermore, understanding the mutual impact of urban
life and air pollution can lead to the development of explainable AI models for air pollution
prediction, which can support improved urban management and healthcare services.</p>
      <p>By leveraging additional data from local sources and considering the interplay between urban
life and air pollution, we believe that this task can contribute to the advancement of air pollution
prediction research and foster the development of innovative approaches that are better aligned
with real-world scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>
        ASEAN-IVO [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] sponsors the air pollution and weather stations installed in Dalat city, Vietnam.
It is the result of the international project, namely "Reusable, Shareable, and Transferable Smart
Data Platform for Collaborative Development of Data-Driven Smart City" [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], conducted by
partners from Japan, Vietnam, Singapore, Brunei, and the Philippines from 2020 to 2022. Dalat
University, Vietnam, and LOC GOLD Technology MTV Ltd. Co, Vietnam, handle and maintain
it. The CCTV system is sponsored by local citizens and handled by the police force of Dalat
City, Vietnam.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. E. A.</given-names>
            <surname>European Environment</surname>
          </string-name>
          <string-name>
            <surname>Agency</surname>
          </string-name>
          ,
          <article-title>Air quality in europe 2022, HTML -</article-title>
          <string-name>
            <surname>TH-AL-</surname>
          </string-name>
          22
          <string-name>
            <surname>-</surname>
          </string-name>
          011
          <string-name>
            <surname>-EN-Q - ISBN</surname>
          </string-name>
          978-92-9480-515-7 - ISSN 1977-
          <volume>8449</volume>
          05/
          <year>2022</year>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .2800/488115.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>U. N. Environment programme</surname>
          </string-name>
          ,
          <source>The unep pollution action note</source>
          ,
          <year>2022</year>
          . URL: https://www.unep.org/ interactive/air-pollution-note.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tejima</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>S.</given-names>
            <surname>Dao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zettsu</surname>
          </string-name>
          ,
          <article-title>Mm-aqi: A novel framework to understand the associations between urban trafic, visual pollution, and air pollution</article-title>
          ,
          <source>in: Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U. S.</given-names>
            <surname>Environmental Protection</surname>
          </string-name>
          <string-name>
            <surname>Agency</surname>
          </string-name>
          ,
          <article-title>Technical assistance document for the reporting of daily air quality - the air quality index (aqi</article-title>
          ),
          <year>2020</year>
          . URL: https://www.airnow.gov/sites/default/files/2020-05/ aqi-technical
          <article-title>-assistance-document-sept2018.pdf</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-H. Fei</surname>
            ,
            <given-names>S.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Fang</surname>
          </string-name>
          ,
          <article-title>Forecasting air quality in taiwan by using machine learning</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.-V.</given-names>
            <surname>La</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>S.</given-names>
            <surname>Dao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tejima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. U.</given-names>
            <surname>Kiran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zettsu</surname>
          </string-name>
          ,
          <article-title>Improving the awareness of sustainable smart cities by analyzing lifelog images and iot air pollution data</article-title>
          ,
          <source>in: 2021 IEEE International Conference on Big Data (Big Data)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3589</fpage>
          -
          <lpage>3594</lpage>
          . doi:
          <volume>10</volume>
          .1109/BigData52589.
          <year>2021</year>
          .
          <volume>9671403</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.-S.</given-names>
            <surname>Dao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zettsu</surname>
          </string-name>
          , U. K. Rage, Image-2
          <article-title>-aqi: Aware of the surrounding air qualification by a few images</article-title>
          , in: H.
          <string-name>
            <surname>Fujita</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Selamat</surname>
            ,
            <given-names>J. C.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          Ali (Eds.),
          <source>Advances and Trends in Artificial Intelligence. From Theory to Practice</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Gutiérrez-Avila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Arfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carrión</surname>
          </string-name>
          ,
          <article-title>Prediction of daily mean and one-hour maximum pm2.5 concentrations and applications in central mexico using satellite-based machine-learning models</article-title>
          ,
          <source>Journal of Exposure Science Environmental Epidemiology</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1038/ s41370-022-00471-4.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>NICT</surname>
          </string-name>
          , Asean-ivo,
          <fpage>2013</fpage>
          -. URL: https://www.nict.go.jp/en/asean_ivo/about_asean_ivo.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M.-S. Dao</surname>
            ,
            <given-names>T.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kasem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Aguinaldo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Biljecki</surname>
          </string-name>
          , Reusable, sharable, and
          <article-title>transferable smart data platform for collaborative development of data-driven smart cities,</article-title>
          <year>2020</year>
          -
          <fpage>2022</fpage>
          . URL: https://www.nict.go.jp/en/asean_ivo/ASEAN_IVO_
          <year>2020</year>
          _
          <article-title>Project02</article-title>
          .html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>