=Paper= {{Paper |id=Vol-1392/paper-10 |storemode=property |title=Airvlc: An Application for Real-Time Forecasting Urban Air Pollution |pdfUrl=https://ceur-ws.org/Vol-1392/paper-10.pdf |volume=Vol-1392 |dblpUrl=https://dblp.org/rec/conf/icml/OchandoJOR15 }} ==Airvlc: An Application for Real-Time Forecasting Urban Air Pollution== https://ceur-ws.org/Vol-1392/paper-10.pdf
          Airvlc: An application for real-time forecasting urban air pollution


Lidia Contreras Ochando                                                                                    LICONOC @ UPV. ES
Universitat Politècnica de València. Spain
Cristina I. Font Julián                                                                               CRIFONJU @ EI . UPV. ES
Universitat Politècnica de València. Spain
Francisco Contreras Ochando                                                                        FRACONOC @ GMAIL . COM
Universitat Politècnica de València. Spain
Cèsar Ferri                                                                                           CFERRI @ DSIC . UPV. ES
DSIC. Universitat Politècnica de València. Spain



                          Abstract                                   1. Introduction
                                                                     Air pollution can have important impact (short and long-
     This paper presents Airvlc, an application for                  term) on the health of people. For instance, urban air pol-
     producing real-time urban air pollution forecasts               lution increases the risk of suffering respiratory diseases
     for the city of Valencia in Spain. Although many                such as pneumonia, or chronic, such as lung cancer or car-
     cities provide air quality data, in many cases, this            diovascular disease (World Health Organisation, 2015). A
     information is presented with significant delays                recent work (Wilker et al., 2015) relates long-term expo-
     (three hours for the city of Valencia) and it is lim-           sure to ambient air pollution to structural changes in the
     ited to the area where the measurement stations                 brain.The SOER 2015 report (The European Environment
     are located. The application employs regression                 Agency , 2015), with data about the European Union coun-
     models able to predict the levels of four differ-               tries’ air quality in 2011, concludes that although the atmo-
     ent pollutants (CO, NO, PM2.5, NO2) in three                    sphere in the continent has improved in the last decades,
     different locations of the city. These models are               there are significant traces of the most harmful contami-
     trained using features that represent traffic inten-            nants. In fact, in 2011, the report estimates that 430.000
     sity, persistence of pollutants and meteorological              Europeans died prematurely because of pollution.
     parameters such as wind speed and temperature.                  Although some governments are introducing restriction
     We compare different learning techniques to get                 policies that limit the use of vehicles (main source of pol-
     the better performance in the prediction of pollu-              lution in most cases), only in Europe, important cities such
     tants. According to our experiments, ensembles                  as Paris, Naples, Moscow, Milan or Barcelona still report
     of decision trees (Random Forest) outperforms                   significant levels of urban pollution in 2015 (The European
     the rest of methods in almost all of our tests.                 Environment Agency , 2015). In this context, it is impor-
     Airvlc incorporates the best regression models                  tant for citizens of urban agglomerations to reduce the ex-
     and, by a distance-weighted combination of the                  position to urban air pollution as much as possible. This is
     predictions, is able to generate a real-time pollu-             especially relevant for high risk population such as: kids,
     tion map of the city of Valencia. The application               elderly people, asthmatics or people suffering respiratory
     also includes a warning system for sending no-                  diseases.
     tifications to users when a nearby risk pollution
     concentration is detected.                                      In this work we present an application that predicts urban
                                                                     air pollution in real time by employing historical data. The
                                                                     application is based on the city of Valencia in Spain. This
                                                                     city can be considered a medium size urban agglomeration
                                                                     (around 1.000.000 inhabitants). The city provides an open
Proceedings of the 2 nd International Workshop on Mining Urban       data site containing real-time information about the city in
Data, Lille, France, 2015. Copyright c 2015 for this paper by its
authors. Copying permitted for private and academic purposes.        different aspects such as traffic data, noise sensors, pollen




                                                                    dk
                              Airvlc: An application for real-time forecasting urban air pollution

sensors... Although different sensors of urban pollution air               body. Atmospheric conditions in the Mediterranean
are included in the site, this information needs to be care-               coast of Spain can influence the particle levels, due
fully verified and it is published with a delay of three hours.            to lower rainfall and wind action with respect to other
This delay can represent a problem since risky high levels                 northern Europe countries, and the North African par-
of pollutions are not detected in real-time. Additionally, the             ticles (Saharan dust), PM10 and PM2.5.
network of sensors is limited (six in the city of Valencia).
                                                                       • NO (Nitrogen monoxide): Nitrogen monoxide is a
Considering these limitations, we have developed an ap-                  highly unstable compound; it causes nitrogen dioxide
plication able to display in real-time foreseeable levels of             by quickly reacting in the atmosphere. This instability
pollution in a wide number of points of the city. The ap-                makes the nitrogen monoxide a radical, namely, a high
plication is based on the predictions of regression models               reactive power molecule, whose effects on the body
that are trained using features that represent traffic inten-            are abnormal DNA, lipids and proteins. This kind
sity, persistence of pollutants and meteorological parame-               of changes derives in the medium and long term as
ters.                                                                    a greater chance of developing cancer.Its origin stems
The paper is organised as follows. Section 2 details the                 largely from vehicle engines.
process of data recollection of pollution particles and the
factors that affect the generation, concentration or disper-           • NO2 (Nitrogen dioxide): Nitrogen dioxide is not a
sion of these pollutants. Experiments in learning regres-                directly generated pollutant, since its presence in the
sion models for predicting the pollutant concentrations are              atmosphere is caused by the oxidation of nitrogen
included in Section 3. The Airvlc application is detailed                monoxide. In the presence of moisture, this com-
in 4. Related works are discussed in Section 5. Finally,                 pound results in nitric acid, and its inhalation, even
Section 6 closes the paper with a discussion of the main                 in low concentrations, can cause lung tissue degrada-
conclusions and some plans for future work.                              tion, as well as can reduce the efficacy of the immune
                                                                         system, especially in children.

2. Data collection                                                     • CO (Carbon monoxide): Carbon monoxide is a pri-
                                                                         mary pollutant. CO is toxic; it prevents oxygen trans-
Different particles are associated with urban air pollution.
                                                                         port by poisoning the blood, since it replaces the
In order to measure air contamination, pollutant parameters
                                                                         haemoglobin. People with cardiovascular and cere-
found in the lower levels of the troposphere are controlled.
                                                                         brovascular problems could suffer heart attacks or
Air quality sensors measure concentrations of particles that
                                                                         strokes because of problems related to high concen-
have an anthropogenic origin and produce effects during
                                                                         trations of CO.
or after the inhalation by humans. The historical pollu-
tion data for this work has been obtained from the open
data web of the Generalitat Valenciana1 . Following the            The distribution of air pollution is decisively influenced
recommendations of (The European Environment Agency                by climatic conditions. We have collected Climatological
, 2015), we concentrate on the following particles:                observations for the meteorological data of Valencia city
                                                                   from Meteorological Agency of the Government of Spain
                                                                   (AEMET)2 . We consider the following parameters:
  • PM 2.5 (Suspended particles below 2.5 microns):
    This parameter has been chosen because of its pol-                 • Temperature: In an ordinary atmosphere situation,
    lutant power. It is one of the most dangerous parti-                 temperature decreases with altitude, favouring ascen-
    cles, since its size makes it almost unstoppable by the              sion of warmer (and less dense) air, and dragging con-
    natural filters of the body. This fact means that the                taminants upwards. In a situation of thermal inversion,
    PM 2.5 are usually able to reach the pulmonary alve-                 a warmer layer of air is over the colder surface air and
    oli and in some cases, these particles are attached to               prevents the rise of this last (denser), so the contami-
    these alveoli with a consequent reduction of lung ca-                nation is confined and increases.
    pacity; in worst cases, the particles cross the alveolar
    membranes and reach the blood stream. Considering                  • Humidity: Humidity is a weather factor to be consid-
    that PM 2.5 particles have its origin in anthropogenic               ered; in its presence, nitrogen dioxide derives in nitric
    activities (especially in the use of fuels in motor ve-              acid, harmful to human health.
    hicles), it is not surprising that its atomic structure
    contains heavy metals, extremely toxic to the human                • Wind speed: Strong winds can disperse pollutants
                                                                         and transport them away from their emission point.
  1
    http://www.cma.gva.es/cidam/emedio/
                                                                       2
atmosfera/jsp/historicos.jsp                                               http://www.aemet.es/




                                                                  dj
                              Airvlc: An application for real-time forecasting urban air pollution

  • Precipitations Precipitations wash contaminants and            the levels of pollutants.
    can dissolve substances and gases.
                                                                   We first study the weekly evolution of pollutants in the
                                                                   three stations. Figure 1 shows the evolution of the aver-
The two main sources of pollution in developed countries           age of the four parameters of pollution analysed and the
are motor vehicles and industry. Vehicles release large            average traffic intensity for Molı̀ station depending on the
amounts of nitrogen oxides, carbon oxides, hydrocarbons            day of the week. Figure 2 presents the same plot for Avd
and particulates when burning gasoline and diesel. There-          Francia station and Figure 3 corresponds to Pista de Silla
fore, we need to measure the level of traffic in the city in       station. In order to make the values comparable in the plot
order to predict the air pollution. For this purpose, the City     we normalise each parameter by the maximum value of
of Valencia provides a network of sensors (electromagnetic         that parameter. The level of pollutants and traffic reach the
coils) that measure the intensity of traffic (Vehicles/hour)       maximum levels during the working days of the week for
in the city. This data can be found in the open data site of       the three stations (Friday seems to be the worst day). We
the Valencia City Council3 .                                       can clearly see the dependency of the four parameters of
                                                                   pollution on the traffic intensity level. During the week-
3. Experiments                                                     end days, the level of traffic drastically descends and as-
                                                                   sociated with this reduction the levels of pollutants signifi-
With all the selected parameters, we have built datasets           cantly drop. Again, the exception is PM2.5. This behaviour
aimed to predict the concentration of pollutants from the          can be caused because these particles can be generated by
intensity of traffic and weather parameters. Concretely, we        all types of combustion activities (motor vehicles, power
have collected data for a period of two years (2013 and            plants, wood burning, etc.) and certain industrial processes
2014). Data was collected every 60 minutes, 24 hours a             (US Environmental Protection Agency , 2015).
day during those two years. Although Valencia city has
                                                                   We have performed a similar analysis considering the evo-
six stations for the detection and measurement of air pollu-
                                                                   lution of pollutants, traffic intensity and meteorological
tion, three of them have not sufficient data for the analysed
                                                                   variables during a day (humidity and wind). Figure 4 shows
period and were discarded. In this way we collected data
                                                                   the evolution of the daily average of these parameters for
from these stations: Molı̀, Avd Francia and Pista de Silla.
                                                                   Molı̀ station depending on the hour of the day . Figure 5
These three stations are located inside the urban agglomer-
                                                                   corresponds to Avd Francia station and Figure 6 to Pista
ation, and thus most of the pollutants measured in the sen-
                                                                   de Silla station. Again, we normalise each parameter by
sors should be generated by urban activities (mainly traf-
                                                                   the maximum value of that parameter. If we observe traf-
fic). For each one of these stations, we create a dataset with
                                                                   fic intensity, we can discover in all the three plots a similar
the level of the pollutants measured and parameters that can
                                                                   behaviour, there are three peaks in traffic intensity corre-
affect these measurements, we concentrate on traffic level
                                                                   sponding to the hours where workers travel to their work
(measured by electromagnetic coils), weather conditions.
                                                                   places (around 9 am), lunch time (around 2 pm) and an
In order to measure the traffic related to each air pollution
                                                                   evening period (around 8 pm). In the three stations the
station, we average the traffic intensity of the closest six
                                                                   maximum of pollution parameters is found at the same pe-
traffic measurament sensors. This is a simplification since,
                                                                   riod of the first peak in traffic intensity (around 9 am). In
certainly, all the traffic of the city has effect on the mea-
                                                                   the second peak of traffic intensity (around 2 pm) the lev-
sured level of all the stations in the city.
                                                                   els of pollutants does not follow the increase in traffic. In
We can see a summary of the three datasets in Table 1. This        fact, after the maximum period around 9 am, pollutants de-
table includes averages and standard deviation for the three       crease their levels until around 4 pm where they change the
stations of the pollutant particles measured and the inten-        behaviour and start an increasing of the values. The second
sity of traffic associated with each station. If we analyse        peak in pollutant values is found around 9 pm. Our intu-
traffic intensity, Avd Francia is the busiest station, while       ition with respect to this behaviour is that wind disperses
the other two have similar values. With regard to pollution        part of the pollutant in the most sunny hours. Valencia is
levels Pista de Silla station presents the maximum levels          in the Mediterranean coast and in this city it is easy to find
for three parameters. The only exception is PM2.5. This            (especially in summer) sea breezes. These kind of winds
behaviour can probably be associated with the specific lo-         are created over bodies of water (usually sea or big lakes)
cation of the stations: While Pista de Silla station is located    near land due to differences in air pressure created by their
in a the central part of the city, and therefore more vulner-      different heat capacity. This phenomenon can be detected
able to the overall city pollution, the other two are in the       in the plots if we observe the increase in wind strength dur-
suburbs of the city where external air streams can reduce          ing the midday hours. Finally, we observe a strange and
  3                                                                different behaviour of the CO particle in Molı́ station. For
    http://www.valencia.es/ayuntamiento/
DatosAbiertos.nsf/                                                 this pollutant there is a second peak in the midday period.




                                                                  d9
                               Airvlc: An application for real-time forecasting urban air pollution

This behaviour probably corresponds to an extra source of           (random forest) is the best model in almost all of cases.
pollution that needs to be further studied.                         These results are in concordance with (Singh et al., 2013)
                                                                    where ensembles of trees outperformed other approaches
As stated previously, we are interested in predicting pollu-
                                                                    such as SVMs.
tion levels in real time. Since these levels are only made
                                                                                                                           Molí
public with a delay of three hours, we need to produce a
prediction model from real time features. We extract the




                                                                                      1.0
following set of features from the data collected from dif-




                                                                                      0.9
ferent sources (detailed in the previous section):




                                                                                      0.8
  • Climatological features: Temperature (Celsius de-




                                                                                      0.7
                                                                             Levels
    grees), Relative humidity (Percentage), Pressure
    (hPa), Wind speed (km/h), Rain (mm/h)




                                                                                      0.6
                                                                                      0.5
  • Calendar features: Year, Month, Day in the month,
    Day in the week, Hour




                                                                                      0.4
                                                                                                                             Traffic
                                                                                                                             CO
                                                                                                                             NO

  • Traffic intensity features: Traffic level in the sur-                                                                    NO2




                                                                                      0.3
                                                                                                                             PM2.5

    rounding stations (vehicles/hour), traffic level 1, 2, 3                                Sunday   Monday   Tuesday      Weds.       Thursday   Friday   Saturday

    and 24 hours before
                                                                    Figure 1. Average weekly traffic intensity and pollution parame-
  • Pollution features: Pollution level in the target sta-
                                                                    ters measured in Molı́ station.
    tion 3 and 24 hours before                                                                                          Avd Francia
                                                                                      1.0




With this goal we compare several regression learning tech-
niques from R (R Core Team, 2015) in order to identify the
                                                                                      0.9




technique that is able to better predict the levels of pollu-
                                                                                      0.8




tion. To test the prediction ability of different models, we
learn the models using as training data the registers of 2013
                                                                                      0.7
                                                                             Levels




and the first nine months of 2014. We test the models with
                                                                                      0.6




the last three months of 2014. We use Mean Squared Error
(MSE) as a performance measure. Concretely, we employ
                                                                                      0.5




the following techniques for learning regression models (all
                                                                                      0.4




                                                                                                                             Traffic
of them with the default parameters, unless stated other-                                                                    CO
                                                                                                                             NO
                                                                                                                             NO2
wise): Linear Regression (lr) (Hornik et al., 2009), quan-
                                                                                      0.3




                                                                                                                             PM2.5

tile regression (qr) (Koenker, 2015) with lasso method, K                                   Sunday   Monday   Tuesday      Weds.       Thursday   Friday   Saturday

nearest neighbours (IBKreg) with k = 10 (Hornik et al.,
2009) , a decision tree for regression (M5P) (Hornik et al.,        Figure 2. Average weekly traffic intensity and pollution parame-
2009), Random Forest (RF) (Liaw & Wiener, 2002), Sup-               ters measured in Avd Francia station.
port Vector Machines (SVM) (Meyer et al., 2014) and Neu-
ral Networks (Venables & Ripley, 2002). In order to com-
pare the predictive performance of these models, we also
introduce three baseline models: A model that always pre-
dicts the mean of the train data (TrainMean), a model that          4. Airvlc
always predicts the mean of the test data (TestMean), and a
basic model that predicts the same value of the target pol-         In the previous section we have analysed how to obtain
lutant 3 hours before (X3H).                                        real-time air pollution predictions from a given set of fea-
                                                                    tures. In this section we summarise Airvlc, a mobile app for
Table 2 contains the MSE of the regression models for the           Android and iOS and a web application4 . This application
prediction of the four target pollution levels of the Molı́ sta-    generates from the regression models a map of the city of
tion. Results for Pista de Silla station and Avd Francia sta-       Valencia showing the predicted intensity of pollution lev-
tion are shown in Table 4 and 3 respectively. If we analyse         els. The application also allows the user to configure a set
these results, we can conclude that learned models are im-          of automatic warnings every time a pollution threshold is
proving the performance of the basic baseline models in al-         reached near the position of the mobile device.
most all cases. When we compare the learning techniques
                                                                        4
in the three tables, the ensemble of decision trees technique               http://airvlc.lidiacontreras.com/




                                                                   d8
                                                   Airvlc: An application for real-time forecasting urban air pollution


                                         Table 1. Averages and standard deviation of the three pollution detection sensors.
                                                       Traffic                              CO                       NO                           NO2                       PM2.5
                                                 ave         sd                     ave          sd         ave           sd                ave      sd                ave       sd
                                  Molı́          442.333     339.489                0.116        0.093      8.642         20.311            26.608   21.003            10.650    6.926
                                  Francia        631.569     431.412                0.185        0.122      9.092         19.271            27.840   23.992            7.909     4.235
                                  Silla          484.722     298.768                0.228        0.187      23.559        33.024            45.631   25.376            8.309     6.020



      Table 2. Results in MSE of different regression models for Molı́ Station. The best prediction model is highlighted in bold.
                                          TrainMean              TestMean             X3h              lr         qr        IBkreg                 M5p            RF         SVM                NN
                                CO             0.086                 0.061           0.067         0.057       0.060         0.068                0.071        0.057         0.063            0.182
                                NO            30.202               28.739           36.516        25.200      29.805        27.821               25.555       20.655        25.870           32.944
                               NO2            19.918               19.914           25.258        19.680      17.370        15.683               31.242       14.877        14.488           32.152
                              PM2.5            8.803                 8.803           8.634         6.889       6.564         6.674                7.248        6.072         6.135           13.089
                                                 Pista Silla                                                                                                                Molí
               1.0




                                                                                                                                      1.0
               0.9




                                                                                                                                      0.8
               0.8




                                                                                                                                      0.6
               0.7
      Levels




                                                                                                                             Levels
               0.6




                                                                                                                                      0.4
               0.5




                                                                                                                                                                             Traffic
                                                                                                                                                                             Humidity
                                                                                                                                      0.2
               0.4




                                                      Traffic                                                                                                                Wind
                                                      CO                                                                                                                     CO
                                                      NO                                                                                                                     NO
                                                      NO2                                                                                                                    NO2
               0.3




                                                      PM2.5                                                                                                                  PM2.5


                     Sunday   Monday   Tuesday     Weds.        Thursday   Friday    Saturday                                                0            5            10               15            20

                                                                                                                                                                            Hours


Figure 3. Average weekly traffic intensity and pollution parame-                                                 Figure 4. Average daily traffic intensity and pollution parameters
ters measured in Pista Silla station.                                                                            measured in Molı́ station.

4.1. Contamination intensity map
Results of Section 3 show that random forest models ob-                                                          cation its pollution level as a dot which colour varies among
tain the best performance in most cases. Therfore, twelve                                                        green, yellow and red depending on the calculated pollution
random forest models are implemented in the Airvlc ap-                                                           level. If the user selects one of these dots, an extended win-
plication. These models are able to predict every hour the                                                       dow is opened where the exact predicted levels are shown.
level of the four analysed particles at the three pollution                                                      Figure 7 includes a screen-shot of the pollution map of the
detection stations. We want, however, to predict pollution                                                       Airvlc application. The user can also select a second frame
levels at the points of the city where the traffic is measured                                                   in the window of the Airvlc application where he/she can
(1245 points around the city). For that purpose, given any                                                       introduce a specific location and then the application com-
of these points, we extract the features related to traffic in-                                                  putes the predicted pollution levels for that selection. An
tensity from the six nearest traffic sensors. The meteoro-                                                       example of this process is included in Figure 8.
logical features are the same for all the city. The predic-
tions of pollutants in that exact location is computed by                                                        4.2. Risk levels
the combination of the models corresponding to the three
                                                                                                                 Figure 8 shows how the pollution levels are presented to
stations. The combination is weighted with respect to the
                                                                                                                 users. However, showing just a concentration value of each
distance of the target point with respect to the measurement
                                                                                                                 parameter is not very useful for most users, since most of
stations giving more importance to the closest models. A
                                                                                                                 them are not experts in pollutants and they could not in-
simpler approach could be to learn a single model from the
                                                                                                                 terpret correctly these numbers. In order to improve the
concatenation of the data from the three stations and then
                                                                                                                 comprehensibility of the predictions we have established
apply this in all the set of target points.
                                                                                                                 three ranges of risk represented as speedometer: Low risk
By computing the pollution predictions for a set of strate-                                                      (green) corresponds to a measurement that is safe; Medium
gic and well-distributed locations we are able to estimate a                                                     risk (yellow) when concentrations reach levels to cause
real-time pollution map of the city. The map is generated                                                        harmful effects in people sensitive to air pollution expo-
with Google Maps technology. This map shows for each lo-                                                         sure (kids, elderly people...); High risk (red) when concen-




                                                                                                               de
                                          Airvlc: An application for real-time forecasting urban air pollution

Table 3. Results in MSE of different regression models for Avd. Francia Station. The best prediction model is highlighted in bold.
                                  TrainMean           TestMean           X3h         lr       qr   IBkreg            M5p            RF         SVM                 NN
                            CO         0.195              0.165         0.224    0.159     0.163    0.168           0.160       0.153          0.156             0.324
                            NO        35.493            33.634         44.955   32.992    36.049   34.092          30.350       29.517        33.364            38.262
                           NO2        23.443            20.900         27.162   16.299    16.718   18.494          23.782       14.851        19.100            41.929
                          PM2.5        3.721              3.718         3.879    3.326     3.132    3.523           4.655        3.214         3.265             7.974


Table 4. Results in MSE of different regression models for Pista Silla Station. The best prediction model is highlighted in bold.
                                  TrainMean           TestMean           X3h         lr       qr   IBkreg            M5p            RF         SVM                 NN
                            CO         0.278              0.222         0.304    0.221     0.227    0.221           0.235        0.218         0.268             0.278
                            NO        49.149            46.232         60.353   39.863    43.524   42.760          44.150       36.332        52.438            58.798
                           NO2        23.135            23.122         30.167   20.861    19.031   18.487          25.972       16.722        23.313            49.699
                          PM2.5        6.911              6.660         7.119    5.663     5.342    5.750           7.189        5.339         7.368            11.061

                                     Avd Francia                                                                                         Pista Silla
                1.0




                                                                                                             1.0
                0.8




                                                                                                             0.8
                0.6




                                                                                                             0.6
       Levels




                                                                                                    Levels
                0.4




                                                                                                             0.4

                                           Traffic                                                                                              Traffic
                                           Humidity                                                                                             Humidity
                                           Wind                                                                                                 Wind
                0.2




                                                                                                             0.2




                                           CO                                                                                                   CO
                                           NO                                                                                                   NO
                                           NO2                                                                                                  NO2
                                           PM2.5                                                                                                PM2.5


                      0      5       10               15          20                                               0        5            10                15            20

                                          Hours                                                                                               Hours


Figure 5. Average daily traffic intensity and pollution parameters                           Figure 6. Average daily traffic intensity and pollution parameters
measured in Avd Francia station.                                                             measured in Pista Silla station.


trations can cause acute and chronic effects to anyone, es-
                                                                                             For example, the user can establish personal limits for
pecially those with sensitivity.
                                                                                             warnings or modify the range of distance for the detection
The ranges of risk shown by the application from the pre-                                    of high risk levels of pollutant concentration. Obviously,
dicted values of the four pollutants are based on the recom-                                 the user needs to allow the application to know the actual
mendations of the Directive 2008/50/EC (European Comis-                                      GPS location of the device
sion, 2008). The variable as NOx (oxides of nitrogen)
                                                                                             In the case of the web application, given that here it is more
refers to NO or NO2, since the normative establishes the
                                                                                             complex to know the exact location of the user, we adopt a
same limits for both levels.
                                                                                             different strategy. We are working in an automated warning
                                                                                             system where the user needs to fix a set of areas, and then
  • Green level: [NOx] < 14.0 µg/m3 ∧ [CO] < 30.0                                            the system sends an electronic email whenever a dangerous
    mg/m3 ∧ [PM 2.5] < 7.5 µg/m3.                                                            situation (high risk level by default) is detected.
  • Yellow level: We establish medium risk (yellow level)
    if the levels do not satisfy the conditions of the green                                 5. Related work
    level and the red level.
                                                                                             A wide number of works employs machine learning tech-
  • Red level: [NOx] ≥190.0 µg/m3 ∨ [CO] ≥ 55.0                                              niques or statistical approaches for predicting pollution lev-
    mg/m3 ∨ [PM 2.5] ≥ 25.0 µg/m3                                                            els. A classical work is (Yi & Prybutok, 1996). In this pa-
                                                                                             per, the authors propose ozone prediction models. Specifi-
                                                                                             cally, they develop a neural network model for forecasting
4.3. Risk warnings
                                                                                             daily maximum ozone levels and compare it to previous
Airvlc mobile application can be configured to send warn-                                    approaches by regression, and Box-Jenkins ARIMA. The
ings to users if the device is near to a zone (200 meters                                    results show that the neural network model improves the
approximately) where a high risk level is predicted. These                                   performance of the regression and Box-Jenkins ARIMA
warnings can be personalised by the user in different ways.                                  models tested. Neural networks models have been widely




                                                                                           dd
                               Airvlc: An application for real-time forecasting urban air pollution




                                                                   Figure 8. Frame where the user can introduce specific locations to
                                                                   know the predicted levels of pollution.



                                                                   conclusions to the work presented in (Singh et al., 2013).
                                                                   In this study, principal components analysis (PCA) is per-
                                                                   formed to identify air pollution sources. From the extracted
                                                                   features, tree based ensemble learning models are induced
                                                                   to predict the urban air quality of Lucknow (India) together
     Figure 7. Airvlc application. Real-time pollution map.        with the air quality and meteorological databases for a pe-
                                                                   riod of five years.
employed in this field, a review of these approaches can be
found in (Khare & Nagendra, 2006).                                 6. Conclusions
A more related work is (Karppinen et al., 2000a). Here             Air pollution can decrease life expectancy since contami-
the authors propose a modelling system for predicting the          nation rises the risk of suffering respiratory diseases. Al-
traffic volumes, emissions from stationary and vehicular           though policies motivating the reduction of emissions of
sources, and atmospheric dispersion of pollution in an ur-         pollutant particles have been introduced in the last years,
ban area. They employ four monitoring stations in the              many cities frequently still present risky levels of air pol-
Helsinki metropolitan area in 1993. The paper compares             lution. In these situations, the reduction of the exposure to
the predicted NOx and NO2 concentrations with the results          ambient air pollution is highly recommended. In this work,
of an urban air quality monitoring network. The agreement          we have presented Airvlc, an application that predicts in
of model predictions was better for the two suburban mon-          real-time the levels of four dangerous pollutants in a wide
itoring stations, compared with two urban stations. Some           set of points in the city of Valencia. The system is able to
applications of these models are introduced in (Karppinen          predict these pollution levels by applying regression mod-
et al., 2000b). A similar work for the city of Izmir in Turkey     els trained from data containing information traffic inten-
is (Elbir, 2003). Here, the authors compare The CAL-               sity, persistence of pollutants and meteorological parame-
MET meteorological model and its puff dispersion model             ters. Airvlc can be a useful tool for avoiding risky locations
CALPUFF for predicting dispersion of the sulphur dioxide           in terms of air pollution.
emissions from industrial and domestic sources.                    As future work we propose the integration of the applica-
Another related work, and in this case very recent, is (Don-       tion in middleware platforms such as Fi-Ware5 , this could
nelly et al., 2015). This paper presents a model for real          help to extend the applicability of the system to other cities
time air quality forecasts. The predictions are concentrated       or regions. We also are interested in the incorporation of
in nitrogen dioxide (NO2) and they are used to estimate            additional features in order to improve the prediction mod-
air quality 48 hours in advance. The model is based on             els: wind direction, sand storms, forest wildfires and agri-
a multiple linear regression which uses linearised factors         cultural burnings... Finally, the use of the tool for the rec-
describing variations in concentrations together with mete-        ommendation of routes that minimise the exposure to air
orological parameters and persistence as predictors.               pollution.
                                                                      5
Our comparison of regression techniques obtains similar                   http://www.fiware.org/




                                                                 d3
                            Airvlc: An application for real-time forecasting urban air pollution

Acknowledgments                                                 Koenker, Roger. quantreg: Quantile Regression,
                                                                  2015.   URL http://CRAN.R-project.org/
We thank the anonymous reviewers for their comments,              package=quantreg. R package version 5.11.
which have helped to improve this paper significantly.
We are also grateful to Ajuntament de València, InnDEA         Liaw, Andy and Wiener, Matthew. Classification and
València and specially to Ramón Ferri, Ruth López and          regression by randomforest. R News, 2(3):18–22,
Paula Llobet for their help in providing traffic data. This       2002.     URL http://CRAN.R-project.org/
work was supported by the REFRAME project, granted by             doc/Rnews/.
the European Coordinated Research on Long-term Chal-
lenges in Information and Communication Sciences &              Meyer, David, Dimitriadou, Evgenia, Hornik, Kurt,
Technologies ERA-Net (CHIST-ERA), and funded by the              Weingessel, Andreas, and Leisch, Friedrich. e1071:
Ministerio de Economı́a y Competitividad in Spain (PCIN-         Misc Functions of the Department of Statistics (e1071),
2013-037). It also has been partially supported by the EU        TU Wien, 2014. URL http://CRAN.R-project.
(FEDER) and the Spanish MINECO project ref. TIN2013-             org/package=e1071. R package version 1.6-4.
45732-C4-01 (DAMAS), and by Generalitat Valenciana              R Core Team. R: A Language and Environment for Sta-
ref. PROMETEOII/2015/013 (SmartLogic).                            tistical Computing. R Foundation for Statistical Com-
                                                                  puting, Vienna, Austria, 2015. URL http://www.
References                                                        R-project.org/.
Donnelly, Aoife, Misstear, Bruce, and Broderick, Brian.         Singh, Kunwar P, Gupta, Shikha, and Rai, Premanjali.
  Real time air quality forecasting using integrated para-        Identifying pollution sources and predicting urban air
  metric and non-parametric regression techniques. Atmo-          quality using ensemble learning methods. Atmospheric
  spheric Environment, 103:53–65, 2015.                           Environment, 80:426–437, 2013.

Elbir, Tolga. Comparison of model predictions with the          The European Environment Agency . Soer 2015 — the eu-
  data of an urban air quality monitoring network in izmir,       ropean environment — state and outlook 2015. http:
  turkey. Atmospheric Environment, 37(15):2149–2157,              //www.eea.europa.eu/soer, 2015.
  2003.                                                         US Environmental Protection Agency .   Particulate
                                                                  matter (pm) regulations. http://www.epa.gov/
European Comission. Directive 2008/50/ec of the european          airquality/particlepollution/index.
  parliament on ambient air quality and cleaner air for           html, 2015.
  europe. http://ec.europa.eu/environment/
  air/quality/legislation/directive.htm,                        Venables, W. N. and Ripley, B. D. Modern Applied
  2008.                                                           Statistics with S. Springer, New York, fourth edi-
                                                                  tion, 2002. URL http://www.stats.ox.ac.uk/
Hornik, Kurt, Buchta, Christian, and Zeileis, Achim. Open-        pub/MASS4. ISBN 0-387-95457-0.
  source machine learning: R meets Weka. Computa-
                                                                Wilker, Elissa H., Preis, Sarah R., Beiser, Alexa S.,
  tional Statistics, 24(2):225–232, 2009. doi: 10.1007/
                                                                 Wolf, Philip A., Au, Rhoda, Kloog, Itai, Li, Wenyuan,
  s00180-008-0119-7.
                                                                 Schwartz, Joel, Koutrakis, Petros, DeCarli, Charles, Se-
                                                                 shadri, Sudha, and Mittleman, Murray A. Long-Term
Karppinen, A, Kukkonen, J, Elolähde, T, Konttinen, M, and
                                                                 Exposure to Fine Particulate Matter, Residential Prox-
  Koskentalo, T. A modelling system for predicting urban
                                                                 imity to Major Roads and Measures of Brain Struc-
  air pollution:: comparison of model predictions with the
                                                                 ture. Stroke, April 2015. doi: 10.1161/strokeaha.114.
  data of an urban measurement network in helsinki. At-
                                                                 008348. URL http://dx.doi.org/10.1161/
  mospheric Environment, 34(22):3735–3743, 2000a.
                                                                 strokeaha.114.008348.
Karppinen, A, Kukkonen, J, Elolähde, T, Konttinen, M,          World Health Organisation. Public health, environmental
  Koskentalo, T, and Rantakrans, E. A modelling sys-             and social determinants of health. http://www.
  tem for predicting urban air pollution: model description      who.int/phe/health_topics/outdoorair/
  and applications in the helsinki metropolitan area. Atmo-      databases/health_impacts/en/, 2015.
  spheric Environment, 34(22):3723–3733, 2000b.
                                                                Yi, Junsub and Prybutok, Victor R. A neural network
Khare, Mukesh and Nagendra, SM Shiva. Artificial neural           model forecasting for prediction of daily maximum
  networks in vehicular pollution modelling, volume 41.           ozone concentration in an industrialized urban area. En-
  Springer, 2006.                                                 vironmental Pollution, 92(3):349–357, 1996.




                                                              dN