=Paper=
{{Paper
|id=Vol-1392/paper-10
|storemode=property
|title=Airvlc: An Application for Real-Time Forecasting Urban Air Pollution
|pdfUrl=https://ceur-ws.org/Vol-1392/paper-10.pdf
|volume=Vol-1392
|dblpUrl=https://dblp.org/rec/conf/icml/OchandoJOR15
}}
==Airvlc: An Application for Real-Time Forecasting Urban Air Pollution==
Airvlc: An application for real-time forecasting urban air pollution Lidia Contreras Ochando LICONOC @ UPV. ES Universitat Politècnica de València. Spain Cristina I. Font Julián CRIFONJU @ EI . UPV. ES Universitat Politècnica de València. Spain Francisco Contreras Ochando FRACONOC @ GMAIL . COM Universitat Politècnica de València. Spain Cèsar Ferri CFERRI @ DSIC . UPV. ES DSIC. Universitat Politècnica de València. Spain Abstract 1. Introduction Air pollution can have important impact (short and long- This paper presents Airvlc, an application for term) on the health of people. For instance, urban air pol- producing real-time urban air pollution forecasts lution increases the risk of suffering respiratory diseases for the city of Valencia in Spain. Although many such as pneumonia, or chronic, such as lung cancer or car- cities provide air quality data, in many cases, this diovascular disease (World Health Organisation, 2015). A information is presented with significant delays recent work (Wilker et al., 2015) relates long-term expo- (three hours for the city of Valencia) and it is lim- sure to ambient air pollution to structural changes in the ited to the area where the measurement stations brain.The SOER 2015 report (The European Environment are located. The application employs regression Agency , 2015), with data about the European Union coun- models able to predict the levels of four differ- tries’ air quality in 2011, concludes that although the atmo- ent pollutants (CO, NO, PM2.5, NO2) in three sphere in the continent has improved in the last decades, different locations of the city. These models are there are significant traces of the most harmful contami- trained using features that represent traffic inten- nants. In fact, in 2011, the report estimates that 430.000 sity, persistence of pollutants and meteorological Europeans died prematurely because of pollution. parameters such as wind speed and temperature. Although some governments are introducing restriction We compare different learning techniques to get policies that limit the use of vehicles (main source of pol- the better performance in the prediction of pollu- lution in most cases), only in Europe, important cities such tants. According to our experiments, ensembles as Paris, Naples, Moscow, Milan or Barcelona still report of decision trees (Random Forest) outperforms significant levels of urban pollution in 2015 (The European the rest of methods in almost all of our tests. Environment Agency , 2015). In this context, it is impor- Airvlc incorporates the best regression models tant for citizens of urban agglomerations to reduce the ex- and, by a distance-weighted combination of the position to urban air pollution as much as possible. This is predictions, is able to generate a real-time pollu- especially relevant for high risk population such as: kids, tion map of the city of Valencia. The application elderly people, asthmatics or people suffering respiratory also includes a warning system for sending no- diseases. tifications to users when a nearby risk pollution concentration is detected. In this work we present an application that predicts urban air pollution in real time by employing historical data. The application is based on the city of Valencia in Spain. This city can be considered a medium size urban agglomeration (around 1.000.000 inhabitants). The city provides an open Proceedings of the 2 nd International Workshop on Mining Urban data site containing real-time information about the city in Data, Lille, France, 2015. Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes. different aspects such as traffic data, noise sensors, pollen dk Airvlc: An application for real-time forecasting urban air pollution sensors... Although different sensors of urban pollution air body. Atmospheric conditions in the Mediterranean are included in the site, this information needs to be care- coast of Spain can influence the particle levels, due fully verified and it is published with a delay of three hours. to lower rainfall and wind action with respect to other This delay can represent a problem since risky high levels northern Europe countries, and the North African par- of pollutions are not detected in real-time. Additionally, the ticles (Saharan dust), PM10 and PM2.5. network of sensors is limited (six in the city of Valencia). • NO (Nitrogen monoxide): Nitrogen monoxide is a Considering these limitations, we have developed an ap- highly unstable compound; it causes nitrogen dioxide plication able to display in real-time foreseeable levels of by quickly reacting in the atmosphere. This instability pollution in a wide number of points of the city. The ap- makes the nitrogen monoxide a radical, namely, a high plication is based on the predictions of regression models reactive power molecule, whose effects on the body that are trained using features that represent traffic inten- are abnormal DNA, lipids and proteins. This kind sity, persistence of pollutants and meteorological parame- of changes derives in the medium and long term as ters. a greater chance of developing cancer.Its origin stems The paper is organised as follows. Section 2 details the largely from vehicle engines. process of data recollection of pollution particles and the factors that affect the generation, concentration or disper- • NO2 (Nitrogen dioxide): Nitrogen dioxide is not a sion of these pollutants. Experiments in learning regres- directly generated pollutant, since its presence in the sion models for predicting the pollutant concentrations are atmosphere is caused by the oxidation of nitrogen included in Section 3. The Airvlc application is detailed monoxide. In the presence of moisture, this com- in 4. Related works are discussed in Section 5. Finally, pound results in nitric acid, and its inhalation, even Section 6 closes the paper with a discussion of the main in low concentrations, can cause lung tissue degrada- conclusions and some plans for future work. tion, as well as can reduce the efficacy of the immune system, especially in children. 2. Data collection • CO (Carbon monoxide): Carbon monoxide is a pri- mary pollutant. CO is toxic; it prevents oxygen trans- Different particles are associated with urban air pollution. port by poisoning the blood, since it replaces the In order to measure air contamination, pollutant parameters haemoglobin. People with cardiovascular and cere- found in the lower levels of the troposphere are controlled. brovascular problems could suffer heart attacks or Air quality sensors measure concentrations of particles that strokes because of problems related to high concen- have an anthropogenic origin and produce effects during trations of CO. or after the inhalation by humans. The historical pollu- tion data for this work has been obtained from the open data web of the Generalitat Valenciana1 . Following the The distribution of air pollution is decisively influenced recommendations of (The European Environment Agency by climatic conditions. We have collected Climatological , 2015), we concentrate on the following particles: observations for the meteorological data of Valencia city from Meteorological Agency of the Government of Spain (AEMET)2 . We consider the following parameters: • PM 2.5 (Suspended particles below 2.5 microns): This parameter has been chosen because of its pol- • Temperature: In an ordinary atmosphere situation, lutant power. It is one of the most dangerous parti- temperature decreases with altitude, favouring ascen- cles, since its size makes it almost unstoppable by the sion of warmer (and less dense) air, and dragging con- natural filters of the body. This fact means that the taminants upwards. In a situation of thermal inversion, PM 2.5 are usually able to reach the pulmonary alve- a warmer layer of air is over the colder surface air and oli and in some cases, these particles are attached to prevents the rise of this last (denser), so the contami- these alveoli with a consequent reduction of lung ca- nation is confined and increases. pacity; in worst cases, the particles cross the alveolar membranes and reach the blood stream. Considering • Humidity: Humidity is a weather factor to be consid- that PM 2.5 particles have its origin in anthropogenic ered; in its presence, nitrogen dioxide derives in nitric activities (especially in the use of fuels in motor ve- acid, harmful to human health. hicles), it is not surprising that its atomic structure contains heavy metals, extremely toxic to the human • Wind speed: Strong winds can disperse pollutants and transport them away from their emission point. 1 http://www.cma.gva.es/cidam/emedio/ 2 atmosfera/jsp/historicos.jsp http://www.aemet.es/ dj Airvlc: An application for real-time forecasting urban air pollution • Precipitations Precipitations wash contaminants and the levels of pollutants. can dissolve substances and gases. We first study the weekly evolution of pollutants in the three stations. Figure 1 shows the evolution of the aver- The two main sources of pollution in developed countries age of the four parameters of pollution analysed and the are motor vehicles and industry. Vehicles release large average traffic intensity for Molı̀ station depending on the amounts of nitrogen oxides, carbon oxides, hydrocarbons day of the week. Figure 2 presents the same plot for Avd and particulates when burning gasoline and diesel. There- Francia station and Figure 3 corresponds to Pista de Silla fore, we need to measure the level of traffic in the city in station. In order to make the values comparable in the plot order to predict the air pollution. For this purpose, the City we normalise each parameter by the maximum value of of Valencia provides a network of sensors (electromagnetic that parameter. The level of pollutants and traffic reach the coils) that measure the intensity of traffic (Vehicles/hour) maximum levels during the working days of the week for in the city. This data can be found in the open data site of the three stations (Friday seems to be the worst day). We the Valencia City Council3 . can clearly see the dependency of the four parameters of pollution on the traffic intensity level. During the week- 3. Experiments end days, the level of traffic drastically descends and as- sociated with this reduction the levels of pollutants signifi- With all the selected parameters, we have built datasets cantly drop. Again, the exception is PM2.5. This behaviour aimed to predict the concentration of pollutants from the can be caused because these particles can be generated by intensity of traffic and weather parameters. Concretely, we all types of combustion activities (motor vehicles, power have collected data for a period of two years (2013 and plants, wood burning, etc.) and certain industrial processes 2014). Data was collected every 60 minutes, 24 hours a (US Environmental Protection Agency , 2015). day during those two years. Although Valencia city has We have performed a similar analysis considering the evo- six stations for the detection and measurement of air pollu- lution of pollutants, traffic intensity and meteorological tion, three of them have not sufficient data for the analysed variables during a day (humidity and wind). Figure 4 shows period and were discarded. In this way we collected data the evolution of the daily average of these parameters for from these stations: Molı̀, Avd Francia and Pista de Silla. Molı̀ station depending on the hour of the day . Figure 5 These three stations are located inside the urban agglomer- corresponds to Avd Francia station and Figure 6 to Pista ation, and thus most of the pollutants measured in the sen- de Silla station. Again, we normalise each parameter by sors should be generated by urban activities (mainly traf- the maximum value of that parameter. If we observe traf- fic). For each one of these stations, we create a dataset with fic intensity, we can discover in all the three plots a similar the level of the pollutants measured and parameters that can behaviour, there are three peaks in traffic intensity corre- affect these measurements, we concentrate on traffic level sponding to the hours where workers travel to their work (measured by electromagnetic coils), weather conditions. places (around 9 am), lunch time (around 2 pm) and an In order to measure the traffic related to each air pollution evening period (around 8 pm). In the three stations the station, we average the traffic intensity of the closest six maximum of pollution parameters is found at the same pe- traffic measurament sensors. This is a simplification since, riod of the first peak in traffic intensity (around 9 am). In certainly, all the traffic of the city has effect on the mea- the second peak of traffic intensity (around 2 pm) the lev- sured level of all the stations in the city. els of pollutants does not follow the increase in traffic. In We can see a summary of the three datasets in Table 1. This fact, after the maximum period around 9 am, pollutants de- table includes averages and standard deviation for the three crease their levels until around 4 pm where they change the stations of the pollutant particles measured and the inten- behaviour and start an increasing of the values. The second sity of traffic associated with each station. If we analyse peak in pollutant values is found around 9 pm. Our intu- traffic intensity, Avd Francia is the busiest station, while ition with respect to this behaviour is that wind disperses the other two have similar values. With regard to pollution part of the pollutant in the most sunny hours. Valencia is levels Pista de Silla station presents the maximum levels in the Mediterranean coast and in this city it is easy to find for three parameters. The only exception is PM2.5. This (especially in summer) sea breezes. These kind of winds behaviour can probably be associated with the specific lo- are created over bodies of water (usually sea or big lakes) cation of the stations: While Pista de Silla station is located near land due to differences in air pressure created by their in a the central part of the city, and therefore more vulner- different heat capacity. This phenomenon can be detected able to the overall city pollution, the other two are in the in the plots if we observe the increase in wind strength dur- suburbs of the city where external air streams can reduce ing the midday hours. Finally, we observe a strange and 3 different behaviour of the CO particle in Molı́ station. For http://www.valencia.es/ayuntamiento/ DatosAbiertos.nsf/ this pollutant there is a second peak in the midday period. d9 Airvlc: An application for real-time forecasting urban air pollution This behaviour probably corresponds to an extra source of (random forest) is the best model in almost all of cases. pollution that needs to be further studied. These results are in concordance with (Singh et al., 2013) where ensembles of trees outperformed other approaches As stated previously, we are interested in predicting pollu- such as SVMs. tion levels in real time. Since these levels are only made Molí public with a delay of three hours, we need to produce a prediction model from real time features. We extract the 1.0 following set of features from the data collected from dif- 0.9 ferent sources (detailed in the previous section): 0.8 • Climatological features: Temperature (Celsius de- 0.7 Levels grees), Relative humidity (Percentage), Pressure (hPa), Wind speed (km/h), Rain (mm/h) 0.6 0.5 • Calendar features: Year, Month, Day in the month, Day in the week, Hour 0.4 Traffic CO NO • Traffic intensity features: Traffic level in the sur- NO2 0.3 PM2.5 rounding stations (vehicles/hour), traffic level 1, 2, 3 Sunday Monday Tuesday Weds. Thursday Friday Saturday and 24 hours before Figure 1. Average weekly traffic intensity and pollution parame- • Pollution features: Pollution level in the target sta- ters measured in Molı́ station. tion 3 and 24 hours before Avd Francia 1.0 With this goal we compare several regression learning tech- niques from R (R Core Team, 2015) in order to identify the 0.9 technique that is able to better predict the levels of pollu- 0.8 tion. To test the prediction ability of different models, we learn the models using as training data the registers of 2013 0.7 Levels and the first nine months of 2014. We test the models with 0.6 the last three months of 2014. We use Mean Squared Error (MSE) as a performance measure. Concretely, we employ 0.5 the following techniques for learning regression models (all 0.4 Traffic of them with the default parameters, unless stated other- CO NO NO2 wise): Linear Regression (lr) (Hornik et al., 2009), quan- 0.3 PM2.5 tile regression (qr) (Koenker, 2015) with lasso method, K Sunday Monday Tuesday Weds. Thursday Friday Saturday nearest neighbours (IBKreg) with k = 10 (Hornik et al., 2009) , a decision tree for regression (M5P) (Hornik et al., Figure 2. Average weekly traffic intensity and pollution parame- 2009), Random Forest (RF) (Liaw & Wiener, 2002), Sup- ters measured in Avd Francia station. port Vector Machines (SVM) (Meyer et al., 2014) and Neu- ral Networks (Venables & Ripley, 2002). In order to com- pare the predictive performance of these models, we also introduce three baseline models: A model that always pre- dicts the mean of the train data (TrainMean), a model that 4. Airvlc always predicts the mean of the test data (TestMean), and a basic model that predicts the same value of the target pol- In the previous section we have analysed how to obtain lutant 3 hours before (X3H). real-time air pollution predictions from a given set of fea- tures. In this section we summarise Airvlc, a mobile app for Table 2 contains the MSE of the regression models for the Android and iOS and a web application4 . This application prediction of the four target pollution levels of the Molı́ sta- generates from the regression models a map of the city of tion. Results for Pista de Silla station and Avd Francia sta- Valencia showing the predicted intensity of pollution lev- tion are shown in Table 4 and 3 respectively. If we analyse els. The application also allows the user to configure a set these results, we can conclude that learned models are im- of automatic warnings every time a pollution threshold is proving the performance of the basic baseline models in al- reached near the position of the mobile device. most all cases. When we compare the learning techniques 4 in the three tables, the ensemble of decision trees technique http://airvlc.lidiacontreras.com/ d8 Airvlc: An application for real-time forecasting urban air pollution Table 1. Averages and standard deviation of the three pollution detection sensors. Traffic CO NO NO2 PM2.5 ave sd ave sd ave sd ave sd ave sd Molı́ 442.333 339.489 0.116 0.093 8.642 20.311 26.608 21.003 10.650 6.926 Francia 631.569 431.412 0.185 0.122 9.092 19.271 27.840 23.992 7.909 4.235 Silla 484.722 298.768 0.228 0.187 23.559 33.024 45.631 25.376 8.309 6.020 Table 2. Results in MSE of different regression models for Molı́ Station. The best prediction model is highlighted in bold. TrainMean TestMean X3h lr qr IBkreg M5p RF SVM NN CO 0.086 0.061 0.067 0.057 0.060 0.068 0.071 0.057 0.063 0.182 NO 30.202 28.739 36.516 25.200 29.805 27.821 25.555 20.655 25.870 32.944 NO2 19.918 19.914 25.258 19.680 17.370 15.683 31.242 14.877 14.488 32.152 PM2.5 8.803 8.803 8.634 6.889 6.564 6.674 7.248 6.072 6.135 13.089 Pista Silla Molí 1.0 1.0 0.9 0.8 0.8 0.6 0.7 Levels Levels 0.6 0.4 0.5 Traffic Humidity 0.2 0.4 Traffic Wind CO CO NO NO NO2 NO2 0.3 PM2.5 PM2.5 Sunday Monday Tuesday Weds. Thursday Friday Saturday 0 5 10 15 20 Hours Figure 3. Average weekly traffic intensity and pollution parame- Figure 4. Average daily traffic intensity and pollution parameters ters measured in Pista Silla station. measured in Molı́ station. 4.1. Contamination intensity map Results of Section 3 show that random forest models ob- cation its pollution level as a dot which colour varies among tain the best performance in most cases. Therfore, twelve green, yellow and red depending on the calculated pollution random forest models are implemented in the Airvlc ap- level. If the user selects one of these dots, an extended win- plication. These models are able to predict every hour the dow is opened where the exact predicted levels are shown. level of the four analysed particles at the three pollution Figure 7 includes a screen-shot of the pollution map of the detection stations. We want, however, to predict pollution Airvlc application. The user can also select a second frame levels at the points of the city where the traffic is measured in the window of the Airvlc application where he/she can (1245 points around the city). For that purpose, given any introduce a specific location and then the application com- of these points, we extract the features related to traffic in- putes the predicted pollution levels for that selection. An tensity from the six nearest traffic sensors. The meteoro- example of this process is included in Figure 8. logical features are the same for all the city. The predic- tions of pollutants in that exact location is computed by 4.2. Risk levels the combination of the models corresponding to the three Figure 8 shows how the pollution levels are presented to stations. The combination is weighted with respect to the users. However, showing just a concentration value of each distance of the target point with respect to the measurement parameter is not very useful for most users, since most of stations giving more importance to the closest models. A them are not experts in pollutants and they could not in- simpler approach could be to learn a single model from the terpret correctly these numbers. In order to improve the concatenation of the data from the three stations and then comprehensibility of the predictions we have established apply this in all the set of target points. three ranges of risk represented as speedometer: Low risk By computing the pollution predictions for a set of strate- (green) corresponds to a measurement that is safe; Medium gic and well-distributed locations we are able to estimate a risk (yellow) when concentrations reach levels to cause real-time pollution map of the city. The map is generated harmful effects in people sensitive to air pollution expo- with Google Maps technology. This map shows for each lo- sure (kids, elderly people...); High risk (red) when concen- de Airvlc: An application for real-time forecasting urban air pollution Table 3. Results in MSE of different regression models for Avd. Francia Station. The best prediction model is highlighted in bold. TrainMean TestMean X3h lr qr IBkreg M5p RF SVM NN CO 0.195 0.165 0.224 0.159 0.163 0.168 0.160 0.153 0.156 0.324 NO 35.493 33.634 44.955 32.992 36.049 34.092 30.350 29.517 33.364 38.262 NO2 23.443 20.900 27.162 16.299 16.718 18.494 23.782 14.851 19.100 41.929 PM2.5 3.721 3.718 3.879 3.326 3.132 3.523 4.655 3.214 3.265 7.974 Table 4. Results in MSE of different regression models for Pista Silla Station. The best prediction model is highlighted in bold. TrainMean TestMean X3h lr qr IBkreg M5p RF SVM NN CO 0.278 0.222 0.304 0.221 0.227 0.221 0.235 0.218 0.268 0.278 NO 49.149 46.232 60.353 39.863 43.524 42.760 44.150 36.332 52.438 58.798 NO2 23.135 23.122 30.167 20.861 19.031 18.487 25.972 16.722 23.313 49.699 PM2.5 6.911 6.660 7.119 5.663 5.342 5.750 7.189 5.339 7.368 11.061 Avd Francia Pista Silla 1.0 1.0 0.8 0.8 0.6 0.6 Levels Levels 0.4 0.4 Traffic Traffic Humidity Humidity Wind Wind 0.2 0.2 CO CO NO NO NO2 NO2 PM2.5 PM2.5 0 5 10 15 20 0 5 10 15 20 Hours Hours Figure 5. Average daily traffic intensity and pollution parameters Figure 6. Average daily traffic intensity and pollution parameters measured in Avd Francia station. measured in Pista Silla station. trations can cause acute and chronic effects to anyone, es- For example, the user can establish personal limits for pecially those with sensitivity. warnings or modify the range of distance for the detection The ranges of risk shown by the application from the pre- of high risk levels of pollutant concentration. Obviously, dicted values of the four pollutants are based on the recom- the user needs to allow the application to know the actual mendations of the Directive 2008/50/EC (European Comis- GPS location of the device sion, 2008). The variable as NOx (oxides of nitrogen) In the case of the web application, given that here it is more refers to NO or NO2, since the normative establishes the complex to know the exact location of the user, we adopt a same limits for both levels. different strategy. We are working in an automated warning system where the user needs to fix a set of areas, and then • Green level: [NOx] < 14.0 µg/m3 ∧ [CO] < 30.0 the system sends an electronic email whenever a dangerous mg/m3 ∧ [PM 2.5] < 7.5 µg/m3. situation (high risk level by default) is detected. • Yellow level: We establish medium risk (yellow level) if the levels do not satisfy the conditions of the green 5. Related work level and the red level. A wide number of works employs machine learning tech- • Red level: [NOx] ≥190.0 µg/m3 ∨ [CO] ≥ 55.0 niques or statistical approaches for predicting pollution lev- mg/m3 ∨ [PM 2.5] ≥ 25.0 µg/m3 els. A classical work is (Yi & Prybutok, 1996). In this pa- per, the authors propose ozone prediction models. Specifi- cally, they develop a neural network model for forecasting 4.3. Risk warnings daily maximum ozone levels and compare it to previous Airvlc mobile application can be configured to send warn- approaches by regression, and Box-Jenkins ARIMA. The ings to users if the device is near to a zone (200 meters results show that the neural network model improves the approximately) where a high risk level is predicted. These performance of the regression and Box-Jenkins ARIMA warnings can be personalised by the user in different ways. models tested. Neural networks models have been widely dd Airvlc: An application for real-time forecasting urban air pollution Figure 8. Frame where the user can introduce specific locations to know the predicted levels of pollution. conclusions to the work presented in (Singh et al., 2013). In this study, principal components analysis (PCA) is per- formed to identify air pollution sources. From the extracted features, tree based ensemble learning models are induced to predict the urban air quality of Lucknow (India) together Figure 7. Airvlc application. Real-time pollution map. with the air quality and meteorological databases for a pe- riod of five years. employed in this field, a review of these approaches can be found in (Khare & Nagendra, 2006). 6. Conclusions A more related work is (Karppinen et al., 2000a). Here Air pollution can decrease life expectancy since contami- the authors propose a modelling system for predicting the nation rises the risk of suffering respiratory diseases. Al- traffic volumes, emissions from stationary and vehicular though policies motivating the reduction of emissions of sources, and atmospheric dispersion of pollution in an ur- pollutant particles have been introduced in the last years, ban area. They employ four monitoring stations in the many cities frequently still present risky levels of air pol- Helsinki metropolitan area in 1993. The paper compares lution. In these situations, the reduction of the exposure to the predicted NOx and NO2 concentrations with the results ambient air pollution is highly recommended. In this work, of an urban air quality monitoring network. The agreement we have presented Airvlc, an application that predicts in of model predictions was better for the two suburban mon- real-time the levels of four dangerous pollutants in a wide itoring stations, compared with two urban stations. Some set of points in the city of Valencia. The system is able to applications of these models are introduced in (Karppinen predict these pollution levels by applying regression mod- et al., 2000b). A similar work for the city of Izmir in Turkey els trained from data containing information traffic inten- is (Elbir, 2003). Here, the authors compare The CAL- sity, persistence of pollutants and meteorological parame- MET meteorological model and its puff dispersion model ters. Airvlc can be a useful tool for avoiding risky locations CALPUFF for predicting dispersion of the sulphur dioxide in terms of air pollution. emissions from industrial and domestic sources. As future work we propose the integration of the applica- Another related work, and in this case very recent, is (Don- tion in middleware platforms such as Fi-Ware5 , this could nelly et al., 2015). This paper presents a model for real help to extend the applicability of the system to other cities time air quality forecasts. The predictions are concentrated or regions. We also are interested in the incorporation of in nitrogen dioxide (NO2) and they are used to estimate additional features in order to improve the prediction mod- air quality 48 hours in advance. The model is based on els: wind direction, sand storms, forest wildfires and agri- a multiple linear regression which uses linearised factors cultural burnings... Finally, the use of the tool for the rec- describing variations in concentrations together with mete- ommendation of routes that minimise the exposure to air orological parameters and persistence as predictors. pollution. 5 Our comparison of regression techniques obtains similar http://www.fiware.org/ d3 Airvlc: An application for real-time forecasting urban air pollution Acknowledgments Koenker, Roger. quantreg: Quantile Regression, 2015. URL http://CRAN.R-project.org/ We thank the anonymous reviewers for their comments, package=quantreg. R package version 5.11. which have helped to improve this paper significantly. We are also grateful to Ajuntament de València, InnDEA Liaw, Andy and Wiener, Matthew. Classification and València and specially to Ramón Ferri, Ruth López and regression by randomforest. R News, 2(3):18–22, Paula Llobet for their help in providing traffic data. This 2002. URL http://CRAN.R-project.org/ work was supported by the REFRAME project, granted by doc/Rnews/. the European Coordinated Research on Long-term Chal- lenges in Information and Communication Sciences & Meyer, David, Dimitriadou, Evgenia, Hornik, Kurt, Technologies ERA-Net (CHIST-ERA), and funded by the Weingessel, Andreas, and Leisch, Friedrich. e1071: Ministerio de Economı́a y Competitividad in Spain (PCIN- Misc Functions of the Department of Statistics (e1071), 2013-037). It also has been partially supported by the EU TU Wien, 2014. URL http://CRAN.R-project. (FEDER) and the Spanish MINECO project ref. TIN2013- org/package=e1071. R package version 1.6-4. 45732-C4-01 (DAMAS), and by Generalitat Valenciana R Core Team. R: A Language and Environment for Sta- ref. PROMETEOII/2015/013 (SmartLogic). tistical Computing. R Foundation for Statistical Com- puting, Vienna, Austria, 2015. URL http://www. References R-project.org/. Donnelly, Aoife, Misstear, Bruce, and Broderick, Brian. Singh, Kunwar P, Gupta, Shikha, and Rai, Premanjali. Real time air quality forecasting using integrated para- Identifying pollution sources and predicting urban air metric and non-parametric regression techniques. Atmo- quality using ensemble learning methods. Atmospheric spheric Environment, 103:53–65, 2015. Environment, 80:426–437, 2013. Elbir, Tolga. Comparison of model predictions with the The European Environment Agency . Soer 2015 — the eu- data of an urban air quality monitoring network in izmir, ropean environment — state and outlook 2015. http: turkey. Atmospheric Environment, 37(15):2149–2157, //www.eea.europa.eu/soer, 2015. 2003. US Environmental Protection Agency . Particulate matter (pm) regulations. http://www.epa.gov/ European Comission. Directive 2008/50/ec of the european airquality/particlepollution/index. parliament on ambient air quality and cleaner air for html, 2015. europe. http://ec.europa.eu/environment/ air/quality/legislation/directive.htm, Venables, W. N. and Ripley, B. D. Modern Applied 2008. Statistics with S. Springer, New York, fourth edi- tion, 2002. URL http://www.stats.ox.ac.uk/ Hornik, Kurt, Buchta, Christian, and Zeileis, Achim. Open- pub/MASS4. ISBN 0-387-95457-0. source machine learning: R meets Weka. Computa- Wilker, Elissa H., Preis, Sarah R., Beiser, Alexa S., tional Statistics, 24(2):225–232, 2009. doi: 10.1007/ Wolf, Philip A., Au, Rhoda, Kloog, Itai, Li, Wenyuan, s00180-008-0119-7. Schwartz, Joel, Koutrakis, Petros, DeCarli, Charles, Se- shadri, Sudha, and Mittleman, Murray A. Long-Term Karppinen, A, Kukkonen, J, Elolähde, T, Konttinen, M, and Exposure to Fine Particulate Matter, Residential Prox- Koskentalo, T. A modelling system for predicting urban imity to Major Roads and Measures of Brain Struc- air pollution:: comparison of model predictions with the ture. Stroke, April 2015. doi: 10.1161/strokeaha.114. data of an urban measurement network in helsinki. At- 008348. URL http://dx.doi.org/10.1161/ mospheric Environment, 34(22):3735–3743, 2000a. strokeaha.114.008348. Karppinen, A, Kukkonen, J, Elolähde, T, Konttinen, M, World Health Organisation. Public health, environmental Koskentalo, T, and Rantakrans, E. A modelling sys- and social determinants of health. http://www. tem for predicting urban air pollution: model description who.int/phe/health_topics/outdoorair/ and applications in the helsinki metropolitan area. Atmo- databases/health_impacts/en/, 2015. spheric Environment, 34(22):3723–3733, 2000b. Yi, Junsub and Prybutok, Victor R. A neural network Khare, Mukesh and Nagendra, SM Shiva. Artificial neural model forecasting for prediction of daily maximum networks in vehicular pollution modelling, volume 41. ozone concentration in an industrialized urban area. En- Springer, 2006. vironmental Pollution, 92(3):349–357, 1996. dN