=Paper=
{{Paper
|id=Vol-2841/BigVis_3
|storemode=property
|title= SenseBoard: Sensor Monitoring for Air Quality Experts
|pdfUrl=https://ceur-ws.org/Vol-2841/BigVis_3.pdf
|volume=Vol-2841
|authors=Federica Rollo,Laura Po
|dblpUrl=https://dblp.org/rec/conf/edbt/RolloP21
}}
== SenseBoard: Sensor Monitoring for Air Quality Experts==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/BigVis_3.pdf</pdf>
<pre>
          SenseBoard: Sensor Monitoring for Air Quality Experts
                                Federica Rollo                                                                  Laura Po
                ‘Enzo Ferrari’ Engineering Department                                             ‘Enzo Ferrari’ Engineering Department
                             Modena, Italy                                                                     Modena, Italy
                      federica.rollo@unimore.it                                                            laura.po@unimore.it

ABSTRACT                                                                               of effective visualizations to process and interpret collected data
Air quality monitoring is crucial within cities since air pollution                    is essential.
is one of the main causes of premature death in Europe. However,                           In this paper, we present SenseBoard, an interactive tool ad-
performing trustworthy monitoring of urban air quality is not a                        dressed to environmental experts that brings together heteroge-
simple process. Especially, if you want to try to create extensive                     neous and dynamic data for real time analysis and management
and timely monitoring of the entire urban area using low-cost                          of air quality network. This tool has been conceived within the
sensors.                                                                               TRAFAIR project1 that allowed the creation of an urban network
    In order to collect reliable measurements from low-cost sen-                       of low cost air quality sensors. The low-cost sensors employed
sors, a lot of work is required from environmental experts who                         are cheaper and less reliable than the Air Quality Monitoring
deploy and maintain the air quality network, and daily calibrate,                      (AQM) legal stations managed by the Environmental Agencies.
control, and clean up the data generated by these sensors. In this                     It is possible to improve the reliability of the measurements of
paper, we describe SenseBoard, an interactive dashboard created                        these devices if they are previously calibrated by placing the
to support environmental experts in the sensor network control,                        device near air quality stations for some weeks. Low-cost sensors
management of sensor data calibration, and anomaly detection.                          provide "raw" measures, i.e. a datum in millivolts; to convert this
                                                                                       datum into a reliable concentration of pollutant it is necessary
                                                                                       to carry out a calibration period during which some Machine
1    INTRODUCTION                                                                      Learning algorithms are trained in order to generate, from the
Air pollution is a global threat leading to large impacts on human                     raw measurements, pollutant concentrations in line with those
health and ecosystems, particularly in urban areas. In Europe, air                     estimated by the AQM stations. SenseBoard is devoted to sup-
quality remains poor in many cities that experience exceedances                        port environmental experts in the monitoring and control of the
of the regulated limits for air pollutants [1]. The urgency of limit                   air quality sensor network, in the supervision of the calibration
air pollution is also stated by the sustainable development goals                      process and in the detection of anomalous values. SenseBoard
(SDGs) defined in the 2030 Agenda for Sustainable Development                          acts as an enabling tool to detect anomalies, update sensor sta-
[2].                                                                                   tus, monitor the proper functioning of the sensors, manage the
    Effective action to reduce air pollution and its impact on the                     change of location of the devices and, above all, to provide feed-
quality of life requires good understanding and extensive moni-                        back to perform the calibration process. The calibration results
toring of urban air quality. In recent years, the development of                       obtained using the Machine Learning algorithm are shown and
Internet of Things technologies has increased and cities around                        compared to the raw data, and the data of the AQM station, thus,
the world have exploited this enabling technology to be able to                        it is possible to understand if the algorithm works appropriately
control multiple aspects of citizens’ lives. IoT allows monitoring                     or if it is necessary to extend the co-location period of the device.
traffic congestion [3, 10], detecting and classifying road accidents                       SenseBoard is a general and flexible dashboard that can be
[7], managing car parking [9], supporting decision in agriculture                      adapted for the monitoring of any air quality sensor network.
[4], evaluating energy consumption [12], and, also, monitoring                         The scalability of the dashboard allows replicability in cities of
air quality [13]. Data generated by IoT are used to improve city                       different size with a variable number of sensors. The dashboard
services and the living experience of citizens.                                        is not affected by the type of employed sensors and it can be
    In this context, data coming from a group of low-cost sen-                         easily modified to visualize other parameters measured by the
sors spread around a city might generate widespread hyperlocal                         sensors. In this paper, we take advantage of the use case in the
insights into air pollution. However, a network of low-cost air                        city of Modena.
quality sensors is not enough to monitor urban air quality. Since                          The rest of the paper is organised as follows. Section 2 is de-
those sensors are complex and sensitive, they require specific                         voted to the presentation of the background. Then, Section 3
environmental skills. Data generated by the air quality sensors                        introduces the dashboard and describes some views (data visu-
need to be converted into relevant and crucial insights to allow                       alization) in the city of Modena. In the end, Section 5 provides
the monitoring of air quality by politicians and to enable the                         conclusions.
achievement of the sustainability goals. In this context, environ-
mental experts hunger for a control platform to perform sensor                         2     BACKGROUND
data calibration and anomaly detection.                                                TRAFAIR ("Understanding Traffic Flows to Improve Air Quality")
    The maintenance and control of a urban air quality network                         [11] is a project co-financed by European Commission that brings
is relevant, and crucial to provide good information that enables                      together 10 partners from two European countries (Italy and
the extensive monitoring of air quality. Moreover, the availability                    Spain) to develop innovative and sustainable services combining
                                                                                       air quality, weather conditions, and traffic flows data. The scope
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   is to increase the awareness on urban air quality for the benefit
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       of citizens and government decision-makers. The project aims
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
                                                                                       1 https://trafair.eu
Figure 1: An air quality device (on the left) and its con-
tent inside (on the right): 4 cells/sensors for measuring the
level of 4 air pollutants (𝑁𝑂, 𝑁𝑂 2 , 𝐶𝑂, and 𝑂 3 in this case).

                                                                        Figure 2: Raw observations of one air quality device made
to supervise the level of pollution on urban scale in 6 European        in two different locations.
cities (Modena, Florence, Pisa and Livorno in Italy, and Santiago
de Compostela and Zaragoza in Spain) by producing real-time
estimates of air pollution through a network of air quality devices     The values drop off sharply due to the warm-up. Consequently,
and by developing a service for forecasting urban air quality           these values have to be discarded from the reliable observations
based on weather forecasts and traffic flows [5].                       and flagged by the environmental experts as "not reliable". This
   The sensors employed are low-cost, cheaper and less reliable         operation can be made through SenseBoard (this will be described
than the AQM stations managed by the Environmental Agencies.            in Section 3). Accordingly, the environmental experts need to
However, these devices can provide reliable measurements if they        constantly visualize the data produced by the device and monitor
are co-located for a certain period of time close to the stations       the behavior of each device.
where they "learn" how to measure air quality.                             The need for a monitoring tool comes from the environmental
   A device is a box where different sensors are placed. Each           experts. For this reason, they have been directly involved in
sensor, also called cell, is devoted to the measurement of a specific   the definition of the requirements to be satisfied through the
pollutant. Figure 1 shows an exemplar device.                           dashboard. After some discussions, 6 requirements have been
   The approach used in TRAFAIR is to install the devices for           outlined:
a certain period close to the AQM stations. During this period
                                                                          R1 providing an overview of the current position and status
(calibration period) a Machine Learning algorithm is trained on
                                                                             of each sensor,
the measurments provided in millivolts by our devices (raw ob-
                                                                          R2 recording the location or status change for a certain sensor
servations) and the AQM station measurements. Then, when
                                                                             without hand-writing the SQL query to store the modifi-
environmental experts evaluate the device is "ready" (usually
                                                                             cation on the database,
after 3 weeks of co-location), it can be moved to different loca-
                                                                          R3 visualizing sensor observations without data aggregation
tion and start providing air quality measurements. When the
                                                                             to detect anomalous values and compare them each other,
device is "ready", thanks to the Machine Learning algorithm pre-
                                                                             and with data aggregation to better understand the trend,
viously trained, calibrated data (concentrations) are generated
                                                                          R4 comparing observations of co-located sensors in the same
from the raw observations. Usually, periodically every 6 months,
                                                                             place,
the devices are once again co-located near the AQM stations for
                                                                          R5 showing the concentrations produced by the calibration
re-calibrating, thus maintaining a good quality of measurements.
                                                                             algorithm and comparing them with the certified values
   In this scenario, it is easy to understand the importance of
                                                                             of the AQM legal stations,
having a tool for managing the change of location or status of the
                                                                          R6 displaying the anomalies identified by the anomaly detec-
devices. Besides, it is important to compare anytime the calibrated
                                                                             tion algorithm to control the efficiency of the automated
data with the measurements of the AQM stations (no matter in
                                                                             algorithm.
which location the sensor is), to determine when the device needs
to be re-calibrated (usually when the calibrated data and AQM              After the development of SenseBoard, during the usage, addi-
measurements differ a lot).                                             tional feedback from environmental experts has been continu-
   Since devices are constantly moved, it is possible that, when        ously collected to improve the functionalities of the dashboard.
they are switched off and then switched on in a new location,
they experienced a "warm-up" for some minutes or even hours.            2.1    Air quality sensor network
The warm-up is a specific period when the device tries to achieve       In Modena, an Italian city of 186,000 inhabitants and 183 𝐾𝑚 2 ,
a thermo-mechanical balance in the measuring system as well as          13 air quality devices (52 low-cost sensors) have been installed
an optimal operating temperature of the electronic components.          in different locations. There have been identified 2 locations for
Since the warm-up period is of variable duration, a time line           calibration (the red dots in Figure 3), close to the AQM stations,
evolution of the raw observations enables the user to detect            and 10 locations of interest (the blue dots in Figure 3), that are
when the warm-up period is over.                                        placed in areas of different kinds, such as residential, industrial,
   Figure 2 shows the raw measurements collected from one               or green areas.
device. As it can be seen, at 9 a.m. approximately the device has          Two types of low-cost devices have been exploited: 12 De-
been switched off. One hour later, the device has been switched         centlab Air Cubes and 1 Libelium Smart Environment PRO. All
on in a new location. The zoomed area of the graph shows the            the devices are equipped with 4 cells (sensors), one for each gas
first two measurements for each gas made in the new location.           (𝑁𝑂, 𝑁𝑂 2 , 𝐶𝑂 and 𝑂 𝑥 ). Each cell measures the gas concentration
                                                                                                    Figure 5: SenseBoard architecture.


                                                                                        Figure 4 shows the data acquisition process. The devices en-
                                                                                    capsulate the acquired measurements into LoRa packets and send
                                                                                    them to the LoRaWAN server through the gateways. Every LoRa
                                                                                    packet contains 19 (or 21) measurements (both mVs and basic
                                                                                    concentrations for each gas and channel, including air tempera-
                                                                                    ture, humidity and battery voltage) and in one day each device
                                                                                    produces 13,680 (19 * 720) measurements. 4 gateways have been
Figure 3: Points of interest for air quality monitoring (blue                       installed in Modena, mainly on the roofs of the highest buildings,
dots) and positions of the AQM stations (red dots). Map                             to cover the whole urban area of the city and ensure the coverage
data: Google, 2020.                                                                 of the LoRaWAN network in our points of interest. Moreover,
                                                                                    the gateways have been registered on the LoRa server. When
                                                                                    the gateways receive a message, they send it to the LoRa server
                                                                                    where the MQTT (Message Queue Telemetry Transport) Broker
                                                                                    Mosquitto [8] is running. This is a publish/subscribe messag-
                                                                                    ing transport protocol. Data are published as an MQTT topic.
                                                                                    We have used the paho-mqtt Python library3 to implement the
                                                                                    open source message broker in a Python script. The script is
                                                                                    always running and exploits the client class to enable the connec-
Figure 4: Sensor data acquisition from the low-cost air
                                                                                    tion to the MQTT broker, publish messages, subscribe to topics
quality sensor network.
                                                                                    and receive messages. Then, messages are decoded and measure-
                                                                                    ments stored into a PostgreSQL database, the TRAFAIR database
                                                                                    (in the following sub-section, more details are provided). Also,
through 2 channels (the auxiliary and the working channels). In
                                                                                    through this script an anomaly detection algorithm is applied
addition, the Libelium device measures the level of 𝑃𝑀2.5 and
                                                                                    to the time series of the air quality measurements to detect if
𝑃𝑀10 . For each channel, the raw observations are provided in
                                                                                    each measurement is anomalous or not. This algorithm employs
𝑚𝑉 , moreover, a basic concentration based on the original factory
                                                                                    a majority voting system of three different Machine Learning
calibration2 is provided in 𝜇𝑔/𝑚 3 . Besides, these devices are able
                                                                                    algorithms. The anomalous data are flagged into the TRAFAIR
to measure the air temperature and humidity, and provide the
                                                                                    database. When a device is moved from one location to another,
battery voltage. Therefore, the total number of measurements
                                                                                    it automatically connects to the nearest gateway and restart send-
provided by one sensor is 19 for Decentlab cubes and 21 for
                                                                                    ing messages. Since the messages received in the LoRa Server are
Libellium sensor.
                                                                                    described by the identifier of the device, the change of gateway to
   The sensor data acquisition is managed by the Long Range
                                                                                    which the device connects is completely transparent. The LoRa
Wide Area Network (LoRaWAN) implemented in the city of Mod-
                                                                                    server keeps storing the measurements of each device no matter
ena. LoRaWAN [6] is a media access control (MAC) protocol
                                                                                    how they are moving in the urban context.
widely used in smart city applications thanks to its easy instal-
lation and cost-effectiveness. It employs some gateways i.e. an-
tennas that receive broadcast messages from the enabled devices
                                                                                    2.2     Data platform
(the air quality devices, in our use case) and forward them to the                  Data from the air quality low-cost sensors are stored, in real
server. The message from one device can be received by more                         time, into the TRAFAIR database. This database exploits the Post-
gateways at the same time, the server will deal with duplicates.                    GIS extension to handle with geospatial data and the Timescale
The LoRaWAN network exploits low radio frequencies and pro-                         extension to make SQL scalable for time-series data.
vides for long-range communications (up to five kilometers in                           The database contains more than 60 tables and 190 GB of data
urban areas, and up to 15 kilometers or more in rural areas). The                   collected from the beginning of the TRAFAIR project (November
network coverage depends a lot on the geographic landscape.                         2018) till now (February 2021). Air quality measurements and
   Our air quality devices have been registered to the LoRaWAN                      device-related information are store in 11 tables and take 3 GB.
network of Modena through their identifier (DevEUI) and fol-                        These tables stores the technical characteristics of each device,
lowing the Over-the-Air Activation (OTAA) process. The data                         its position, its status (running, calibration, offline, broken, warm-
rate has been set up to 125 kHz, and the spreading factor to 7, to                  up), the raw observations, the concentrations obtained by both
allow devices transmitting data every 2 minutes.                                    the original factory calibration and our calibration algorithm, and
                                                                                    the anomalies identified by some anomaly detection algorithms
2 This is obtained by applying to the raw observations a formula provided by the    applied to both raw and calibrated observations.
manufacturing company with the calibration parameters that are different for each
device.                                                                             3 https://pypi.org/project/paho-mqtt/
   In each moment, every device is described by a status and is                Some examples of visualizations (views) are described in Sec-
located in a point of interest (see Figure 3). Its measurements are         tion 3.2. The views are static to allow users to navigate and
stored continuously, as soon as they are parsed by the LoRa server.         explore all the plots in the view without any interference. How-
Each raw measurement can be calibrated by multiple calibration              ever, the user can click on the “update” button to see the updated
algorithms. Thus, calibrated data are identified, not only by the           views.
date of the measurement and the sensor that has provided it, but
also by the algorithm that was used. In the end, several anomaly            3.1    Users and scope
detection algorithms are applied to both raw and calibrated data.           The scope of SenseBoard is the monitoring and control of the air
The results are stored in the TRAFAIR database in appropriate               quality sensor network and the supervise of the calibration and
tables using boolean values to indicate if they are anomalous or            anomaly detection processes.
not.                                                                            Regarding the monitoring of the network, SenseBoard allows
   Only considering the measurements coming from our devices,               to identify and update the status of the sensors, change their
from the installation, we have collected 3.3 million records of mea-        location when they are moved in different position and perform
surements (1.8 GB). Each record includes 19/21 measurements: air            any maintenance, if necessary.
temperature, humidity, battery voltage, 8 raw measurements (2                   Considering the supervise of the calibration process, Sense-
channels per 4 gases), 8/10 concentrations of the original factory          Board lets to compare raw measurements of co-located sensors,
calibration (2 channels per 4 gases and one measure for 𝑃𝑀2.5               raw and calibrated measurements of the sensors, and, in particu-
and 𝑃𝑀10 ).                                                                 lar, the calibrated observations generated during the calibration
                                                                            period with the legal observations from the AQM stations. This
3     SENSEBOARD                                                            last operation is the crucial one in the calibration process because
SenseBoard4 is a Python web application which exploits Tor-                 it allows experts to understand if the training period of the Ma-
nado5 as web framework. It runs on a Debian 9 machine with                  chine Learning algorithm is sufficient, i.e. if the concentrations,
32 Intel(R) Xeon(R) Silver 4108 CPU at 1.80GHz processors and               elaborated by the Machine Learning algorithm, are in line with
256 GB RAM. Figure 5 shows the architecture of the dashboard.               those of the AQM station.
Firstly, users need to login to access the dashboard. The authen-               Other tasks are the detection of issues in the network commu-
tication phase is performed through the Lightweight Directory               nication, the discovery of disruptions or failures in the sensor’s
Access Protocol (LDAP). The list of people allowed to access                behaviour, the identification of anomalous gas concentrations,
is currently limited to the environmental experts working in                the comparison of co-located sensors measurements, the correla-
TRAFAIR.                                                                    tion study of the pollution level in the area of sensor installation.
   After the authentication, the user is able to visualize the cur-             The primary users of our visual analytic dashboard are the
rent status of each device and send other requests through the              environmental experts in charge of installation, maintenance and
navigation bar at the top: he/she can ask for observations (raw             calibration of air quality sensors.
measurements), anomalies, calibration (calibrated measurements),
and AQM station (measurements from the AQM stations). For                   3.2    Views
each request, the dashboard queries the TRAFAIR database to ob-             In SenseBoard, we have developed 6 views to allow environmen-
tain the appropriate data and creates plots of the time series data         tal experts to have complete control of the air quality sensor
by using the matplotlib Python library6 . More complex plots are            network status and the operations that are performed on the
periodically generated by ad-hoc Python scripts7 which query                sensor data. Each view is described in detail in the following
the database and save plots in the file system as html files through        sub-sections.
the save_html function of the mpld3 library8 . This library is also
                                                                               3.2.1   Sensor status and position
exploited for the InteractiveLegendPlugin9 , which allows to con-
nect the plot to an interactive legend. This legend is very useful in
                                                                                The first view, i.e. the homepage of the dashboard after the
our plots since it allows customizing the visualization by adding
                                                                            login, aims at satisfying requirements R1 and R2. Here, users are
or removing some lines in the plot. The user can click on the
                                                                            able to visualize a table with a summary of the main information
rectangle generated in the legend near the labels. If the rectangle
                                                                            related to the air quality devices. For each device, in the table,
is colored, the corresponding data is shown on the plot; if the
                                                                            there are listed its identifier, the name of the location where the
rectangle is white, these data are removed from the plot. The
                                                                            device is currently installed, the timestamp of the installation,
html files are, then, included in the html page of the correspond-
                                                                            the name of the person in charge of the installation, the sensor
ing request. What we mean with “more complex plots” are the
                                                                            status and any possible notes.
ones which require an elaboration of the data stored in the data-
                                                                                Besides, as shown in Figure 6, for each device, two buttons are
base and manage a big amount of data (i.e. the raw observations
                                                                            available: the “edit” button allows to update the location and/or
of each sensor related to one month). This choice was made to
                                                                            the status of the corresponding device. After clicking on the
save time in the visualization of the plots. Indeed, this solution
                                                                            button, the user has to specify the timestamp representing the
decreases the server response time of 35 seconds for the most
                                                                            instant of the update (of the location or status), the location (one
time-consuming request.
                                                                            of the points of interest in Figure 3), the status, and, optionally,
                                                                            its name and notes. The “save” button stores the information in
4 https://trafair-srv.ing.unimo.it/aqsensors                                the TRAFAIR database. The status update is exploited in different
5 https://www.tornadoweb.org/
6 https://matplotlib.org/
                                                                            situations. For example, if the device is moved from a point of
7 The scripts run every 2 minutes and generate the plots in 4-27 seconds.   interest to the AQM station, its status changes from “running” to
8 https://mpld3.github.io/                                                  “calibration”. In addition, if the environmental experts notices an
9 https://mpld3.github.io/examples/interactive_legend.html
                                                                            abnormal behavior of the device, he/she can modify the status in
                                              Figure 6: “Sensor status and position” view.


                                                                           (4) the raw observations of the 4 gases for auxiliary and work-
                                                                               ing channels in mV,
                                                                           (5) the observations of the 4 gases for auxiliary and working
                                                                               channels calibrated through the original factory calibra-
                                                                               tion in 𝜇𝑔/𝑚 3 ,
                                                                           (6) the observations calibrated through the TRAFAIR calibra-
                                                                               tion algorithms in 𝜇𝑔/𝑚 3 .
                                                                           Only for the Libelium device another plot is provided, which
                                                                        shows the level of 𝑃𝑀2.5 and 𝑃𝑀10 .
                                                                           Each plot can visualize data for different time interval (last 24
   Figure 7: Position of the devices on January 4𝑡ℎ , 2021.             hours, week, or month) and data aggregation (2 minutes - which
                                                                        means no aggregation, 5 minutes, and 15 minutes), generating 9
                                                                        different combinations for each plot. The visualization changes
“broken” indicating as timestamp the date of the first abnormal         according to the option selected by the user.
measurement. Then, he/she needs to add the “running” status                There are altogether 711 plots (13 devices * 6 plots * 3 time
from the first regular measurement.                                     interval * 3 data aggregation + 1 PM plot * 3 time interval * 3
    In addition to the “edit” button, the “check data” button con-      data aggregation). Since the creation of a plot took on average
nects to the “sensor observations” view.                                12 seconds, we decided to generate these plots asynchronously
    Besides, in this view, the user can interact with a map (Figure     through one Python scripts. This means that the plots are gener-
7), where the current position of each device is visualized with        ated independently by the user choice, and when the user selects
an icon of different colors according to the status of the device. If   an option (for time interval and data aggregation), he/she en-
more devices are in the same location, a bigger icon is displayed       ables the visualization of a ready-made plot. This time-saving
on the map with the number of devices in that position. By              design choice is also motivated by the user behavior. After three
clicking on this icon, an icon for each device is visualized. If you    months from the first release of SenseBoard, we noticed that it
click on the icon of a device, you can see its name, its status, the    was very likely that the user is interested in visualizing several
name of the location, and the link to the “sensor observations”         plots, exploring different gases with different aggregations or for
view of that specific sensor. Folium10 is the Python library used       a different time interval. If the plots are created synchronously
to create the map.                                                      with the user’s choice, jumping from one plot to another requires
                                                                        waiting for the generation of the relative plot each time. In agree-
   3.2.2 Sensor observations                                            ment with environmental experts, we have therefore decided to
The “sensor observations” view satisfies requirement R3 and             switch to an asynchronous generation of the plots that reload
includes 6 plots with the observations of one device. At the top        the 711 plots every 2 minutes.
of the page, the name of the device, its status and location, and          Figures 8 and 9 are two examples of visualization available in
the timestamp of the last observation with the level of battery         the “sensor observations” view. In Figure 8 the measurements
voltage are reported. This allows the managers of the sensor            of the 4 gases for the auxiliary and working channels related
network to check immediately if the sensor is not sending data          to one device are plotted in a lines chart. An anomalous behav-
or if the batteries need to be changed.                                 ior of the device has been highlighted in red: the values of the
   The 6 plots show the measurements of:                                measurements in that time interval are very different from the
    (1) the relative humidity in percentage (%),                        previous ones. SenseBoard allows the detection of the wrong
    (2) the temperature in Celsius degree,                              data. After the maintenance work by the environmental experts,
    (3) the battery voltage in Volt (V),                                the device reaches the stability and the measurements proceed
                                                                        with the expected values. Through the “edit” button of the “sen-
10 https://pypi.org/project/folium/0.1.5/                               sor status and position” view, the time period related to the red
                                                                       Figure 10: A visualization of the “gas observations” view
Figure 8: An anomalous behavior of a device detected on                which shows the measurements of NO channels.
January 13𝑡ℎ , 2021.

                                                                       in Figure 10) to facilitate the comparison of these measurements
                                                                       and detect the correlation between the two channels. Thanks to
                                                                       this view, the behavior of the cells can be regularly checked and
                                                                       the maintenance planned.

                                                                           3.2.4 Sensor anomalies
                                                                       The accuracy of the raw measurements can be influenced by
                                                                       multiple factors, i.e. the low level of battery voltage, the weather
                                                                       conditions, the air humidity. Distinguishing not correct data
                                                                       allows for providing more reliable data and could improve the
                                                                       results of the calibration task.
                                                                           We have implemented a majority voting system which com-
                                                                       bines 3 classifiers: (1) the Sliding Window anomaly detection
                                                                       which considers the consecutive measurements and the IQR to
                                                                       find anomalies far from the normal behavior of the system, (2)
                                                                       the FFIDCAD (Forgetting Factor Iterative Data Capture Anomaly
                                                                       Detection) which is an iterative algorithm, and (3) an algorithm
                                                                       based on the correlation between the values of each gas (NO,
                                                                       𝑁𝑂 2 , CO and O3) and the measurements of air temperature and
                                                                       humidity. Every time a new measurement is done by a sensor,
                                                                       just after storing the measurement into the TRAFAIR database,
                                                                       the three classifiers are applied to the measurements.
                                                                           The research for anomalous data is performed on both chan-
                                                                       nels of each pollutant and device independently, since each device
                                                                       is individual and performs differently from the other devices even
                                                                       if they are in the same location.
Figure 9: An anomalous behavior of a device detected on
                                                                           The “sensor anomalies” view consists of one plot for each
January 15𝑡ℎ , 2021, due to a drastic reduction in the battery
                                                                       sensor with the raw observations and the anomalies identified by
level.
                                                                       the majority voting system (requirement R6). Also in this case,
                                                                       the user can choose for the observations of the last 24 hours,
area is flagged with the “broken” status. Figure 9 highlights an       week, or month.
abnormal behavior of another device. In this case, the anomalous           Figure 11 is an example of anomalies visualization for sensor
measurements are due to a drastic reduction in the battery level.      4006. Anomalies are identified by a point. As can be seen in the
At 2 a.m., approximately, the battery died and the device stopped      figure, in most cases anomalies are detected in the upper peaks
sending data. After changing the battery, at 10 a.m., the device       of the time series.
restarts providing reliable measurements.
                                                                          3.2.5 Calibrated observations
   3.2.3 Gas observations                                              The results of the calibration process consists of the concentra-
In the “gas observations” view, a plot for each gas and channel        tions of the 4 measured gases. Starting from 2 values for each
is generated, as shown in Figure 10 for NO. This view meets            gas (one value for each of the two channels) in millivolts, the
requirements R3 and R4. The user can choose to visualize the           calibration provides one value in 𝜇𝑔/𝑚 3 . Currently, we are using
data of the last 24 hours, week or month. The visualization could      Random Forest to calibrate our data. However, this algorithm
seem confused, however the user is able to hide one or more            can be improved over the time since more and more data are
lines in the plot thanks to the interactive legend, and zoom in a      collected and they are used to re-train the calibration algorithm.
specific area of the plot. In the web page, the plots related to the      The “calibrated observation” view shows the result of the
two channels of the same gas are placed next to each other (as         last calibration algorithm, that is the most recent and accurate
                                                                             Figure 12: Calibrated 𝑁𝑂 observations by 5 devices located
                                                                             in the same place (“Parco Ferrari”) visualized on January
                                                                             4𝑡ℎ , 2021 at 4 p.m..

Figure 11: Anomalies of sensor 4006 for the last 24 hours
(A), the last week (B), and the last month (C) available on                     In the plots of the “calibrated observations” view, a line in cor-
January 4𝑡ℎ , 2021 at 11 a.m..                                               respondence of the threshold value is plotted only if at least one
                                                                             measurement exceeds the threshold. The plots in Figure 12 show
                                                                             the measurements of 𝑁𝑂 made by 5 different devices installed in
algorithm available for the visualized data. This view meets re-             the same location named “Parco Ferrari” (this is also the location
quirement R5. The calibrated observations are organized in 4                 of an AQM station). We have selected only the devices in the
plots, one for each gas, and the user can distinguish the measure-           same location through the interactive legend. The concentrations
ments of each device through the integration of the interactive              measured by the devices are very similar, as we expected. In the
legend.                                                                      “last month” plot the blue line indicates the above mentioned
   The calibrated values can be directly compared with the mea-              local warning threshold and only one value is higher than this
surements of the AQM stations since they are in the same unit of             threshold.
measure. To validate our calibrated data we have defined one lo-
                                                                                 3.2.6 Certified AQM station measurements
cal warning threshold for each gas based on the measurements of
                                                                             The sixth view of SenseBoard is devoted to the visualization of
the AQM stations. Each threshold has been calculated as 1.25 ∗ 𝑀,
                                                                             AQM station observations. They are hourly certified data related
where 𝑀 is the maximum value measured by the AQM stations
                                                                             to the concentrations of 𝑁𝑂, 𝑁𝑂 2 , 𝑁𝑂 𝑥 , and 𝑂 3 measured by
for the specific gas in the year preceding the date of the obser-
                                                                             the two AQM stations installed in Modena (red points in Figure
vation to be compared. If the concentration of the gas is higher
                                                                             3).
than the corresponding threshold, it is automatically flagged as
“anomalous” in the TRAFAIR database by a Python process run-
                                                                             4    EXPERT EVALUATION
ning in real time. The warning threshold is valid only in the area
of Modena since it is provided by the certified values of the AQM            SenseBoard has been regularly used by 4 environmental experts
stations of Modena and it changes every year. This threshold                 from January 2020 till now and it is still active. It has allowed:
allows to exclude very high values that are most likely due to                   (1) the recording of 250 location/status updates,
malfunction of the sensor. It is not to be confused with the alert               (2) the identification of network malfunctions in real time
thresholds of the European Commission11 or the reference lev-                        (which occurred twice in the last year and caused the loss
els of the European Environment Agency12 , which defines the                         of 1-2 days of data),
values to assess the level of pollution in the area.                             (3) the detection of sensor faults in semi-real time and anoma-
                                                                                     lous cell behaviour (which occurred 4 times and brought
11 https://ec.europa.eu/environment/legal/law/5/e_learning/module_2_18.htm           to the cell replacement),
12 https://www.eea.europa.eu/themes/air/air-quality/resources/                   (4) the identification of low battery level which caused anoma-
air-quality-map-thresholds                                                           lous observations (33 times in around 14 months),
    (5) the daily comparison of concentrations from low-cost sen-       allow the creation of custom plots, starting from the selection of
        sors and certified measurements from AQM stations to            one or more sensors, pollutants, and AQM stations, and the time
        evaluate the calibration algorithm,                             interval. This will allow for further data comparison.
    (6) the detection of strange behaviour in the anomaly detec-
        tion process which allowed to retrain the algorithm and         ACKNOWLEDGMENTS
        restart it.                                                     Research reported in this paper was partially supported by the
   The effectiveness of SenseBoard was widely appreciated by            TRAFAIR project 2017-EU-IA-0167, co-financed by the Connect-
environmental engineers who would not have had the opportu-             ing Europe Facility of the European Union. The views and con-
nity to compare sensor measurements and calibrations and to             clusions contained in this document are those of the authors and
carry out such sudden checks and maintenance.                           should not be interpreted as representing the official policies, ei-
                                                                        ther expressed or implied, of EU Commission. The authors would
                                                                        like to thank the City of Modena that contributes to the deploy-
5    CONCLUSION
                                                                        ment of the LoRa network, and the LARMA research group for
SenseBoard is a data visualization and management platform              providing requirements and feedback on SenseBoard. Moreover,
for air quality sensors. It is a flexible tool that can be integrated   a special thank goes to ARPAE that shared real-time air quality
into specific IoT environments. In this paper, architecture, users,     observations used for the calibration of the low-cost air quality
scope, and exemplar views have been presented. Moreover, details        sensors.
on the sensor data acquisition and storage processes have been
given.                                                                  REFERENCES
   SenseBoard is a multi-purpose tool: to manage and maintain            [1] European Environment Agency. 2020. Air quality in Europe — 2020
the air quality sensor network control and to supervise the calibra-         report.           Issue 9. https://doi.org/10.2800/786656                Available at
                                                                             https://www.eea.europa.eu//publications/air-quality-in-europe-2020-report.
tion process and the identification of anomalies. The management         [2] United Nations General Assembly. 2015. Transforming our world: The 2030
of the network requires the deploy and frequent re-allocation of             Agenda for Sustainable Development. Available at http://www.un.org/ga/
devices close to the AQM stations or in specific points of interests.        search/view_doc.asp?symbol=A/RES/70/1&Lang=E.
                                                                         [3] Chiara Bachechi, Federica Rollo, Federico Desimoni, and Laura Po. 2020. Us-
Data coming in real-time from the sensors need to be constantly              ing Real Sensors Data to Calibrate a Traffic Model for the City of Modena.
monitored by experts in order to control the normal functioning              In Intelligent Human Systems Integration 2020, Tareq Ahram, Waldemar Kar-
of sensors.                                                                  wowski, Alberto Vergnano, Francesco Leali, and Redha Taiar (Eds.). Springer
                                                                             International Publishing, Cham, 468–473.
   The dashboard integrates a big amount of heterogeneous data,          [4] Titus Balan, Catalin Dumitru, Gabriela Dudnik, Enrico Alessi, Suzanne Lesecq,
both geo-spatial and time series data. The position of each sensor           Marc Correvon, Fabio Passaniti, and Antonella Licciardello. 2020. Smart Multi-
                                                                             Sensor Platform for Analytics and Social Decision Support in Agriculture.
is visualized in an interactive map. The measurements of the                 Sensors 20, 15 (2020), 4127. https://doi.org/10.3390/s20154127
sensors have been plotted in different line charts with mainly           [5] A. Bigi, G. Veratti, S. Fabbi, L. Po, and G. Ghermandi. 2019. Forecast
two types of visualization: the same air pollutant measured by all           of the impact by local emissions at an urban micro scale by the com-
                                                                             bination of Lagrangian modelling and low cost sensing technology: The
the sensors in the same plot, and all the air pollutants measured            TRAFAIR project. 19th International Conference on Harmonisation within
by the same sensor in the same plot. Besides, anomalous data are             Atmospheric Dispersion Modelling for Regulatory Purposes, Harmo 2019
highlighted in other plots. The visualization of such an amount              (2019). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85084160462&
                                                                             partnerID=40&md5=3b37303a7af769206777d87ab30a2541 cited By 2.
of plots is speed up by the use of Python scripts which generate         [6] Mehmet Ali Ertürk, Muhammed Ali Aydin, M. Talha Buyukakkaslar, and
the plots asynchronously and independently by SenseBoard.                    Hayrettin Evirgen. 2019. A Survey on LoRaWAN Architecture, Protocol and
                                                                             Technologies. Future Internet 11, 10 (2019), 216. https://doi.org/10.3390/
   The dashboard is accessible anywhere and anytime to allow a               fi11100216
constant monitoring of the network. Besides, it can be generalized       [7] N. Kumar, D. Acharya, and D. Lohani. 2021. An IoT-Based Vehicle Accident
to visualize other kinds of geo-spatial and time series data. Indeed,        Detection and Classification System Using Sensor Fusion. IEEE Internet of
                                                                             Things Journal 8, 2 (2021), 869–880. https://doi.org/10.1109/JIOT.2020.3008896
the dashboard is not affected by the type of sensors employed in         [8] Xiangtao Liu, Tianle Zhang, Ning Hu, Peng Zhang, and Yu Zhang. 2020. The
the network (also in our case we integrate two different types of            method of Internet of Things access and network communication based on
sensors) and can be easily adapted to monitor other pollutants               MQTT. Computer Communications 153 (2020), 169 – 176. https://doi.org/10.
                                                                             1016/j.comcom.2020.01.044
beyond the ones described in our use case. The flexibility and           [9] Luis F. Luque-Vega, David A. Michel-Torres, Emmanuel López-Neri, Miriam A.
scalability of SenseBoard allow to monitor networks of a variable            Carlos-Mancilla, and Luis Enrique González Jiménez. 2020. IoT Smart Parking
                                                                             System Based on the Visual-Aided Smart Vehicle Presence Sensor: SPIN-V.
number of sensors in cities of different sizes. In addition, in our          Sensors 20, 5 (2020), 1476. https://doi.org/10.3390/s20051476
use case we manage a dynamic sensor network since the sensors           [10] L. Po, F. Rollo, C. Bachechi, and A. Corni. 2019. From Sensors Data to Urban
are moved frequently. However, this is an additional issue, and the          Traffic Flow Analysis. In 2019 IEEE International Smart Cities Conference (ISC2).
                                                                             IEEE, Casablanca, Morocco, 478–485. https://doi.org/10.1109/ISC246665.2019.
dashboard works also with static sensor networks. SenseBoard                 9071639
can be adapted to query a different data platform which can be a        [11] L. Po, F. Rollo, J. R. R. Viqueira, R. T. Lado, A. Bigi, J. C. López, M. Paolucci, and
PostgreSQL database or a data model of different type. Queries               P. Nesi. 2019. TRAFAIR: Understanding Traffic Flow to Improve Air Quality.
                                                                             In 2019 IEEE International Smart Cities Conference (ISC2). IEEE, Casablanca,
and plots can be easily modified to visualize data in another way            Morocco, 36–43. https://doi.org/10.1109/ISC246665.2019.9071661
or to show additional data that are not included in our use case.       [12] N. Shivaraman, S. Saki, Z. Liu, S. Ramanathan, A. Easwaran, and S. Steinhorst.
                                                                             2020. Real-Time Energy Monitoring in IoT-enabled Mobile Devices. In 2020 De-
   SenseBoard has been developed according to the technical                  sign, Automation Test in Europe Conference Exhibition (DATE). IEEE, Grenoble,
requirements provided by the environmental experts. Thus, it is              France, 991–994. https://doi.org/10.23919/DATE48585.2020.9116577
not comparable with the dashboards developed for citizens and           [13] D. Zhang and S. S. Woo. 2020. Real Time Localized Air Quality Monitoring
                                                                             and Prediction Through Mobile and Fixed IoT Sensing Network. IEEE Access
public administrations. Indeed, the scope of these dashboards                8 (2020), 89584–89594. https://doi.org/10.1109/ACCESS.2020.2993547
is not the monitoring of the sensor network, but the provision
of pollution levels to raise awareness among people about the
situation in their city. As future work, we will compare Sense-
Board with the technical tools provided by the air quality sensor
suppliers. In addition, we will integrate an additional view to

</pre>