=Paper=
{{Paper
|id=Vol-3762/468
|storemode=property
|title=Developing a Decision Support System with a Georeferenced Smart City Security Index (SCSI): A Case Study of Messina
|pdfUrl=https://ceur-ws.org/Vol-3762/468.pdf
|volume=Vol-3762
|authors=Giuseppe Accardo,Roberta Marino,Valentina Esposito
|dblpUrl=https://dblp.org/rec/conf/ital-ia/AccardoME24
}}
==Developing a Decision Support System with a Georeferenced Smart City Security Index (SCSI): A Case Study of Messina==
Developing a Decision Support System with a
Georeferenced Smart City Security Index (SCSI): A Case
Study of Messina
Giuseppe Accardo1, *,†, Roberta Marino1,*,† and Valentina Esposito1,*†
1 Data Jam srl, Centro Direzionale Isola F8, Via F. Lauria, Naples, 80143, Italy
Abstract
With the rapid growth of urban population, cities are facing increasing challenges in terms of
mobility, sustainability, and living conditions. Smart cities leverage advanced technologies to
improve urban efficiency and citizens' quality of life.
This work aims to empower the Public Administration (PA) of Messina, a medium-sized Italian
city, with a georeferenced Smart City Security Index (SCSI) to monitor urban security and inform
decision-making processes.
To achieve this, we trained a Random Forest Regressor using open data alongside territory
specific key performance indicators (KPIs) and insecurity indicators. The model assigns a
security score from 0 to 100 to each city area, achieving a Root Mean Squared Error (RMSE) of
5.6 on the test set.
Furthermore, integrating the model with a Decision Support System (DSS) allows PA members
to assess changes in the SCSI in response to adjustments made to the input factors, supporting
decision-making.
Keywords
smart city, open data, decision support system 1
1. Introduction SCIs function by aggregating multiple variables and
indicators into a single score, providing a statistical
This work aims to leverage Artificial Intelligence (AI) summary of a city's overall performance. Monitoring
to develop a specific smart city index for monitoring this score over time allows for evaluation of a city's
urban security in Messina, ultimately contributing to progress in achieving its "smart city" goals.
a smarter city. Table 1 summarizes some of the most widely
The concept of a "smart city" encompasses the recognized SCIs from the literature.
integration of technology and urban planning to AI, on the other hand, has become a crucial tool for
enhance a city's sustainability, efficiency, and researchers in smart city initiatives. This, coupled
innovation. Several Smart City Indices (SCIs) have with the open data movement, has spurred further
been developed in the literature to assess and research using these sophisticated techniques to
quantify these aspects. These indices typically unlock the potential of data in realizing smart city
consider a range of services and projects that goals.
contribute to a city's "smartness," encompassing There is some evidence of positive impacts in the
areas like public safety (e.g., reduced traffic accidents) transportation, sustainability, or security fields [7][8]
and environmental sustainability. [9][10][11][12].
Ital-IA 2024: 4th National Conference on Artificial Intelligence, v.esposito@almaviva.it
organized by CINI, May 29-30, 2024, Naples, Italy
∗ Corresponding author. © 2024 Copyright for this paper by its authors. Use permitted under
† These authors contributed equally. Creative Commons License Attribution 4.0 International (CC BY 4.0).
gi.accardo@almaviva.it;
r.marino@almaviva.it;
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
This work aims to equip the Public Administration coordinates to 84% of the previously
(PA) of Messina with a tool for monitoring urban unknown locations. Next, we extracted the
security and informing decision-making processes. variables of interest by aggregating the data
This tool leverages a georeferenced and machine by geometry_id, year and month based on
learning-based Smart City Security Index (SCSI) the articles of traffic violation, according to
the regulation in Italy.
This resulted in the following features:
Table 1
Smart Cities Indexes in the literature. • “prov_precedenza” (precedence)
Index KPI obtained as the sum of incidents
with violations of articles 145 and
150.
Arcadis Sustainable Cities Index [1] 20 indicators
• “prov_velocita” (speed) considers
Innovation Cities Index [2] 162 indicators only the violation of Article 141.
ISO 37120 [3] 100 indicators • “prov_posizione” (position)
ITU FG-SSC [4] 88 indicators obtained as the sum of articles 154,
Networked Society City Index [5] 35 indicators 149, 143, 148 and 144.
Siemens Green City Index [6] 30 indicators • “prov_documenti” (documents) as
the sum of Articles 80, 193, 116,
180, 126, 94 and 93.
• “prov_sosta” (stop) derived as the
2. Materials and Methods sum of articles 158 and 157.
This section details the data sources utilized for this • “prov_segnaletica” (signals)
study. We describe the steps involved in constructing derived as the sum of incidents
the variables that will be employed by the machine with violation of Articles 40, 41
learning (ML) model. Additionally, we present an and 146.
overview of the exploratory analyses conducted to
Like the approach used for Municipal Police
gain insights into the characteristics of the dataset.
measures data, we addressed missing
The city is subdivided into 287 spatial units (tiles),
geospatial coordinates within the Lighting
each encompassing an area of 1 km². The SCSI will be
Points data. We employed the Nominatim
used to assess the security level of each tile over time.
open-source API for geocoding, using the
It follows that each feature within the dataset must
information provided in the "Ubicazione
adhere to a specific structure, consisting of a unique
toponomastica" (toponomastic location)
triad: geometry_id, month, and year. The year and
text column. As with the previous data
month fields represent the reference time, while the
source, text cleaning procedures were
geometry_id field uniquely identifies a tile.
necessary prior to geocoding, leveraging
NLP techniques. This process successfully
2.1. Open data assigned geographic coordinates to 78% of
We utilized open data from the city of the locations where coordinates were
Messina, which are described in the previously missing. Next, the feature of
following section. interest, namely the number of public
Municipal Police measures gather data on lighting poles present in a certain time tile
accidents involving traffic violations. (“n_pali_luce”), was calculated by summing
As an initial data preprocessing step, we the poles falling by geospatial coordinates in
addressed missing geospatial coordinates. the analyzed tile.
We leveraged the Nominatim open-source Urban Video surveillance details the closed-
API [13] to geocode these locations using the circuit television (CCTV) system operating
information provided in the "Luogo within the Municipality. The data concern
Incidente" (incident location) text column. only administration-owned cameras, all of
Prior to geocoding, the text data underwent which are georeferenced, and have no
cleaning procedures using natural language missing values. Here, the variable of interest
processing (NLP) techniques. This process is the number of cameras present in a
successfully assigned geographic specific time tile (“n_telecamere”). We
obtained this value by summing the CCTVs quantitative index has a constant weight of
that fall within the analyzed tile, based on 1. Values of the target variable range from 0
their geospatial coordinates. (lowest security) to 100 (highest security).
Figure 1 illustrates that for specific month
2.2. Digital exhaust data and year, the target variable often takes the
value of 100, which corresponds to the
For the construction of the features, in
highest security level. Furthermore, as
addition to the open data, we derived the
shown in Figure 2, the distribution of the
following geolocated indicators that can
target variable, considering the entire
characterize tiles in the city of Messina.
dataset, exhibits a significant imbalance,
The “sentiment” index is a measure of
with the value 100 being the most frequent
sentiment calculated on online content from
by a considerable margin.
the analysis period within the selected tile. It
To further explore the distribution of the
ranges from 0 to 100.
target variable, we visualized it after
The “footfall” score is an absolute, and
excluding tiles with the highest security
unlimited index that measures the foot
level (value 100). As shown in Figure 3, the
traffic and popularity of a tile. This indicator
remaining values exhibited a wider range,
considers various factors, such as the
suggesting a more informative distribution
number of geolocated reviews, content on
for analysis.
social media and aggregated and
Nevertheless, it was necessary to consider
anonymized data originated from mobile
how to correct the imbalance in the values
devices.
assumed by the target.
The remaining features: "degrado"
To understand the cause of this imbalance,
(degradation), "incendio" (arson),
we examined the features associated with
"incidente" (accident), and "crimini"
tiles having the highest “Security_Target”
(crimes), sum up the number of events
(value 100). Interestingly, we discovered
linked to each of these categories per tile,
that 7812 records possessed identical
year, and month. We collected this
features. In all these cases, the feature values
information by web-scraping from open and
were either 0 (indicating no events like for
licensed/authorized closed sources such as
instance arson) or NaN (meaning data on
websites blogs, social media and Police.
factors like footfall and sentiment was
2.3. Data Preparation unavailable). Due to these missing or non-
informative features, we opted to remove
After integrating the data described in the these duplicate rows.
previous sections into a single table, we We obtained a dataset with 4816 records,
obtained a dataset with 12628 records, each 3398 of which were with target 100.
representing a unique triad of geometry_id, Following the initial data exploration, we
month, and year. analyzed the prevalence of missing values
The dataset refers to the time frame January across all features (percentages shown in
2019-August 2022, extremes included. Table 2). To address this issue, we excluded
We then proceeded to analyze the content of observations where both sentiment and
this dataset, focusing initially on the target footfall data were missing. This exclusion
variable for the machine learning model, step resulted in a dataset of 4654 records.
namely the "Security_Target". Subsequently, the data was split into
This variable, is a weighted average of a training and test sets. The training set
qualitative and a quantitative index,
representing the security level of each tile.
The qualitative index considers the
sentiment of online reviews related to
security falling within each tile, while the
quantitative index reflects the number of
crimes committed. The qualitative index is
weighted by the number of reviews in each
tile, normalized between 0 and 1, while the
comprised 3257 records, while the test set Feature Percentage of
contained 1397 records. missing
values
prov_precedenza 0
prov_velocita 0
prov_posizione 0
prov_documenti 0
prov_sosta 0
prov_segnaletica 0
n_telecamere 0
n_pali_luce 0
sentiment 3.36
footfall 3.36
degrado 0
incendio 0
incidente 0
Figure 1: “Security_Target” distribution in Messina.
crimini 0
This figure depicts the spatial distribution of the
target. Color intensity is used to represent the
“Security_Target” value, with light yellow indicating 3. Results
areas with the highest security level and dark red
indicating areas with the lowest security level. This section details the ML model which was selected
to compute the SCSI. This is a random forest regressor
from the library scikit-learn, whose hyperparameters
are indicated in Table 3. Analyzing the performance
metrics of the ML model in Table 4, the residuals in the
test set in Table 5 and the distribution of observed and
predicted values in Figure 4 we assessed its goodness.
Having established the validity of the chosen model,
we proceeded to analyze the impact of each feature on
the target variable. Shapley Additive exPlanations
(SHAP) values provide a useful graphical
representation of these feature importances [14]. A
beeswarm plot effectively visualizes the distribution
of SHAP values, highlighting the features that exert the
Figure 2: “Security_Target” Histogram. strongest influence on the model's predictions. Our
analysis in Figure 5 reveals that the "degrado" feature
has the greatest impact. High values of "degrado"
(represented by red in the beeswarm plot) are
associated with a lower SSCI, and vice versa. Similarly,
the "n_pali_luce" feature is the second most important,
with lower values corresponding to a reduced SSCI.
This analysis of feature importance provides key
insights into the behavior of the decision-support
system (DSS). Following model development, we
equipped the Public Administration of Messina with a
DSS that enables them to simulate the impact of
changes in the SSCI by modifying features within
selected city tiles (see Figure 6 and Figure 7). In
Figure 3: “Security_Target” with values less than
essence, these features function as controllable
100. Histogram.
parameters that can be adjusted to improve the
security level in specific areas.
Table 2
Percentage of missing values Building on a similar approach, we developed a
georeferenced green index (GI) for the PA of Messina
(see equation (1)). This index assigns a score between Hyperparameter Value
0 and 100, quantifying the overall quality and quantity
n_estimators 100
of urban green space for each spatial unit. Similar to
the SCSI, the green index is designed for integration oob_score True
with a DSS (see Figure 8 and Figure 9). However, criterion 'squared_error’
unlike the SCSI, it does not employ machine learning max_depth None
techniques. random_state 0
Below the expression to calculate the GI:
max_features None
min_samples_split 6
𝐻𝐺𝐴 + 𝑇𝐶𝐴 ∗ 𝛼
𝑤1 ∗ 𝑈𝐺 + 𝑤2 ∗ ( ∗ 100)
(1)
𝐺𝐼(𝑡𝑖𝑙𝑒) = 𝐸𝐿𝐴
𝑤1 + 𝑤2
Table 4
Explanation of variables: Performance metrics for the Random Forest
Regressor, namely MAE (Mean Absolute Error), MSE
1. UG (Urban green perception index): This (Mean Squared Error), and RMSE (Root Mean Squared
index reflects the perceived quality and user Error). The Validation errors represent the mean of
experience of urban green spaces, derived errors calculated during the 5-Fold cross-validation
process.
from analyzing online reviews.
2. HGA (Horizontal green area, m2): Measure Train Validation Test
Represents the area of gardens, parks, and (mean)
forests within the spatial unit. MAE 1.01 2.08 1.78
3. TCA (Tree canopy area, m2): Calculated as MSE 9.76 40.11 31.09
the sum of canopy area for all trees in the RMSE 3.12 6.28 5.58
spatial unit.
4. ELA (Emerged land area, m2): Represents
the total land area excluding water bodies Table 5
within the spatial unit. Distribution of observed, predicted values and
5. α (Weight relative to the vegetative state of residuals considering data in the test set. Residuals
are the difference between observed values and
the canopy area): Derived from Visual Tree
predicted values.
Assessment (VTA) data. It is calculated as
the weighted sum of the areas of tree crowns Value observed predicted residual
within a tile, adjusted for their vegetative count 1397 1397 1397
state, divided by the total area of all tree min 0 0 -40.99
crowns in the tile. 25% 97.04 97.39 0
6. w1 and w2: Weights assigned such that the 50% 100 99.95 0
quantitative dimension (HGA and TCA) 75% 100 100 0.39
contributes twice as much as the qualitative max 100 100 80.04
dimension (UG) to the overall GI score.
Overall, this project demonstrates the value of data-
driven approaches in urban planning. The SCSI and
DSS empower the PA to make informed decisions
regarding security, and the future integration of
machine learning into the Green Index holds further
promise for comprehensive urban management.
Table 3
Hyperparameters for the Random Forest Regressor
Figure 4: Distribution of observed and predicted
values in the test set.
Figure 8: Example of the GI implemented in the
Municipality of Messina. Empty tiles represent areas
with missing data for the municipal tree inventory.
The number of tiles displayed will increase as the
Figure 5: The "beeswarm" graph for the Random census continues.
Forest regression related to the Smart Security City
Index.
Figure 9: Example of DSS application (urban green
condition).
References
[1] Arcadis. (2022, June 21). The Arcadis Sustainable
Cities Index 2022. [Member Spotlight]. Retrieved
Figure 6: Example of an implementation of the
from https://www.arcadis.com/en/knowledge-
SCSI in the Municipality of Messina. Empty tiles
hub/perspectives/global/sustainable-cities-
indicate areas with missing data for footfall and
index.
sentiment and the remaining features equal to 0.
[2] 2thinknow. (2023). Innovation Cities™ Index.
Retrieved from https://innovation-
cities.com/worlds-most-innovative-cities-2022-
2023-city-rankings/26453/
[3] International Organization for Standardization.
(2018). ISO 37120:2018 Sustainable
development of communities - Indicators for city
services and quality of life.
[4] International Telecommunication Union (ITU).
(n.d.). The Telecommunication Standardization
Sector (ITU-T). Retrieved from
https://www.itu.int/en/ITU-
T/Pages/default.aspx
[5] Ericsson. (n.d.). Networked Society City Index.
Figure 7: Example of DSS application (security)
Retrieved from
https://www.ericsson.com/en/reports-and-
papers/networked-society-insights
[6] Siemens AG. (n.d.). Siemens Green City Index.
Retrieved from
https://assets.new.siemens.com/siemens/asset
s/api/uuid:cf26889b-3254-4dcb-bc50-
fef7e99cb3c7/gci-report-summary.pdf
[7] Agarwal, P. K., Gurjar, J., Agarwal, A. K., & Birla, R.
(2015). Application of artificial intelligence for
development of intelligent transport system in
smart cities. Journal of Traffic and Transportation
Engineering, 1(1), 20-30.
[8] Bharadiya, J. (2023). Artificial intelligence in
transportation systems a critical
review. American Journal of Computing and
Engineering, 6(1), 34-45.
[9] De Las Heras, A., Luque-Sendra, A., & Zamora-
Polo, F. (2020). Machine learning technologies
for sustainability in smart cities in the post-covid
era. Sustainability, 12(22), 9320.
[10] Hassan, S. I., & Agarwal, P. (2020). Analytical
approach to sustainable smart city using IoT and
machine learning. In Big Data, IoT, and Machine
Learning (pp. 277-294). CRC Press.
[11] Lourenço, V., Mann, P., Guimaraes, A., Paes, A., &
de Oliveira, D. (2018, July). Towards safer
(smart) cities: Discovering urban crime patterns
using logic-based relational machine learning.
In 2018 International Joint Conference on Neural
Networks (IJCNN) (pp. 1-8). IEEE.
[12] Butt, U. M., Letchmunan, S., Hassan, F. H., Ali, M.,
Baqir, A., Koh, T. W., & Sherazi, H. H. R. (2021).
Spatio-temporal crime predictions by leveraging
artificial intelligence for citizens security in smart
cities. IEEE Access, 9, 47516-47529.
[13] OpenStreetMap contributors, "Nominatim,"
OpenStreetMap wiki, 2023,
https://nominatim.openstreetmap.org/.
[14] Scott M. Lundberg and Su-In Lee. 2017. A unified
approach to interpreting model predictions. In
Proceedings of the 31st International Conference
on Neural Information Processing Systems
(NIPS'17). Curran Associates Inc., Red Hook, NY,
USA, 4768–4777.