=Paper= {{Paper |id=Vol-3762/468 |storemode=property |title=Developing a Decision Support System with a Georeferenced Smart City Security Index (SCSI): A Case Study of Messina |pdfUrl=https://ceur-ws.org/Vol-3762/468.pdf |volume=Vol-3762 |authors=Giuseppe Accardo,Roberta Marino,Valentina Esposito |dblpUrl=https://dblp.org/rec/conf/ital-ia/AccardoME24 }} ==Developing a Decision Support System with a Georeferenced Smart City Security Index (SCSI): A Case Study of Messina== https://ceur-ws.org/Vol-3762/468.pdf
                                Developing a Decision Support System with a
                                Georeferenced Smart City Security Index (SCSI): A Case
                                Study of Messina
                                Giuseppe Accardo1, *,†, Roberta Marino1,*,† and Valentina Esposito1,*†

                                1 Data Jam srl, Centro Direzionale Isola F8, Via F. Lauria, Naples, 80143, Italy




                                                    Abstract
                                                    With the rapid growth of urban population, cities are facing increasing challenges in terms of
                                                    mobility, sustainability, and living conditions. Smart cities leverage advanced technologies to
                                                    improve urban efficiency and citizens' quality of life.
                                                    This work aims to empower the Public Administration (PA) of Messina, a medium-sized Italian
                                                    city, with a georeferenced Smart City Security Index (SCSI) to monitor urban security and inform
                                                    decision-making processes.
                                                    To achieve this, we trained a Random Forest Regressor using open data alongside territory
                                                    specific key performance indicators (KPIs) and insecurity indicators. The model assigns a
                                                    security score from 0 to 100 to each city area, achieving a Root Mean Squared Error (RMSE) of
                                                    5.6 on the test set.
                                                    Furthermore, integrating the model with a Decision Support System (DSS) allows PA members
                                                    to assess changes in the SCSI in response to adjustments made to the input factors, supporting
                                                    decision-making.

                                                    Keywords
                                                    smart city, open data, decision support system 1



                                1. Introduction                                                     SCIs function by aggregating multiple variables and
                                                                                                    indicators into a single score, providing a statistical
                                This work aims to leverage Artificial Intelligence (AI)             summary of a city's overall performance. Monitoring
                                to develop a specific smart city index for monitoring               this score over time allows for evaluation of a city's
                                urban security in Messina, ultimately contributing to               progress in achieving its "smart city" goals.
                                a smarter city.                                                     Table 1 summarizes some of the most widely
                                The concept of a "smart city" encompasses the                       recognized SCIs from the literature.
                                integration of technology and urban planning to                     AI, on the other hand, has become a crucial tool for
                                enhance a city's sustainability, efficiency, and                    researchers in smart city initiatives. This, coupled
                                innovation. Several Smart City Indices (SCIs) have                  with the open data movement, has spurred further
                                been developed in the literature to assess and                      research using these sophisticated techniques to
                                quantify these aspects. These indices typically                     unlock the potential of data in realizing smart city
                                consider a range of services and projects that                      goals.
                                contribute to a city's "smartness," encompassing                    There is some evidence of positive impacts in the
                                areas like public safety (e.g., reduced traffic accidents)          transportation, sustainability, or security fields [7][8]
                                and environmental sustainability.                                   [9][10][11][12].


                                Ital-IA 2024: 4th National Conference on Artificial Intelligence,                v.esposito@almaviva.it
                                organized by CINI, May 29-30, 2024, Naples, Italy
                                ∗ Corresponding author.                                                           © 2024 Copyright for this paper by its authors. Use permitted under
                                † These authors contributed equally.                                              Creative Commons License Attribution 4.0 International (CC BY 4.0).

                                            gi.accardo@almaviva.it;
                                            r.marino@almaviva.it;




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   This work aims to equip the Public Administration           coordinates to 84% of the previously
   (PA) of Messina with a tool for monitoring urban            unknown locations. Next, we extracted the
   security and informing decision-making processes.           variables of interest by aggregating the data
   This tool leverages a georeferenced and machine             by geometry_id, year and month based on
   learning-based Smart City Security Index (SCSI)             the articles of traffic violation, according to
                                                               the regulation in Italy.
                                                               This resulted in the following features:
   Table 1
   Smart Cities Indexes in the literature.                         •    “prov_precedenza” (precedence)
               Index                               KPI                  obtained as the sum of incidents
                                                                        with violations of articles 145 and
                                                                        150.
Arcadis Sustainable Cities Index [1]          20 indicators
                                                                   •    “prov_velocita” (speed) considers
    Innovation Cities Index [2]              162 indicators             only the violation of Article 141.
          ISO 37120 [3]                      100 indicators        •    “prov_posizione”          (position)
         ITU FG-SSC [4]                      88 indicators              obtained as the sum of articles 154,
 Networked Society City Index [5]            35 indicators              149, 143, 148 and 144.
  Siemens Green City Index [6]               30 indicators         •    “prov_documenti” (documents) as
                                                                        the sum of Articles 80, 193, 116,
                                                                        180, 126, 94 and 93.
                                                                   •    “prov_sosta” (stop) derived as the
   2. Materials and Methods                                             sum of articles 158 and 157.
   This section details the data sources utilized for this         •    “prov_segnaletica”          (signals)
   study. We describe the steps involved in constructing                derived as the sum of incidents
   the variables that will be employed by the machine                   with violation of Articles 40, 41
   learning (ML) model. Additionally, we present an                     and 146.
   overview of the exploratory analyses conducted to
                                                               Like the approach used for Municipal Police
   gain insights into the characteristics of the dataset.
                                                               measures data, we addressed missing
   The city is subdivided into 287 spatial units (tiles),
                                                               geospatial coordinates within the Lighting
   each encompassing an area of 1 km². The SCSI will be
                                                               Points data. We employed the Nominatim
   used to assess the security level of each tile over time.
                                                               open-source API for geocoding, using the
   It follows that each feature within the dataset must
                                                               information provided in the "Ubicazione
   adhere to a specific structure, consisting of a unique
                                                               toponomastica" (toponomastic location)
   triad: geometry_id, month, and year. The year and
                                                               text column. As with the previous data
   month fields represent the reference time, while the
                                                               source, text cleaning procedures were
   geometry_id field uniquely identifies a tile.
                                                               necessary prior to geocoding, leveraging
                                                               NLP techniques. This process successfully
   2.1. Open data                                              assigned geographic coordinates to 78% of
            We utilized open data from the city of             the locations where coordinates were
            Messina, which are described in the                previously missing. Next, the feature of
            following section.                                 interest, namely the number of public
            Municipal Police measures gather data on           lighting poles present in a certain time tile
            accidents involving traffic violations.            (“n_pali_luce”), was calculated by summing
            As an initial data preprocessing step, we          the poles falling by geospatial coordinates in
            addressed missing geospatial coordinates.          the analyzed tile.
            We leveraged the Nominatim open-source             Urban Video surveillance details the closed-
            API [13] to geocode these locations using the      circuit television (CCTV) system operating
            information provided in the "Luogo                 within the Municipality. The data concern
            Incidente" (incident location) text column.        only administration-owned cameras, all of
            Prior to geocoding, the text data underwent        which are georeferenced, and have no
            cleaning procedures using natural language         missing values. Here, the variable of interest
            processing (NLP) techniques. This process          is the number of cameras present in a
            successfully       assigned        geographic      specific time tile (“n_telecamere”). We
        obtained this value by summing the CCTVs           quantitative index has a constant weight of
        that fall within the analyzed tile, based on       1. Values of the target variable range from 0
        their geospatial coordinates.                      (lowest security) to 100 (highest security).
                                                           Figure 1 illustrates that for specific month
2.2. Digital exhaust data                                  and year, the target variable often takes the
                                                           value of 100, which corresponds to the
        For the construction of the features, in
                                                           highest security level. Furthermore, as
        addition to the open data, we derived the
                                                           shown in Figure 2, the distribution of the
        following geolocated indicators that can
                                                           target variable, considering the entire
        characterize tiles in the city of Messina.
                                                           dataset, exhibits a significant imbalance,
        The “sentiment” index is a measure of
                                                           with the value 100 being the most frequent
        sentiment calculated on online content from
                                                           by a considerable margin.
        the analysis period within the selected tile. It
                                                           To further explore the distribution of the
        ranges from 0 to 100.
                                                           target variable, we visualized it after
        The “footfall” score is an absolute, and
                                                           excluding tiles with the highest security
        unlimited index that measures the foot
                                                           level (value 100). As shown in Figure 3, the
        traffic and popularity of a tile. This indicator
                                                           remaining values exhibited a wider range,
        considers various factors, such as the
                                                           suggesting a more informative distribution
        number of geolocated reviews, content on
                                                           for analysis.
        social media and aggregated and
                                                           Nevertheless, it was necessary to consider
        anonymized data originated from mobile
                                                           how to correct the imbalance in the values
        devices.
                                                           assumed by the target.
        The      remaining      features:    "degrado"
                                                           To understand the cause of this imbalance,
        (degradation),        "incendio"        (arson),
                                                           we examined the features associated with
        "incidente" (accident), and "crimini"
                                                           tiles having the highest “Security_Target”
        (crimes), sum up the number of events
                                                           (value 100). Interestingly, we discovered
        linked to each of these categories per tile,
                                                           that 7812 records possessed identical
        year, and month. We collected this
                                                           features. In all these cases, the feature values
        information by web-scraping from open and
                                                           were either 0 (indicating no events like for
        licensed/authorized closed sources such as
                                                           instance arson) or NaN (meaning data on
        websites blogs, social media and Police.
                                                           factors like footfall and sentiment was
2.3. Data Preparation                                      unavailable). Due to these missing or non-
                                                           informative features, we opted to remove
        After integrating the data described in the        these duplicate rows.
        previous sections into a single table, we          We obtained a dataset with 4816 records,
        obtained a dataset with 12628 records, each        3398 of which were with target 100.
        representing a unique triad of geometry_id,        Following the initial data exploration, we
        month, and year.                                   analyzed the prevalence of missing values
        The dataset refers to the time frame January       across all features (percentages shown in
        2019-August 2022, extremes included.               Table 2). To address this issue, we excluded
        We then proceeded to analyze the content of        observations where both sentiment and
        this dataset, focusing initially on the target     footfall data were missing. This exclusion
        variable for the machine learning model,           step resulted in a dataset of 4654 records.
        namely the "Security_Target".                      Subsequently, the data was split into
        This variable, is a weighted average of a          training and test sets. The training set
        qualitative and a quantitative index,
        representing the security level of each tile.
        The qualitative index considers the
        sentiment of online reviews related to
        security falling within each tile, while the
        quantitative index reflects the number of
        crimes committed. The qualitative index is
        weighted by the number of reviews in each
        tile, normalized between 0 and 1, while the
         comprised 3257 records, while the test set                    Feature     Percentage of
         contained 1397 records.                                                      missing
                                                                                      values
                                                                  prov_precedenza        0
                                                                    prov_velocita        0
                                                                   prov_posizione        0
                                                                  prov_documenti         0
                                                                     prov_sosta          0
                                                                  prov_segnaletica       0
                                                                    n_telecamere         0
                                                                     n_pali_luce         0
                                                                     sentiment         3.36
                                                                       footfall        3.36
                                                                       degrado           0
                                                                      incendio           0
                                                                      incidente          0
Figure 1: “Security_Target” distribution in Messina.
                                                                       crimini           0
This figure depicts the spatial distribution of the
target. Color intensity is used to represent the
“Security_Target” value, with light yellow indicating   3. Results
areas with the highest security level and dark red
indicating areas with the lowest security level.        This section details the ML model which was selected
                                                        to compute the SCSI. This is a random forest regressor
                                                        from the library scikit-learn, whose hyperparameters
                                                        are indicated in Table 3. Analyzing the performance
                                                        metrics of the ML model in Table 4, the residuals in the
                                                        test set in Table 5 and the distribution of observed and
                                                        predicted values in Figure 4 we assessed its goodness.
                                                        Having established the validity of the chosen model,
                                                        we proceeded to analyze the impact of each feature on
                                                        the target variable. Shapley Additive exPlanations
                                                        (SHAP) values provide a useful graphical
                                                        representation of these feature importances [14]. A
                                                        beeswarm plot effectively visualizes the distribution
                                                        of SHAP values, highlighting the features that exert the
   Figure 2: “Security_Target” Histogram.               strongest influence on the model's predictions. Our
                                                        analysis in Figure 5 reveals that the "degrado" feature
                                                        has the greatest impact. High values of "degrado"
                                                        (represented by red in the beeswarm plot) are
                                                        associated with a lower SSCI, and vice versa. Similarly,
                                                        the "n_pali_luce" feature is the second most important,
                                                        with lower values corresponding to a reduced SSCI.
                                                        This analysis of feature importance provides key
                                                        insights into the behavior of the decision-support
                                                        system (DSS). Following model development, we
                                                        equipped the Public Administration of Messina with a
                                                        DSS that enables them to simulate the impact of
                                                        changes in the SSCI by modifying features within
                                                        selected city tiles (see Figure 6 and Figure 7). In
   Figure 3: “Security_Target” with values less than
                                                        essence, these features function as controllable
   100. Histogram.
                                                        parameters that can be adjusted to improve the
                                                        security level in specific areas.
Table 2
Percentage of missing values                            Building on a similar approach, we developed a
                                                        georeferenced green index (GI) for the PA of Messina
(see equation (1)). This index assigns a score between                     Hyperparameter           Value
0 and 100, quantifying the overall quality and quantity
                                                                              n_estimators           100
of urban green space for each spatial unit. Similar to
the SCSI, the green index is designed for integration                          oob_score             True
with a DSS (see Figure 8 and Figure 9). However,                                criterion       'squared_error’
unlike the SCSI, it does not employ machine learning                           max_depth            None
techniques.                                                                   random_state            0
Below the expression to calculate the GI:
                                                                              max_features          None
                                                                           min_samples_split          6
                                     𝐻𝐺𝐴 + 𝑇𝐶𝐴 ∗ 𝛼
                      𝑤1 ∗ 𝑈𝐺 + 𝑤2 ∗ (             ∗ 100)
                                                            (1)
         𝐺𝐼(𝑡𝑖𝑙𝑒) =                       𝐸𝐿𝐴
                                    𝑤1 + 𝑤2

                                                                  Table 4
Explanation of variables:                                         Performance metrics for the Random Forest
                                                                  Regressor, namely MAE (Mean Absolute Error), MSE
    1.      UG (Urban green perception index): This               (Mean Squared Error), and RMSE (Root Mean Squared
            index reflects the perceived quality and user         Error). The Validation errors represent the mean of
            experience of urban green spaces, derived             errors calculated during the 5-Fold cross-validation
                                                                  process.
            from analyzing online reviews.
    2.      HGA (Horizontal green area, m2):                         Measure           Train       Validation     Test
            Represents the area of gardens, parks, and                                              (mean)
            forests within the spatial unit.                          MAE              1.01           2.08         1.78
    3.      TCA (Tree canopy area, m2): Calculated as                  MSE             9.76          40.11        31.09
            the sum of canopy area for all trees in the               RMSE             3.12           6.28         5.58
            spatial unit.
    4.      ELA (Emerged land area, m2): Represents
            the total land area excluding water bodies            Table 5
            within the spatial unit.                              Distribution of observed, predicted values and
    5.      α (Weight relative to the vegetative state of         residuals considering data in the test set. Residuals
                                                                  are the difference between observed values and
            the canopy area): Derived from Visual Tree
                                                                  predicted values.
            Assessment (VTA) data. It is calculated as
            the weighted sum of the areas of tree crowns              Value          observed      predicted residual
            within a tile, adjusted for their vegetative              count            1397          1397      1397
            state, divided by the total area of all tree               min              0              0      -40.99
            crowns in the tile.                                       25%             97.04          97.39       0
    6.      w1 and w2: Weights assigned such that the                 50%              100           99.95       0
            quantitative dimension (HGA and TCA)                      75%              100            100      0.39
            contributes twice as much as the qualitative               max             100            100      80.04
            dimension (UG) to the overall GI score.

Overall, this project demonstrates the value of data-
driven approaches in urban planning. The SCSI and
DSS empower the PA to make informed decisions
regarding security, and the future integration of
machine learning into the Green Index holds further
promise for comprehensive urban management.

Table 3
Hyperparameters for the Random Forest Regressor

                                                                      Figure 4: Distribution of observed and predicted
                                                                  values in the test set.
                                                       Figure 8: Example of the GI implemented in the
                                                       Municipality of Messina. Empty tiles represent areas
                                                       with missing data for the municipal tree inventory.
                                                       The number of tiles displayed will increase as the
   Figure 5: The "beeswarm" graph for the Random       census continues.
Forest regression related to the Smart Security City
Index.




                                                       Figure 9: Example of DSS application (urban green
                                                       condition).

                                                       References
                                                       [1] Arcadis. (2022, June 21). The Arcadis Sustainable
                                                           Cities Index 2022. [Member Spotlight]. Retrieved
    Figure 6: Example of an implementation of the
                                                           from https://www.arcadis.com/en/knowledge-
SCSI in the Municipality of Messina. Empty tiles
                                                           hub/perspectives/global/sustainable-cities-
indicate areas with missing data for footfall and
                                                           index.
sentiment and the remaining features equal to 0.
                                                       [2] 2thinknow. (2023). Innovation Cities™ Index.
                                                           Retrieved         from         https://innovation-
                                                           cities.com/worlds-most-innovative-cities-2022-
                                                           2023-city-rankings/26453/
                                                       [3] International Organization for Standardization.
                                                           (2018).      ISO     37120:2018        Sustainable
                                                           development of communities - Indicators for city
                                                           services and quality of life.
                                                       [4] International Telecommunication Union (ITU).
                                                           (n.d.). The Telecommunication Standardization
                                                           Sector        (ITU-T).        Retrieved      from
                                                           https://www.itu.int/en/ITU-
                                                           T/Pages/default.aspx
                                                       [5] Ericsson. (n.d.). Networked Society City Index.
   Figure 7: Example of DSS application (security)
                                                           Retrieved                                     from
                                                           https://www.ericsson.com/en/reports-and-
                                                           papers/networked-society-insights
                                                       [6] Siemens AG. (n.d.). Siemens Green City Index.
                                                           Retrieved                                     from
                                                           https://assets.new.siemens.com/siemens/asset
     s/api/uuid:cf26889b-3254-4dcb-bc50-
     fef7e99cb3c7/gci-report-summary.pdf
[7] Agarwal, P. K., Gurjar, J., Agarwal, A. K., & Birla, R.
     (2015). Application of artificial intelligence for
     development of intelligent transport system in
     smart cities. Journal of Traffic and Transportation
     Engineering, 1(1), 20-30.
[8] Bharadiya, J. (2023). Artificial intelligence in
     transportation          systems        a       critical
     review. American Journal of Computing and
     Engineering, 6(1), 34-45.
[9] De Las Heras, A., Luque-Sendra, A., & Zamora-
     Polo, F. (2020). Machine learning technologies
     for sustainability in smart cities in the post-covid
     era. Sustainability, 12(22), 9320.
[10] Hassan, S. I., & Agarwal, P. (2020). Analytical
     approach to sustainable smart city using IoT and
     machine learning. In Big Data, IoT, and Machine
     Learning (pp. 277-294). CRC Press.
[11] Lourenço, V., Mann, P., Guimaraes, A., Paes, A., &
     de Oliveira, D. (2018, July). Towards safer
     (smart) cities: Discovering urban crime patterns
     using logic-based relational machine learning.
     In 2018 International Joint Conference on Neural
     Networks (IJCNN) (pp. 1-8). IEEE.
[12] Butt, U. M., Letchmunan, S., Hassan, F. H., Ali, M.,
     Baqir, A., Koh, T. W., & Sherazi, H. H. R. (2021).
     Spatio-temporal crime predictions by leveraging
     artificial intelligence for citizens security in smart
     cities. IEEE Access, 9, 47516-47529.
[13] OpenStreetMap contributors, "Nominatim,"
     OpenStreetMap                   wiki,            2023,
     https://nominatim.openstreetmap.org/.
[14] Scott M. Lundberg and Su-In Lee. 2017. A unified
     approach to interpreting model predictions. In
     Proceedings of the 31st International Conference
     on Neural Information Processing Systems
     (NIPS'17). Curran Associates Inc., Red Hook, NY,
     USA, 4768–4777.