<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IEEE Journal of Selected Topics
in Signal Processing 7 (2013) 38-49. doi:10.1109/JSTSP.2012.2237381.
[14] Y. Hao</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/JSTSP.2012.2237381</article-id>
      <title-group>
        <article-title>A Deep Learning Approach for False Data Injection Attacks Detection in Smart Water Infrastructure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Giannubilo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Giorgeschi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Carminati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Zanero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Longari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano</institution>
          ,
          <addr-line>Via Ponzio 34/5, 20133, Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>13914</volume>
      <fpage>38</fpage>
      <lpage>49</lpage>
      <abstract>
        <p>Cyber-Physical systems (CPS) represent a sophisticated integration of digital technologies with physical processes, particularly vital in critical environments such as smart water infrastructures, which require advanced monitoring and control systems to guarantee safe and resilient operations, especially in the context of attacks. This study introduces a novel unsupervised deep learning approach for detecting false data injection (FDI) attacks in smart water infrastructures. The method employs Long Short-Term Memory (LSTM) networks and Autoencoders to discern the legitimate behavior of time-series water level sensor data. We evaluate this approach using the Mincio River water system in Italy as a case study, employing publicly available data augmented with synthetic-yet realistic-random, replay, and advanced attack scenarios. The experimental results demonstrate the efectiveness of the proposed method in distinguishing anomalies from legitimate data, highlighting its potential for enhancing the security of smart water systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intrusion Detection Systems</kwd>
        <kwd>Cyber Physical Systems</kwd>
        <kwd>Critical Infrastructure Security</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Cyber-Physical systems (CPS) integrate the cyber domain, comprising networked components and
servers, with the physical domain, consisting of sensors and actuators [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These systems monitor
and control physical processes through feedback loops, where physical processes influence cyber
operations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Widely employed across critical infrastructure such as industrial control, automotive
systems, smart grids, and water treatment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Cyber-Physical System (CPS)s form the backbone of
modern society. Given that the disruption of the operations may severely afect society, the security
of such systems is paramount. However, the integration of physical and cyber components opens
such systems to cyber-physical attacks. These attacks manipulate control systems governing physical
processes, such as power grids or transportation systems, by exploiting digital interfaces to cause
tangible disruptions. For example, attackers could compromise smart water infrastructure to manipulate
water flow or disrupt operations.
      </p>
      <p>Water and river systems are critical applications of CPSs, particularly in managing resources and
mitigating environmental risks. These systems are specifically at risk against False Data Injection ( FDI)
attacks, which would allow, for instance, an adversary to alter sensor readings at the critical station,
leading to the premature release of excess water downstream. This could result in catastrophic flooding
and significant infrastructure damage.</p>
      <p>To protect such systems, robust anomaly detection methods are necessary. Early solutions relied
on mathematical models, but recent advancements incorporate Machine Learning (ML) and Deep
Learning (DL) techniques. These approaches, including Long Short-Term Memory (LSTM) networks
and Autoencoder (AE), excel at capturing temporal and spatial correlations in sensor data. By learning
legitimate sensor behavior from historical data, these methods can identify anomalies indicative of
potential attacks.</p>
      <p>Our work focuses on the design of an Intrusion Detection System (IDS) leveraging LSTM autoencoders
to detect anomalies in smart water infrastructure systems. The system reconstructs expected sensor
behavior and calculates anomaly scores to flag deviations. We validate our approach on a use case based
on the Mincio River water infrastructure in Italy. This system monitors and controls water flow from
the Garda Lake to the Po River using strategically located control points and dams. Sensors measure
water height at 15-minute intervals, and a critical station regulates flow into an artificial canal based on
real-time sensor data. We use publicly available data and simulated FDI attacks crafted for this purpose.</p>
      <p>The contributions of this study are threefold:
• The introduction of a novel Intrusion Detection System (IDS) in smart water systems.
• The creation of a new dataset specifically for validating IDS approaches.
• The evaluation of the performances of the proposed algorithm on a real-world case study, the</p>
      <p>Mincio River infrastructure.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Primer on Smart Water Infrastructures</title>
      <p>In recent years, research on water management systems has predominantly focused on traditional water
infrastructures, which rely on fewer advanced technologies and smart devices. However, the advent of
smart water infrastructures – a distinct category of cyber-physical systems – is transforming the manner
in which water systems are managed. A smart water infrastructure is defined as a system that integrates
a range of cutting-edge technologies, including sensors, real-time communication capabilities (via
wireless networks, satellite communications, etc.), automated controls, and artificial intelligence. Such
improvements facilitate more eficient monitoring and management of water resources, representing a
significant upgrade over traditional systems.</p>
      <p>The impetus behind the transition to smart water systems can be attributed to the pressing global
challenges of rising water scarcity and the considerable financial burden associated with the provision
of potable water to expanding populations. The deployment of smart water infrastructures ofers a
number of advantages over conventional systems. These include more accurate measurement of water
consumption, improved water quality control, advanced flood monitoring, and efective prevention
of water wastage. To illustrate, a typical smart water infrastructure may entail the collection of water
from rivers or seas, followed by its transportation to a water treatment facility where it undergoes
purification. Subsequently, the water is either stored or delivered to consumers via a water distribution
system, with real-time monitoring and optimization facilitated by advanced technologies.</p>
      <p>A fully realized smart water system is comprised of a multitude of components and devices, each
of which plays a crucial role in ensuring the system’s eficiency, reliability, and security. The process
starts with a set of connected sensors, such as water pressure, flow, and height sensors, which allow
for real-time assessments of the amount of water being provided downstream, alongside allowing the
prediction of floods and droughts. The data retrieved from these components is then sent through
SCADA or similar networks to command centers, where fully or partially automated control feedback
loops are used to process the data to forecast demand, optimize water flow, and adjust operations. Such
a decision-making process is then transformed into a set of instructions for the system’s connected
actuators, such as valves, dams, and gates, that - given the feedback provided by the sensors - allow
automated regulation of the flow and pressure of water and its distribution in multiple emissaries.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Motivation and Threat Model</title>
      <p>In cyber-physical attacks, the assets at risk encompass both the digital and physical elements of the
system. To illustrate, manipulating water level sensor data through malware or network tampering may
lead to inappropriate responses to water level changes. This may, in turn, cause flooding or structural
damage. Cyber-physical attacks require ad-hoc threat modeling and defensive measures to ensure their
mitigation, especially due to many critical infrastructures comprising CPS. Fortunately, the predictable
and structured nature of CPSs aids in developing certain security measures and mitigations, specifically
in the field of attack and intrusion detection. In fact, these systems can leverage the regularity of
communication patterns, sensor readings, and control signals inherent to CPS to detect anomalies that
indicate intrusions or tampering. In particular, ML models can be trained on normal system behavior
to identify deviations indicative of cyber intrusions or data manipulation, which might otherwise go
unnoticed in conventional rule-based systems.</p>
      <p>Threat Model. We consider an attacker with the capability to compromise the water level sensor data
in a smart water infrastructure system, either through direct tampering with sensors, injecting false
data into communication channels, or exploiting software vulnerabilities in the data acquisition system.
Such an attacker could possess varying access levels, ranging from physical proximity to the sensor
nodes for hardware-based tampering to remote access via network intrusion or exploitation of insecure
communication protocols. The attacker’s goal may include disrupting the system’s functionality, such
as causing overflows or shortages that can lead to physical damage, service outages, or safety hazards.
Additionally, they might aim to manipulate reported metrics to mislead operators or decision-making
algorithms, potentially inducing incorrect system responses or masking other malicious activities.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Related works</title>
      <p>
        Earlier approaches to intrusion detection for CPSs have built mathematical models to detect anomalous
behaviors. This kind of approach has been analyzed in several surveys, including Giraldo et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
Cardenas et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Although mathematical approaches can achieve high detection accuracy, they often
struggle when applied to complex cyber-physical systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Building accurate models for intricate
physical processes can be challenging due to the need for substantial expertise and deep knowledge
of system dynamics during the initial development phase [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. In recent years, several detection
methods for CPSs have been developed using machine learning techniques that do not rely on expert
knowledge or specific domain expertise, including methods for detecting misbehavior in individual
sensors, where each sensor is analyzed by applying a separate instance of the model. An example of
this approach is Process-Aware Stealthy Attack Detection (PASAD), which interprets time-series of
sensor data through Singular Spectrum Analysis (SSA) in industrial control systems Aoudi et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
However, the original version of PASAD handles only univariate data, which limits its scalability to
environments with multiple sensors [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As a result, researchers have devoted considerable efort
to developing multi-sensor misbehavior detection methods. Zhang et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] propose a multilayer
data-driven cyber-attack detection system to improve the security of industrial control systems. Several
of these methods rely on supervised machine learning, which is not applicable in the absence of labeled
data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In a real-world scenario, labeled datasets are not always available, so semi-supervised and
unsupervised techniques are used to fill the gap [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Recent unsupervised methods are based on
ML techniques. Pafenroth et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] introduce a methodology for detecting weak, distributed patterns
in sensor networks through space-time signal processing. However, temporal information is crucial, as
sensor observations are time-dependent, and historical data play a key role in reconstructing current
states, helping to determine whether a system is operating normally or abnormally [14]. The latest
applications of deep learning methods in multi-sensor domains have focused on capturing temporal
dependencies within time-series data. The LSTM approach analyses time-series data to model temporal
sequences and identifies long-term dependencies [ 15, 16, 17]. Zhu et al. [18] propose an approach for
anomaly detection in complex time-series data. The method involves the use of an LSTM
EncoderDecoder architecture combined with adversarial training. Wei et al. [19] develop a deep learning
approach for detecting anomalies in indoor air quality data by combining LSTM networks with an
autoencoder architecture. Shrestha et al. [20] introduce a framework for detecting anomalies in smart
electric grid systems. The authors present an anomaly detection system that employs LSTM and AEs to
process sensor data from smart grids.
      </p>
      <p>Detection in smart water infrastructures. Amin et al. [21, 22] propose a model for detecting
anomalies in distributed control systems using a hydrodynamic model based on the Shallow Water
Equations, they capture flow dynamics and account for propagation delays. Wei Gao et al. [ 23] develop
an IDS for smart water utilities using a three-stage backpropagation neural network based on Modbus
features. Their IDS monitors sensor and actuator data, focusing on water levels and valve settings. Raman
et al. [24] introduce an anomaly detection method based on a Physics-based Neural Network (PbNN)
approach combining deep CNNs with Industrial Control System (ICS) design knowledge, leveraging
physical interactions to detect anomalies by comparing predicted and actual behavior in real time.
Meleshko et al. [25] propose a hybrid anomaly detection method for wireless sensor networks in water
management systems, employing classifiers like AdaBoost, Random Forest, and SVM to detect attacks on
water level and flow sensors. Ramotsoela et al. [ 26] develop a behavioural intrusion detection system for
water distribution systems using a voting-based ensemble of neural networks (ANN, RNN, LSTM, GRU,
CNN). Nayak et al. [27] present a IDS for Smart Water Infrastructure (SWI) that combines fog computing
with a fuzzy logic-based Intuitionistic System for feature selection, followed by a voting classifier using
algorithms like Random Forest, SVM, and K-NN. Finally, Moazeni et al. [28] propose a deep learning
approach for detecting FDI attacks in water distribution systems. They develop a supervised deep
neural network optimized for identifying random FDIs targeting water level measurements.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Approach</title>
      <p>In a multi-sensor cyber-physical system within a smart water infrastructure, the sensors are positioned
at strategic points along rivers, reservoirs, or artificial canals to measure the height of the water, for
instance. Usually, these points are sequential, and the temporal correlation between the measurements
is essential for the system to function correctly. Our approach employs an LSTM-Autoencoder model
to detect abnormal patterns within spatially and temporally correlated data, thus reinforcing the
resilience of water distribution operations against both cyber and physical attacks. Figure 1 provides
a comprehensive overview of the entire process, delineating the various phases involved. The initial
stage is the preprocessing of the data, which involves the removal of any erroneous or anomalous
measurements. Following the cleaning process, data are organized into fixed-time-length sequences,
comprising all sensors simultaneously, to efectively capture both temporal and cross-sensor
correlations. Once preprocessed and segmented into sequences, the time-series data from various sensors is
passed through a stack of LSTM layers in the encoder phase to generate a latent representation. This
representation is then decoded to attempt the reconstruction of the original input sequence. Once this
process is complete, the reconstruction error, Mean Squared Error (MSE), for each sensor time series
is calculated. Once the metrics have been calculated, these results are compared to a threshold. If the
anomaly score exceeds the threshold, the data point is flagged as anomalous.</p>
      <sec id="sec-5-1">
        <title>5.1. Preprocessing</title>
        <p>During the data cleaning phase, any missing, invalid, or outlier entries are addressed to ensure data
quality. When a missing value, an invalid reading, or an outlier occurs, the mean between the first
preceding and first following valid measurements is calculated. This process allows the creation of a
continuous and realistic data stream, accounting for outliers and inconsistencies without introducing
anything that could afect the analysis. The resulting dataset, augmented with the synthetic attacks
presented in the next section, provides a reliable foundation for the anomaly detection approach and
can be found at 1.</p>
        <p>Following the completion of the cleaning process, it is necessary to create time-series sequences for
our DL architecture. These sequences are of a fixed length and are designed to capture the temporal
and spatial correlations between sensors. Once these sequences are created, a 3D matrix (n_rows
x sequence_length x n_sensors) is produced, whereby ’n_rows’ represents the number of individual
1https://github.com/necst/ITASEC_SWI_dataset</p>
        <p>Timeseriessequencesdata
measurements, ’sequence_length’ is the size selected, and ’n_sensors’ is the number of sensors available.
In this manner, each row is comprised of a matrix (sequence_length x n_sensors), which correlates with
a specific group of measurements.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Attacks Generation</title>
        <p>Based on similar approaches in the literature, we design three synthetic attacks: random, replay, and
gradual decrement attacks. Attack intervals are chosen to ensure each starting point supports the full
duration of the attack sequence. For each of these attacks, we devise a scenario where an attacker
selects a sensor before a dam and moments when there is a high water level. Once chosen the target, he
or she injects lower-than-actual values into one or more sensors, causing the control system to interpret
the water level as low and, consequently, to open the dam wider than necessary.</p>
        <p>Random attacks aim at generating random values for a fixed-length sequence. Within each chosen
interval, the water height values are deliberately altered. Specifically, the attack simulates intentional
deviations by replacing the sensor readings with new, randomly generated values that fall within a
predefined range, as shown in Eq. 1.</p>
        <p>_ + _
Random Range = _ +
3
These new values are specifically chosen within this range to ensure low water height readings while
still remaining within the acceptable range for each sensor.</p>
        <p>Replay attacks simulate a scenario in which historical values of sensor data are reused with the
intention of misleading the control system. Within each selected interval, sensor values are replaced
with prior valid readings from within a defined range, efectively replaying earlier water-level data.
Initially, we tested a range defined as the one for the random attack (Eq. 2). However, we found that no
historical values fell within this narrower range. As a result, we adjusted it as shown in Eq. 2.
_ + _
Replay Range = _ +
2.9
(1)
(2)
This modification provides realistic but deceptive data that subtly mislead the system. This approach
maintains realistic fluctuations within the targeted range, subtly introducing misleading data into the
system.</p>
        <p>Gradual decrement attacks simulate a slow and progressive reduction in sensor values as an
adversarial attempt to bypass the intrusion detection process. For each identified interval, we decrease each
subsequent reading by a small, predefined amount until reaching a target threshold. This target is set
just above the minimum measurable water level, ensuring the data remains plausible. Once the target
threshold is reached in every targeted sensor, the sensor value remains constant at this target level for
the remaining duration of the attack. This gradual decrease avoids causing abrupt changes, which may
increase the likelihood of bypassing simple detection mechanisms and obscuring the true water level
trends over time.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Training and parameter tuning</title>
        <p>Our methodology entails comprehensive training and parameter tuning to ascertain the optimal
configuration for our LSTM-based model in anomaly detection within smart water infrastructures. To achieve
this, a range of hyper-parameters is experimented with, including the number of units in each LSTM
layer, regularisation parameters, dropout rates, sequence length, and batch size. This tuning process
is crucial for achieving an appropriate balance between the model’s complexity and its capacity to
generalize efectively across diferent operational states.</p>
        <p>In particular, the efectiveness of models with varying architectures, including 2, 4, and 6 LSTM
layers, is evaluated in order to assess their capacity to capture the temporal dependencies inherent in
multi-sensor time-series data. Each model variant is subjected to hyper-parameter tuning, whereby
parameters such as the number of LSTM units per layer and regularisation penalties are iteratively
adjusted with the objective of minimizing validation loss. These evaluations are presented in Section 7.</p>
        <p>As a loss function, we empirically found that the MSE was the most efective one, which measures
the average squared diference between the model’s predictions and the actual values. As an evaluation
metric, we use Mean Absolute Error (MAE), which calculates the mean of the absolute diferences
between predicted and actual values. MAE (  = 1 ∑︀=1 | − ˆ|) (where  is the total number
of data points,  is the actual value of one data point, and ˆ| is the predicted value for the same data
point) provides an indication of the model’s average deviation without excessively penalizing larger
errors, making it a useful metric for understanding how much the model’s predictions deviate from the
actual values in general. Finally, an early stopping technique is employed to prevent overfitting, thereby
ensuring that the model retains the optimal weights once the validation performance has stabilized.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Use Case: the Mincio River Water Infrastructure</title>
      <p>Our use case, developed under the SARIL European project [29], takes into consideration the water
infrastructure that traverses the Mincio River, which flows through the city of Mantua. It functions as a
cyber-physical system with the objective of monitoring and managing the river’s flow. As illustrated in
Figure 2, the river originates from Garda Lake and traverses Mantua. The system comprises a network
of strategically located control points, each of which is equipped with a small dam and a collection of
sensors that continuously monitor water levels. The sensors are responsible for measuring the height of
the water, thereby providing the system with real-time data that is essential for the efective management
of the water flow, the maintenance of optimal levels, and the assurance of the system’s responsiveness
to environmental changes. This configuration enables precise control and timely adjustments across
the infrastructure, thereby augmenting its ability to prevent overflow and manage resources eficiently.</p>
      <p>The "Pozzolo" station represents the most critical juncture, where the river’s course divides into two
branches. This bifurcation occurs in front of a dam that regulates the flow of water in downstream areas.
On the opposite side of the bifurcation, there is an artificial channel designed as a bypass to prevent
lfooding and facilitate the transfer of water to other areas. The functionality of the dam is contingent
upon the real-time data collected by the water level sensors, which are indispensable for the precise
D
C</p>
      <p>Salionze
A Mi B
n
ci
o Scaricatore
Pozzolo
C. di Goito</p>
      <p>Derivazione</p>
      <p>Seriola Prevaldesca
Diversivo Mincio</p>
      <p>Vasarone
Lago di</p>
      <p>Mezzo
City of Mantua</p>
      <p>LEGENDA
1 - Conca - sostegno diga Masetti
2 - Conca di Valdaro
3 - Impianto idrovoro di Formigosa
4 - Chiavica e controchiavica di Formigosa
A - Centrale idroelettrica Medio Mantovano
B - Centrale idroelettrica delle Buse
C - Centrale idroelettrica di Montecorno</p>
      <p>D - Centrale idroelettrica della Torre
Mincio</p>
      <p>Lago
Superiore</p>
      <p>Lago
Inferiore
1</p>
      <p>Vallazza
2
3
4
and responsive regulation of the water flow. The proportion of water to be directed into the canal and
the amount to be allowed to flow through the main river downstream of Pozzolo is determined by the
control system based on the water levels detected by the sensors. The city of Mantua is crossed by three
basins called respectively Lago Superiore, Lago di Mezzo, and Lago Inferiore.</p>
      <p>Within the stations, there are sensors that measure water levels every 15 minutes. These
measurements are stored in a public repository accessible via the Agenzia Interregionale per il fiume PO (AIPO)
website [30]. This accessibility provides open access to both real-time and historical information.</p>
      <p>While other cybersecurity measures are in place, the absence of an IDS renders the infrastructure
incapable of automatically discerning anomalous behavior and identifying potential attacks. It is,
therefore, potentially possible for an adversary that obtains control of the sensor data to exploit these
conditions, potentially implementing an FDI attack.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Experimental Evaluation</title>
      <p>The assessment of our algorithm focuses on demonstrating the eficacy of LSTM models in anomaly
detection within a multi-sensor system and on presenting the reasoning behind the design choices
and parameter tuning presented in Section 5. For our evaluation, we used precision, recall, F1-Score,
ROC-AUC, and the average detection time of an attack.</p>
      <p>We collect data from a public repository [30], comprising water level measurements recorded every
15 minutes across various sensors. We divide this dataset into training and testing sets. Since our
approach is unsupervised, the testing dataset consists of both real measurements and manipulated
data generated through the attacks we have implemented. For each attack type, we generate 50 attack
instances on diferent sensors to thoroughly assess detection capabilities. Our dataset can be accessed
at 1.</p>
      <p>To select the optimal architecture for our LSTM-AE model, we conduct initial tests exploring various
configurations. First, we employ KerasTuner to determine the best settings for LSTM units per layer,
dropout rate, and L2 regularisation, evaluating these configurations based on validation loss. Notably,
this tuning process is conducted separately for models with 2, 4, and 6 layers, allowing us to identify
1.0 1.00 0.93 1.00
0.8
se0.6
rcoS0.4
0.2
0.0 Global Precision Global Recal Global F1 Score Global ROC_AUC</p>
      <p>Replay attack - Pozzolo Monte
0.83 0.83 0.97 0.90 0.97 0.84</p>
      <p>Sensor2 Sensor3 Sensor04 Sensor5 Sensor6
Detected5R0e5p0l5a0y atta5c0k - Al sen5s0ors50 50 50 50</p>
      <p>43 38
the top-performing configuration for each model depth. From these three optimal configurations (one
for each layer setup), we test diferent validation split values, finding that a validation split of 0.2 yields
the best results, meaning 20% of the training dataset is allocated for validation while 80% is used for the
actual training.</p>
      <p>After identifying the top three models, we conduct extensive tests to evaluate the performance of
each architecture in terms of sequence length, batch size, and anomaly detection parameters, specifically
percentile, Z-Score threshold, and lower and upper quantiles across all types of attacks. We implemented
(random, replay, and gradual decremental). Additionally, for each type of attack, we test both attacks
targeting all sensors and attacks focusing solely on the Pozzolo Monte sensor, which is the most critical
one. To evaluate the best setup regarding sensor reading sequence length and batch size, we perform
over a thousand runs for each of the three best-performing models. Clearly, it is not feasible to present
all these results, and considering the limitations of space and readability, we just focus on the three best
configurations to highlight the key diferences.</p>
      <p>To achieve a balanced detection across all three types of attacks, we focus on identifying configurations
that yield efective results for all of them. We evaluate the configurations on recall, precision, F1-Score,
and whether the complete attack was detected at least once. From this analysis, we identify three optimal
configurations: Configuration 1: 2-layers, batch size of 32, sequence length of 6, using Mahalanobis
Distance with a percentile threshold of 98.5. Configuration 2: 4-layers, batch size of 32, sequence
length of 6, using Z-Score with a threshold of 2. Configuration 3: 6-layers, batch size of 64, sequence
length of 4, using Euclidean Distance with a percentile threshold of 98.5.</p>
      <sec id="sec-7-1">
        <title>7.1. Results</title>
        <p>Figures 3a and 3b show the variations in precision, recall, F1-Score, and ROC-AUC for each configuration
during attacks on single or all sensors. Figure 3c displays the performance of the configurations in
detecting 50 attacks of each type to all the sensors. An attack is considered detected if at least one of
the manipulated measurements is classified as anomalous.</p>
        <p>All three configurations perform well in detecting single-sensor random attacks, but while all three
maintain a high recall in multi-sensor attacks, configuration 2 achieves good precision, resulting in the
highest F1 score. Note that, nonetheless, as shown in the graphs, all three configurations successfully
detect all the random attacks.</p>
        <p>Regarding replay attacks, the performances drop significantly. Configuration 3 appears to be efective
against single-sensor attacks, but its F1 score drops significantly when evaluating multi-sensor ones.
Configuration 1 appears to be detecting all attacks, but this is primarily due to its high recall combined
with very low precision, which results in the system efectively triggering alerts for the 50 real attacks
while also producing a number of false positives. Configuration 2, on the other hand, misses a few
attacks, although only on three specific sensors.</p>
        <p>Finally, all three configurations achieve good precision but a relatively low recall, consequently
afecting the overall F1-Score, leading to moderate values. The ROC-AUC stabilizes around 0.7 across
all configurations, indicating similar performance. In detecting the 50 gradual decrement attacks,
configuration 2 proves to be the most efective, despite missing only a few attacks, specifically on sensor
4. In contrast, configuration 3 fails to detect all attacks on sensor 4 and misses some on sensor 5, while
configuration 1 misses a few attacks on sensor 4.</p>
        <p>Discussion. Based on the experimental results, configuration 2 appears to be the most efective in
terms of overall performance. Specifically, it demonstrates superior performance in detecting both
random and replay attacks. While its performance in identifying gradual decrement attacks is slightly
lower than that of configuration 3, it still manages to detect almost all attacks, unlike configuration 3.
As observed in the experiments, increasing the complexity of the attack type leads to a reduction in
performance, particularly in terms of recall. Nonetheless, configuration 2, when faced with complex
attacks like replay or gradual decrement, still manages to identify the vast majority of the attacks
implemented.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>This research introduced an innovative approach for detecting anomalies in cyber-physical systems,
focusing on a smart water infrastructure use case along the Mincio River. By utilizing advanced deep
learning techniques, specifically Long Short-Term Memory (LSTM) networks and Autoencoder models,
the study demonstrated the feasibility of accurately capturing temporal dependencies within time-series
data, enabling the efective detection of False Data Injection (FDI) attacks. Extensive evaluations of
multiple model configurations were conducted to identify the most efective design, showcasing the
method’s ability to detect a variety of attack types. The deployment of this model within the Mincio
River infrastructure illustrated its potential for real-world application while ofering valuable insights
into the broader domain of anomaly-based intrusion detection for critical infrastructure systems. Future
work will explore the potential integration of distributed ledger technologies, inspired by Mafiola
et al. [31], to enhance security and transparency in cyber-physical domains. Additionally, federated
learning will be investigated to facilitate distributed and secure training within CPS environments,
leveraging collaborative frameworks to strengthen anomaly detection capabilities [32].</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>The presented work was performed in the context of the Horizon Europe project SARIL [29], which is
funded by the European Union under grant agreement ID 101103978. Views and opinions expressed
are, however, those of the authors only and do not necessarily reflect those of the European Union or
the European Climate, Infrastructure and Environment Executive Agency. Neither the European Union
nor the granting authority can be held responsible for them.”</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>Generative AI tools such as Grammarly and ChatGPT 4o were utilized solely for proofreading and
grammar refinement in the preparation of this manuscript. The authors retain full responsibility for the
content presented in the final version.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baheti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gill</surname>
          </string-name>
          ,
          <article-title>Cyber-physical systems</article-title>
          ,
          <source>The impact of control technology 12</source>
          (
          <year>2011</year>
          )
          <fpage>161</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xie</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ling</surname>
          </string-name>
          ,
          <article-title>Intrusion detection in cyber-physical systems: Techniques and challenges</article-title>
          ,
          <source>IEEE Systems Journal</source>
          <volume>8</volume>
          (
          <year>2014</year>
          )
          <fpage>1052</fpage>
          -
          <lpage>1062</lpage>
          . URL: https://www.scopus.com/inward/record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>84913586160</lpage>
          &amp;doi=10.1109%
          <fpage>2fJSYST</fpage>
          .
          <year>2013</year>
          .
          <volume>2257594</volume>
          &amp;partnerID=
          <volume>40</volume>
          &amp;md5=b266efa26c010ca73a63315023a5a230. doi:
          <volume>10</volume>
          .1109/JSYST.
          <year>2013</year>
          .
          <volume>2257594</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. Z.</given-names>
            <surname>Yong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Q.</given-names>
            <surname>Foo</surname>
          </string-name>
          , E. Frazzoli,
          <article-title>Robust and resilient estimation for cyber-physical systems under adversarial attacks</article-title>
          ,
          <source>in: 2016 American Control Conference (ACC)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>315</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACC.
          <year>2016</year>
          .
          <volume>7524933</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Giraldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Urbina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cardenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Valente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Faisal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruths</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Tippenhauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sandberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Candell</surname>
          </string-name>
          ,
          <article-title>A survey of physics-based attack detection in cyber-physical systems</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>51</volume>
          (
          <year>2018</year>
          ). URL: https://doi.org/10.1145/3203245. doi:
          <volume>10</volume>
          .1145/3203245.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Cárdenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Huang</surname>
          </string-name>
          , C.-Y. Huang,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <article-title>Attacks against process control systems: risk assessment, detection, and response</article-title>
          ,
          <source>in: Proceedings of the 6th ACM Symposium on Information, Computer</source>
          and Communications Security, ASIACCS '11,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2011</year>
          , p.
          <fpage>355</fpage>
          -
          <lpage>366</lpage>
          . URL: https://doi.org/10.1145/ 1966913.1966959. doi:
          <volume>10</volume>
          .1145/1966913.1966959.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Palleti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chana</surname>
          </string-name>
          ,
          <article-title>A systematic framework to generate invariants for anomaly detection in industrial control systems</article-title>
          ., in: NDSS,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Aoudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iturbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almgren</surname>
          </string-name>
          ,
          <article-title>Truth will out: Departure-based process-level detection of stealthy attacks on control systems</article-title>
          ,
          <source>in: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security</source>
          , CCS '18,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , p.
          <fpage>817</fpage>
          -
          <lpage>831</lpage>
          . URL: https://doi.org/10.1145/3243734.3243781. doi:
          <volume>10</volume>
          .1145/ 3243734.3243781.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Poskitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Learning from mutants: Using code mutation to learn and monitor invariants of a cyber-physical system</article-title>
          ,
          <source>in: 2018 IEEE Symposium on Security and Privacy (SP)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>648</fpage>
          -
          <lpage>660</lpage>
          . doi:
          <volume>10</volume>
          .1109/SP.
          <year>2018</year>
          .
          <volume>00016</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Aoudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almgren</surname>
          </string-name>
          ,
          <article-title>A scalable specification-agnostic multi-sensor anomaly detection system for iiot environments</article-title>
          ,
          <source>International Journal of Critical Infrastructure Protection</source>
          <volume>30</volume>
          (
          <year>2020</year>
          )
          <article-title>100377</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S187454822030041X. doi:https://doi. org/10.1016/j.ijcip.
          <year>2020</year>
          .
          <volume>100377</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A. D. E.</given-names>
            <surname>Kodituwakku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Hines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Coble</surname>
          </string-name>
          ,
          <article-title>Multilayer data-driven cyber-attack detection system for industrial control systems based on network, system, and process data</article-title>
          ,
          <source>IEEE Transactions on Industrial Informatics</source>
          <volume>15</volume>
          (
          <year>2019</year>
          )
          <fpage>4362</fpage>
          -
          <lpage>4369</lpage>
          . doi:
          <volume>10</volume>
          .1109/TII.
          <year>2019</year>
          .
          <volume>2891261</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Suaboot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fahad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grundy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almalawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Zomaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Drira</surname>
          </string-name>
          ,
          <article-title>A taxonomy of supervised learning for idss in scada environments</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>53</volume>
          (
          <year>2020</year>
          ). URL: https://doi.org/10.1145/3379499. doi:
          <volume>10</volume>
          .1145/3379499.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ndubuaku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Mauro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fortino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bagdasar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liotta</surname>
          </string-name>
          ,
          <article-title>Smart anomaly detection in sensor systems: A multi-perspective review</article-title>
          ,
          <source>Information Fusion</source>
          <volume>67</volume>
          (
          <year>2021</year>
          )
          <fpage>64</fpage>
          -
          <lpage>79</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S1566253520303717. doi:https: //doi.org/10.1016/j.inffus.
          <year>2020</year>
          .
          <volume>10</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pafenroth</surname>
          </string-name>
          , P. du Toit,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Scharf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Jayasumana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bandara</surname>
          </string-name>
          ,
          <article-title>Space-time signal</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>