-

X. Shi);

1613-0073

Detection using Graph Neural Network

Xiangyu Shi

0 1 4

Abhishek Srinivasan

srini@kth.se 0 1 4

Sepideh Pashami

1 2 3

Workshop

1 0 Department of Computer Science, KTH University , Stockholm , Sweden 1 Graph Neural Network , Time-series Anomaly Detection, Counterfactual Explanation, Graph Node selection, GNN 2 Halmstad University , Halmstad , Sweden 3 RISE AB , Isafjordsgatan 28 A, Kista , Sweden 4 Scania CV AB , Vagnmakarvägen 1, Södertälje , Sweden

1941

000 0 0003

In industrial settings, anomalies often indicate critical events such as equipment failures or system faults. These events are rare but highly impactful and require urgent attention and often have financial or safety consequences. Deep learning models, especially Graph Neural Networks (GNNs) have gained prominence due to their ability to capture intricate dependencies between sensor signals as graphs. Understanding the reasons behind the predicted anomalies is essential for efective response, however, the black-box nature of GNNs poses a significant challenge.

Explainer

1. Introduction

In the age of cyber-physical systems (CPS), where physical processes are tightly integrated with computation and communication infrastructure, ensuring reliable and safe operation of these systems is of paramount importance. As these systems become increasingly complex, continuous monitoring of their health has emerged as a vital component of operational safety and performance optimization. One of the key techniques employed in this context is anomaly detection (AD), which involves identifying patterns in system behavior that deviate from expected norms. Accurate anomaly detection enables early fault diagnosis, minimizes downtime, and helps prevent catastrophic failures.

Traditional anomaly detection methods encompass a wide range of statistical and machine learning techniques. These include clustering approaches such as k-means, and density estimation methods like One-Class SVM and Isolation Forests [1]. More recently, deep learning-based methods such as autoencoders, recurrent neural networks (RNNs), and variational autoencoders (VAEs) have been employed to model the normal behavior of time-series data and identify deviations [2]. While efective in many cases, these approaches generally operate under the assumption that sensor observations are independent or sequentially dependent, and they often fail to account for the structural interdependencies among sensors in a system.

Traditional anomaly detection methods often treat sensor observations independently or assume simplistic temporal dependencies, ignoring the inherent structural relationships between diferent

CEUR

ceur-ws.org sensing components. In many CPS applications, such as industrial automation, energy distribution networks, and autonomous vehicles-the behavior of a sensor is often influenced by the states of its neighboring sensors due to underlying physical or logical connections. Capturing these interactions is essential for robust modeling of system behavior. Graph-based representations provide a natural and expressive framework to encode these inter-sensor relationships. Recent advances in Graph Neural Networks (GNNs) have made it possible to efectively leverage graph-structured data for tasks like classification, prediction, and anomaly detection in multivariate time series data [ 3].

Several studies have demonstrated that modeling sensor dependencies through graph structures can significantly enhance the performance of anomaly detection systems in CPS settings [ 4, 5, 3]. Despite these promising results, a major limitation persists: the lack of explainability. GNN-based anomaly detection models are often treated as black boxes, ofering little insight into why a particular anomaly was detected. This is particularly a problem in safety-critical domains, where human operators must understand and trust the decisions made by automated systems.

To bridge this gap, the machine learning community has increasingly focused on explainability, with methods generally categorized into local explanations—targeting individual predictions and global explanations—describing overall model behavior [6]. While several explanation techniques have been proposed for standard deep learning models, the explainability of GNNs, especially in time-series contexts, remains an underexplored area. Moreover, existing explanation methods often rely on feature attribution or saliency maps, which may lack causal grounding and are limited in the types of counterfactual insights they can provide. Our primary focus is to explore whether graph structure in GNNs be harnessed for better explaining the model’s decisions.

In this work, we propose a novel framework for counterfactual explanation tailored to GNN-based anomaly detection models operating on time-series sensor data. Counterfactual explanations aim to answer the question: “What minimal change to the input would alter the model’s prediction?”—thus providing actionable and intuitive insights into model decisions. Counterfactual explanations can give a clue as to the root cause of the anomalies.

Our approach comprises a two-stage process. In the first stage, we identify the most influential sensors that contribute to an anomaly, along with their local graph neighborhoods. This localization step leverages node-level deviations and GNN attention mechanisms to pinpoint regions of the graph that are most responsible for the prediction. In the second stage, we generate counterfactual instances by perturbing sensor readings in a minimal and plausible manner, aiming to flip the model’s prediction from anomalous to normal (or vice versa). These counterfactuals serve as transparent, case-specific explanations that can assist operators in understanding failure modes and potential corrective actions. By integrating such support-systems reduces cognitive load on the human decision-makers, while allowing them to efectively validate model outputs.

By combining the structural strengths of GNNs with the intuitive clarity of counterfactual reasoning, our method advances the state of the art in explainable anomaly detection for cyber-physical systems. On the SWaT and WADI benchmarks, our two-stage approach alters fewer than 6% of sensors, yet still delivers an outstanding sparsity-versus-proximity balance that makes the counterfactuals concise and actionable. This not only enhances trust and accountability but also opens new avenues for troubleshooting and diagnostics.

2. Related Work 2.1. Counterfactual Explanation for Time Series

Several recent studies have explored counterfactual explanation techniques for time series data, with the aim of explaining model decisions by identifying minimal changes in input features that would alter the model output.

For instance, Karlsson et al. 2020, propose a technique for generating counterfactuals using models like k-nearest neighbors and random shapelet forests. In another approach, wan 2021, focus on univariate time series by mapping data to a latent space, identifying counterfactuals there, and decoding them back to the input space. Native-Guide [9] identifies the nearest contrasting instance, extracts its most influential subsequence, and substitutes it into the original time series. CoMTE [ 10] selects alternative series from the training set to replace parts of the input in order to induce prediction changes. More recently, CFWoT [11] introduces a model-agnostic framework for both static and multivariate time series, capable of handling continuous and categorical features without needing access to training data or similar samples.

These approaches often do not focus on relational structures present in multivariate time series data, which is the focus of this work.

2.2. Counterfactual Explanation of Graph Neural Networks

Counterfactual explanation methods of graph neural networks aim to identify the smallest possible modifications to the input that would lead to a diferent model output. By pinpointing which features must be altered to change a prediction, these methods ofer valuable insights into the model’s decision boundaries and causal reasoning.

A representative method in this category is CF-GNNExplainer [12], which introduces a learnable binary mask over the model’s computational graph to indicate edge presence or removal. The mask is optimized to ( 1 ) alter predictions (prediction loss) and ( 2 ) minimize structural changes (distance loss). The final explanation highlights edges with the highest importance scores from the learned mask.

Another thread of counterfactual explanation methods is to generate counterfactual instances that are close to the original instance but lead to a diferent prediction. CLEAR [ 13] employs a graph variational autoencoder (GVAE) to learn a latent representation of the input graph and generate counterfactual graphs by making minimal changes to the original structure or features. The GVAE is trained to reconstruct the original graph while ensuring that the generated counterfactual samples result in a diferent model prediction, maintaining both proximity (closeness to the original instance) and validity (changing the prediction). RCExplainer [14] uses a neural network that predicts the existence of an edge between two nodes based on their embeddings. To generate counterfactual explanations, RCExplainer modifies these pairwise node embeddings, efectively simulating the addition or removal of edges that lead to a change in the model’s prediction. This approach allows for a structured and interpretable way of understanding which edges influence the decision of the GNN model.

However, these methods primarily focus on structural changes to the graph, such as edge addition or removal, rather than utilizing graph structures for time series data, which is the focus of our work. Our approach leverages the inherent relationships between sensors in a time series context, enabling us to generate counterfactual explanations that are both interpretable and relevant to the specific anomalies detected by GNN-based models.

3. Method 3.1. Problem Statement

This paper addresses the task of explaining anomalies in multivariate time series data through counterfactual explanation generation. To support this, we incorporate an initial anomaly detection component as a foundation.

We begin by employing an unsupervised time series anomaly detection model that learns the normal behavior of a system from historical data and detects deviations in unseen data. The input consists of multivariate sensor data , where | | = and is the number of sensors. The training data is denoted as strain = [s( 1 ), s( 2 ), … , s( train)], where each s() ∈ ℝ represents sensor readings at time . The model assumes training data to be free of anomalies and captures normal system patterns to flag abnormal points in the test data.

The core focus of this work lies in generating counterfactual explanations for the data points identified as anomalous. The counterfactual explanation provides human-interpretable insights into the model’s decision-making process by answering the question: What minimal change would make an anomalous instance be considered normal? Formally, given a test data sequence stest = [s( 1 ), s( 2 ), … , s( test)] and a set of anomaly predictions, the goal is to generate, for each detected anomaly st(e)st, a modified version st(e)s′t such that the model classifies st(e)s′t as normal, and st(e)s′t remains as close as possible to st(e)st under a suitable distance metric.

3.2. Overview

1) GNN-based Anomaly Detection

Graph Information Anomaly Sample Two-stage approach 2.a) Node Extraction Informative

Subgraph 2.b) Counterfactual Explanation Generation

3.3. GNN-based Model for Time-Series Anomaly Detection

This section presents GNN-based model for time-series anomaly detection, which utilizes the methodology proposed by den 2021. The model produces an anomaly score for time series data, labeling it as anomalous if its score exceed a specified threshold. Following the GDN architecture [ 4], the implementation integrates structural learning techniques with graph neural networks, comprising four interconnected modules: sensor embedding, graph structure learning, graph attention-based forecasting, and graph deviation scoring.

For each sensor is represented by a trainable embedding vector e ∈ ℝ , learned jointly with the forecasting objective. These embeddings capture the behavior patterns of the sensors and can be used to identify which sensors are similar to each other. Sensors that are highly correlated will have similar embedding vectors.

To explicitly represent inter-sensor relationships, we build a data-driven directed graph. For every where attention weights are softmax-normalized cosine similarities of concatenated node features.

A fully connected layer then maps sensor representations into predicted sensor values: where is a fully connected layer. The output ŝ() is the predicted values of the sensors at time . The model is trained using a mean squared error (MSE) loss function: h() = ReLU ( , Wx() +

∑ , Wx() ) , ∈ () ŝ() = ([e1 ⋅ h(1) , e2 ⋅ h(2) , … , e ⋅ h() ]), ℒMSE =

1 train − =+1 train ∑ ‖ŝ() − s() ‖2, 2 where train is the total number of training samples.

Deviations between predicted and actual values are calculated as the deviation score for each sensor Err () = |ŝ () − s () |, where ŝ () is the predicted value and s () is the actual value of sensor at time .

To ensure that all deviation scores are on the same scale, we normalize the deviation score as follows: ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) pair of sensors embeddings e and e , we compute cosine similarity as:

A′ =

e e ‖e ‖‖e ‖ , anomaly scoring steps.

For node at time , the hidden state is and retain the top- neighbors of each node to obtain the adjacency matrix A. This resulting directed graph explicitly encodes dominant inter-sensor relationships, informing subsequent forecasting and With the learned adjacency matrix A, graph-attention layers process each time window x() ∈ ℝ × .

AS () =

Err () − ̃ ̃

, AS() = max (), =1 where ̃ and ̃ are the median and inter-quartile range (IQR) of the deviation scores of sensor over the training set, as followed by [4].

The final anomaly score at time is given by taking the maximum across all sensors: where is the number of sensors. The system is flagged as anomalous if the score exceeds a predefined threshold.

3.4. Two-stage Approach

Anomaly samples detected by the GNN-based anomaly detection model are fed into the two-stage approach for generating an explanation. The first stage involves node extraction, which identifies the most relevant sensors to guide the counterfactual explanation method. The second stage uses a counterfactual explanation method that generates counterfactual instances by altering only the sensors identified in the extracted node set from the first stage. selected e3 e2 c)

e4 selected ( 7 ) ( 8 ) ( 9 )

3.4.1. Node Extraction

To generate counterfactual explanations focused on the most relevant sensors, we need to extract a node set containing only the most important sensors from the original graph. The node extraction module uses the anomaly score for each sensor to identify the most important sensors in the graph. The extraction process consists of three steps: 1. Select the top 1 sensors with the highest anomaly scores, where 1 is a hyperparameter that controls the size of the initial node set. 2. Select 2 additional sensors that are connected to the selected 1 sensors in the graph, where 2 is a hyperparameter that controls the size of the extended node set. 3. Combine both sets of sensors to form the final set of selected sensors () . where rank(⋅)ranks sensors by anomaly scores in descending order. This step selects sensors with the highest anomaly scores, as they are most likely to contribute to the detected anomaly and are therefore most relevant for generating counterfactual explanations.

In the second step, we select 2 sensors 2() that are connected to the selected 1 sensors: ( 1() ) = { ∈ ∖ 1() ∣ ∃ ∈ 1() ∶ = 1},

S(2) ⊆ ( 1() ), |S(2) | = 2, where ( 1() )represents the neighboring sensors of the selected 1 sensors. Several strategies exist for selecting S() from ( 1() ), including choosing sensors with the highest anomaly scores, those most 2 connected to 1() , or random selection. We choose the top 2 sensors with the highest anomaly scores as this provides a simple and efective way to select the most relevant sensors.

Finally, we combine both sets to form the final selected sensor set:

() = 1() ∪ 2() .

The selected node set () is then used as input to the counterfactual explanation method, which generates counterfactual instances by altering only the sensors in the extracted set.

3.4.2. Counterfactual Explanation Generation

The counterfactual explanation generation module creates counterfactual instances by altering the signals of the sensors in the extracted node set. We use a perturbation-based approach that generates counterfactual instances by adding small changes to the original signal.

We employ gradient optimization, a technique commonly used in adversarial attacks, to compute these perturbations efectively. The perturbation is found by minimizing the objective function ℒ (x, x + ), where x is the original signal, and is the perturbation. The objective function is defined as: ℒ (x, x + ) = ℒCE( (x + ), target) + ⋅ ‖ ‖, where controls the trade-of between the two terms, (⋅) is the model, target is the target class, and ℒCE is the cross-entropy loss. The first term pushes the model to produce a specific output (the target class), while the second term keeps the perturbation small.

The perturbation is computed using gradient descent: where is the learning rate, and is the iteration number. We initialize the perturbation to zero: 0 = 0.

To focus only on the extracted sensors, we apply a mask to the gradient. The mask m is defined as: (+1) = − ∇ ℒ (x, x + ), m = { 1, if ∈ () , 0, otherwise. ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) This mask zeros out the gradients for sensors not in the extracted node set. The masked gradient is computed as:

∇′ℒ (x, x + ) = ∇ ℒ (x, x + ) ⊙m, where ⊙ denotes element-wise multiplication. The perturbation is then updated using the masked gradient:

This process continues until we reach the maximum number of iterations or obtain a valid counterfactual instance. The final step adds the perturbation to the original signal: (+1) = − ∇ ′ℒ (x, x + ).

xcf = x + .

The generated counterfactual instance xcf is a modified version of the original signal that produces a diferent model output. This counterfactual instance explains the model’s decision by showing how the prediction changes when influential sensors are altered. By only perturbing sensors in the extracted node set, we focus on the most relevant sensors, which helps minimize the perturbation size and improve the quality of explanations.

4. Experiments 4.1. Experiment Setup 4.1.1. Datasets

We evaluate our approach on two multivariate time series datasets from industrial control systems, comprising both public benchmarks. Dataset statistics are summarized in Table 1.

We use two widely-adopted water treatment testbed datasets: SWaT [15] and WADI [16]. The Secure Water Treatment (SWaT) dataset contains data from a scaled water treatment plant with 51 sensors monitoring various physical processes. The Water Distribution (WADI) dataset extends SWaT with a more comprehensive 128-sensor water distribution system. Both datasets include two weeks of normal operations followed by controlled attack scenarios that simulate real-world anomalies through physical system manipulations.

We apply consistent preprocessing across all datasets following [4]: ( 1 ) median downsampling to 0.1 Hz (one sample per 10 seconds) to reduce noise and computational overhead, ( 2 ) sensor-wise min-max normalization to [0,1] range, and ( 3 ) sliding window segmentation into 50-second chunks (5 downsampled measurements) for model input, following the previous works [4].

4.1.2. Baseline Methods

We compare the GNN anomaly detection approach against several baseline models, including six traditional machine learning models, and one GNN-based model. The compared models are listed as follows: • KNN: K Nearest Neighbors utilizes the distance of each point to its nearest neighbors as the anomaly score and classifies the point as anomalous if the score is greater than a specified threshold. • IForest: Isolation Forest is an ensemble-based anomaly detection model that isolates anomalies by randomly partitioning the data into smaller subsets. It builds an ensemble of isolation trees and uses the average path length of the trees to compute the anomaly score. • OCSVM: One-Class SVM is a support vector machine-based anomaly detection model that learns a decision boundary around the normal data points and classifies points outside the boundary as anomalous. • AutoEncoder: AutoEncoder consists of an encoder and a decoder which reconstruct data samples from the input data. The reconstruction error is used as the anomaly score. • VAE: Variational AutoEncoder is a improved version of AutoEncoder, which learns a probabilistic model of the data. • PCA: Principal Component Analysis looks for a low-dimensional projection of the data that captures most of the variance of the data. The reconstruction error is used as the anomaly score. • FuSAGNet [17]: FuSAGNet introduces Fused Sparse Autoencoder and Graph Net, which jointly optimizes reconstruction and forecasting while explicitly modeling the relationships within multivariate time series.

For counterfactual explanation generation, we compare against two additional baselines: ( 1 ) Reconstruction, which directly uses autoencoder reconstructions as counterfactual explanations under the assumption that reconstructions project onto the normal space, and ( 2 ) Without Node Extraction, which represents our method without the node extraction component.

4.1.3. Evaluation Metrics

We evaluate our approach using two sets of metrics: anomaly detection performance and counterfactual explanation quality.

Anomaly Detection Performance. We assess the anomaly detection model using standard classification metrics: precision, recall, F1-score, AUC-ROC, and PRC-AUC. AUC-ROC and PRC-AUC provide a comprehensive assessment of the model’s performance across diferent threshold values and are widely used metrics for evaluating classification models.

Counterfactual Explanation Quality.

We evaluate generated counterfactuals using three quantitative metrics alongside qualitative visual inspection. Validity measures the fraction of counterfactuals that successfully flip the model’s prediction: where cf is the number of counterfactuals, (⋅) is the model, is the classification threshold, and (⋅) is the indicator function.

Sparsity quantifies the average fraction of sensors modified per counterfactual: minimal change threshold. where is the number of sensors, is the perturbation for sensor in counterfactual , and is a Proximity measures the average magnitude of perturbations: ( 16 ) ( 17 ) (18) Validity = 1 cf =1 cf ∑ ( ( xcf) < ) Sparsity = cf 1 ∑

1 cf =1 =1

∑ (| | > ) Proximity = 1 cf =1 cf ∑ ‖ ‖ where represents the perturbation vector for counterfactual .

Higher validity indicates more efective counterfactuals, while lower sparsity and proximity reflect better explainability through minimal, localized changes.

4.1.4. Implementation Details

We implement the proposed approach using PyTorch and PyTorch Geometric. The model is trained with Adam optimizer with learning rate 1 × 10−3 and ( 1, 2) = (0.9, 0.99)for 50 epochs. We include early stopping with a patience of 10 epochs. The embedding dimension for the sensors is 128 for WADI dataset, and 64 for SWaT dataset. Training is performed on a single Tesla T4 GPU with 16 GB memory. For the node extraction module, we set 1 = 2 and 2 = 1. The perturbation is computed using gradient descent with a learning rate of 0.001 and a maximum of 100 iterations, with Adam optimizer. for the objective function is 0.1 for SWaT dataset and 0.001 for WADI dataset.

4.2. Benchmark Comparison

In this section, we conduct two benchmark comparisons. The first benchmark is to compare the anomaly detection performance of the proposed GNN-based model with the other baseline models. This benchmarking acts a sanity check for anomaly detection. The second benchmark is to compare the generated counterfactual explanations with the baseline models.

Anomaly Detection Performance

As a sanity check for GNN model, we compare the performance of anomaly detection for the proposed GNN-based model and the other baseline models on the two datasets. The results are shown in Table 2.

On the WADI dataset, GDN achieves the highest F1, precision and PRC-AUC, while FuSAGNet leads in ROC-AUC. VAE achieves the best recall. These results suggest that GNN-based models ofer more balanced performance. normal instances.

On the SWaT dataset, GDN consistently outperforms others across nearly all metrics. PCA achieves the highest precision but with lower recall, indicating a stricter anomaly boundary that may misclassify Explanation Performance We compare the performance of counterfactual explanations across diferent models. In addition to our proposed method, we apply the two-stage approach using FuSAGNet. For baseline models without graph structures, we skip the node extraction step and apply the counterfactual method directly. We also evaluate a reconstruction-based counterfactual approach on both GDN and FuSAGNet. Results are shown in Table 3. Note that KNN and IForest are excluded, as their non-diferentiable nature prevents gradient-based counterfactual generation.

On the WADI dataset, the proposed approach achieves a validity score of 0.5718, which is not significantly higher than other models, but still acceptable. Notably, it outperforms others in sparsity and proximity, indicating that the generated counterfactuals are both sparse and close to the original instances. In contrast, baseline models show poor performance, with a sparsity score of 1.0000 and much higher proximity values. While FuSAGNet with node extraction achieves a higher validity score, its sparsity and proximity do not improve significantly. These suggest that generating valid counterfactuals is more easy, but needs more adjustment to the original signal. We also find that the node extraction step is not efective for FuSAGNet, as the validity score is the same as the model without the node extraction step. Reconstruction-based methods perform poorly on WADI, with low validity and sparsity ifxed at 1.0000, indicating dificulty in generating meaningful and interpretable counterfactuals.

On the SWaT dataset, a similar trend emerges. The proposed approach achieves a high validity score alongside low sparsity and proximity, indicating efective and interpretable counterfactuals. Although baseline models and reconstruction-based methods reach perfect validity, they sufer from high sparsity and proximity, reducing explinability. When the node extraction step is removed from the proposed approach, validity drops and sparsity increases significantly, which highlights the step’s efectiveness. We attribute this to the gradient-based method distributing perturbations across all sensors, leading to less valid and less sparse counterfactuals. Interestingly, FuSAGNet performs worse with node extraction on SWaT, dropping to a validity score of 0.1380. This may stem from its architectural constraints enforcing sparsity in the latent space [17], which limits its adaptability in counterfactual generation.

4.3. Ablation Studies Efect of Node Extraction Hyperparameters for Counterfactual Explanations: We investigate

the impact of hyperparameters 1 and 2, which control the number of selected sensors and their neighbors, on the quality of counterfactual explanations using GDN on the SWaT and WADI datasets. Results are shown in Table 4.

As 1 and 2 increase, the validity score generally improves, indicating that more valid counterfactuals can be generated, with more features to perturb. However, this trend plateaus when the sum 1 + 2 exceeds 2, suggesting only a small number of informative sensors and their immediate neighborhood are suficient for efective explanation. Meanwhile, both sparsity and proximity scores increase with 1 and 2, reflecting reduced explainability due to more widespread perturbations.

On the SWaT dataset, the best configuration is 1 = 2, 2 = 1, achieving the highest validity of 0.9740 while maintaining relatively low sparsity and proximity. Notably, this configuration outperforms than the one with 1 = 3 and 2 = 0, despite both involving three total nodes. This indicates that leveraging the graph structure to incorporate neighbors provides more targeted and eficient perturbations than selecting more sensors independently, which highlights the benefit of graph-based relational modeling in counterfactual generation. In contrast, too few sensors (e.g., 1 = 1, 2 = 0) result in poor validity (0.4160), while too many ( 1 = 5) can dilute the perturbation efect, lowering validity to 0.9430. A similar pattern is observed on WADI, where the best validity (0.5900) occurs at 1 = 3, 2 = 0, though the overall scores are lower, which is likely due to WADI’s higher dimensionality and complexity.

Overall, the number of selected sensors should be large enough to ensure generation of valid counterfactuals, but small enough to maintain explainability. Validity gains plateau after a certain point, suggesting a trade-of between completeness and sparsity.

4.4. Visual Analysis Experiments

We show one illustrative example of the generated counterfactual explanations for the detected anomalies. This example is selected from the SWaT dataset, which contains a detected anomaly with a label of 9. The original instance and the generated counterfactual instance are shown in Figure 3.

Anomaly 9 0 2 4

Time 6 8

This anomaly example is due to a attack on the sensor FIT-401, which is a flow transmitter sensor. The attack manually set the sensor value to 0, which makes actuator P-501 turns of (from 2 to 1 in value). The original instance is shown in solid lines, and both sensors are in the status of turning of. Our node extraction module selects the most important sensors, i.e., FIT-401 and P-501. The generated counterfactual instance is shown in dashed lines, where the sensor FIT-401 is set to a higher value, and the actuator P-501 is set to a higher value. We can see that there are correlations between the two sensors, which indicates that they are related and can influence each other’s behavior. This aligns with the physical setting of the system, as FIT-401 is the upstream sensor of P-501, and the value of FIT-401 has direct influence on the value of P-501. The generated counterfactual instance is valid, as it is close to the original instance and can be interpreted as a valid counterfactual explanation.

5. Discussion

Our experimental results confirm that the proposed two-stage counterfactual framework provides concise, actionable explanations that improve trust and troubleshooting eficiency for system operators. In this section we discuss two main insights.

Efectiveness of graph‐aware counterfactuals: Across both datasets, validity increases sharply once the explanation can perturb at most three sensors, i.e. the 1+ 2 = 3 setting, where 1 > 0, and 2 > 0 and 2 > 0, denote that neighbors of the selected features are utilised. This shows that usually only a few, closely linked variables drive each anomaly. When we choose some of those sensors using the graph of how they connect (i.e. increasing 2), the resulting counterfactuals are more valid than if we just picked the sensors with the highest anomaly scores. This backs up our idea that knowing the system’s structure is crucial for clear counterfactual reasoning in highly coupled systems. Trade-of between validity, sparsity and proximity: Letting the algorithm perturb more sensors (higher 1 or 2) makes its explanations more often valid, but it also means bigger changes to the data, resulting in the results become harder to read and trust. Looking at Table 4, the sweet spot seems to be 1 = 2 and 2 = 1: we still get over 97% validity on the SWaT dataset while the typical change stays under 0.015 (in normalized units). In practice, engineers can pick these two knobs to suit their goals: smaller values if they want to pinpoint the root cause with minimal edits, larger values if making sure the validity is more important than keeping the edits tiny.

6. Conclusion

In this work, we introduced a novel framework to generate counterfactual explanations tailored for graph neural network-based model. Our approach leverages the representational power of GNNs to model complex inter-sensor relationships in our two-stage explanation mechanism which enables interpretable counterfactual reasoning. Extensive experiments on the SWaT and WADI benchmarks show that our two-stage framework cuts the number of perturbed sensors to less than 6% on average, while generating highly valid counterfactual explanation. This superior sparsity–proximity trade-of means the counterfactuals are both concise and easier for practitioners to act upon.

Our framework contributes to more transparent and trustworthy machine learning solutions for safetycritical domains by bridging the gap between black-box anomaly detection using GNNs and explainable AI. Future work may explore weighted similarity-based relationships in graphs, the integration of domain constraints, real-time explanation generation, and multi-criteria optimization.

Acknowledgments

The work was carried out with support from Vinnova (Sweden’s innovation agency) through the Advanced Digitalisation Program as part of the future AI-based maintenance project (project number: 2023-01917).

Declaration on Generative AI

During the preparation of this work, the authors used Chat-GPT-4 in order to: Improving writing style, grammar and spelling check. After using this services, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Chandola ,

Banerjee ,

Kumar , Anomaly detection: A survey, ACM computing surveys (CSUR) 41 ( 2009 ) 1 - 58 .

[2]

Zamanzadeh Darban ,

G. I.

Webb ,

Pan ,

Aggarwal ,

Salehi , Deep learning for time series anomaly detection: A survey , ACM Computing Surveys 57 ( 2024 ) 1 - 42 .

[3]

Jin ,

H. Y.

Koh ,

Wen ,

Zambon ,

Alippi ,

G. I.

Webb ,

King ,

Pan , A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection , IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2024 ).

[4] Graph neural network-based anomaly detection in multivariate time series , volume 35 , 2021 .

[5]

Wu ,

Jain ,

Wright ,

Mirhoseini ,

J. E.

Gonzalez , I. Stoica, Representing long-range context for graph neural networks with global attention , Advances in neural information processing systems 34 ( 2021 ) 13266 - 13279 .

[6]

Molnar , Interpretable Machine Learning , 3 ed ., 2025 . URL: https://christophm.github. io/ interpretable-ml-book.

[7]

Karlsson ,

Rebane ,

Papapetrou ,

Gionis , Locally and globally explainable time series tweaking , Knowledge and Information Systems 62 ( 2020 ) 1671 - 1700 .

[8] Learning time series counterfactuals via latent space representations , Springer, 2021 .

[9] Instance-based counterfactual explanations for time series classification , Springer, 2021 .

[10] Counterfactual explanations for multivariate time series , IEEE, 2021 .

[11]

Sun ,

Aoki ,

K. H.

Wilson , Counterfactual explanations for multivariate time-series without training datasets , arXiv preprint arXiv:2405.18563 ( 2024 ).

[12] Cf-gnnexplainer: Counterfactual explanations for graph neural networks , PMLR , 2022 .

[13]

Ma ,

Guo ,

Mishra ,

Zhang ,

Li , Clear: Generative counterfactual explanations on graphs , Advances in neural information processing systems 35 ( 2022 ) 25895 - 25907 .

[14]

Bajaj ,

Chu ,

Z. Y.

Xue ,

Pei ,

Wang , P. C. -H. Lam , Y. Zhang, Robust counterfactual explanations on graph neural networks , Advances in Neural Information Processing Systems 34 ( 2021 ) 5644 - 5655 .

[15] SWaT: A water treatment testbed for research and training on ICS security , IEEE, 2016 .

[16] WADI: a water distribution testbed for research in the design of secure cyber physical systems , 2017 .

[17] Learning Sparse Latent Graph Representations for Anomaly Detection in Multivariate Time Series , 2022 .