1. Introduction

February

1613-0073

Neural Networks for Network Intrusion Detection with Structural Attacks

Dimitri Galli

dimitri.galli@unimore.it 1

Andrea Venturi

andrea.venturi@unimore.it 1

Isabella Marasco

isabella.marasco4@unibo.it 0

Mirco Marchetti

mirco.marchetti@unimore.it 1

Workshop

Explainable Artificial Intelligence, Graph Neural Network, Network Intrusion Detection

0 University of Bologna, Department of Computer Science and Engineering , 40126 Bologna , Italy 1 University of Modena and Reggio Emilia, Department of Engineering “Enzo Ferrari” , 41125 Modena , Italy

2025

0 3 8

Among Machine Learning (ML) models, Graph Neural Networks (GNN) have been shown to improve the performance of modern Network Intrusion Detection Systems (NIDS). However, their black-box nature poses a significant challenge to their practical deployment in the real world. In this context, researchers have developed eXplainable Artificial Intelligence (XAI) methods that reveal the inner workings of GNN models. Despite this, determining the most efective explainer is complex because diferent methods yield diferent explanations, and there are no standardized strategies. In this paper, we present an innovative approach for evaluating XAI methods in GNN-based NIDS. We evaluate explainers based on their capability to identify key graph components that an attacker can exploit to bypass detection. More accurate XAI algorithms can identify topological vulnerabilities, resulting in more efective attacks. We assess the efectiveness of diferent explainers by measuring the severity of structural attacks guided by the corresponding explanations. Our case study compares five XAI techniques on two publicly available datasets containing real-world network trafic. Results show that the explainer based on Integrated Gradients (IG) generates the most accurate explanations, allowing attackers to refine their strategies.

1. Introduction

Machine Learning (ML) and Deep Learning (DL) algorithms can enhance the capabilities of modern Network Intrusion Detection Systems (NIDS) [1]. Recent research shows that ML methods improve the classification of cyber attacks, reducing the reliance on manual rule creation. However, traditional ML-based NIDS treat data features as independent variables and data points as individual samples, limiting their efectiveness in capturing the complex dependencies of modern multi-flow attacks in real-world scenarios [2, 3]. To overcome this limitation, Graph Neural Networks (GNN) analyze both individual features and topological structures of network trafic during the training process [ 4]. Indeed, GNN are a family of neural networks capable of processing network hosts and their communications as nodes and edges within a graph. Since graphs represent the inherent inter-dependencies within network trafic, GNN models can detect malicious patterns exhibited at the topological level [ 5].

Despite their efectiveness, GNN models operate as black-boxes. This lack of transparency hinders their use in practical contexts [6], where security analysts need to understand why a cyber detector lfags some flows as malicious [

7]. Explainable Artificial Intelligence (XAI) techniques attempt to bridge this gap by defining

explanations that identify which components in the network graph influence the decision-making of GNN models [8]. In this context, diferent XAI methods could be used to explain GNN predictions. However, each explainer exploits its mechanisms to identify the most relevant structures. Therefore, diferent explainers may return diferent explanations, making it unclear which method should be considered correct. There are no automated frameworks to evaluate the (M. Marchetti) ∗Corresponding author. †These authors contributed equally.

CEUR

ceur-ws.org quality of explanations in GNN-based NIDS. Furthermore, existing methodologies do not consider the dynamic nature of network trafic. Existing evaluation approaches rely on expensive ground truth explanations [9]. Therefore, these frameworks require importance labels to be available a priori, which limits their use in constantly evolving scenarios. Traditional metrics for evaluating XAI methods examine how model predictions change when relevant structures are isolated or removed from the complete graph. These strategies do not consider any constraints on the network structure and may disrupt the distributed topology of modern cyber attacks [10].

We aim to develop an evaluation framework that satisfies several key properties, such as being (i) agnostic, i.e., independent of the specific type of explainers; (ii) flexible , i.e., usable without the need for ground truth explanations; (iii) practical, i.e., useful in real-world scenarios. In this paper, we present an innovative methodology for comparing and evaluating explainers for GNN-based NIDS that fulfills these requirements. Our approach is based on structural attacks, where attackers manipulate the graph topology to evade detection [11, 12]. Indeed, these perturbations have proven to be efective against GNN-based NIDS [13]. In our proposed strategy, we adversarially change the network topology by injecting the components considered relevant by each explainer into the graph and measuring the severity of structural attacks. We assume that the most efective XAI techniques highlight the explanations that lead to the most impactful attacks.

We apply our approach to an experimental case study to demonstrate how our proposal identifies the most appropriate XAI methods for GNN-based NIDS, even without ground truth explanations. We compare five explainers tailored for graph-based models, considering two public datasets extensively used in network intrusion detection. We design a GNN-based NIDS that achieves good detection performance, allowing practical evaluation of the explainers. Our results reveal an overall increase in attack severity when attackers employ explanations, especially when they exploit Integrated Gradients (IG) to locate graph model topological vulnerabilities.

The paper is structured as follows. Section 2 provides background knowledge on GNN and XAI for NIDS. Section 3 describes the evaluation methodology. Section 4 details the experiments. Section 5 presents the results. Section 6 concludes the paper with final remarks.

2. Background and Related Work

ML-based NIDS use supervised algorithms to classify network trafic and detect malicious patterns [ 1]. To build such systems, ML algorithms are trained on labeled netflows , where each data entry reports a set of metrics and statistics—also referred to as features—that summarize the communication between two hosts in the monitored network [14]. Traditional ML-based NIDS analyze data features independently and consider each flow in isolation, enabling near real-time responses [ 15]. However, these models often struggle to detect modern attacks that rely on complex multi-flow techniques [ conversion is shown in the example in Figure 1. is formally defined as = ( , ), where is a set of vertices or nodes, and is a set of edges or links. The flow graph

representation is the most common transformation for a computer network because each endpoint in the monitored network represents a node, and each flow corresponds to a link in the graph [17]. However, the majority of GNN perform node classification using a line graph representation, where flows are mapped directly to graph vertices, which are connected if they share a host in the () respective flows [ 18]. If a flow graph has nodes and edges, then its line graph transformation has nodes and 1 ∑=1 2 − edges, where is the degree of the node in the flow graph. This 2 1,4 5,4

Next, we train the GNN model to transform graph nodes into embeddings. These embeddings encapsulate both flow features and structural similarities in high-dimensional vectors. In transductive settings, the GNN learns on a single graph [19]. To improve the generalization of the GNN, inductive strategies use diferent input graphs for training and testing the model [ 10]. Once generated, the embeddings with their respective labels can be used to train a classifier . Indeed, any ML model can distinguish between benign and malicious samples.

Despite their high detection performance, GNN-based NIDS are still underutilized in practical contexts [20]. The lack of transparency makes these models opaque to cybersecurity practitioners, who should understand the rationale behind the detector’s predictions to validate alerts [8]. XAI algorithms applied to GNN use diferent approaches to provide insight into which components influence the inner workings of the models. These explainers extend beyond feature importance by defining a mask that assigns weights to the components in the graph based on their contribution to predictions [21]. We can have a hard mask if the explanation scores take binary values or a soft mask if the importance scores take continuous values. This mask may refer to a subgraph whose elements are related to their importance in interpreting model predictions.

Many explainability methods have been proposed for graph learning models. The paper referenced in [22] systemizes explainability methods for GNN based on the explanation target. Explainers working at the instance-level extract features relevant to the GNN output. In contrast, model-level explainers provide a general understanding of the model. Explanation methods can be further classified based on their integration with the ML model, as reported in [23]. Post-hoc methods act as external components dealing with pre-trained models with fixed weights. Self-interpretable methods are directly integrated into the neural network. Our analysis considers instance-level and post-hoc explainers, as they are used in real-world scenarios. Due to code availability, we focus on gradient-based methods [24, 25], which estimate importance scores by computing the gradient of the GNN, and perturbation-based methods [26, 27], which perturb the input graph by removing nodes or rewiring edges to obtain the explanation subgraph.

Compared to evaluating traditional ML models, the evaluation of the quality of diferent explanations is a complex task. On the one hand, supervised approaches [28, 29] compare the explanation with a ground truth importance. These approaches assume that the elements critical to a particular prediction are known, enabling an objective and quantitative evaluation of explanation methods. However, these strategies require human supervision to decide what is important for the model. Consequently, generating ground labels is time-consuming, especially when working with real-world datasets [30]. Security analysts should investigate complex dependencies that require significant efort and domain expertise. Our evaluation framework does not rely on ground truth, making it adaptable and flexible to any dataset. On the other hand, unsupervised approaches [31, 32] evaluate how explanations extracted by XAI methods influence model predictions, either by isolating or removing the significant components. These approaches are flexible as they can rely on existing metrics and do not require ground truth explanations. However, explainability methods aim to identify the most relevant factors, generating explanations that should be small and sparse. Therefore, important graphs may be out-of-distribution, leading to an incorrect evaluation [33]. Explanatory subgraphs can extract components that do not represent the distributed topology of modern attacks. Instead of removing explanations from the original data distribution, our methodology identifies and injects input key components into the dataset, perturbing the graph topology while preserving the extracted structures.

3. Methodology

As discussed in Section 2, evaluating explainability methods in GNN-based NIDS is a challenging task, as it often depends on expensive ground truth labels or unrealistic explanatory subgraphs. We present an innovative unsupervised evaluation framework that leverages structural adversarial attacks to compare diferent explainers. More specifically, our strategy is based on the detection accuracy of GNN-based NIDS under realistic structural adversarial attacks [13]. The proposed methodology evaluates the overall quality of the explainability methods by analyzing their contribution to attacking the GNN model.

By design, diferent explainability methods highlight the graph components that are important for the model predictions. The general idea of our methodology is to evaluate explainers by observing how well structural adversarial attacks driven by explanations change the graph structure and thwart the GNN-based NIDS. To evaluate the quality of each method, we calculate the attack severity (AS) [34], which measures the degradation of the detector performance when key graph components perturb the input topology. Our hypothesis is straightforward: accurate XAI algorithms identify the graph elements most critical for evasion, leading to higher AS when exploited. Our framework identifies the most accurate XAI method and shows how attackers can exploit these insights to refine their strategies. Our strategy can further help security practitioners strengthen their defenses against such threats.

An example of the proposed methodology is shown in Figure 2. To improve readability, we model network trafic using a flow graph. In this model, nodes and edges represent hosts and communications. The compromised host 1 is controlled by the adversary and coordinates the botnet with nodes 2, 3, 4 to flood the victim node 3 with a massive amount of packets. A GNN-based NIDS detects DDoS attacks where diferent hosts send packets to a single target node. In particular, the GNN is trained to detect malicious patterns by flagging the edges associated with them as suspect. The explainer extracts a mask containing the most important flow records for the model. These samples correspond to edges within the flow graph critical to detecting cyber attacks. These elements enhance structural attacks to evade the GNN-based NIDS. The XAI method that identifies the samples leading to more misclassifications is considered the most precise.

Below we define the deployment scenario and describe the two phases of the proposed evaluation strategy.

3.1. Deployment Scenario

We consider the same deployment scenario as previously proposed in [18]. The corporate network includes multiple devices and a single border router that facilitates communication for all the hosts. We also assume that a remote attacker has built his own C&C infrastructure by compromising one or more devices to perform malicious operations. We suppose that a GNN-based NIDS monitors the internal network by analyzing the graph representation of network trafic.

Network packets passing through the border router are captured by a flow exporter , which extracts the corresponding flow records . These samples, collected during a specific time window, are processed by a graph generator to produce the graph . In this evaluation methodology, we consider a flow graph , where the hosts in the internal network are associated with nodes, while the communications are associated with edges. In the experiments, we translate this graph representation into a line graph () by considering the edges in as nodes in () and linking together two nodes in () if the corresponding edges in share a vertex.

Input

Output

Results Detector Detector u1 u2 c2 u3 c1 c3

c4 u4

Prediction Explanation u1 u2 c2 u3 Explanatory Graph c1 c3

c4 u4 Output

Results

Scores - F1 - Precision - Recall - Severity

Scores u1 u2 c2 u3 c1 c3

c4 u4 Perturbed Graph

Explainer

Input

Once generated, the graph is processed by the GNN-based NIDS to produce the embeddings ∈ ℝ× , where is the number of flows (i.e., edges in the flow graph or equivalently nodes in the line graph), while is the dimension of the latent space [35]. The embeddings allow the GNN model to perform classification as they encapsulate the f -dimensional features ∈ ℝ× and binary labels ∈ {0, 1} of the corresponding flow records, but also represent the most significant structures in the network topology [36]. The final layer of the detector consists of a binary classifier that predicts whether the embeddings are legitimate or malicious. However, while the GNN-based NIDS performs the classification, the rationale behind the predictions remains unclear to security analysts, highlighting the need for XAI techniques.

3.2. Explaining

The first phase of our proposed framework uses explainability methods to identify the influential components in the input graph and provide insights into the GNN-based NIDS. In this phase, we aim to compute explanations that an attacker can exploit to manipulate the input graph structure.

As discussed in Section 2, we focus on instance-based explainers that are capable of highlighting specific graph components, omitting model-based explainers due to their heterogeneity. In this context, we follow a post-hoc approach, examining pre-trained models and excluding self-explainable strategies due to their infeasibility in the real world.

Given the flow graph and the GNN predictions ,̂ an explainer defines an explanation mask, i.e., a graph ⋆ = ( ⋆, ⋆) where each node ⋆ ∈ ⋆ and edge ⋆ ∈ ⋆ has an importance value ℎ ∈ [0, 1] representing its contribution to the intrusion detection. Since we want to compute a relevance value for each record in the dataset using a flow graph representation, we consider the edge mask of the explanatory graph ⋆, leaving out the node weights.

When the explanation is extracted, we select the most significant benign flows ℬ⋆. We focus on benign netflows ℬ rather than malicious ones ℳ because the structural adversarial attacks considered in our evaluation exploit legitimate records to manipulate the graph structure. In addition, benign communications dominate the network trafic, making them more accessible for analysis. In contrast, malicious transmissions are often rare in the real world, making it dificult to establish a consistent baseline for evaluation. Adversaries who use benign flows to hide their malicious operations also pose a significant challenge to detection systems. Indeed, attackers can inject benign edges into the neighborhood of compromised nodes within the graph to efectively evade detection of malicious links.

To prioritize which benign flows have a major impact on the detector performance, we rank and select the top K important legitimate netflows

ℬ⋆ based on the computed relevance values ℎ . In the next step, we evaluate which explainer really captures the influential links, as diferent explainers may return diferent explanations.

3.3. Evaluation

The actual evaluation of the explainability algorithm happens in this phase. Our approach leverages structural attacks, where attackers manipulate the graph structure to evade the GNN-based NIDS. As explainability methods highlight important components within the graph, attacks based on these explanations reflect which method most correctly identifies the flows critical to the detection. In our assessment, the most efective explainer is the one that identifies netflows leading to the highest number of misclassified samples when submitted to the GNN-based NIDS.

At this stage, we consider a realistic gray-box attack scenario [37], where the attacker has partial knowledge of the defensive system. The attacker knows that a GNN-based NIDS monitors the network, but he has no insight into its parameters. We also assume that the attacker has a limited quantity of lfow records available to define explanations for the input graph components. Hence, the attacker can compute an explanatory mask to identify the important benign netflows and perturb the input graph structure.

To evaluate the accuracy of diferent explainability algorithms, we focus on C2xℬ attacks [13]. Here, attackers initiate new benign communications ℬ from compromised hosts to random targets to manipulate the graph and thwart the GNN. In other words, attackers change the structural patterns of their attacks by perturbing the graph topology. Indeed, these perturbations generate embeddings that evade detection by tricking the GNN into making misclassifications. This approach is feasible in the real world because the attacker can execute any strategy on the compromised hosts once he has complete control over them. Consequently, evaluating GNN explanations using C2xℬ attacks constitutes a practical strategy. Although these attacks succeed in evading the GNN, the contribution of explanations is not considered in the threat model. Indeed, the attacks proposed in the original work [13] randomly select legitimate netflows without considering which ones might be the most relevant for the detection task.

In our framework, we do not randomly collect the benign flows to manipulate the graph. Instead, we rank and select the top legitimate netflows based on their relevance scores ℎ . We then use these important samples to evade the detector and evaluate the explainer. More specifically: 1. From the explanatory graph ⋆, we identify the most important benign flows ℬ⋆ and inject them into the original dataset , resulting in an augmented dataset ̂ . 2. Given the perturbed dataset ̂ , we build the corresponding flow graph ̂, where important legitimate samples ℬ⋆ are associated with new edges. 3. The manipulated graph ̂ leads to misclassifications when fed to the GNN-based NIDS, since the neighborhood of compromised nodes includes the benign edges ℬ⋆. 4. The efectiveness of the XAI technique is evaluated by measuring the impact of the injected legitimate transmissions ℬ⋆ into the graph on the GNN-based NIDS predictions.

4. Case Study

In Section 3, we described our evaluation framework to compare XAI methods in GNN-based NIDS. We now present an experimental case study to validate our proposed methodology. In particular, we present the datasets used, the GNN-based NIDS targeted, the diferent explainers tested, and the framework implementation, which leverages DGL1 and PyTorch Geometric2 libraries to provide reliable graph processing and model training. To ensure the reproducibility of our results, the source code of the following experiments is freely available at https://github.com/dimgalli/evaluating-xai.git.

4.1. Datasets

We base the case study on two publicly available datasets: CTU-13 [38] and ToN-IoT [39]. These two datasets consist of labeled netflows and are widely referenced in the literature, making them reliable benchmarks for this study [35]. CTU-13 contains network traces that combine benign and malicious trafic from several real-world botnet variants. These botnets exhibit specific structural behaviors that align well with the capabilities of GNN-based NIDS. ToN-IoT contains heterogeneous types of IoT data. We consider the benign flow records and the malicious netflows of the cyber threats. The attacks in the two datasets exhibit complex malicious patterns, relying on diferent multi-flow strategies. Therefore, they are excellent candidates for evaluating GNN explainability methods.

We apply the same preprocessing steps described in [40], which include discarding non-TCP trafic and ifltering out outliers. We also eliminate explicit IP addresses and port numbers to avoid separating flows by spurious correlations [41]. The final feature set includes more than 30 attributes and is consistent with previous work [18]. In particular, we build separate collections for each cyber threat in the two datasets, excluding the sets with insuficient malicious netflows that would yield underperforming detectors. We combine the benign data and the malicious samples for each specific threat in a benign:malicious ratio of 10:1. This distribution ratio reflects real-world scenarios [ 41]. Each collection is then divided into training and test subsets using an 80:20 ratio. We report the resulting number of benign and malicious lfows of the two datasets used to train and test the models in Table 1, where each row indicates the botnet variant and attack type considered in the experiments.

4.2. Detectors

Our evaluation leverages GNN’s unique ability to capture relational dependencies between netflows that classical ML techniques fail to achieve. This feature makes GNN models strong candidates for evaluating explanations since the perturbation of key graph components afects detection performance [ 42]. To guarantee a fair and meaningful comparison between diferent XAI methods, we focus on GNN that perform node-level classification. Indeed, these models generate more informative embeddings and exhibit greater robustness to adversarial perturbations [43].

Our GNN-based NIDS is built on the GraphSAGE model [44], which aggregates features to generate node embeddings. In the context of this paper, it is referred to as LineGraphSAGE [45] because it allows node classification directly on line graphs. As discussed in Section 2, flow graphs can be converted to line graphs using a linearization procedure. Attributes of the original netflows are transferred to the nodes

1https://docs.dgl.ai

2https://pytorch-geometric.readthedocs.io/ of the line graph, so GraphSAGE can be used to learn node embeddings without losing information from the netflow features. In addition, the inductive nature of this algorithm allows an efective embedding generation for nodes in unseen test graphs. Therefore, this approach makes GraphSAGE more suitable for network intrusion detection in the real world.

For each attack in the two datasets, we train a specific instance of the binary classifier to detect the malicious netflows (i.e., malicious nodes in the line graph), as suggested by prior research [ 46, 47]. We implement the detectors using the same hyperparameters outlined in [13].

4.3. Explainers

For our case study, we evaluate five diferent XAI algorithms tailored for GNN models. As discussed in Section 2, GNN-based NIDS learn attack patterns simultaneously from flow features and topological structures. However, it is unclear whether GNN relies specifically on feature-level information or graph structure when making its predictions. For this reason, we consider in the experiments two methods that highlight flow-level features [ 24, 25] and two strategies that provide explanations representative of the underlying topology [26, 27]. We recall that we use post-hoc XAI strategies to identify the structural vulnerabilities of the cyber detector, focusing on improving the overall security rather than the interpretability.

The XAI algorithms evaluated in the experiments are: Dummy Explainer (DE), Integrated Gradients (IG) [24], Saliency (SA) [25], GNNExplainer (GE) [26], and GraphMask (GM) [27]. DE assigns random explanation scores as a baseline for comparison. IG calculates feature importance by integrating the gradients of model output relative to input along a baseline-to-input path. SA computes node importance by measuring gradients of model output with respect to input features. GE defines critical graph substructures by estimating the mutual information. GM generates masked versions of the graph by dropping edges and observing their efect on model predictions.

4.4. Framework Implementation

We now describe the implementation of our evaluation methodology, which is detailed in Section 3.

As mentioned in Section 4.1, each collection containing benign and malicious samples of a particular cyber threat is divided into training and test subsets, preserving the 10:1 benign:malicious ratio typical of real-world settings [41]. For each collection obtained, we generate the flow graph and then the line graph ( ) representation of the network trafic. Then, line graphs obtained from training sets are used to train an ensemble of LineGraphSAGE classifiers to detect specific malicious variants, as outlined in Section 4.2. Instead, line graphs obtained from test sets are used to evaluate detectors and explainers.

Each of the explainers presented in Section 4.3 is then applied to each instance of LineGraphSAGE to compute the corresponding explanations. Each explainer takes the clean line graph ( ) containing the network trafic of particular threat and the corresponding model predictions ̂ to define its explanatory graph ( ⋆), where each node ⋆ has an importance score ℎ indicating its influence in the detection process. On the one hand, gradient-based algorithms (i.e., IG and SA) assign a score to each attribute, so we calculate the average over the features to derive a single value for each node in the graph. On the other hand, perturbation-based methods (i.e., GE and GM) directly return a mask containing a score for each vertex. Since our strategy relies on C2xℬ structural attacks performed in the problem space, we tag each vertex ⋆ of the graph ( ⋆) with an identifier to retrieve the original netflows in the dataset . We then rank all flow records based on their importance score ℎ . Next, we consider only the benign lfows ℬ and discard the malicious samples ℳ. To ensure a fair comparison between diferent attacks, we select the top 10% of legitimate examples ℬ previously sorted by relevance ℎ .

To evaluate the explainer’s performance, we test the robustness of LineGraphSAGE against C2xℬ attacks [13]. In this scenario, attackers set up new benign communications from their compromised nodes to random destinations in the network. In other words, we inject new data samples into the test set, introducing new components into the resulting graph—edges in the flow graph and nodes in the line graph. This perturbation is practically feasible because the generation of the graph representation occurs once a suficient number of flows have been collected [ 20]. The original attacks involve sending benign netflows ℬ from each compromised host to random targets . Consequently, this process injects ∗ | C| new benign nodes into the resulting line graph, where |C| is the number of controlled hosts. However, instead of randomly sampling benign flows from the available pool, we select the relevant legitimate netflows to perturb the graph structure. This strategy enables us to evaluate the contribution of diferent explainers in refining the structural violations. To ensure that new communications originate from compromised nodes in the graph, we replace source IP addresses and port numbers included in the benign flows with those of the controlled hosts. We increase the number of legitimate samples injected into the test set by considering the following step values for : 1, 2, 5, 10, and 20.

5. Results

We now present the results of our case study, where we apply the proposed evaluation strategy to a state-of-the-art GNN-based NIDS.

First, we evaluate LineGraphSAGE classifiers on the clean line graphs generated from the respective unperturbed validation sets to confirm the suitability of the considered GNN-based NIDS as a target for structural attacks. For this evaluation, we leverage measures commonly used in computer security [34], namely F1-score, precision, and recall. Considering a malicious network flow as a positive sample, these three evaluation metrics can be summarized as follows: 1 = 2 × × + =

+ =

+ where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively. These metrics range between 0 and 1, with higher scores indicating better performance. We recall that we do not rely on accuracy scores since the datasets employed in network intrusion detection are typically highly unbalanced.

We then evaluate explainers by observing how the benign flows they identify as important influence LineGraphSAGE performance once explanations are injected into the graph to perturb its overall structure. For this purpose, we rely on the attack severity (AS) measure [34]. This metric is defined as follows:

( after the attack ) = 1 − ( before the attack) where the numerator indicates the recall score on the perturbed graph, while the denominator is the recall value on the clean graph. This metric ranges between 0 and 1, with highly accurate attacks getting AS values closer to 1. Therefore, the most efective explainers are those achieving AS values close to 1, resulting in the most severe attacks.

5.1. Detectors Performance

Our evaluation campaign begins by testing the GNN-based NIDS presented in Section 4.2 on clean network trafic. In particular, we create a line graph representation for each test set and evaluate the specific instance of the GNN-based NIDS on it. The performance of the LineGraphSAGE classifiers on the two considered datasets is shown in Table 2. Each row refers to a threat and reports the detection results of the specific GNN classifier on it, with the last row summarizing the mean and the standard deviation values across the diferent instances.

The GNN-based NIDS achieves high performance across all evaluation metrics, with an average F1-score of 0.935 and 0.995 on CTU-13 and ToN-IoT datasets, respectively. From Table 2, we observe that our LineGraphSAGE models obtain scores that are in line with those of the state-of-the-art [13]. All classifiers exceed 0.9 in performance on the CTU-13 dataset, except on Neris trafic, where the GNN-based detector struggles due to the heterogeneity of malicious netflows, with a F1-score of 0.846 and a recall of 0.767. By contrast, each instance of LineGraphSAGE obtains near-perfect performance F1-score, precision, and recall values of the GNN classifiers on clean graphs. scores on the ToN-IoT dataset, reflecting its reliability in detecting multi-flow cyber attacks with diferent structural topologies. The performance results prove that our GNN-based NIDS is efective in deployment scenarios that do not involve manipulated network trafic. Therefore, the considered detection model represents a solid target for the structural attacks through which we evaluate the diferent explanations.

5.2. Explainers Performance

We now evaluate the quality of the explanations extracted by the five XAI methods presented in Section 4.3. For each considered explainer, we generate an explanatory graph and inject its benign lfow records into the corresponding test set, manipulating the resulting line graph with new nodes. As discussed in Section 4.4, we gradually increase the number of key benign samples ℬ⋆ to estimate the severity of structural attacks at distinct perturbation levels. To assess the efectiveness of the explanation-based attacks, we feed the perturbed graphs into the GNN-based NIDS and measure how much the overall detection capability drops.

We compare the efectiveness of explainability methods in identifying important flow records and perturbing the resulting graph structure in Table 3, where each cell corresponds to the mean AS value obtained by the explainer in the column on the network trafic in the row. Each AS value is averaged over the perturbation steps . For each row, we mark the best AS value in bold. The rows at the bottom in light gray summarize the AS values obtained by each explainer with the mean value computed over the diferent cyber attacks.

We immediately observe that AS values are not zero. The results are consistent with those obtained in the original work [13] and validate the vulnerability of GNN-based NIDS to structural attacks. The IG method achieves the highest AS values across all attacks, with an average value of 0.422 and 0.195 on CTU-13 and ToN-IoT datasets, respectively. This result shows that the features of flows are important for the detection model and support more efective perturbations. We observe that SA slightly outperforms IG over Rbot trafic, with a marginal diference of

0.004 in the mean values (0.237 for SA and 0.233 for IG). Similarly, GraphMask outperforms IG over DDoS and DoS attacks with average values of 0.273 and 0.023, compared to 0.187 and 0.018 of IG. The AS of IG measured on Menti and Murlo are the most significant, with average values equal to

0.728 and 0.900, respectively. This result demonstrates the efectiveness of explanations in perturbing graphs formed by a limited number of components. By contrast, the detector trained on DoS trafic is the most robust, with AS values not exceeding 0.1. Indeed, the limited number of controlled hosts reduces the attacker’s potential threat. Our findings demonstrate that only attacks based on IG are efective than those that leverage random benign records. Hence, IG-generated explanations allow attackers to determine the critical features of network trafic and the structural vulnerabilities of the GNN-based NIDS. Attack severity (AS) values of the GNN classifiers on graphs perturbed with each explanation.

6. Conclusions

To address the black-box nature of GNN, XAI algorithms have been developed by practitioners and researchers. However, evaluating diferent XAI methods is challenging because traditional approaches often rely on expensive ground truth explanations or oversimplified metrics. In this paper, we present a novel framework for evaluating GNN explainers based on their ability to identify the most relevant graph components and improve the severity of structural attacks against GNN-based NIDS. Our experimental campaign, conducted on two public datasets, shows that the IG explainer consistently outperforms other methods in determining the most accurate explanations and guiding powerful attacks. More specifically, IG achieves the highest AS values in most cases, while GraphMask excels in DDoS and DoS scenarios due to the characteristic topological patterns associated with these threats. These results confirm the efectiveness of adversarial perturbations against GNN-based NIDS, especially when explanations generate structural attacks. Our proposal represents a significant contribution to the evaluation of XAI methods in GNN-based NIDS. Future research could expand the proposed work to other areas within cybersecurity, exploring new attack methods and defense strategies to improve the resilience of GNN-based NIDS.

Acknowledgments

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. detection, computers & security 56 (2016) 28–49. [1] R. Sommer, V. Paxson, Outside the closed world: On using machine learning for network intrusion detection, in: 2010 IEEE symposium on security and privacy, IEEE, 2010, pp. 305–316. [2] F. Manganiello, M. Marchetti, M. Colajanni, Multistep attack detection and alert correlation in intrusion detection systems, in: Information Security and Assurance: International Conference, ISA 2011, Brno, Czech Republic, August 15-17, 2011. Proceedings, Springer, 2011, pp. 101–110. [3] F. Pierazzi, S. Casolari, M. Colajanni, M. Marchetti, Exploratory security analytics for anomaly [4] W. Jiang, Graph-based deep learning for communication networks: A survey, Computer Commu[5] D. Pujol-Perich, J. Suárez-Varela, A. Cabellos-Aparicio, P. Barlet-Ros, Unveiling the potential of graph neural networks for robust intrusion detection, ACM SIGMETRICS Performance Evaluation Review 49 (2022) 111–117. [6] A. Warnecke, D. Arp, C. Wressnegger, K. Rieck, Evaluating explanation methods for deep learning in security, in: 2020 IEEE european symposium on security and privacy (EuroS&P), IEEE, 2020, pp. 158–174. [7] N. Moustafa, N. Koroniotis, M. Keshk, A. Y. Zomaya, Z. Tari, Explainable intrusion detection for cyber defences in the internet of things: Opportunities and solutions, IEEE Communications Surveys & Tutorials 25 (2023) 1775–1807. [8] A. Nadeem, D. Vos, C. Cao, L. Pajola, S. Dieck, R. Baumgartner, S. Verwer, Sok: Explainable machine learning for computer security applications, in: 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), IEEE, 2023, pp. 221–240. [9] G. Apruzzese, P. Laskov, A. Tastemirova, Sok: The impact of unlabelled data in cyberthreat detection, in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), IEEE, 2022, pp. 20–42. [10] H. Zhu, J. Lu, Graph-based intrusion detection system using general behavior learning, in:

GLOBECOM 2022-2022 IEEE Global Communications Conference, IEEE, 2022, pp. 2621–2626. [11] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, F. Roli, Evasion attacks against machine learning at test time, in: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, Springer, 2013, pp. 387–402. [12] L. Sun, Y. Dou, C. Yang, K. Zhang, J. Wang, S. Y. Philip, L. He, B. Li, Adversarial attack and defense on graph data: A survey, IEEE Transactions on Knowledge and Data Engineering 35 (2022) 7693–7711. [13] A. Venturi, D. Stabili, M. Marchetti, Problem space structural adversarial attacks for network intrusion detection systems based on graph neural networks, arXiv preprint arXiv:2403.11830 (2024). [14] G. Vormayr, J. Fabini, T. Zseby, Why are my flows diferent? a tutorial on flow exporters, IEEE

Communications Surveys & Tutorials 22 (2020) 2064–2103. [15] Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, F. Ahmad, Network intrusion detection system: A systematic study of machine learning and deep learning approaches, Transactions on Emerging Telecommunications Technologies 32 (2021) e4150. [16] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE transactions on neural networks 20 (2008) 61–80. [17] W. W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, M. Portmann, E-graphsage: A graph neural network based intrusion detection system for iot, in: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, IEEE, 2022, pp. 1–9. [18] A. Venturi, M. Ferrari, M. Marchetti, M. Colajanni, Arganids: a novel network intrusion detection system based on adversarially regularized graph autoencoder, in: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, 2023, pp. 1540–1548. [19] J. Zhou, Z. Xu, A. M. Rush, M. Yu, Automating botnet detection with graph neural networks, arXiv preprint arXiv:2003.06344 (2020). [20] A. Venturi, D. Pellegrini, M. Andreolini, L. Ferretti, M. Marchetti, M. Colajanni, et al., Practical evaluation of graph neural networks in network intrusion detection, in: CEUR Workshop Proceedings, volume 3488, CEUR-WS, 2023. [21] A. Longa, S. Azzolin, G. Santin, G. Cencetti, P. Liò, B. Lepri, A. Passerini, Explaining the explainers in graph neural networks: a comparative study, ACM Computing Surveys (2024). [22] H. Yuan, H. Yu, S. Gui, S. Ji, Explainability in graph neural networks: A taxonomic survey, IEEE transactions on pattern analysis and machine intelligence 45 (2022) 5782–5799. [23] J. Kakkad, J. Jannu, K. Sharma, C. Aggarwal, S. Medya, A survey on explainability of graph neural networks, arXiv preprint arXiv:2306.01958 (2023). [24] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: International conference on machine learning, PMLR, 2017, pp. 3319–3328. [25] P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, H. Hofmann, Explainability methods for graph convolutional neural networks, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10772–10781. [26] Z. Ying, D. Bourgeois, J. You, M. Zitnik, J. Leskovec, Gnnexplainer: Generating explanations for graph neural networks, Advances in neural information processing systems 32 (2019). [27] M. S. Schlichtkrull, N. De Cao, I. Titov, Interpreting graph neural networks for nlp with diferentiable edge masking, arXiv preprint arXiv:2010.00577 (2020). [28] B. Sanchez-Lengeling, J. Wei, B. Lee, E. Reif, P. Wang, W. Qian, K. McCloskey, L. Colwell, A. Wiltschko, Evaluating attribution for graph neural networks, Advances in neural information processing systems 33 (2020) 5898–5910. [29] T. Funke, M. Khosla, M. Rathee, A. Anand, Zorro: Valid, sparse, and stable explanations in graph neural networks, IEEE Transactions on Knowledge and Data Engineering 35 (2022) 8687–8698. [30] P. Li, Y. Yang, M. Pagnucco, Y. Song, Explainability in graph neural networks: An experimental survey, arXiv preprint arXiv:2203.09258 (2022). [31] C. Agarwal, M. Zitnik, H. Lakkaraju, Probing gnn explainers: A rigorous theoretical and empirical analysis of gnn explanation methods, in: International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 8969–8996. [32] K. Amara, R. Ying, Z. Zhang, Z. Han, Y. Shan, U. Brandes, S. Schemm, C. Zhang, Graphframex: Towards systematic evaluation of explainability methods for graph neural networks, arXiv preprint arXiv:2206.09677 (2022). [33] X. Zheng, F. Shirani, T. Wang, W. Cheng, Z. Chen, H. Chen, H. Wei, D. Luo, Towards robust fidelity for evaluating explainability of graph neural networks, arXiv preprint arXiv:2310.01820 (2023). [34] G. Apruzzese, M. Colajanni, L. Ferretti, M. Marchetti, Addressing adversarial attacks against security systems based on machine learning, in: 2019 11th international conference on cyber conflict (CyCon), volume 900, IEEE, 2019, pp. 1–18. [35] T. Bilot, N. El Madhoun, K. Al Agha, A. Zouaoui, Graph neural networks for intrusion detection:

A survey, IEEE Access 11 (2023) 49114–49139. [36] A. Venturi, D. Galli, D. Stabili, M. Marchetti, et al., Hardening machine learning based network intrusion detection systems with synthetic netflows, in: CEUR WORKSHOP PROCEEDINGS, volume 3731, CEUR-WS, 2024. [37] A. Kuppa, N.-A. Le-Khac, Adversarial xai methods in cybersecurity, IEEE transactions on information forensics and security 16 (2021) 4924–4938. [38] S. Garcia, M. Grill, J. Stiborek, A. Zunino, An empirical comparison of botnet detection methods, computers & security 45 (2014) 100–123. [39] A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, A. Anwar, Ton_iot telemetry dataset: A new generation dataset of iot and iiot for data-driven intrusion detection systems, Ieee Access 8 (2020) 165130–165150. [40] A. Venturi, G. Apruzzese, M. Andreolini, M. Colajanni, M. Marchetti, Drelab-deep reinforcement learning adversarial botnet: A benchmark dataset for adversarial attacks against botnet intrusion detection systems, Data in Brief 34 (2021) 106631. [41] D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, K. Rieck, Dos and don’ts of machine learning in computer security, in: 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 3971–3988. [42] D. Zügner, A. Akbarnejad, S. Günnemann, Adversarial attacks on neural networks for graph data, in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 2847–2856. [43] S. Zhang, H. Tong, J. Xu, R. Maciejewski, Graph convolutional networks: a comprehensive review,

Computational Social Networks 6 (2019) 1–23. [44] W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Advances in neural information processing systems 30 (2017). [45] X. Hu, W. Gao, G. Cheng, R. Li, Y. Zhou, H. Wu, Towards early and accurate network intrusion detection using graph embedding, IEEE Transactions on Information Forensics and Security (2023). [46] M. Stevanovic, J. M. Pedersen, An analysis of network trafic classification for botnet detection, in: 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), IEEE, 2015, pp. 1–8. [47] G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, M. Marchetti, On the efectiveness of machine and deep learning for cyber security, in: 2018 10th international conference on cyber Conflict (CyCon), IEEE, 2018, pp. 371–390.