<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>February</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Neural Networks for Network Intrusion Detection with Structural Attacks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dimitri Galli</string-name>
          <email>dimitri.galli@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Venturi</string-name>
          <email>andrea.venturi@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabella Marasco</string-name>
          <email>isabella.marasco4@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirco Marchetti</string-name>
          <email>mirco.marchetti@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Explainable Artificial Intelligence, Graph Neural Network, Network Intrusion Detection</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bologna, Department of Computer Science and Engineering</institution>
          ,
          <addr-line>40126 Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Modena and Reggio Emilia, Department of Engineering “Enzo Ferrari”</institution>
          ,
          <addr-line>41125 Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <fpage>3</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>Among Machine Learning (ML) models, Graph Neural Networks (GNN) have been shown to improve the performance of modern Network Intrusion Detection Systems (NIDS). However, their black-box nature poses a significant challenge to their practical deployment in the real world. In this context, researchers have developed eXplainable Artificial Intelligence (XAI) methods that reveal the inner workings of GNN models. Despite this, determining the most efective explainer is complex because diferent methods yield diferent explanations, and there are no standardized strategies. In this paper, we present an innovative approach for evaluating XAI methods in GNN-based NIDS. We evaluate explainers based on their capability to identify key graph components that an attacker can exploit to bypass detection. More accurate XAI algorithms can identify topological vulnerabilities, resulting in more efective attacks. We assess the efectiveness of diferent explainers by measuring the severity of structural attacks guided by the corresponding explanations. Our case study compares five XAI techniques on two publicly available datasets containing real-world network trafic. Results show that the explainer based on Integrated Gradients (IG) generates the most accurate explanations, allowing attackers to refine their strategies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Machine Learning (ML) and Deep Learning (DL) algorithms can enhance the capabilities of modern
Network Intrusion Detection Systems (NIDS) [1]. Recent research shows that ML methods improve
the classification of cyber attacks, reducing the reliance on manual rule creation. However, traditional
ML-based NIDS treat data features as independent variables and data points as individual samples,
limiting their efectiveness in capturing the complex dependencies of modern multi-flow attacks in
real-world scenarios [2, 3]. To overcome this limitation, Graph Neural Networks (GNN) analyze both
individual features and topological structures of network trafic during the training process [ 4]. Indeed,
GNN are a family of neural networks capable of processing network hosts and their communications
as nodes and edges within a graph. Since graphs represent the inherent inter-dependencies within
network trafic, GNN models can detect malicious patterns exhibited at the topological level [ 5].</p>
      <p>Despite their efectiveness, GNN models operate as
black-boxes. This lack of transparency hinders
their use in practical contexts [6], where security analysts need to understand why a cyber detector
lfags some flows as malicious [</p>
      <p>7]. Explainable Artificial Intelligence (XAI) techniques attempt to bridge
this gap by defining</p>
      <p>explanations that identify which components in the network graph influence
the decision-making of GNN models [8]. In this context, diferent XAI methods could be used to
explain GNN predictions. However, each explainer exploits its mechanisms to identify the most
relevant structures. Therefore, diferent explainers may return diferent explanations, making it unclear
which method should be considered correct. There are no automated frameworks to evaluate the
(M. Marchetti)
∗Corresponding author.
†These authors contributed equally.</p>
      <p>CEUR</p>
      <p>ceur-ws.org
quality of explanations in GNN-based NIDS. Furthermore, existing methodologies do not consider the
dynamic nature of network trafic. Existing evaluation approaches rely on expensive ground truth
explanations [9]. Therefore, these frameworks require importance labels to be available a priori, which
limits their use in constantly evolving scenarios. Traditional metrics for evaluating XAI methods
examine how model predictions change when relevant structures are isolated or removed from the
complete graph. These strategies do not consider any constraints on the network structure and may
disrupt the distributed topology of modern cyber attacks [10].</p>
      <p>We aim to develop an evaluation framework that satisfies several key properties, such as being (i)
agnostic, i.e., independent of the specific type of explainers; (ii) flexible , i.e., usable without the need
for ground truth explanations; (iii) practical, i.e., useful in real-world scenarios. In this paper, we
present an innovative methodology for comparing and evaluating explainers for GNN-based NIDS that
fulfills these requirements. Our approach is based on structural attacks, where attackers manipulate
the graph topology to evade detection [11, 12]. Indeed, these perturbations have proven to be efective
against GNN-based NIDS [13]. In our proposed strategy, we adversarially change the network topology
by injecting the components considered relevant by each explainer into the graph and measuring
the severity of structural attacks. We assume that the most efective XAI techniques highlight the
explanations that lead to the most impactful attacks.</p>
      <p>We apply our approach to an experimental case study to demonstrate how our proposal identifies
the most appropriate XAI methods for GNN-based NIDS, even without ground truth explanations. We
compare five explainers tailored for graph-based models, considering two public datasets extensively
used in network intrusion detection. We design a GNN-based NIDS that achieves good detection
performance, allowing practical evaluation of the explainers. Our results reveal an overall increase in
attack severity when attackers employ explanations, especially when they exploit Integrated Gradients
(IG) to locate graph model topological vulnerabilities.</p>
      <p>The paper is structured as follows. Section 2 provides background knowledge on GNN and XAI for
NIDS. Section 3 describes the evaluation methodology. Section 4 details the experiments. Section 5
presents the results. Section 6 concludes the paper with final remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>ML-based NIDS use supervised algorithms to classify network trafic and detect malicious patterns [ 1].
To build such systems, ML algorithms are trained on labeled netflows , where each data entry reports a set
of metrics and statistics—also referred to as features—that summarize the communication between two
hosts in the monitored network [14]. Traditional ML-based NIDS analyze data features independently
and consider each flow in isolation, enabling near real-time responses [ 15]. However, these models
often struggle to detect modern attacks that rely on complex multi-flow techniques [
conversion is shown in the example in Figure 1.
is formally defined as  = (  ,  ), where  is a set of vertices or nodes, and  is a set of edges or links.
The flow graph</p>
      <p>representation is the most common transformation for a computer network because
each endpoint in the monitored network represents a node, and each flow corresponds to a link in the
graph [17]. However, the majority of GNN perform node classification using a line graph representation,
where flows are mapped directly to graph vertices, which are connected if they share a host in the
()
respective flows [ 18]. If a flow graph  has  nodes and  edges, then its line graph transformation

has  nodes and 1 ∑=1  2 −  edges, where   is the degree of the node  in the flow graph. This
2
1,4
5,4</p>
      <p>Next, we train the GNN model to transform graph nodes into embeddings. These embeddings
encapsulate both flow features and structural similarities in high-dimensional vectors. In transductive
settings, the GNN learns on a single graph [19]. To improve the generalization of the GNN, inductive
strategies use diferent input graphs for training and testing the model [ 10]. Once generated, the
embeddings with their respective labels can be used to train a classifier . Indeed, any ML model can
distinguish between benign and malicious samples.</p>
      <p>Despite their high detection performance, GNN-based NIDS are still underutilized in practical
contexts [20]. The lack of transparency makes these models opaque to cybersecurity practitioners, who
should understand the rationale behind the detector’s predictions to validate alerts [8]. XAI algorithms
applied to GNN use diferent approaches to provide insight into which components influence the inner
workings of the models. These explainers extend beyond feature importance by defining a mask that
assigns weights to the components in the graph based on their contribution to predictions [21]. We
can have a hard mask if the explanation scores take binary values or a soft mask if the importance
scores take continuous values. This mask may refer to a subgraph whose elements are related to their
importance in interpreting model predictions.</p>
      <p>Many explainability methods have been proposed for graph learning models. The paper referenced
in [22] systemizes explainability methods for GNN based on the explanation target. Explainers working
at the instance-level extract features relevant to the GNN output. In contrast, model-level explainers
provide a general understanding of the model. Explanation methods can be further classified based on
their integration with the ML model, as reported in [23]. Post-hoc methods act as external components
dealing with pre-trained models with fixed weights. Self-interpretable methods are directly integrated
into the neural network. Our analysis considers instance-level and post-hoc explainers, as they are
used in real-world scenarios. Due to code availability, we focus on gradient-based methods [24, 25],
which estimate importance scores by computing the gradient of the GNN, and perturbation-based
methods [26, 27], which perturb the input graph by removing nodes or rewiring edges to obtain the
explanation subgraph.</p>
      <p>Compared to evaluating traditional ML models, the evaluation of the quality of diferent explanations
is a complex task. On the one hand, supervised approaches [28, 29] compare the explanation with a
ground truth importance. These approaches assume that the elements critical to a particular prediction
are known, enabling an objective and quantitative evaluation of explanation methods. However,
these strategies require human supervision to decide what is important for the model. Consequently,
generating ground labels is time-consuming, especially when working with real-world datasets [30].
Security analysts should investigate complex dependencies that require significant efort and domain
expertise. Our evaluation framework does not rely on ground truth, making it adaptable and flexible to
any dataset. On the other hand, unsupervised approaches [31, 32] evaluate how explanations extracted
by XAI methods influence model predictions, either by isolating or removing the significant components.
These approaches are flexible as they can rely on existing metrics and do not require ground truth
explanations. However, explainability methods aim to identify the most relevant factors, generating
explanations that should be small and sparse. Therefore, important graphs may be out-of-distribution,
leading to an incorrect evaluation [33]. Explanatory subgraphs can extract components that do not
represent the distributed topology of modern attacks. Instead of removing explanations from the
original data distribution, our methodology identifies and injects input key components into the dataset,
perturbing the graph topology while preserving the extracted structures.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>As discussed in Section 2, evaluating explainability methods in GNN-based NIDS is a challenging task, as
it often depends on expensive ground truth labels or unrealistic explanatory subgraphs. We present an
innovative unsupervised evaluation framework that leverages structural adversarial attacks to compare
diferent explainers. More specifically, our strategy is based on the detection accuracy of GNN-based
NIDS under realistic structural adversarial attacks [13]. The proposed methodology evaluates the overall
quality of the explainability methods by analyzing their contribution to attacking the GNN model.</p>
      <p>By design, diferent explainability methods highlight the graph components that are important for
the model predictions. The general idea of our methodology is to evaluate explainers by observing how
well structural adversarial attacks driven by explanations change the graph structure and thwart the
GNN-based NIDS. To evaluate the quality of each method, we calculate the attack severity (AS) [34],
which measures the degradation of the detector performance when key graph components perturb the
input topology. Our hypothesis is straightforward: accurate XAI algorithms identify the graph elements
most critical for evasion, leading to higher AS when exploited. Our framework identifies the most accurate
XAI method and shows how attackers can exploit these insights to refine their strategies. Our strategy
can further help security practitioners strengthen their defenses against such threats.</p>
      <p>An example of the proposed methodology is shown in Figure 2. To improve readability, we model
network trafic using a flow graph. In this model, nodes and edges represent hosts and communications.
The compromised host  1 is controlled by the adversary and coordinates the botnet with nodes  2,  3,  4
to flood the victim node  3 with a massive amount of packets. A GNN-based NIDS detects DDoS attacks
where diferent hosts send packets to a single target node. In particular, the GNN is trained to detect
malicious patterns by flagging the edges associated with them as suspect. The explainer extracts a mask
containing the most important flow records for the model. These samples correspond to edges within
the flow graph critical to detecting cyber attacks. These elements enhance structural attacks to evade
the GNN-based NIDS. The XAI method that identifies the samples leading to more misclassifications is
considered the most precise.</p>
      <p>Below we define the deployment scenario and describe the two phases of the proposed evaluation
strategy.</p>
      <sec id="sec-3-1">
        <title>3.1. Deployment Scenario</title>
        <p>We consider the same deployment scenario as previously proposed in [18]. The corporate network
includes multiple devices and a single border router that facilitates communication for all the hosts. We
also assume that a remote attacker has built his own C&amp;C infrastructure by compromising one or more
devices to perform malicious operations. We suppose that a GNN-based NIDS monitors the internal
network by analyzing the graph representation of network trafic.</p>
        <p>Network packets passing through the border router are captured by a flow exporter , which extracts
the corresponding flow records  . These samples, collected during a specific time window, are processed
by a graph generator to produce the graph  . In this evaluation methodology, we consider a flow graph
 , where the hosts in the internal network are associated with nodes, while the communications are
associated with edges. In the experiments, we translate this graph representation into a line graph
() by considering the edges in  as nodes in () and linking together two nodes in () if the
corresponding edges in  share a vertex.</p>
        <p>Input</p>
        <p>Output</p>
        <p>Results
Detector
Detector
u1
u2
c2 u3
c1 c3</p>
        <p>c4 u4</p>
        <p>Prediction
Explanation
u1
u2
c2 u3 Explanatory Graph
c1 c3</p>
        <p>c4 u4
Output</p>
        <p>Results</p>
        <p>Scores
- F1
- Precision
- Recall
- Severity</p>
        <p>Scores
u1
u2
c2 u3
c1 c3</p>
        <p>c4 u4
Perturbed Graph</p>
        <p>Explainer</p>
        <p>Input</p>
        <p>Once generated, the graph is processed by the GNN-based NIDS to produce the embeddings  ∈ ℝ× ,
where  is the number of flows (i.e., edges in the flow graph or equivalently nodes in the line graph),
while  is the dimension of the latent space [35]. The embeddings allow the GNN model to perform
classification as they encapsulate the f -dimensional features  ∈ ℝ× and binary labels  ∈ {0, 1}
of the corresponding flow records, but also represent the most significant structures in the network
topology [36]. The final layer of the detector consists of a binary classifier that predicts whether
the embeddings are legitimate or malicious. However, while the GNN-based NIDS performs the
classification, the rationale behind the predictions remains unclear to security analysts, highlighting
the need for XAI techniques.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Explaining</title>
        <p>The first phase of our proposed framework uses explainability methods to identify the influential
components in the input graph and provide insights into the GNN-based NIDS. In this phase, we aim to
compute explanations that an attacker can exploit to manipulate the input graph structure.</p>
        <p>As discussed in Section 2, we focus on instance-based explainers that are capable of highlighting
specific graph components, omitting model-based explainers due to their heterogeneity. In this context,
we follow a post-hoc approach, examining pre-trained models and excluding self-explainable strategies
due to their infeasibility in the real world.</p>
        <p>Given the flow graph  and the GNN predictions  ,̂ an explainer defines an explanation mask, i.e.,
a graph  ⋆ = ( ⋆,  ⋆) where each node   ⋆ ∈  ⋆ and edge  ⋆ ∈  ⋆ has an importance value ℎ ∈ [0, 1]
representing its contribution to the intrusion detection. Since we want to compute a relevance value
for each record in the dataset using a flow graph representation, we consider the edge mask of the
explanatory graph  ⋆, leaving out the node weights.</p>
        <p>When the explanation is extracted, we select the most significant benign flows ℬ⋆. We focus on
benign netflows ℬ rather than malicious ones ℳ because the structural adversarial attacks considered
in our evaluation exploit legitimate records to manipulate the graph structure. In addition, benign
communications dominate the network trafic, making them more accessible for analysis. In contrast,
malicious transmissions are often rare in the real world, making it dificult to establish a consistent
baseline for evaluation. Adversaries who use benign flows to hide their malicious operations also
pose a significant challenge to detection systems. Indeed, attackers can inject benign edges into the
neighborhood of compromised nodes within the graph to efectively evade detection of malicious links.</p>
        <p>To prioritize which benign flows have a major impact on the detector performance, we rank and
select the top K important legitimate netflows</p>
        <p>ℬ⋆ based on the computed relevance values ℎ . In the
next step, we evaluate which explainer really captures the influential links, as diferent explainers may
return diferent explanations.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation</title>
        <p>The actual evaluation of the explainability algorithm happens in this phase. Our approach leverages
structural attacks, where attackers manipulate the graph structure to evade the GNN-based NIDS.
As explainability methods highlight important components within the graph, attacks based on these
explanations reflect which method most correctly identifies the flows critical to the detection. In our
assessment, the most efective explainer is the one that identifies netflows leading to the highest number
of misclassified samples when submitted to the GNN-based NIDS.</p>
        <p>At this stage, we consider a realistic gray-box attack scenario [37], where the attacker has partial
knowledge of the defensive system. The attacker knows that a GNN-based NIDS monitors the network,
but he has no insight into its parameters. We also assume that the attacker has a limited quantity of
lfow records available to define explanations for the input graph components. Hence, the attacker can
compute an explanatory mask to identify the important benign netflows and perturb the input graph
structure.</p>
        <p>To evaluate the accuracy of diferent explainability algorithms, we focus on
C2xℬ attacks [13]. Here,
attackers initiate new benign communications ℬ from compromised hosts   to random targets   to
manipulate the graph and thwart the GNN. In other words, attackers change the structural patterns
of their attacks by perturbing the graph topology. Indeed, these perturbations generate embeddings
that evade detection by tricking the GNN into making misclassifications. This approach is feasible
in the real world because the attacker can execute any strategy on the compromised hosts   once he
has complete control over them. Consequently, evaluating GNN explanations using C2xℬ attacks
constitutes a practical strategy. Although these attacks succeed in evading the GNN, the contribution
of explanations is not considered in the threat model. Indeed, the attacks proposed in the original
work [13] randomly select legitimate netflows without considering which ones might be the most
relevant for the detection task.</p>
        <p>In our framework, we do not randomly collect the benign flows to manipulate the graph. Instead,
we rank and select the top  legitimate netflows based on their relevance scores ℎ . We then use these

important samples to evade the detector and evaluate the explainer. More specifically:
1. From the explanatory graph  ⋆, we identify the most important benign flows ℬ⋆ and inject them
into the original dataset  , resulting in an augmented dataset  ̂ .
2. Given the perturbed dataset  ̂ , we build the corresponding flow graph  ̂, where important
legitimate samples ℬ⋆ are associated with new edges.
3. The manipulated graph  ̂ leads to misclassifications when fed to the GNN-based NIDS, since the
neighborhood of compromised nodes includes the benign edges ℬ⋆.
4. The efectiveness of the XAI technique is evaluated by measuring the impact of the injected
legitimate transmissions ℬ⋆ into the graph on the GNN-based NIDS predictions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Case Study</title>
      <p>In Section 3, we described our evaluation framework to compare XAI methods in GNN-based NIDS. We
now present an experimental case study to validate our proposed methodology. In particular, we present
the datasets used, the GNN-based NIDS targeted, the diferent explainers tested, and the framework
implementation, which leverages DGL1 and PyTorch Geometric2 libraries to provide reliable graph
processing and model training. To ensure the reproducibility of our results, the source code of the
following experiments is freely available at https://github.com/dimgalli/evaluating-xai.git.</p>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>We base the case study on two publicly available datasets: CTU-13 [38] and ToN-IoT [39]. These two
datasets consist of labeled netflows and are widely referenced in the literature, making them reliable
benchmarks for this study [35]. CTU-13 contains network traces that combine benign and malicious
trafic from several real-world botnet variants. These botnets exhibit specific structural behaviors that
align well with the capabilities of GNN-based NIDS. ToN-IoT contains heterogeneous types of IoT data.
We consider the benign flow records and the malicious netflows of the cyber threats. The attacks in the
two datasets exhibit complex malicious patterns, relying on diferent multi-flow strategies. Therefore,
they are excellent candidates for evaluating GNN explainability methods.</p>
        <p>We apply the same preprocessing steps described in [40], which include discarding non-TCP trafic and
ifltering out outliers. We also eliminate explicit IP addresses and port numbers to avoid separating flows
by spurious correlations [41]. The final feature set includes more than 30 attributes and is consistent with
previous work [18]. In particular, we build separate collections for each cyber threat in the two datasets,
excluding the sets with insuficient malicious netflows that would yield underperforming detectors. We
combine the benign data and the malicious samples for each specific threat in a benign:malicious ratio
of 10:1. This distribution ratio reflects real-world scenarios [ 41]. Each collection is then divided into
training and test subsets using an 80:20 ratio. We report the resulting number of benign and malicious
lfows of the two datasets used to train and test the models in Table 1, where each row indicates the
botnet variant and attack type considered in the experiments.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Detectors</title>
        <p>Our evaluation leverages GNN’s unique ability to capture relational dependencies between netflows that
classical ML techniques fail to achieve. This feature makes GNN models strong candidates for evaluating
explanations since the perturbation of key graph components afects detection performance [ 42]. To
guarantee a fair and meaningful comparison between diferent XAI methods, we focus on GNN that
perform node-level classification. Indeed, these models generate more informative embeddings and
exhibit greater robustness to adversarial perturbations [43].</p>
        <p>Our GNN-based NIDS is built on the GraphSAGE model [44], which aggregates features to generate
node embeddings. In the context of this paper, it is referred to as LineGraphSAGE [45] because it allows
node classification directly on line graphs. As discussed in Section 2, flow graphs can be converted to line
graphs using a linearization procedure. Attributes of the original netflows are transferred to the nodes</p>
        <sec id="sec-4-2-1">
          <title>1https://docs.dgl.ai</title>
          <p>2https://pytorch-geometric.readthedocs.io/
of the line graph, so GraphSAGE can be used to learn node embeddings without losing information from
the netflow features. In addition, the inductive nature of this algorithm allows an efective embedding
generation for nodes in unseen test graphs. Therefore, this approach makes GraphSAGE more suitable
for network intrusion detection in the real world.</p>
          <p>For each attack in the two datasets, we train a specific instance of the binary classifier to detect the
malicious netflows (i.e., malicious nodes in the line graph), as suggested by prior research [ 46, 47]. We
implement the detectors using the same hyperparameters outlined in [13].</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Explainers</title>
        <p>For our case study, we evaluate five diferent XAI algorithms tailored for GNN models. As discussed in
Section 2, GNN-based NIDS learn attack patterns simultaneously from flow features and topological
structures. However, it is unclear whether GNN relies specifically on feature-level information or graph
structure when making its predictions. For this reason, we consider in the experiments two methods
that highlight flow-level features [ 24, 25] and two strategies that provide explanations representative
of the underlying topology [26, 27]. We recall that we use post-hoc XAI strategies to identify the
structural vulnerabilities of the cyber detector, focusing on improving the overall security rather than
the interpretability.</p>
        <p>The XAI algorithms evaluated in the experiments are: Dummy Explainer (DE), Integrated Gradients
(IG) [24], Saliency (SA) [25], GNNExplainer (GE) [26], and GraphMask (GM) [27]. DE assigns random
explanation scores as a baseline for comparison. IG calculates feature importance by integrating
the gradients of model output relative to input along a baseline-to-input path. SA computes node
importance by measuring gradients of model output with respect to input features. GE defines critical
graph substructures by estimating the mutual information. GM generates masked versions of the graph
by dropping edges and observing their efect on model predictions.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Framework Implementation</title>
        <p>We now describe the implementation of our evaluation methodology, which is detailed in Section 3.</p>
        <p>As mentioned in Section 4.1, each collection   containing benign and malicious samples of a
particular cyber threat  is divided into training and test subsets, preserving the 10:1 benign:malicious
ratio typical of real-world settings [41]. For each collection obtained, we generate the flow graph  
and then the line graph (  ) representation of the network trafic. Then, line graphs obtained from
training sets are used to train an ensemble of LineGraphSAGE classifiers to detect specific malicious
variants, as outlined in Section 4.2. Instead, line graphs obtained from test sets are used to evaluate
detectors and explainers.</p>
        <p>Each of the explainers presented in Section 4.3 is then applied to each instance of LineGraphSAGE to
compute the corresponding explanations. Each explainer takes the clean line graph (  ) containing the
network trafic of particular threat  and the corresponding model predictions  ̂ to define its explanatory
graph ( ⋆), where each node   ⋆ has an importance score ℎ indicating its influence in the detection
process. On the one hand, gradient-based algorithms (i.e., IG and SA) assign a score to each attribute,
so we calculate the average over the features to derive a single value for each node in the graph. On the
other hand, perturbation-based methods (i.e., GE and GM) directly return a mask containing a score for
each vertex. Since our strategy relies on C2xℬ structural attacks performed in the problem space, we
tag each vertex   ⋆ of the graph ( ⋆) with an identifier  to retrieve the original netflows in the dataset
  . We then rank all flow records based on their importance score ℎ . Next, we consider only the benign
lfows ℬ and discard the malicious samples ℳ. To ensure a fair comparison between diferent attacks,
we select the top 10% of legitimate examples ℬ previously sorted by relevance ℎ .</p>
        <p>To evaluate the explainer’s performance, we test the robustness of LineGraphSAGE against C2xℬ
attacks [13]. In this scenario, attackers set up new benign communications from their compromised
nodes to random destinations in the network. In other words, we inject new data samples into the test
set, introducing new components into the resulting graph—edges in the flow graph and nodes in the
line graph. This perturbation is practically feasible because the generation of the graph representation
occurs once a suficient number of flows have been collected [ 20]. The original attacks involve sending
 benign netflows ℬ from each compromised host   to random targets   . Consequently, this process
injects  ∗ | C| new benign nodes into the resulting line graph, where |C| is the number of controlled hosts.
However, instead of randomly sampling benign flows from the available pool, we select the relevant
legitimate netflows to perturb the graph structure. This strategy enables us to evaluate the contribution
of diferent explainers in refining the structural violations. To ensure that new communications originate
from compromised nodes in the graph, we replace source IP addresses and port numbers included in
the benign flows with those of the controlled hosts. We increase the number of legitimate samples
injected into the test set by considering the following step values for  : 1, 2, 5, 10, and 20.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We now present the results of our case study, where we apply the proposed evaluation strategy to a
state-of-the-art GNN-based NIDS.</p>
      <p>First, we evaluate LineGraphSAGE classifiers on the clean line graphs generated from the respective
unperturbed validation sets to confirm the suitability of the considered GNN-based NIDS as a target for
structural attacks. For this evaluation, we leverage measures commonly used in computer security [34],
namely F1-score, precision, and recall. Considering a malicious network flow as a positive sample, these
three evaluation metrics can be summarized as follows:
 1 = 2 ×    × 
   + 
   =</p>
      <p>+  
 =</p>
      <p>+  
where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively.
These metrics range between 0 and 1, with higher scores indicating better performance. We recall
that we do not rely on accuracy scores since the datasets employed in network intrusion detection are
typically highly unbalanced.</p>
      <p>We then evaluate explainers by observing how the benign flows they identify as important influence
LineGraphSAGE performance once explanations are injected into the graph to perturb its overall
structure. For this purpose, we rely on the attack severity (AS) measure [34]. This metric is defined as
follows:</p>
      <p>( after the attack )
 = 1 −  ( before the attack)
where the numerator indicates the recall score on the perturbed graph, while the denominator is the
recall value on the clean graph. This metric ranges between 0 and 1, with highly accurate attacks
getting AS values closer to 1. Therefore, the most efective explainers are those achieving AS values
close to 1, resulting in the most severe attacks.</p>
      <sec id="sec-5-1">
        <title>5.1. Detectors Performance</title>
        <p>Our evaluation campaign begins by testing the GNN-based NIDS presented in Section 4.2 on clean
network trafic. In particular, we create a line graph representation for each test set and evaluate the
specific instance of the GNN-based NIDS on it. The performance of the LineGraphSAGE classifiers on
the two considered datasets is shown in Table 2. Each row refers to a threat and reports the detection
results of the specific GNN classifier on it, with the last row summarizing the mean  and the standard
deviation  values across the diferent instances.</p>
        <p>The GNN-based NIDS achieves high performance across all evaluation metrics, with an average
F1-score of 0.935 and 0.995 on CTU-13 and ToN-IoT datasets, respectively. From Table 2, we observe
that our LineGraphSAGE models obtain scores that are in line with those of the state-of-the-art [13].
All classifiers exceed 0.9 in performance on the CTU-13 dataset, except on Neris trafic, where the
GNN-based detector struggles due to the heterogeneity of malicious netflows, with a F1-score of 0.846
and a recall of 0.767. By contrast, each instance of LineGraphSAGE obtains near-perfect performance
F1-score, precision, and recall values of the GNN classifiers on clean graphs.
scores on the ToN-IoT dataset, reflecting its reliability in detecting multi-flow cyber attacks with
diferent structural topologies. The performance results prove that our GNN-based NIDS is efective
in deployment scenarios that do not involve manipulated network trafic. Therefore, the considered
detection model represents a solid target for the structural attacks through which we evaluate the
diferent explanations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Explainers Performance</title>
        <p>We now evaluate the quality of the explanations extracted by the five XAI methods presented in
Section 4.3. For each considered explainer, we generate an explanatory graph and inject its benign
lfow records into the corresponding test set, manipulating the resulting line graph with new nodes. As
discussed in Section 4.4, we gradually increase the number  of key benign samples ℬ⋆ to estimate
the severity of structural attacks at distinct perturbation levels. To assess the efectiveness of the
explanation-based attacks, we feed the perturbed graphs into the GNN-based NIDS and measure how
much the overall detection capability drops.</p>
        <p>We compare the efectiveness of explainability methods in identifying important flow records and
perturbing the resulting graph structure in Table 3, where each cell corresponds to the mean AS value
obtained by the explainer in the column on the network trafic in the row. Each
AS value is averaged
over the perturbation steps  . For each row, we mark the best AS value in bold. The rows at the bottom
in light gray summarize the AS values obtained by each explainer with the mean value  computed
over the diferent cyber attacks.</p>
        <p>We immediately observe that AS values are not zero. The results are consistent with those obtained
in the original work [13] and validate the vulnerability of GNN-based NIDS to structural attacks. The
IG method achieves the highest AS values across all attacks, with an average value of 0.422 and 0.195 on
CTU-13 and ToN-IoT datasets, respectively. This result shows that the features of flows are important for
the detection model and support more efective perturbations. We observe that SA slightly outperforms
IG over Rbot trafic, with a marginal diference of</p>
        <p>0.004 in the mean values (0.237 for SA and 0.233 for
IG). Similarly, GraphMask outperforms IG over DDoS and DoS attacks with average values of 0.273
and 0.023, compared to 0.187 and 0.018 of IG. The AS of IG measured on Menti and Murlo are the most
significant, with average values equal to</p>
        <p>0.728 and 0.900, respectively. This result demonstrates the
efectiveness of explanations in perturbing graphs formed by a limited number of components. By
contrast, the detector trained on DoS trafic is the most robust, with
AS values not exceeding 0.1. Indeed,
the limited number of controlled hosts reduces the attacker’s potential threat. Our findings demonstrate
that only attacks based on IG are efective than those that leverage random benign records. Hence,
IG-generated explanations allow attackers to determine the critical features of network trafic and the
structural vulnerabilities of the GNN-based NIDS.
Attack severity (AS) values of the GNN classifiers on graphs perturbed with each explanation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>To address the black-box nature of GNN, XAI algorithms have been developed by practitioners and
researchers. However, evaluating diferent XAI methods is challenging because traditional approaches
often rely on expensive ground truth explanations or oversimplified metrics. In this paper, we present a
novel framework for evaluating GNN explainers based on their ability to identify the most relevant graph
components and improve the severity of structural attacks against GNN-based NIDS. Our experimental
campaign, conducted on two public datasets, shows that the IG explainer consistently outperforms other
methods in determining the most accurate explanations and guiding powerful attacks. More specifically,
IG achieves the highest AS values in most cases, while GraphMask excels in DDoS and DoS scenarios
due to the characteristic topological patterns associated with these threats. These results confirm
the efectiveness of adversarial perturbations against GNN-based NIDS, especially when explanations
generate structural attacks. Our proposal represents a significant contribution to the evaluation of
XAI methods in GNN-based NIDS. Future research could expand the proposed work to other areas
within cybersecurity, exploring new attack methods and defense strategies to improve the resilience of
GNN-based NIDS.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery
and Resilience Plan funded by the European Union - NextGenerationEU.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
detection, computers &amp; security 56 (2016) 28–49.
[1] R. Sommer, V. Paxson, Outside the closed world: On using machine learning for network intrusion
detection, in: 2010 IEEE symposium on security and privacy, IEEE, 2010, pp. 305–316.
[2] F. Manganiello, M. Marchetti, M. Colajanni, Multistep attack detection and alert correlation in
intrusion detection systems, in: Information Security and Assurance: International Conference,
ISA 2011, Brno, Czech Republic, August 15-17, 2011. Proceedings, Springer, 2011, pp. 101–110.
[3] F. Pierazzi, S. Casolari, M. Colajanni, M. Marchetti, Exploratory security analytics for anomaly
[4] W. Jiang, Graph-based deep learning for communication networks: A survey, Computer
Commu[5] D. Pujol-Perich, J. Suárez-Varela, A. Cabellos-Aparicio, P. Barlet-Ros, Unveiling the potential of
graph neural networks for robust intrusion detection, ACM SIGMETRICS Performance Evaluation
Review 49 (2022) 111–117.
[6] A. Warnecke, D. Arp, C. Wressnegger, K. Rieck, Evaluating explanation methods for deep learning
in security, in: 2020 IEEE european symposium on security and privacy (EuroS&amp;P), IEEE, 2020, pp.
158–174.
[7] N. Moustafa, N. Koroniotis, M. Keshk, A. Y. Zomaya, Z. Tari, Explainable intrusion detection for
cyber defences in the internet of things: Opportunities and solutions, IEEE Communications
Surveys &amp; Tutorials 25 (2023) 1775–1807.
[8] A. Nadeem, D. Vos, C. Cao, L. Pajola, S. Dieck, R. Baumgartner, S. Verwer, Sok: Explainable
machine learning for computer security applications, in: 2023 IEEE 8th European Symposium on
Security and Privacy (EuroS&amp;P), IEEE, 2023, pp. 221–240.
[9] G. Apruzzese, P. Laskov, A. Tastemirova, Sok: The impact of unlabelled data in cyberthreat
detection, in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&amp;P), IEEE, 2022,
pp. 20–42.
[10] H. Zhu, J. Lu, Graph-based intrusion detection system using general behavior learning, in:</p>
      <p>GLOBECOM 2022-2022 IEEE Global Communications Conference, IEEE, 2022, pp. 2621–2626.
[11] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, F. Roli, Evasion
attacks against machine learning at test time, in: Machine Learning and Knowledge Discovery in
Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27,
2013, Proceedings, Part III 13, Springer, 2013, pp. 387–402.
[12] L. Sun, Y. Dou, C. Yang, K. Zhang, J. Wang, S. Y. Philip, L. He, B. Li, Adversarial attack and
defense on graph data: A survey, IEEE Transactions on Knowledge and Data Engineering 35 (2022)
7693–7711.
[13] A. Venturi, D. Stabili, M. Marchetti, Problem space structural adversarial attacks for network
intrusion detection systems based on graph neural networks, arXiv preprint arXiv:2403.11830
(2024).
[14] G. Vormayr, J. Fabini, T. Zseby, Why are my flows diferent? a tutorial on flow exporters, IEEE</p>
      <p>Communications Surveys &amp; Tutorials 22 (2020) 2064–2103.
[15] Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, F. Ahmad, Network intrusion detection
system: A systematic study of machine learning and deep learning approaches, Transactions on
Emerging Telecommunications Technologies 32 (2021) e4150.
[16] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network
model, IEEE transactions on neural networks 20 (2008) 61–80.
[17] W. W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, M. Portmann, E-graphsage: A graph neural network
based intrusion detection system for iot, in: NOMS 2022-2022 IEEE/IFIP Network Operations and
Management Symposium, IEEE, 2022, pp. 1–9.
[18] A. Venturi, M. Ferrari, M. Marchetti, M. Colajanni, Arganids: a novel network intrusion
detection system based on adversarially regularized graph autoencoder, in: Proceedings of the 38th
ACM/SIGAPP Symposium on Applied Computing, 2023, pp. 1540–1548.
[19] J. Zhou, Z. Xu, A. M. Rush, M. Yu, Automating botnet detection with graph neural networks, arXiv
preprint arXiv:2003.06344 (2020).
[20] A. Venturi, D. Pellegrini, M. Andreolini, L. Ferretti, M. Marchetti, M. Colajanni, et al., Practical
evaluation of graph neural networks in network intrusion detection, in: CEUR Workshop Proceedings,
volume 3488, CEUR-WS, 2023.
[21] A. Longa, S. Azzolin, G. Santin, G. Cencetti, P. Liò, B. Lepri, A. Passerini, Explaining the explainers
in graph neural networks: a comparative study, ACM Computing Surveys (2024).
[22] H. Yuan, H. Yu, S. Gui, S. Ji, Explainability in graph neural networks: A taxonomic survey, IEEE
transactions on pattern analysis and machine intelligence 45 (2022) 5782–5799.
[23] J. Kakkad, J. Jannu, K. Sharma, C. Aggarwal, S. Medya, A survey on explainability of graph neural
networks, arXiv preprint arXiv:2306.01958 (2023).
[24] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: International
conference on machine learning, PMLR, 2017, pp. 3319–3328.
[25] P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, H. Hofmann, Explainability methods for graph
convolutional neural networks, in: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, 2019, pp. 10772–10781.
[26] Z. Ying, D. Bourgeois, J. You, M. Zitnik, J. Leskovec, Gnnexplainer: Generating explanations for
graph neural networks, Advances in neural information processing systems 32 (2019).
[27] M. S. Schlichtkrull, N. De Cao, I. Titov, Interpreting graph neural networks for nlp with
diferentiable edge masking, arXiv preprint arXiv:2010.00577 (2020).
[28] B. Sanchez-Lengeling, J. Wei, B. Lee, E. Reif, P. Wang, W. Qian, K. McCloskey, L. Colwell,
A. Wiltschko, Evaluating attribution for graph neural networks, Advances in neural information
processing systems 33 (2020) 5898–5910.
[29] T. Funke, M. Khosla, M. Rathee, A. Anand, Zorro: Valid, sparse, and stable explanations in graph
neural networks, IEEE Transactions on Knowledge and Data Engineering 35 (2022) 8687–8698.
[30] P. Li, Y. Yang, M. Pagnucco, Y. Song, Explainability in graph neural networks: An experimental
survey, arXiv preprint arXiv:2203.09258 (2022).
[31] C. Agarwal, M. Zitnik, H. Lakkaraju, Probing gnn explainers: A rigorous theoretical and empirical
analysis of gnn explanation methods, in: International Conference on Artificial Intelligence and
Statistics, PMLR, 2022, pp. 8969–8996.
[32] K. Amara, R. Ying, Z. Zhang, Z. Han, Y. Shan, U. Brandes, S. Schemm, C. Zhang, Graphframex:
Towards systematic evaluation of explainability methods for graph neural networks, arXiv preprint
arXiv:2206.09677 (2022).
[33] X. Zheng, F. Shirani, T. Wang, W. Cheng, Z. Chen, H. Chen, H. Wei, D. Luo, Towards robust fidelity
for evaluating explainability of graph neural networks, arXiv preprint arXiv:2310.01820 (2023).
[34] G. Apruzzese, M. Colajanni, L. Ferretti, M. Marchetti, Addressing adversarial attacks against
security systems based on machine learning, in: 2019 11th international conference on cyber
conflict (CyCon), volume 900, IEEE, 2019, pp. 1–18.
[35] T. Bilot, N. El Madhoun, K. Al Agha, A. Zouaoui, Graph neural networks for intrusion detection:</p>
      <p>A survey, IEEE Access 11 (2023) 49114–49139.
[36] A. Venturi, D. Galli, D. Stabili, M. Marchetti, et al., Hardening machine learning based network
intrusion detection systems with synthetic netflows, in: CEUR WORKSHOP PROCEEDINGS,
volume 3731, CEUR-WS, 2024.
[37] A. Kuppa, N.-A. Le-Khac, Adversarial xai methods in cybersecurity, IEEE transactions on
information forensics and security 16 (2021) 4924–4938.
[38] S. Garcia, M. Grill, J. Stiborek, A. Zunino, An empirical comparison of botnet detection methods,
computers &amp; security 45 (2014) 100–123.
[39] A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, A. Anwar, Ton_iot telemetry dataset: A new
generation dataset of iot and iiot for data-driven intrusion detection systems, Ieee Access 8 (2020)
165130–165150.
[40] A. Venturi, G. Apruzzese, M. Andreolini, M. Colajanni, M. Marchetti, Drelab-deep reinforcement
learning adversarial botnet: A benchmark dataset for adversarial attacks against botnet intrusion
detection systems, Data in Brief 34 (2021) 106631.
[41] D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, K. Rieck,
Dos and don’ts of machine learning in computer security, in: 31st USENIX Security Symposium
(USENIX Security 22), 2022, pp. 3971–3988.
[42] D. Zügner, A. Akbarnejad, S. Günnemann, Adversarial attacks on neural networks for graph data,
in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery &amp;
data mining, 2018, pp. 2847–2856.
[43] S. Zhang, H. Tong, J. Xu, R. Maciejewski, Graph convolutional networks: a comprehensive review,</p>
      <p>Computational Social Networks 6 (2019) 1–23.
[44] W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Advances in
neural information processing systems 30 (2017).
[45] X. Hu, W. Gao, G. Cheng, R. Li, Y. Zhou, H. Wu, Towards early and accurate network intrusion
detection using graph embedding, IEEE Transactions on Information Forensics and Security (2023).
[46] M. Stevanovic, J. M. Pedersen, An analysis of network trafic classification for botnet detection, in:
2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment
(CyberSA), IEEE, 2015, pp. 1–8.
[47] G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, M. Marchetti, On the efectiveness of machine
and deep learning for cyber security, in: 2018 10th international conference on cyber Conflict
(CyCon), IEEE, 2018, pp. 371–390.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>