Reasoning with Counterfactual Explanations for Code Vulnerability Detection and Correction Anjana Wijekoon1 , Nirmalie Wiratunga1 1 Robert Gordon University, Aberdeen, Scotland Abstract Counterfactual explanations highlight “actionable knowledge” which helps the end-users to understand how a machine learning outcome could be changed to a more desirable outcome. In code vulnerability detection, understanding these “actionable” corrections can be critical to proactively mitigate security attacks that are caused by known vulnerabilities. In this paper, we present the case-based explainer DisCERN for counterfactual discovery with code data. DisCERN explainer finds counterfactuals to explain the outcomes of black-box vulnerability detection models and highlight actionable corrections to guide the user. DisCERN uses feature relevance explainer knowledge as a proxy to discover potentially vulnerable code statements and then used a novel substitution algorithm based on pattern matching to find corrections from the nearest unlike neighbour. The overall aim of DisCERN is to identify vulnerabilities and correct them with minimal changes necessary. We evaluate DisCERN using the NIST Java SAR dataset to find that DisCERN finds counterfactuals for 96% of the test instances with 13 ∼ 14 statement changes in each test instance. Additionally, we present example counterfactuals found using DisCERN to qualitatively evaluate the algorithm. Keywords Counterfactual Explanations, Vulnerability Detection, Explainable AI 1. Introduction Security attacks that exploit hidden software code flaws pose serious risks that compromise system performance and services. Therefore the ability to detect these vulnerabilities in a timely manner as well as being able to detect potential flaws is a desirable feature that can help to avoid disastrous financial and societal consequences. Application of AI for data-driven vulnerability detection has increased significantly in recent years [1, 2]. This is mainly due to the availability of large amounts of open-source code needed for training vulnerability detection models. Tradi- tional classifiers such as SVM and Naive Bayes [3] as well as neural architectures for sequence modelling (e.g. LSTMs) have been successfully used for code vulnerability classification [4]. Given the textual nature of the data; these classifiers make use of text representation methods from information retrieval [3] as well as deep embedding techniques to represent software code and nodes in the abstract syntax tree [5]. Once vulnerabilities are detected or classified into flaw categories, the software needs to be fixed. Explainable AI (XAI) techniques are used for explaining AI model outcomes and to the AI-CyberSec 2021: Workshop on Artificial Intelligence and Cyber Security, December 14, 2021, Cambridge, UK $ a.wijekoon1@rgu.ac.uk (A. Wijekoon); n.wiratunga@rgu.ac.uk (N. Wiratunga)  0000-0003-3848-3100 (A. Wijekoon); 0000-0003-4040-2496 (N. Wiratunga) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) best of our knowledge, only very little work has been done to assist with this code revision phase. Authors of [1] took a factual explanation approach by using their convolutional feature activations to highlight parts of the code that contributed most to the AI model decision. In this paper, we demonstrate how research in counterfactual explanations can be conveniently adapted to generate code correction operators to guide the fixing of vulnerable software segments that are detected by a classifier. Counterfactual explanations for AI (CAI) have accrued benefits from counterfactual thinking research from psychology and GDPR guidelines for AI [6]. Unlike other forms of post-hoc XAI methods, CAI does not require exposure to the underlying AI model parameters. It instead rea- sons with the inputs, the outputs and the relationships between these to formulate a situationally (locally) relevant explanation to convey how a “better” or “more desirable” output (outcome) could have been achieved by minimally changing the inputs (situation). Which inputs to change and by how much to change, are interesting research questions that we address in this paper in relation to fixing code vulnerabilities. Here inputs relate to software code and the proposed change relates to the code correction operation. We adapt the CAI algorithm, DisCERN [7], to locate the specific area of vulnerability in a code segment, and to generate a correction at the statement-level using substitution operations. These substitutions are extracted from a code segment that is considered similar to the vulnerable segment. This similar segment needs to be threat-free; and therefore is referred to as the vulnerable segment’s nearest unlike neighbour (here we say "unlike", because it differs in terms of the class label i.e. not vulnerable). The idea is to exploit similarities between pairs of code segments, where one of the pairs is considered to be vulnerable and the other is non-vulnerable; and to make use of the non-vulnerable segment to fix the vulnerability. Since we work with similar pairs (proximal property of CAI), we expect to identify minimal code corrections (sparsity property of CAI) to fix the detected vulnerability. Accordingly, this paper makes the following contributions: • re-purpose feature relevance explainers like LIME [8] to find vulnerabilities in code; • introduces a substitution algorithm based on pattern matching to correct vulnerable code statements; and • present DisCERN counterfactual Explainer for code vulnerability correction which brings together knowledge from feature relevance explainers and the substitution algorithm. The rest of the paper is organised as follows. The introduction of the NIST Dataset and detection of code vulnerabilities using Machine Learning methods is presented in Section 2. Section 3 presents the DisCERN algorithm which discovers counterfactuals for vulnerable code segments and thereby guide the user to correct the vulnerabilities. The evaluation methodologies, performance metrics with quantitative and qualitative results are presented in Section 4, followed by conclusions and future work in Section 5. 2. Vulnerability Detection with the NIST Java Dataset NIST Software Assurance Reference Dataset (SARD) Project promotes the detection and correc- tion of known security flaws in programming code. They have a publicly available repository of datasets from different programming languages that are labelled for flaws and possible (a) Code Segment labelled ’Vulnerable’ (b) Code Segment labelled ’Not Vulnerable’ Figure 1: Pre-processed Code Examples corrections. The flaws are standardised by the community maintained Common Weakness Enumeration (CWE) list which consists of software and hardware weaknesses. In this work, we selected the Java dataset from SARD 1 . Here Java files are grouped under their CWE code and each file contains a class object with one or more methods. For supervised learning we make use of two methods in the code: one with the code vulnerability and the other demonstrating the fix for the code vulnerability. 2.1. Pre-processing Code Segment Data The supervised task involves a binary classification of ’Vulnerable’ or ’Not Vulnerable’ given some code segments. Forming the dataset involved the extraction of code segments and labelling them into one of the two classes and carrying out a masking step to ensure there is no target leak that can influence model training. The steps taken to organise this dataset for classification are as follows: 1. Split methods that contain a vulnerability and those that don’t into two separate files. Java methods were labelled as ’Vulnerable’, if they contain a comment that starts with either ’FLAW’ or ’POTENTIAL FLAW’; and a ’Not Vulnerable’ label is assigned when a comment that starts with ’FIX’ is found. 2. Entity obfuscation steps are applied to each file. a) Replace any method name with ’method’ b) Replace any Parameter type with ’Parameter’ c) Replace comments with ’/*comment*/’ d) Change all method signatures to ’public void method()’ 1 https://samate.nist.gov/SARD/testsuite.php (a) Vulnerable Dataset (b) Not Vulnerable Dataset Figure 2: BOW Distributions Figure 3: Class Distribution An example of pre-processed code segments appear in Figures 1a and 1b. Here the "if" statement is used to check the size of data as a fix to the vulnerable code segment. Using the Bag-of-Word (BoW) vectorisation method we analyse the distribution of the pre- processed dataset over the 1074 unique language tokens (see Figures 2a and 2b). The class distribution of the dataset (see Figure 3) suggests a class imbalance between the ’Vulnerable’ and ’Not Vulnerable’ data which indicates that not all files have a ’corrected’ version of the code. 2.2. Code Vulnerability Classification The most common Machine Learning (ML) pipeline for classification with text data is to use a Tokenizer followed by a Vectoriser to transform the text data into a vector representation and then apply a classification algorithm to learn from labelled data. Code data can be seen as a text that follows grammar rules defined by the Java Compiler. In this work, the tf-idf feature vector representation is used for classification purposes where each feature value is a real-valued number that denotes the importance of that token for a code-segment instance given the full corpus of code segments. The summary of the final dataset appears in Table 1. Table 1 Dataset summary Feature Value Number of data instances 104905 Classes ’Vulnerable’ or ’Not Vulnerable’ Number of train and test instances (75/25) 78677 / 26226 Number of test instances labelled ’Vulnerable’ 10019 Mean number of statements 40.56 Tf-idf features 500 Results from a comparative study of classifiers on the dataset are presented in Table 2. Overall we can see that Random Forest performs extremely well on this dataset and therefore forms the AI model for our explainability evaluation studies. It is worth noting that the CAI method used here is model agnostic and we could use it with any classifier; however it makes sense to work with one that already has high accuracy so that explanations are underpinned by accurate predictions. Table 2 Classification Algorithms and Performance Tokenizer Classifier Accuracy Macro-F1 Precision Recall Naive Bayes 0.7206 0.6926 0.6625 0.5478 k-Nearest Neighbour 0.9387 0.9340 0.9572 0.8788 Tf-idf SVM 0.9574 0.9544 0.9706 0.9164 Random Forest 0.9722 0.9704 0.9730 0.9536 3. Counterfactual XAI for Vulnerability Detection The use case of code vulnerability detection can benefit from different types of explanations. For example, given a code segment that is labelled “Vulnerable”, a Factual Explanation will point to the part of the code segment that led the AI model to label it as “Vulnerable”. An example factual explanation is shown in Figure 4 where text highlights indicate “Vulnerable” code and “Not Vulnerable” code in a Red to Green heat map scale. For an expert user this type of explanation should be sufficient given that they have the knowledge to correct the vulnerability. In contrast, a counterfactual explanation will compare the query with a similar yet ’Not Vulnerable’ code segment and identify how to correct the vulnerable code segment. A counterfactual example is presented in Figure 5 where the algorithm has highlighted the substitution changes guided by the Nearest Unlike Neighbour. This type of explanation is informative for both expert and non-expert users. In this paper, we focus on generating counterfactual explanations and introduce DisCERN counterfactual algorithm specifically designed for code vulnerability correction. DisCERN was originally proposed in [7] which finds counterfactuals in tabular data. Figure 4: Factual Explanation Figure 5: Counterfactual Explanation Example 3.1. DisCERN Counterfactual Explanations Consider a query code segment 𝑥, with 𝑚 statements where the 𝑖𝑡ℎ statement is denoted by 𝑠𝑖 . If the vulnerability detection pipeline used to predict the code vulnerability consists of a Tokeniser, 𝑡, and a classification model, 𝑓 , the outcome predicted for 𝑥 is 𝑦. 𝑥 = [𝑠1 , 𝑠2 , ..., 𝑠𝑚 ] (1) 𝑦 = 𝑓 (𝑡(𝑥)) For a given query 𝑥, there are five steps to discovering counterfactuals with DisCERN: 1. find the Nearest Unlike Neighbour (NUN), 𝑥 ˆ; 2. find the feature relevance weights for the query, 𝑥, using the Feature Relevance Explainer LIME [8]; 3. given a token, 𝑧, in 𝑥, find statements pair, i.e. a list of statements in 𝑥 and a list of candidate statements in 𝑥ˆ as potential vulnerability corrections; 4. create a perturbed code segment, 𝑥′ by adapting a vulnerability correction and check 𝑥′ for outcome change using the vulnerability detection pipeline; and 5. repeat steps 3 and 4 until desired outcome is achieved. Once the perturbed code segment achieves the desired outcome (i.e. not vulnerable), it is identified as the counterfactual of the query. Next, we will explore each of these steps in detail. 3.1.1. Finding the Nearest Unlike Neighbour Given a query 𝑥, the NUN, 𝑥 ˆ , is the nearest instance found in the train data with a different outcome. In theory, by selecting the NUN as the starting point of counterfactual discovery, we expect to minimise the actionable changes needed to flip the classifier decision. As in Equation 2, ˆ has 𝑛 number of statements and the predicted outcome is 𝑦ˆ. Importantly, 𝑥 𝑥 ˆ and 𝑥 have different number of statements (i.e. 𝑛 ̸= 𝑚) and different prediction outcomes (i.e. 𝑦ˆ ̸= 𝑦). 𝑥 ˆ = [𝑠ˆ1 , ˆ𝑠2 , ..., ˆ𝑠𝑛 ] (2) ˆ )) | 𝑦ˆ ̸= 𝑦 𝑦ˆ = 𝑓 (𝑡(𝑥 Any encoder which transforms text to a vector representation can be used to model the feature space to find the NUN. We use the Sentence Transformers library 2 to encode code segments. Sentence Transformers Library is a publicly available collection of state-of-the-art sentence encoder models that are based on the Sentence-BERT architectures [9]. The Sentence-BERT architecture uses BERT encoder in a Siamese architecture such that it is trained for similarity comparison. Sentence-BERT can be trained using data from different domains to suit different tasks, however, there is no encoder trained for encoding Java code. Accordingly, we use the generic pre-trained all-MiniLM-L6-v2 encoder which is trained using a large and diverse dataset training pairs to support multiple domains. Given a code segment, 𝑥, the encoder 𝐸 generates a vector representation, 𝑣, of size 𝑙. From the train data set 𝒳 , we filter data instances for which 𝑦𝑖 ̸= 𝑦 and create the subset 𝒳 ′ . Each data instance in 𝒳 ′ is encoded using the same encoder 𝐸 to obtain the set of vectors 𝒱 ′ . The cosine similarity between the query 𝑥, and any other instance, 𝑥𝑖 can be calculated as in Equation 3. ∑︀𝑙 𝑗=1 𝑣𝑖𝑗 𝑣𝑗 𝑐𝑜𝑠𝑖𝑛𝑒(𝑥, 𝑥𝑖 ) = √︁∑︀ √︁∑︀ (3) 𝑙 𝑙 𝑗=1 𝑣𝑖𝑗 𝑗=1 𝑣𝑗 Once pair-wise similarity is computed (between 𝑥 and each 𝑥𝑖 in 𝒳 ′ ), we select the train instance 𝑥𝑖 from the pair with the highest similarity as the NUN of 𝑥. In the rest of this paper, this function is referred to as 𝑛𝑛 which given, 𝑥, 𝒳 ′ and the similarity metric returns 𝑥 ˆ as the output. 3.1.2. Finding Feature Relevance Weights In DisCERN we hypothesise that if parts of the code that contributed most to the current outcome is substituted, we can find the minimal changes needed to flip the class outcome. Here Feature Relevance Explainers can provide the knowledge needed to identify important parts in the code. Accordingly, in this section, we use LIME Feature Relevance Explainer to find the feature relevance weights of the query to identify which parts of the code contributed to the current outcome. LIME is a model-agnostic feature relevance explainer that creates an interpretable model around a data instance to estimate how each feature contributed to the black-box model out- come [8]. LIME creates a set of perturbations within the instance’s neighbourhood and labels them using the black-box model. This newly labelled dataset is used to create a linear inter- pretable model (e.g. a weighted linear regression model). The resulting surrogate model is interpretable and only locally faithful to the black-box model (i.e. correctly classifies the input instance, but not all data instances outside its immediate neighbourhood). The new interpretable 2 https://www.sbert.net/ model is used to predict and explain the classification outcome of the data instance. The ex- planation of the current outcome is formed by obtaining the weights that indicate how each feature contributed to the outcome. In the context of code segment data, the Tokenizer, 𝑡, used in the vulnerability detection pipeline considers language tokens in the code as “features”. Accordingly, LIME assigns a weight for each token which indicates how much the token contributes to the classification outcome. 𝐿𝐼𝑀 𝐸(𝑥, 𝑡, 𝑓 ) → {𝑤(𝑧) | 𝑤(𝑧) ∈ R, 𝑧 ∈ 𝑍} (4) If the vocabulary of code segments is 𝑍, LIME assigns a weight 𝑤 for each token 𝑧 in 𝑍 as in Equation 4. A positive weight (𝑤 ≥ 0) indicates that the corresponding token contributes positively and a negative weight (𝑤 < 0) contributes negatively towards the prediction. The weights are sorted using the partial order condition in Equation 5 to obtain the list of tokens ordered by their contribution towards the current outcome as 𝑍 ′ . 𝑧𝑖 ⪯ℛ 𝑧𝑗 ⇐⇒ ℛ :: 𝑤(𝑧𝑖 ) ≥ 𝑤(𝑧𝑗 ) (5) 3.1.3. Segment corrections with the substitution operation Given a language token in the query, the goal of the substitution algorithm is to find a matching set of statements in the query and in the NUN to adapt the query such that it contributes towards a positive outcome change (i.e. from not vulnerable to vulnerable). A Pattern Matching (𝑝𝑚) algorithm is used here to find matching statement blocks as presented in Algorithm 1 for substitution. Since the Feature Relevance Explainers used with DisCERN identify important Algorithm 1 substitute Require: 𝑥′ = [𝑠′1 , 𝑠′2 , ..., 𝑠′𝑚 ]: perturbed query Require: 𝑥 ˆ = [𝑠ˆ1 , ˆ𝑠2 , ..., ˆ𝑠𝑛 ]: NUN as a list of statements Require: 𝑧: token in the query 1: 𝑆 ′ ← [𝑠 ∈ 𝑥′ | 𝑧 ∈ 𝑠] ◁ find the list of statements in 𝑥′ that include 𝑧 2: for 𝑠𝑗 ∈ 𝑆 ′ do 3: 𝑠′𝑖𝑘 , ˆ𝑠𝑔ℎ ← 𝑝𝑚(𝑠𝑗 , [𝑠′1 , 𝑠′2 , ..., 𝑠′𝑚 ], [𝑠ˆ1 , ˆ𝑠2 , ..., ˆ𝑠𝑛 ]) ◁ find the suggested change 4: ′ 𝑐𝑗 = 𝑐𝑜𝑠𝑖𝑛𝑒(𝑠𝑖𝑘 , ˆ𝑠𝑔ℎ ) ◁ similarity between current and suggested code 5: end for 𝑠′ ) ← arg max 𝑐𝑗 6: (𝑠′ , ˆ ◁ select maximum similarity pair i.e. similar yet corrected (𝑠𝑖𝑘 ,𝑠 ^𝑔ℎ ) 𝑠′ ) 7: 𝑥′′ ← 𝑎𝑑𝑎𝑝𝑡(𝑥′ , 𝑠′ , ˆ ◁ replace 𝑠′ in 𝑥′ with ˆ𝑠′ 8: return 𝑥′′ ◁ return the newly perturbed query tokens, the first step involves finding the matching list of statements 𝑆 ′ in the query that contains the important token 𝑧. We use a simple lookup function to identify all code statements in the perturbed query 𝑥′ , that contain the token 𝑧 (Line 1). The next steps of finding the vulnerable statements and their replacements from NUN are based on the hypothesis that if a statement 𝑠𝑖 in 𝑆 ′ is vulnerable, it must be corrected in the NUN. Accordingly, for statements in 𝑆 ′ , we use the edit changes that are proposed by a pattern matching (𝑝𝑚) algorithm and select the change (based on statement-level similarity) as the most likely substitution to correct the code vulnerability. A 𝑝𝑚 algorithm like the Gestalt Patten Matching or Levenshtein Edit Distance can find the minimum changes required to transform one string to another. The changes can be “replace”, “insert” and “delete” 3 . We use 𝑝𝑚 at the statement granularity level to find what changes are proposed for a given statement, 𝑠𝑗 . The 𝑝𝑚 algorithm will return a list of matching pairs of statements to the query (𝑠𝑖𝑘 ) and list of statements in the NUN (𝑠ˆ𝑣𝑤 ) where i,k,v,w are start and end indices of statements and 𝑠𝑗 is found within 𝑠𝑖𝑘 (Line 3). Next, we calculate the similarity between the two lists of statements using Cosine similarity. Similar to Section 3.1.1 we use the all-MiniLM-L6-v2 encoder to transform the statements to vector representations and calculate Cosine similarity. Once we have all the pairs for 𝑆 ′ , and their similarities, we select the pair, (𝑠′ , ˆ𝑠′ ) that has the maximum similarity. We assume a vulnerable code segment and its corrected counterpart are different yet carry some similarities. Accordingly, by selecting the highest similarity we expect to discard any changes found by 𝑝𝑚 that are not vulnerability corrections. Finally in Line 7 we replace the list of statements 𝑠′ in the perturbed query 𝑥′ with the list of code statement ˆ𝑠′ to return the new perturbed query 𝑥′′ . 3.1.4. DisCERN Counterfactual Discovery Algorithm 2 DisCERN Algorithm Require: 𝑥 = [𝑠1 , 𝑠2 , ..., 𝑠𝑚 ]: query as a list of statements Require: 𝑓 (𝑡(.)): vulnerability detection pipeline Require: 𝑋: train dataset Require: 𝑦 = 𝑓 (𝑡(𝑥)): predicted outcome of the query 1: 𝒳 ′ ← {𝑥𝑖 ∈ 𝒳 | 𝑦𝑖 ̸= 𝑦} ◁ filter the train dataset 2: 𝑥 ′ ˆ ← 𝑛𝑛(𝑥, 𝒳 , 𝑠𝑖𝑚) ◁ find the NUN using function 𝑛𝑛 3: {𝑤(𝑧)} ← 𝐿𝐼𝑀 𝐸(𝑥, 𝑡, 𝑓 ) ◁ the set of weighted tokens from LIME in Equation 4 4: 𝑍 ′ ← ℛ({𝑤(𝑧)}) ◁ list of tokens ordered using ℛ in Equation 5 5: Initialise 𝑥 = 𝑥 and 𝑦 = 𝑦 ′ ′ ◁ Initialise counterfactual as query 6: for 𝑧 ∈ 𝑍 ′ do 7: 𝑥′ ← 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑒(𝑥′ , 𝑥 ˆ , 𝑧) ◁ from Algorithm 1 8: 𝑦 ′ = 𝑓 (𝑡(𝑥′ )) ◁ predict outcome of the perturbed query 𝑥′ 9: if 𝑦 ′ ̸= 𝑦 then ◁ check if the outcome is changed 10: Break ◁ stop perturbing if outcome is changed 11: end if 12: end for 13: return 𝑥′ ◁ return the perturbed query as the counterfactual DisCERN (Algorithm 2) brings together methods from Sections 3.1.1 to 3.1.3 to discover counterfactuals. Given the query 𝑥, and the train dataset 𝑋, in Lines 1 and 2 we find the NUN 3 In this paper we used the Gestalt Pattern Matching algorithm implemented by cdifflib Python package https://github.com/mduggan/cdifflib as discussed in Section 3.1.1. Next, we find the LIME feature weights for the query and use these to sort the list of tokens that indicate which parts of the code contributed to the current outcome (Line 3 and 4). We iterate over the list of tokens where for each token we consider a substitution correction until a prediction outcome is changed. In each iteration for a given token 𝑧, Algorithm 1 finds the best matching adaptation to create a new perturbed query 𝑥′ . It then obtains a prediction to check the outcome class for the adapted 𝑥′ using the original classification pipeline 𝑓 (𝑡(.)). The iteration is terminated when a change in the outcome is observed and the algorithm returns the perturbed query 𝑥′ as the counterfactual. 4. Evaluation This section presents the preliminary evaluation of the counterfactual DisCERN algorithm for vulnerable code correction. To the best of our knowledge, there are no other counterfactual algorithms in the literature for counterfactual discovery for this application. Accordingly, we are not able to compare any performance metrics with other algorithms in this work. Instead, we compare DisCERN algorithm with an ablated version of DisCERN as listed below. DisCERN-rand considers a random order of statements in Algorithm 2. This is instead of using LIME feature relevance as in Lines 3 and 4. This ablated version evaluates the impact of using feature relevance knowledge to guide efficient counterfactual discovery. DisCERN: is presented in the Algorithm 2. 4.1. Quantitative Evaluation DisCERN algorithm is evaluated using the NIST Java dataset. We only use 10019 test data instances that were classified as ’Vulnerable’ by the vulnerability detection pipeline in the DisCERN evaluation. Following metrics are used to measure the performance of the algorithm on the NIST Java dataset. • Validity: measures the percentage of data for which DisCERN successfully finds a counterfactual. At this stage, the only requirement for a counterfactual discovered by the DisCERN algorithm is to achieve a "positive" change of outcome 4 . Given the set of test instances that were predicted ’Vunerable’ are 𝑋𝑣 , and the subset for which DisCERN found a counterfactual is 𝑋𝑣𝑐 , the validity is calculated as in Equation 6. A higher percentage of validity is desirable and unlike in DisCERN for tabular data, with unstructured data (like code), we cannot yet guarantee the validity of the generated code data. |𝑋𝑣𝑐 | 𝑉 𝑎𝑙𝑖𝑑𝑖𝑡𝑦 = × 100 (6) |𝑋𝑣 | • Sparsity: measures the mean number of statements that are changed for a change in outcome. Given the number of statements changed in each test instance in 𝑋𝑣𝑐 is 4 A more stringent metric would be to evaluate if the change conforms to grammar rules of the Java Compiler, which we will explore in future work. [𝑟1 , 𝑟2 , ..., 𝑟𝑁 ], where 𝑁 = |𝑋𝑣𝑐 |, the sparsity is calculated as in Equation 7. In Algorithm 1, number of statement changes for operations “replace”, “delete” and “insert” are calculated as 𝑚𝑎𝑥(𝑘 − 𝑖, ℎ − 𝑔), 𝑘 − 𝑖 and ℎ − 𝑔 respectively. Lower value of sparsity is desirable. 𝑁 1 ∑︁ 𝑆𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = 𝑟𝑗 (7) 𝑁 𝑗=1 In addition to Validity and Sparsity, we also report the mean number of statements in the counterfactual discovered by each algorithm. This is to show the contrast between the mean number of statements in the test instances which is 44.62. 4.1.1. Results Table 3 presents the quantitative evaluation results of DisCERN using the NIST Java dataset. DisCERN finds counterfactuals for 96.49% of the test instances while the random version only finds counterfactuals for 83.48% (13% less). Sparsity is comparably similar in both algorithms, this means that they both have a comparable number of statement changes 12 ∼ 14 needed to flip the class of the query. DisCERN-rand records a slightly lower sparsity, and this could be explained by the 13% of counterfactuals additionally found by DisCERN having longer code segments (thereby increasing the number of statement changes). Both DisCERN and DisCERN-rand record comparable values for the mean number of state- ments in the counterfactuals. In comparison to the mean number of statements in the query, both algorithms find counterfactuals with 4 ∼ 6 more statements. Interestingly, the mean num- ber of statements in the NUNs was 51.81, however, this did not mean that DisCERN completely changed each test instance to its NUN in order to achieve outcome change. We explore this further in qualitative evaluation (Section 4.2). Table 3 Quantitative Results Mean Number of Statements Algorithm Validity (%) Sparsity in Counterfactual DisCERN-rand 83.48 12.44 49.28 DisCERN 96.49 13.58 50.62 4.2. Qualitative Evaluation In the qualitative evaluation we present few examples of counterfactuals found by the DisCERN algorithm. In this preliminary work, we do not check if the counterfactual based code corrections follow grammar rules defined by the Java Compiler. However, we examined a sample of the generated code to closely examine if they are indeed correcting code vulnerabilities and are sensible modifications. Consider Figures 6a and 6b which presents two counterfactual examples. In each, the query is on the left, the NUN is on the middle and the counterfactual (CF) is on the right. The (a) Example 1: Successful Adaptation (b) Example 2: Unsuccessful Adaptation Figure 6: DisCERN Counterfactual Examples adaptations are highlighted with coloured boxes and arrows. Example in Figure 6a shows how feature relevance weights have guided the algorithm to select correct adaptations to perturb the query into a counterfactual that is similar to NUN and achieved the class outcome change. In contrast, the example in Figure 6b shows that in some cases, the adaptation does not result in grammatically correct counterfactuals. In such cases, the vulnerability detection pipeline may recognise the perturbed query as “Not Vulnerable” but the adaptations applied to the code have lost some of the original functionality. Such examples suggests, that DisCERN is a promising approach to discovering counterfactuals, however it requires further adaptation heuristics and code correction knowledge to ensure accurate code corrections from counterfactuals. 5. Conclusion In this paper, we presented a novel approach to finding counterfactual explanations for correcting vulnerabilities in code. We used feature relevance explainer knowledge as a proxy to discovering potentially vulnerable code statements and then used a novel substitution algorithm based on pattern matching to find corrections from the nearest unlike neighbour. Overall aim of our algorithm was to identify vulnerabilities and correct them with minimal necessary changes. We evaluated our algorithm using the NIST Java SAR dataset to find that DisCERN finds counterfactuals 96% of the time with 13 ∼ 14 statement changes to the query. Additionally, we presented example counterfactuals found using DisCERN to qualitatively evaluate the algorithm. These suggest that further correction operations and heuristics are needed to ensure plausible code changes. Future work will also expand upon our evaluation to include additional SAR datasets in different programming languages and the use of qualitative evaluation through crowd-sourcing techniques. Acknowledgments This research is funded by the iSee project (https://isee4xai.com) which received funding from EPSRC under the grant number EP/V061755/1. iSee is part of the CHIST-ERA pathfinder programme for European coordinated research on future and emerging information and com- munication technologies. Authors would like to thank Dr. Harsha Kalutarage and Ashish Upadhyay at the Robert Gordon University for their enlightening discussions. References [1] R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, M. McConley, Automated vulnerability detection in source code using deep representation learning, in: 2018 17th IEEE international conference on machine learning and applications (ICMLA), IEEE, 2018, pp. 757–762. [2] Z. Bilgin, M. A. Ersoy, E. U. Soykan, E. Tomur, P. Çomak, L. Karaçay, Vulnerability prediction from source code using machine learning, IEEE Access 8 (2020) 150672–150684. [3] B. Chernis, R. Verma, Machine learning methods for software vulnerability detection, in: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, 2018, pp. 31–39. [4] H. K. Dam, T. Tran, T. Pham, S. W. Ng, J. Grundy, A. Ghose, Automatic feature learning for vulnerability prediction, arXiv preprint arXiv:1708.02368 (2017). [5] Z. Li, D. Zou, J. Tang, Z. Zhang, M. Sun, H. Jin, A comparative study of deep learning-based vulnerability detection system, IEEE Access 7 (2019) 103184–103197. [6] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the black box: Automated decisions and the gdpr, Harv. JL & Tech. 31 (2017) 841. [7] N. Wiratunga, A. Wijekoon, I. Nkisi-Orji, K. Martin, C. Palihawadana, D. Corsar, Discern: Discovering counterfactual explanations using relevance features from neighbourhoods, arXiv preprint arXiv:2109.05800 (2021). [8] M. T. Ribeiro, S. Singh, C. Guestrin, " why should i trust you?" explaining the predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144. [9] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv preprint arXiv:1908.10084 (2019).