<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Analysis of C/C++ Datasets for Machine Learning-Assisted Software Vulnerability Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Grahn</string-name>
          <email>dan.grahn@wright.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junjie Zhang</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>As machine learning-assisted vulnerability detection research matures, it is critical to understand the datasets being used by existing papers. In this paper, we explore 7 C/C++ datasets and evaluate their suitability for machine learning-assisted vulnerability detection. We also present a new dataset, named Wild C, containing over 10.3 million individual opensource C/C++ files - a suficiently large sample to be reasonably considered representative of typical C/C++ code. To facilitate comparison, we tokenize all of the datasets and perform the analysis at this level. We make three primary contributions. First, while all the datasets difer from our Wild C dataset, some do so to a greater degree. This includes divergence in file lengths and token usage frequency. Additionally, none of the datasets contain the entirety of the C/C++ vocabulary. These missing tokens account for up to 11% of all token usage. Second, we find all the datasets contain duplication with some containing a significant amount. In the Juliet dataset, we describe augmentations of test cases making the dataset susceptible to data leakage. This augmentation occurs with such frequency that a random 80/20 split has roughly 58% overlap of the test with the training data. Finally, we collect and process a large dataset of C code, named Wild C. This dataset is designed to serve as a representative sample of all C/C++ code and is the basis for our analyses.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons</p>
      <p>License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Proceedings of the Conference on Applied Machine Learning for Information Security, 2021</p>
      <p>Vulnerability detection datasets are quite diferent from other types of machine learning
datasets because they require cybersecurity experts to provide labels. Thus, datasets cannot be
easily crowd-sourced using tools such as Mechanical Turk and are far more expensive to produce.
Many dataset producers have found ways to avoid this problem, but their methods run the risk
of introducing biases into the data.</p>
      <p>These biases may result in a model that fails to generalize. If the datasets portray a limited
view of how C/C++ code is written, they may not understand the full diversity of the language.
For example, a natural-language model trained only on the collected works of Dr. Seuss would
not be expected to perform well on Shakespeare, Twitter, or any other number of sources. It is
these biases and any additional shortcomings that we seek to uncover.</p>
      <p>In this paper, we explore 7 vulnerability datasets in the C/C++ language family. These
datasets were selected based on their usage and to provide a variety of perspectives on machine
learning-assisted vulnerability detection. The datasets can be categorized along two dimensions.
The first is granularity or the level at which the information is sampled: functions, files, scripts,
and projects. Function-level datasets contain only the signatures and contents of functions.
Filelevel contain the contents of a single file. Unless the file happens to be independent, they are
typically not compilable. Scripts are single- or multi-file programs with a single purpose, such
as demonstrating a vulnerability. Projects contain the entirety of an application derived from
a publicly accessible repository. The second dimension is whether the contents are compilable.
Functions and files are typically not compilable while scripts and projects are.</p>
      <p>Our paper makes three contributions. First, we analyze the representivity of each of the
datasets. We find that datasets drawn from existing code-bases are more representative than
hand-crafted datasets. Second, we analyze the duplicativeness of the datasets. We find that
all of them contain duplication with some containing a significant amount. Finally, we collect
and process a large dataset of C code, named Wild C. This dataset is designed to serve as a
representative sample of all C/C++ code and is the basis for our analyses.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Datasets</title>
      <p>
        Big-Vul
Big-Vul, published in Fan et al. [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ], is available as a repository of scripts and CSV files [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ]. The
dataset was collected by crawling the Common Vulnerabilities and Exposures (CVE) database
[
        <xref ref-type="bibr" rid="ref41">46</xref>
        ] and linking the CVEs with open-source GitHub projects. Using commit information, the
authors extracted code changes related to the CVE. The resulting CSV files contain extracted
functions before and after the commit that fixed the vulnerability. The scripts are included for
reproducibility of this process, but we were unable to get them to execute properly. Thankfully,
a 10GB CSV containing all of the processed data is available for download.
2.2
      </p>
      <sec id="sec-2-1">
        <title>SonarCloud Vulnerable Code Prospector for C (SVCP4C)</title>
        <p>
          Raducu et al. [
          <xref ref-type="bibr" rid="ref46">51</xref>
          ] take a diferent approach to collecting vulnerable code. Instead of relying on
the existing datasets provided by the NIST or CVE database, it draws from open-source projects
whose code is processed using the SonarCloud vulnerability scanner [
          <xref ref-type="bibr" rid="ref54">59</xref>
          ]. This is performed
directly through the SonarCloud API which allows public access to scrape-friendly vulnerability
data. SVCP4C is technically a tool for collecting data. However, the authors do provide a dataset
in the paper. This is the data that we review. All files in the dataset contain vulnerabilities and
comments detailing the vulnerable lines.
2.3
Juliet is the largest hand-created1 C/C++ vulnerability dataset with entire programs [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The
dataset is available in C/C++, C#, and Java variants. Each has a large number of test cases,
but C/C++ is the largest with 64, 099. The test cases are divided by CWE, although some cases
contain multiple CWEs. Each test case can be compiled into a separate program or combined
into a monolithic binary. Compilation options allow the test cases to be compiled into safe or
vulnerable versions with minimal code changes. Some test cases are only compilable on Windows
machines, but the majority are cross-platform.
        </p>
        <p>
          In a brief survey, we found at least 23 papers that used the Juliet dataset directly.
Additionally, Juliet is a major component of the National Institute of Standards and Technology
(NIST) Software Assurance Reference Dataset (SARD) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. When large datasets are drawn from
the SARD, they are likely relying upon Juliet in some way. Because of this prevalence, Juliet
deserves an extra level of scrutiny.
2.4
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>ManyBugs &amp; IntroClass</title>
        <p>
          ManyBugs and IntroClass are a pair of datasets presented by Le Goues et al. [
          <xref ref-type="bibr" rid="ref29">33</xref>
          ]. These datasets
are designed to be a benchmark for automated repair methods. ManyBugs contains 185 defects
across 9 open-source programs. These defects were collected from version control. In total, it
has 5.9 million lines of code and 10, 000+ test cases. IntroClass consists of 998 defects from
student submissions of six programming assignments. It includes input/output test cases for
each programming assignment.
2.5
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>A Taxonomy of Bufer Overflows</title>
        <p>A Taxonomy of Bufer Overflows is unique because it attempts to create a structured taxonomy
of bufer overflows based on 22 attributes. The result is 291 diferent bufer overflows. For each
type, three flawed examples (overflow just outside, somewhat outside, and far outside) and a
non-vulnerable version are included. This results in a total of 873 vulnerabilities. Due to the
diversity of vulnerabilities in this dataset, it provides a distinctive opportunity for testing a
vulnerability detection method against a full range of possibilities. Taxonomy is included as
part of the NIST SARD.
2.6</p>
      </sec>
      <sec id="sec-2-4">
        <title>Draper Vulnerability Detection in Source Code (VDISC)</title>
        <p>
          The Draper VDISC dataset was produced as part of the Defense Advanced Research Projects
Agency’s (DARPA) Mining and Understanding Software Enclaves (MUSE) project [
          <xref ref-type="bibr" rid="ref49">54</xref>
          ]. To
build the dataset the authors collected code from the Debian Linux distribution and public
Git repositories from GitHub. They split the code into functions and using a custom minimal
lexer then removed duplicate functions. The strict process used by the authors for removing
duplicates resulted in only 10.8% of the collected functions being included in the dataset.
        </p>
        <p>The authors labeled the remaining functions by using three open-source static source-code
analyzers: Clang, Cppcheck, and Flawfinder. Because each of these tools has disparate outputs,
the authors mapped the results into their corresponding CWEs. Despite including code from
the Juliet dataset in their internal dataset, the authors do not include it in the publicly released
version.</p>
        <p>1Juliet is generated using custom software, but the test cases have been created by hand. The software is not
publicly available.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Wild C</title>
      <p>
        For this paper, we want to compare the datasets to realistic C/C++ source code. It is beyond
the scope of this (or any) paper to collect all C/C++ source code. Instead, we created a dataset
named Wild C from GitHub repositories [
        <xref ref-type="bibr" rid="ref21">23</xref>
        ].
      </p>
      <p>To collect these repositories, we made use of GitHub’s public search API using a simple
scraping algorithm. At the time of writing, GitHub had limitations on their API that made
collection challenging. First, the search endpoint we used is rate-limited to 5, 000 requests per
hour. This limits the queries to one every 0.72 seconds on average. Because cloning repositories
takes some time, we did not encounter this problem in practice. However, a simple solution
would be to perform rate-limiting on the client side.</p>
      <p>Second, GitHub will only return 1, 000 results per search query. This means that our search
queries must be limited to under 1, 000 results. We accomplished this by searching for repositories
with less than or equal to a certain number of stars and sorting the results by the number of
stars (descending). We then iterate over the search results until we encounter a page ending
with a repository starred fewer times than our current search maximum. Instead of requesting
another page, we change the search to lower the maximum number of stars.</p>
      <p>Using this method, we were able to collect 36, 568 repositories with at least 10 stars each.
While there are many repositories with less than 10 stars, we found that they contained far less
code and were likely to have a "spike" of commits followed by little-to-no activity. This indicates
that most of these repositories are likely to be one-of projects, programming assignments, and
similar.</p>
      <p>The collected repositories contain 9, 068, 351 C and 3, 098, 624 C++ files for a total of
12, 166, 975 source code files. In addition to using 10 stars as a cutof to prevent
diminishing returns, we also use it as a soft metric to assess approval by outside reviewers. The code
collected is efectively a sample of C/C++ that is present in public repositories with some degree
of community acceptance. There are a few areas where Wild C may not be entirely
representative. First, it may favor code that complies with community standards which are strongly
encouraged on GitHub. Second, it may favor less buggy code as many of the projects may have
active communities. Finally, code in private repositories may difer from public repositories due
to the code’s functionality being necessarily private or due to the intrinsic privacy of the code.
No one sees the hidden bad practices. Despite these potential areas of divergence, we believe
that the collection methods and the size of the dataset indicate it is suficiently close to a truly
representative sample of C/C++ for our purposes.</p>
      <p>We next extracted tokens from each file. For ease of use, the C/C++ and tokens were
packaged into a collection of parquet files. While the dataset is licensed as CC-BY 4.0, the
individual source files are licensed under their original repositories. We have released this dataset
for public consumption. To the best of our knowledge, it is the first public dataset of C/C++
code and paired tokens of this size.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Preprocessing</title>
      <p>Comparing the datasets requires that they be in a consistent format. This is a dificult task
since they are not available in a standard format. Some datasets contain whole software projects,
others single files, others individual functions. Ideally, we would be able to compile all of the
ifles. With compiled files, we could compare their source, assembly, and binary format. However,
only a few of the datasets are compilable. Thus, we will limit our comparison to source code.</p>
      <p>As the first step, we downloaded the datasets and extracted all code into C-files. This
worked best when the datasets already contained whole projects or whole files. When the
datasets contained functions, we extracted each function into a separate file. While this results
in invalid C-files, it allows us to trace later steps directly to the function.</p>
      <p>Character on which the token starts, relative to line start</p>
      <p>
        With all the code in C-files, we tokenize the source using ANother Tool for Language
Recognition, also known as ANTLR [
        <xref ref-type="bibr" rid="ref45">50</xref>
        ]. ANTLR is a generic parser generator that has an existing
context-free grammar for C. Each of the C-files was converted to tokens in a CSV format.
The CSV files contain columns listed in Table 4. These CSV files are the basis for all of the
comparisons.
5
5.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <sec id="sec-5-1">
        <title>Number of Tokens Per File</title>
        <p>For a machine learning model to generalize, the distribution of data should remain consistent
from training to inference. The further the distance between these distributions, the less likely
the model is to generalize. Our first comparison is the number of tokens per file aggregated for
each dataset. In other words, this allows us to compare the file lengths across diferent datasets.
Figure 1 plots the kernel density estimate for each dataset. The x-axis is the number of tokens
in a given file and the Y-axis is the estimated density of files that contain the specific number of
0
1
2
5
6
7
8
9
0
1
2
5
6
7
8
9
explicit
char16_t
register
3 static_cast
4 ::</p>
        <p>const_cast
3 CharLiteral
4 &lt;=
typedef
wchar_t</p>
        <p>signed
register
double</p>
        <p>
          ...
char
tokens. As is evident, the vulnerability datasets are quite diferent from Wild C. We quantified
this using energy distance[
          <xref ref-type="bibr" rid="ref58">64</xref>
          ] between each histogram and the histogram for Wild C and present
the results in Table 3 (Hist. Dist. column). While one could hope for better agreement, the
results are expected. ManyBugs is collected from several large open-source projects, similar to
Wild C, and has the closest agreement. Conversely, Taxonomy of Bufer Overflows contains
minimalist examples of bufer overflows which causes a spike in the KDE around 100 tokens per
ifle and the maximum energy distance. While Juliet has a better than average distance, it lacks
some of the longer files found in Wild C.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Token Usage by Dataset</title>
        <p>Next, we compare the total usage of each token by dataset to its usage in Wild C. This measures
how frequently each dataset uses the tokens. One of the most important observations is that each
dataset is missing some of the token types. While Draper VDISC misses only two uncommon
tokens (alignas, noexcept), Taxonomy of Bufer Overflows misses tokens such as Not, /, and
this. Juliet misses tokens such as +=, continue, and enum. As shown in Table 3, the datasets
such as Juliet and SVCP4C have a significant number of missing tokens. But these tokens are
not frequently used in Wild C and thus don’t cause a large increase in the Use %. To account
for the disparate lengths of the files, the token frequencies were normalized by the most-frequent
token. This was Identifier for all of the datasets.</p>
        <p>Usage of tokens is subject to extreme outliers as shown in Table 5. IntroClass uses the
% token 3, 200% more than Wild C, ManyBugs uses extern 2, 010% more, and Taxonomy of
Bufer Overflows uses do 5, 711% more. However, the furthest outlier belongs to Juliet which
uses wchar_t an astounding 34, 435% more than Wild C. The wchar_t data type is found in
29, 264 test cases. A review of Juliet indicates that the dramatic increase is likely due to how
Juliet creates test cases. Juliet has many test cases that are near-identical with slight tweaks to
their relevant data types. This is explored further in Section 5.4
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Bigram Usage by Dataset</title>
        <p>Extending the analysis of token types, we next compare the frequency of usage for bigrams of
tokens. Bigrams are commonly used in natural language processing to provide context that
individual tokens lack. We continue to normalize by the most frequent bigram per dataset. An
upper bound on the total number of bigrams, derived from the 130 tokens, is 16, 900. Because
many of those bigrams would be invalid in C/C++, we do not have an exact total for the number
of possible bigrams. In our datasets, we observe 8, 195 unique bigrams. These results are shown
in Table 3.</p>
        <p>The number of bigrams present in Wild C that are missing in the datasets is far larger than
the number of tokens. Draper VDISC, the dataset with the most bigrams, contains only 42.6%
of the bigrams in Wild C. However, that still represents only 0.054% of bigram usage. Two
datasets stand out when compared to each other. SVCP4C is missing 81.2% of the bigrams, but
these are only used 0.320% of the time. Juliet slightly increases the number of missing bigrams
to 93.2% but drastically increased the missing usage percentage to 4.651%.</p>
        <p>Figure 2 shows the kernel density estimate for the bigram usage of each dataset. From
this perspective, we can see a strong dividing line between collected and generated datasets.
Juliet, IntroClass, and Taxonomy of Bufer Overflows are all created specifically for vulnerability
detection or bug fixing. They have no less than 4.6% missing bigrams and are separated into
a cluster of three furthest away from the Wild C distribution. ManyBugs, SVCP4C, Draper
VDISC, and Big-Vul all represent source code drawn from open-source codebases. Each is
missing less than 0.5% of bigrams and is closer to the overall distribution of Wild C.</p>
        <p>Notably, the proximity to Wild C appears to be correlated to the size of the dataset. Draper
VDISC contains 1, 274, 466 files and is most similar to Wild C. It is followed by ManyBugs with
223, 052 files, Big-Vul with 142, 937 files, and SVCP4C with 11, 376 files respectively.
5.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>Juliet Data Leakage Analysis</title>
        <p>The augmentation of test cases in Juliet has implications for using it or the SARD, of which
Juliet is a significant subset, as a source for training and test data. Randomly splitting the
Juliet dataset, regardless of whether stratified by CWE, will introduce data leakage between
training and test sets. Consider the task of detecting faces in an image. If the dataset was
augmented by changing the hair or eye color of faces, splitting the dataset randomly would
cause near-duplicate images to be placed in the test and train datasets. Data leakage of this
type could lead to significantly inflated test performance, a failure to generalize, and more.</p>
        <p>To evaluate the extent of this potential data leakage, we first identified the test groups.
Figure 3 shows the augmentation which was used to build Juliet. While there are 100, 883 files
in Juliet, there are approximately 61, 000 unique test groups. On average, each test group has
1.64 augmentations (ranging between 1 and 5). The majority of files exist in test groups that
contain two files.</p>
        <p>With these test groups identified, we performed 500 random splits of the Juliet files using a
standard 80/20 ratio. For each of these splits, we determined how many files from the test set had
augmentations that existed in the training set and vice versa. Figure 4 shows the distribution
of these numbers. Without accounting for the augmentation, splitting the Juliet files results in
a mean of 58.3% overlap of the test split with the train split and 22.1% of the train split with
the test split.
5.5</p>
      </sec>
      <sec id="sec-5-5">
        <title>Near-Duplicate Files</title>
        <p>
          As a final analysis, we measured the number of near-duplicate files in each dataset. While
the most precise method for finding near-duplicates would be based on the raw contents of the
ifle, that may miss semantically similar files. To account for this, we again based our
nearduplicate detection on the token types. We used MinHash with Locality-Sensitive Hashing from
the datasketch library to find near-duplicates with a Jacquard similarity threshold of 0.99 [
          <xref ref-type="bibr" rid="ref71">77</xref>
          ].
        </p>
        <p>Near-duplicates come in groups. Most of the time it’s not two files that are identical to each
other. Rather, there is a group of files that share similar attributes. Analyzing these groups is
somewhat complicated. A group of 10 duplicates is far more consequential for a dataset with
100 files than a dataset of 1 million files. However, normalizing by the total number of files in
the dataset makes it dificult to determine how many files are in any given group. We attempt
to balance these tensions in Figure 5. This figure shows the cumulative density functions (CDF)
for the percent of files and percent of groups over as the group size increases. The X-axes contain</p>
        <p>Unique Groups % of Dataset Test Split % Test w/Match
% Train w/Match
the group size presented on a log scale and normalized by the total number of files. Each dataset
has the same axes limits and two vertical bars. The solid vertical bar indicates where a group
with 2 files would be placed on the X-axis. Similarly, the dotted vertical bar indicates where a
group of 100 files would be placed.</p>
        <p>Starting with Wild C, we can see that there is some amount of duplication in wild source
code. This is logical for at least two reasons: (1) programmers frequently share source code
that gets copied and remixed; (2) discrete sections of source code are likely to repeat tasks. The
largest group of near-duplicate files in Wild C has 132, 389 files which are constant-definition
ifles. The next largest group only has 25,751 and group sizes reduce quickly from there.</p>
        <p>The plot of the number of files for Draper VDISC is similar to the number of groups for Wild
C. This is a clear indication of the eficacy of their duplicate removal process. The diference
in raw numbers can be explained by their slightly stricter approach to detecting duplicates. Of
note, the plot for Big-Vul exhibits similar evidence of deduplication. This deduplication is not
mentioned in the paper.</p>
        <p>Figure 5 also shows that nearly all of the files from SVCP4C, Juliet, ManyBugs, and
Taxonomy, and IntroClass have at least one duplicate. Our analysis revealed the root cause of this
duplication:
• IntroClass draws from a limited number of assignments.
• Taxonomy is hand generated with intentional duplication to demonstrate good/bad code.
• ManyBugs includes multiple copies of the same applications.
• Juliet 1.3 includes augmentations for many vulnerability examples.</p>
        <p>Because SVCP4C uses SonarCloud vulnerability detection to label their dataset, some
amount of duplication is expected. The algorithms used by SonarCloud are likely to pick up
common patterns within source-code whether they are true or false positives. Draper VDISC
likely sufered a similar problem with duplication before their duplicate removal process.</p>
        <p>Table 6 reverses the analysis and provides the number of "near-unique" files and that number
as a percentage of the original dataset. This is analogous to the number of groups of
nearduplicates for each dataset. It also provides the mean percentage of test samples with a
nearduplicate in the training data and the mean percentage of training samples with a near-duplicate
in the test data for 500 random splits of the indicated split size. It’s important to note that while
the method of calculating the metric is the same as for Section 5.4, the means of identifying
duplicates are diferent. This leads to a metric for Juliet than previously provided.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Our work makes three significant contributions: (1) analysis of representivity, (2) analysis of
duplicativeness, and (3) availability of Wild C. Our work shows that there are significant
diferences between the selected datasets and wild C/C++ code. As a result, some of the datasets
may have limited usefulness for machine learning-assisted software vulnerability detection. The
IntroClass and Taxonomy of Bufer Overflows datasets are not well suited for this task. They
have over 97% of bigrams missing with a significant portion of those being in common usage.
Because of this, it would be possible to have high performance on these datasets while learning
less than 68% of the C/C++ language. They also exhibit significant diferences in length, token
usage, and bigram usage. Based on our analysis, we assess that they do not contain enough
diversity to be a thorough test set nor size to be a training set.</p>
      <p>Conversely, Big-Vul, SVCP4C, and ManyBugs proved to be reasonably close to Wild C. They
had among the fewest missing tokens &amp; bigrams and lowest token &amp; bigram usage diference.
However, all three had a high degree of duplication and diferent drawbacks. Big-Vul contains
only 3, 754 vulnerabilities and is not compilable because it only contains functions. While it
appears to have been deduplicated, a significant amount of near-duplicates remain. It may be
a suitable test dataset if the method uses file- or function-level information, pending further
analysis of deduplication.</p>
      <p>SVCP4C has only 1, 104 unique groups after deduplication, a reduction of 90.29%. The
collection method also means that any model trained on SVCP4C will be learning to emulate
SonarCloud rather than learning the ground truth of vulnerabilities. For these two reasons, we
recommend future work using SVCP4C address duplication and collection biases before usage.</p>
      <p>ManyBugs does have a slight edge over SVCP4C and Big-Vul because it contains entire
projects that are compilable. Despite this, it had the most duplication with unique groups
making up only 3.67% of the original dataset. A model trained on the dataset as provided may
learn to “spot the diference" between projects rather than identifying vulnerabilities. However,
ManyBugs has potential as a test dataset because it contains large, real-world projects. We
recommend that before using ManyBugs, the duplication be addressed.</p>
      <p>Based on our evaluation of the metrics, Draper VDISC appears to be a promising dataset
for training and testing machine learning models. It has a permissive license, contains 87, 804
vulnerabilities, and has 1.27 million functions. Unfortunately, it is not compilable and is not
able to be used with methods that require intermediate, assembly, or binary representations. We
have two outstanding concerns regarding the use of this dataset. First, the collection method
is similar to SVCP4C. In this case, the authors used multiple static analysis tools to identify
vulnerabilities and combined the results. Analysis of SVCP4C showed that the code identified by
SonarCloud was very similar. Because the authors of Draper VDISC deduplicated their dataset
before releasing the dataset, we were unable to analyze the similarity of the dataset before
deduplication. It is possible that using the intersection of static analysis tools led to a higher
level of duplication. Additionally, any model trained on Draper VDISC is ultimately learning the
tools rather than the underlying ground truth. Second, our near-duplicate detection identified
26.88% of the dataset as near-duplicates despite the authors performing deduplication. As the
authors detail, their deduplication strategy was strict. A near-duplicate detection strategy may
lead to a more useful dataset. While we assess that Draper VDISC has strong potential, we
recommend future work address the above-mentioned concerns.</p>
      <p>
        A significant contribution in this paper is the discussion of test case augmentation within
Juliet. While others have stated their concerns [
        <xref ref-type="bibr" rid="ref59">65</xref>
        ], we believe this is the first empirical analysis
of the drawbacks of using Juliet as a training and/or test set. Many of the papers making use
of Juliet or the NIST SARD (it’s parent dataset) do not address this augmentation or describe
steps to remove it [
        <xref ref-type="bibr" rid="ref15 ref18 ref22 ref32 ref34 ref37 ref38 ref4 ref42 ref48 ref5 ref52 ref57 ref60 ref61 ref67 ref69">63, 57, 5, 15, 53, 41, 25, 47, 42, 20, 75, 36, 66, 4, 38, 73, 67</xref>
        ]. Because of
the high potential for data leakage if augmentations are not removed, we believe the evidence
supports using caution when reviewing metrics based on Juliet as they may not be an accurate
reflection of their accuracy on real-world code. For future work using Juliet as a training and/or
test dataset, we recommend that appropriate measures be taken to mitigate the potential for
data leakage and that those measures be clearly stated to avoid ambiguity.
      </p>
      <p>Finally, we are pleased to provide the Wild C dataset to the public. There are a wide
variety of potential uses for this dataset. Due to its size and composition, it is suitable as a
representative sample of the overall distribution of C/C++ source code. This is a critical factor
for our analysis and enables the dataset to be used as a precursor for additional tasks. With
some processing, it is possible to extract any file- or function-level information and build a
taskspecific dataset. Potential tasks include, but are not limited to: comment prediction, function
name recommendation, code completion, and variable name recommendation. There is also
potential for automatic bug insertion to provide an expanded vulnerability detection dataset.
Wild C is available at https://github.com/mla-vd/wild-c.
6.2</p>
      <sec id="sec-6-1">
        <title>Future Work</title>
        <p>The are many areas where this work could be expanded. First, we only considered C/C++
datasets. This is the most commonly used language family for machine learning-assisted software
vulnerability detection, but it is not the only one. Datasets from other languages exist and
deserve similar analysis.</p>
        <p>Second, we only compared the datasets in their entirety. Further analysis may compare the
diference between the safe and vulnerable subsets of the datasets with each other and wild
C/C++ code. While this has the potential to elucidate useful diferences between safe and
vulnerable code, more likely it will further highlight the problems with the existing datasets.
Additionally, further work is needed to determine how much deduplication of Big-Vul, SVCP4C,
and ManyBugs would reduce the number of vulnerabilities each.</p>
        <p>Perhaps the most pressing need from future research is the creation of vulnerability-detection
benchmarks. Juliet has been used for this purpose in previous papers, but our analysis brings
that usage into question. Given the diversity of the dataset types among those selected (e.g., files,
functions, programs) it is unlikely that a single dataset could serve as a universal training dataset
similar to those available for computer vision tasks. This does not mean that a benchmark is
infeasible. Such a benchmark should meet at least five requirements. (1) It must be drawn
from real-world code. As illustrated in Section 5.3, there is a distinct and quantifiable diference
between synthetic and natural code. The barriers to labeling real-world code are likely far lower
than bringing synthetic code into the real-world distribution of usage. (2) It must be compilable.
This will enable it to support methods that work on assembly, binaries, or otherwise require
compilable code. (3) It should exercise a suficient diversity of C/C++. This will allow the
dataset to avoid issues with missing tokens/bigrams and ensure that the model understands
the language. Further testing is needed to determine how much diversity is necessary. (4) It
should be dificult enough to act as a viable benchmark. A benchmark that is too easy will
quickly outlive its usefulness. This dificulty should not only include the depth and likelihood of
a vulnerability but the amount of code “noise" surrounding the vulnerabilities. (5). It should be
deduplicated. As shown in Section 5.5, even Wild C is subject to a large degree of duplication.
This should be removed to ensure that the model isn’t biased towards the features present in
duplicates.</p>
        <p>Given the limited datasets that exist today, much work is needed in the field of machine
learning-assisted vulnerability detection. While the methods being applied to the datasets are
promising, we assess that the limiting factor may be the datasets themselves. However, many
machine learning tasks seemed out of reach just a few years ago. Machine learning researchers
have performed astounding tasks in many areas and we expect to count this as one in the future.
[16] B. H. Dang. A practical approach for ranking software warnings from multiple static code
analysis reports. In 2020 SoutheastCon, volume 2, pages 1–7. IEEE, 2020.
[17] R. Demidov and A. Pechenkin. Application of siamese neural networks for fast vulnerability
detection in mips executable code. In Proceedings of the Future Technologies Conference,
pages 454–466. Springer, 2019.
[24] T. Helmuth. General program synthesis from examples using genetic programming with
parent selection based on random lexicographic orderings of test cases. PhD thesis, University
of Massachusetts Amherst, 2015.
[30] J. A. Kupsch, E. Heymann, B. Miller, and V. Basupalli. Bad and good news about
using software assurance tools. Software: Practice and Experience, 47(1):143–156, 2017.
doi: https://doi.org/10.1002/spe.2401. URL https://onlinelibrary.wiley.com/doi/
abs/10.1002/spe.2401.
[45] M. J. Michl. Analyse sicherheitsrelevanter Designfehler in Software hinsichtlich einer
Detektion mittels Künstlicher Intelligenz. PhD thesis, Technische Hochschule, 2021.
[61] G. Stergiopoulos, P. Petsanas, P. Katsaros, and D. Gritzalis. Automated exploit detection
using path profiling: The disposition should matter, not the position. In 2015 12th
International Joint Conference on e-Business and Telecommunications (ICETE), volume 04,
pages 100–111, 2015.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alikhashashneh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raje</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hill</surname>
          </string-name>
          .
          <article-title>Using software engineering metrics to evaluate the quality of static code analysis tools</article-title>
          .
          <source>In 2018 1st International Conference on Data Intelligence and Security (ICDIS)</source>
          , pages
          <fpage>65</fpage>
          -
          <lpage>72</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Amankwah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Amponsah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kudjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ocran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. O.</given-names>
            <surname>Anang</surname>
          </string-name>
          .
          <article-title>Fast bug detection algorithm for identifying potential vulnerabilities in juliet test cases</article-title>
          .
          <source>In 2020 IEEE 8th International Conference on Smart City and Informatization (iSCI)</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/iSCI50694.
          <year>2020</year>
          .
          <volume>00021</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Amorim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Freitas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dantas</surname>
          </string-name>
          , E. F. de Souza,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Camilo-Junior</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Martins</surname>
          </string-name>
          .
          <article-title>A new word embedding approach to evaluate potential fixes for automated program repair</article-title>
          .
          <source>In 2018 International Joint Conference on Neural Networks (IJCNN)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arakelyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hauser</surname>
          </string-name>
          , E. Kline,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          .
          <article-title>Towards learning representations of binary executable files for security tasks</article-title>
          . arXiv:
          <year>2002</year>
          .03388 [cs, stat],
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arakelyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arasteh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hauser</surname>
          </string-name>
          , E. Kline,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Galstyan.</surname>
          </string-name>
          <article-title>Bin2vec: learning representations of binary executable programs for security tasks</article-title>
          .
          <source>Cybersecurity</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bilgin</surname>
          </string-name>
          . Code2image:
          <article-title>Intelligent code analysis by computer vision techniques and application to vulnerability prediction</article-title>
          .
          <source>arXiv preprint arXiv:2105.03131</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bilgin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ersoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. U.</given-names>
            <surname>Soykan</surname>
          </string-name>
          , E. Tomur,
          <string-name>
            <given-names>P.</given-names>
            <surname>Çomak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Karaçay</surname>
          </string-name>
          .
          <article-title>Vulnerability prediction from source code using machine learning</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>8</volume>
          :
          <fpage>150672</fpage>
          -
          <lpage>150684</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Black</surname>
          </string-name>
          .
          <article-title>A software assurance reference dataset: Thousands of programs with known bugs</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Black</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Black</surname>
          </string-name>
          .
          <article-title>Juliet 1.3 Test Suite: Changes From 1.2</article-title>
          . US Department of Commerce,
          <source>National Institute of Standards and Technology</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Irvine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Saha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ray</surname>
          </string-name>
          .
          <article-title>Entropy guided spectrum based bug localization using statistical language model</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .06947,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Mao</surname>
          </string-name>
          . Bodhi:
          <article-title>Detecting bufer overflows with a game</article-title>
          .
          <source>In 2012 IEEE Sixth International Conference on Software Security and Reliability Companion</source>
          , pages
          <fpage>168</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1109/SERE-C.
          <year>2012</year>
          .
          <volume>35</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kommrusch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Monperrus</surname>
          </string-name>
          .
          <article-title>Neural transfer learning for repairing security vulnerabilities in c code</article-title>
          .
          <source>arXiv preprint arXiv:2104.08308</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , H.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hua</surname>
            , G. Xu, and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sui</surname>
          </string-name>
          . Deepwukong:
          <article-title>Statically detecting software vulnerabilities using deep graph neural network</article-title>
          .
          <source>ACM Trans. Softw</source>
          . Eng. Methodol.,
          <volume>30</volume>
          (
          <issue>3</issue>
          ), Apr.
          <year>2021</year>
          .
          <article-title>ISSN 1049-331X</article-title>
          . doi:
          <volume>10</volume>
          .1145/3436877. URL https://doi-org.
          <source>ezproxy. libraries.wright.edu/10</source>
          .1145/3436877.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] M.
          <article-title>-j.</article-title>
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jeong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>and J.</given-names>
          </string-name>
          <string-name>
            <surname>Choo</surname>
          </string-name>
          .
          <article-title>End-to-end prediction of bufer overruns from raw source code via neural memory networks</article-title>
          .
          <source>arXiv preprint arXiv:1703.02458</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Newlands</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Babar</surname>
          </string-name>
          .
          <article-title>An empirical study of rule-based and learning-based approaches for static application security testing</article-title>
          .
          <source>arXiv preprint arXiv:2107</source>
          .
          <year>01921</year>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhumbumroong</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Piromsopa</surname>
          </string-name>
          . Boundwarden:
          <article-title>Thread-enforced spatial memory safety through compile-time transformations</article-title>
          .
          <source>Science of Computer Programming</source>
          ,
          <volume>198</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          . A c/c++
          <article-title>code vulnerability dataset with code changes and cve summaries</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Mining Software Repositories</source>
          , pages
          <fpage>508</fpage>
          -
          <lpage>512</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y. Zhang.</surname>
          </string-name>
          <article-title>Eficient vulnerability detection based on abstract syntax tree and deep learning</article-title>
          .
          <source>In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)</source>
          , pages
          <fpage>722</fpage>
          -
          <lpage>727</lpage>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/INFOCOMWKSHPS50562.
          <year>2020</year>
          .
          <volume>9163061</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Duck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Roychoudhury</surname>
          </string-name>
          .
          <article-title>Beyond tests: Program vulnerability repair via crash constraint extraction</article-title>
          .
          <source>ACM Transactions on Software Engineering and Methodology (TOSEM)</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Shi, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>A comprehensive detection of memory corruption vulnerabilities for c/c++ programs</article-title>
          .
          <source>In 2018 IEEE Intl Conf on Parallel &amp; Distributed Processing with Applications, Ubiquitous Computing &amp; Communications, Big Data &amp; Cloud Computing, Social Computing &amp; Networking</source>
          ,
          <string-name>
            <given-names>Sustainable</given-names>
            <surname>Computing</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>Communications</surname>
          </string-name>
          (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages
          <fpage>354</fpage>
          -
          <lpage>360</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[23] github. Github</source>
          ,
          <year>2020</year>
          . URL https://github.com/.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jeon</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Kim. Autovas</surname>
          </string-name>
          :
          <article-title>An automated vulnerability analysis system with a deep learning approach</article-title>
          .
          <source>Computers &amp; Security</source>
          ,
          <volume>106</volume>
          :
          <fpage>102308</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>A secure-coding and vulnerability check system based on smart-fuzzing and exploit</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>256</volume>
          :
          <fpage>23</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>2017</year>
          . ISSN 0925-
          <fpage>2312</fpage>
          . doi: https://doi.org/10.1016/j.neucom.
          <year>2015</year>
          .
          <volume>11</volume>
          .139. URL https://www.sciencedirect.com/ science/article/pii/S0925231217304113.
          <article-title>Fuzzy Neuro Theory and Technologies for Cloud Computing</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Stolee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Le Goues</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brun</surname>
          </string-name>
          .
          <article-title>Repairing programs with semantic code search (t)</article-title>
          .
          <source>In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)</source>
          , pages
          <fpage>295</fpage>
          -
          <lpage>306</lpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Koyuncu</surname>
          </string-name>
          .
          <article-title>Boosting Automated Program Repair for Adoption By Practitioners</article-title>
          .
          <source>PhD thesis</source>
          , University of Luxembourg, Luxembourg,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kratkiewicz</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Lippmann</surname>
          </string-name>
          .
          <article-title>A taxonomy of bufer overflows for evaluating static and dynamic software testing tools</article-title>
          .
          <source>In Proceedings of Workshop on Software Security Assurance Tools, Techniques, and Metrics</source>
          , volume
          <volume>500</volume>
          , pages
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [31]
          <string-name>
            <surname>X.-B. D. Le</surname>
            ,
            <given-names>D.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Le Goues</surname>
            , and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>S3: syntax-and semanticguided repair synthesis via programming by examples</article-title>
          .
          <source>In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          , pages
          <fpage>593</fpage>
          -
          <lpage>604</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [32]
          <string-name>
            <surname>X. B. D. Le</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Thung</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lo</surname>
            , and
            <given-names>C. Le</given-names>
          </string-name>
          <string-name>
            <surname>Goues</surname>
          </string-name>
          .
          <article-title>Overfitting in semantics-based automated program repair</article-title>
          .
          <source>Empirical Software Engineering</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>3007</fpage>
          -
          <lpage>3033</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>C.</given-names>
            <surname>Le Goues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Holtschulte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. K.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Devanbu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Forrest</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Weimer</surname>
          </string-name>
          .
          <article-title>The manybugs and introclass benchmarks for automated repair of c programs</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          ,
          <volume>41</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1236</fpage>
          -
          <lpage>1256</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Oh</surname>
          </string-name>
          .
          <article-title>Memfix: static analysis-based repair of memory deallocation errors for c</article-title>
          .
          <source>In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering</source>
          , pages
          <fpage>95</fpage>
          -
          <lpage>106</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Baek</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K.-W. Park. Instruction2vec:
          <article-title>eficient preprocessor of assembly code to detect software weakness with cnn</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>9</volume>
          (
          <issue>19</issue>
          ):
          <fpage>4086</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Lim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Learning binary code with deep learning to detect software weakness</article-title>
          .
          <source>In KSII The 9th International Conference on Internet (ICONI) 2017 Symposium</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kwon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>A scalable approach for vulnerability discovery based on security patches</article-title>
          .
          <source>In International Conference on Applications and Techniques in Information Security</source>
          , pages
          <fpage>109</fpage>
          -
          <lpage>122</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>V-fuzz: Vulnerability-oriented evolutionary fuzzing</article-title>
          . arXiv:
          <year>1901</year>
          .01142 [cs],
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          .
          <article-title>Fault localization with code coverage representation learning</article-title>
          .
          <source>In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)</source>
          , pages
          <fpage>661</fpage>
          -
          <lpage>673</lpage>
          . IEEE,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          .
          <article-title>Vulnerability detection with fine-grained interpretations</article-title>
          .
          <source>arXiv preprint arXiv:2106.10478</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Deng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhong. Vuldeepecker</surname>
          </string-name>
          :
          <article-title>A deep learning-based system for vulnerability detection</article-title>
          .
          <source>arXiv preprint arXiv:1801.01681</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiang</surname>
          </string-name>
          .
          <article-title>Deep learning-based vulnerable function detection: A benchmark</article-title>
          .
          <source>In International Conference on Information and Communications Security</source>
          , pages
          <fpage>219</fpage>
          -
          <lpage>232</lpage>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lutellier</surname>
          </string-name>
          .
          <article-title>Machine Learning for Software Dependability</article-title>
          .
          <source>PhD thesis</source>
          , University of Waterloo,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lutellier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Coconut: combining context-aware neural translation models using ensemble for program repair</article-title>
          .
          <source>In Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>114</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mitre</surname>
          </string-name>
          .
          <article-title>Common vulnerabilities</article-title>
          and exposures,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>H. N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teerakanok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Inomata</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Uehara</surname>
          </string-name>
          .
          <article-title>The comparison of word embedding techniques in rnns for vulnerability detection</article-title>
          .
          <source>ICISSP</source>
          <year>2021</year>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>V. P. L.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Le Goues</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Camilo-Junior</surname>
          </string-name>
          .
          <article-title>Improved crossover operators for genetic programming for program repair</article-title>
          .
          <source>In International Symposium on Search Based Software Engineering</source>
          , pages
          <fpage>112</fpage>
          -
          <lpage>127</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>V. P. L.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , E. F. de Souza, C. Le
          <string-name>
            <surname>Goues</surname>
            , and
            <given-names>C. G.</given-names>
          </string-name>
          <string-name>
            <surname>Camilo-Junior</surname>
          </string-name>
          .
          <article-title>Improved representation and genetic operators for linear genetic programming for automated program repair</article-title>
          .
          <source>Empirical Software Engineering</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>2980</fpage>
          -
          <lpage>3006</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>T.</given-names>
            <surname>Parr</surname>
          </string-name>
          . Antlr,
          <year>2021</year>
          . URL https://www.antlr.org/.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>R.</given-names>
            <surname>Raducu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Esteban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J. Rodríguez</given-names>
            <surname>Lera</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Fernández</surname>
          </string-name>
          .
          <article-title>Collecting vulnerable source code from open-source repositories for dataset generation</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1270</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>J.</given-names>
            <surname>Renzullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Weimer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Forrest</surname>
          </string-name>
          .
          <article-title>Multiplicative weights algorithms for parallel automated software repair</article-title>
          .
          <source>In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</source>
          , pages
          <fpage>984</fpage>
          -
          <lpage>993</lpage>
          . IEEE,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Meirelles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lago</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Kon</surname>
          </string-name>
          .
          <article-title>Ranking warnings from multiple source code static analyzers via ensemble learning</article-title>
          .
          <source>In Proceedings of the 15th International Symposium on Open Collaboration</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>R.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lazovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozdemir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ellingwood</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>McConley</surname>
          </string-name>
          .
          <article-title>Automated vulnerability detection in source code using deep representation learning</article-title>
          .
          <source>In 2018 17th IEEE international conference on machine learning and applications (ICMLA)</source>
          , pages
          <fpage>757</fpage>
          -
          <lpage>762</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yoshida</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Prasad</surname>
          </string-name>
          . Elixir:
          <article-title>Efective object-oriented program repair</article-title>
          .
          <source>In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)</source>
          , pages
          <fpage>648</fpage>
          -
          <lpage>659</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saletta</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ferretti</surname>
          </string-name>
          .
          <article-title>A neural embedding for source code: Security analysis and cwe lists</article-title>
          .
          <source>In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing</source>
          ,
          <source>Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress</source>
          (DASC/PiCom/CBDCom/CyberSciTech), pages
          <fpage>523</fpage>
          -
          <lpage>530</lpage>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/
          <string-name>
            <surname>DASC-PICom-CBDCom-CyberSciTech49142</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <volume>00095</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>A.</given-names>
            <surname>Savchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Fokin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chernousov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sinelnikova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Osadchyi</surname>
          </string-name>
          .
          <article-title>Deedp: vulnerability detection and patching based on deep learning</article-title>
          .
          <source>Theoretical and Applied Cybersecurity</source>
          ,
          <volume>2</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [58]
          <string-name>
            <surname>C. D. Sestili</surname>
            ,
            <given-names>W. S.</given-names>
          </string-name>
          <string-name>
            <surname>Snavely</surname>
            , and
            <given-names>N. M.</given-names>
          </string-name>
          <string-name>
            <surname>VanHoudnos</surname>
          </string-name>
          .
          <article-title>Towards security defect prediction with ai</article-title>
          .
          <source>arXiv preprint arXiv:1808.09897</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [59]
          <string-name>
            <surname>SonarSource. Sonarcloud</surname>
          </string-name>
          ,
          <year>2008</year>
          . URL https://sonarcloud.io/.
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>E.</given-names>
            <surname>Soremekun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirschner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Böhme</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeller</surname>
          </string-name>
          .
          <article-title>Locating faults with program slicing: an empirical analysis</article-title>
          .
          <source>Empirical Software Engineering</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stergiopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Katsaros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gritzalis</surname>
          </string-name>
          .
          <article-title>Execution path classification for vulnerability analysis and detection. E-Business and</article-title>
          <string-name>
            <surname>Telecommunications. ICETE</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Communications in Computer</article-title>
          and Information Science,
          <volume>585</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Laredo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Morari</surname>
          </string-name>
          .
          <article-title>Learning to map source code to software vulnerability using code-as-a-graph</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .08614,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>G.</given-names>
            <surname>Szekely.</surname>
          </string-name>
          E-statistics:
          <article-title>The energy of statistical samples</article-title>
          .
          <source>Preprint</source>
          ,
          <volume>01</volume>
          <fpage>2003</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tanwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sundaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ashwath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ganesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          .
          <article-title>Predicting vulnerability in large codebases with deep code representation</article-title>
          .
          <source>arXiv preprint arXiv:2004.12783</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [66]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tanwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Manikandan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sundaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ganesan</surname>
          </string-name>
          , Sathish, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          <article-title>. Multi-context attention fusion neural network for software vulnerability identification</article-title>
          .
          <source>arXiv pre-print server</source>
          ,
          <year>2021</year>
          . doi: Nonearxiv:
          <volume>2104</volume>
          .09225. URL https://arxiv.org/abs/2104.09225.
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [67]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Bbreglocator: A vulnerability detection system based on bounding box regression</article-title>
          .
          <source>In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)</source>
          , pages
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          . IEEE,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          [68]
          <string-name>
            <given-names>L.</given-names>
            <surname>Trujillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Villanueva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          .
          <article-title>A novel approach for search-based program repair</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>36</fpage>
          -
          <lpage>42</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          [69]
          <string-name>
            <given-names>J.-A.</given-names>
            <surname>Văduva</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Culi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. RADOVICI</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. RUGHINIS</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>DASCALU</surname>
          </string-name>
          .
          <article-title>Vulnerability analysis pipeline using compiler based source to source translation and deep learning</article-title>
          .
          <source>eLearning &amp; Software for Education</source>
          ,
          <volume>1</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          [70]
          <string-name>
            <given-names>N.</given-names>
            <surname>Visalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Suwaida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Brown</surname>
          </string-name>
          , M. Joshi, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Wei</surname>
          </string-name>
          .
          <article-title>Towards automated security vulnerability and software defect localization</article-title>
          .
          <source>In 2019 IEEE 17th International Conference on Software Engineering Research, Management and Applications (SERA)</source>
          , pages
          <fpage>90</fpage>
          -
          <lpage>93</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          [71]
          <string-name>
            <given-names>W.</given-names>
            <surname>Weimer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Forrest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Le Goues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Trusted and resilient mission operations</article-title>
          .
          <source>Technical report</source>
          , University of Michigan Ann Arbor United States,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          [72]
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Wikman</surname>
          </string-name>
          .
          <article-title>Static analysis tools for detecting stack-based bufer overflows</article-title>
          .
          <source>Master's thesis</source>
          ,
          <source>Naval Postgraduate School</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          [73]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jin</surname>
          </string-name>
          . Vulnerability detection in c/c++
          <article-title>source code with graph representation learning</article-title>
          .
          <source>In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC)</source>
          , pages
          <fpage>1519</fpage>
          -
          <lpage>1524</lpage>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1109/CCWC51732.
          <year>2021</year>
          .
          <volume>9376145</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          [74]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Kong</surname>
          </string-name>
          .
          <article-title>Predicting efectiveness of generate-and-validate patch generation systems using random forest</article-title>
          .
          <source>Wuhan University Journal of Natural Sciences</source>
          ,
          <volume>23</volume>
          (
          <issue>6</issue>
          ):
          <fpage>525</fpage>
          -
          <lpage>534</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          [75]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y. Zhang.</surname>
          </string-name>
          <article-title>Han-bsvd: a hierarchical attention network for binary software vulnerability detection</article-title>
          .
          <source>Computers &amp; Security, page 102286</source>
          ,
          <year>2021</year>
          . ISSN 0167-
          <fpage>4048</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.cose.
          <year>2021</year>
          .102286. URL https://dx.doi.org/10.1016/j.cose.
          <year>2021</year>
          .
          <volume>102286</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          [76]
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Min</surname>
          </string-name>
          , J.-w. Lee, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Applying genetic programming with similar bug fix information to automatic fault repair</article-title>
          .
          <source>Symmetry</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>92</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          [77]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , V. Markovtsev, aastafiev, W. Łukasiewicz, ae foster, J. Martin,
          <string-name>
            <surname>Ekevoo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Thakur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ortolani</surname>
          </string-name>
          ,
          <string-name>
            <surname>Titusz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Letal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Bentley</surname>
          </string-name>
          ,
          <article-title>and fpug. ekzhu/datasketch: Improved performance for MinHash and MinHashLSH</article-title>
          , Dec.
          <year>2020</year>
          . URL https://doi.org/ 10.5281/zenodo.4323502.
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          [78]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Scalable static detection of use-after-free vulnerabilities in binary code</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>8</volume>
          :
          <fpage>78713</fpage>
          -
          <lpage>78725</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>