<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Top</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SECBENCH: A Database of Real Security Vulnerabilities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>So a Reis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rui Abreu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Engineering of University of Porto</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IST, University of Lisbon &amp; INESC-ID</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>10</volume>
      <issue>2017</issue>
      <fpage>70</fpage>
      <lpage>85</lpage>
      <abstract>
        <p>Currently, to satisfy the high number of system requirements, complex software is created which turns its development costintensive and more susceptible to security vulnerabilities. In software security testing, empirical studies typically use arti cial faulty programs because of the challenges involved in the extraction or reproduction of real security vulnerabilities. Thus, researchers tend to use hand-seeded faults or mutations to overcome these issues which might not be suitable for software testing techniques since the two approaches can create samples that inadvertently di er from the real vulnerabilities and thus might lead to misleading assessments of the capabilities of the tools. Although there are databases targeting security vulnerabilities test cases, one database contains only real vulnerabilities, the other ones are a mix of real and arti cial or even only arti cial samples. Secbench is a database of real security vulnerabilities mined from Github which hosts millions of open-source projects carrying a considerable number of security vulnerabilities. We mined 248 projects - accounting to almost 2M commits for 16 di erent vulnerability patterns, yielding a Database with 682 real security vulnerabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>Security</kwd>
        <kwd>Real Vulnerabilities</kwd>
        <kwd>Database</kwd>
        <kwd>Open-Source Software</kwd>
        <kwd>Software Testing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        According to IBM's X-Force Threat Intelligence 2017 Report [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the number of
vulnerabilities per year has been signi cantly increasing over the past 6 years.
IBM's database counts with more than 10K vulnerabilities in 2016 alone. The
most common ones are cross-site scripting and SQL injection vulnerabilities {
these are two of the main classes that incorporate the Open Web Application
Copyright c 2017 by the paper's authors. Copying permitted for private and academic
purposes.
Security Project (OWASP)'s [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] 2017 Top-10 security risks. The past years have
been ooded by news from the cybersecurity world: exposure of large amounts
of sensitive data (e.g., 17M of zomato accounts stolen in 2015 which were put
up for sale on a dark web marketplace only now in 2017), phishing attacks
(e.g., Google Docs in 2017), denial-of-service attacks such as the one experienced
last year by Twitter, The Guardian, Net ix, CNN and many other companies
around the world; or, the one that possibly stamped the year, the ransomware
attack which is still very fresh and kept hostage many companies, industries
and hospitals information. All of these attacks were able to succeed due to the
presence of security vulnerabilities in the software that were not tackled before
someone exploit them. Another interesting point reported by IBM is the large
number of unknown vulnerabilities (the so-called zero-day vulnerabilities), i.e.,
vulnerabilities that do not belong to any known attack type/surface or class
which can be harmful since developers have been struggling already with the
known ones.
      </p>
      <p>
        Most software development costs are spent on identifying and correcting
defects [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Several static analysis tools (e.g., Infer, Find Security Bugs, Symbolic
PathFinder, WAP, Brakeman, Dawnscanner and more) are able to detect
security vulnerabilities through a source code scan which may help to reduce the
time spent on those two activities. Unfortunately, their detection capability is
not the best yet (i.e., the number of false-negatives and false-positives is still
high) and sometimes even comparable to random guessing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Testing is one of the most important activities of software development
lifecycle since it is responsible for ensuring software's quality through the detection
of the conditions which may lead to software failures. In order to study and
improve these software testing techniques, empirical studies using real security
vulnerabilities are crucial [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to gain a better understanding of what tools are
able to detect [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Yet, performing empirical studies in software testing research
is challenging due to the lack of widely accepted and easy-to-use databases of
real bugs [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ] as well as the fact that it requires human e ort and CPU time
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Consequently, researchers tend to use databases of hand-seeded
vulnerabilities which di er inadvertently from real vulnerabilities and thus might not work
with the testing techniques under evaluation [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ]. Although there are databases
targeting security vulnerabilities test cases, only one of them contains real
vulnerabilities (Safety-db), the other ones are a mix of real and arti cial or even
only arti cial samples.
      </p>
      <p>This paper re ects the results from mining 248 projects from Github for 16
di erent patterns of security vulnerabilities and attacks which led to the creation
of Secbench, a database of real security vulnerabilities for several languages that
is being used to study a few static analysis tools. The main idea is to use our
database to test static analysis tools, determine the ones that perform better
and possibly identify points of improvement on them. Thus, developers may
be able to use the tools on the Continuous Integration and Continuous
Delivery (CI/CD) pipeline which will help decrease the amount of time and money
spent on vulnerabilities' correction and identi cation. With this study, we aim
to provide a methodology to guide mining security vulnerabilities and provide
database to help studying and improving software testing techniques. Our study
answers the next questions:
{ RQ1 Is there enough information available on open-source repositories to
create a database of software security vulnerabilities?
{ RQ2 What are the most prevalent security patterns on open-source
repositories?</p>
      <p>More information related to Secbench is available at https://tqrg.github.
io/secbench/. Our database will be publicly available with the vulnerable
version and the non-vulnerable version of each security vulnerability (i.e., the x of
the vulnerability).</p>
      <p>The paper is organized as follows: in Section 2, we present the existing related
work; in Section 3, we explain how we extracted and isolated security
vulnerabilities from Github repositories; in Section 4, we provide statistical information
about Secbench; in Section 5, we discuss results and answer the research
questions. And, nally, in Section 6, we draw conclusions and discuss brie y the
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>This section mentions the existing related work in the eld of databases created
to perform empirical studies in the software testing research area.</p>
      <p>
        The Software-artifact Infrastructure Repository (SIR) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] provides
both real and arti cial real bugs. SIR provides artefacts in Java, C/C++ and
C# but most of them are hand-seeded or generated using mutations. It is a
repository meant to support experimentation in the software testing domain.
      </p>
      <p>The Center for Assured Software (CAS) created arti cial test cases - Juliet
Test Suites - to study static analysis tools. These test suites are available through
National Institute of Standards and Technology (NIST). The Java suite has
25; 477 test cases for 112 di erent Common Weakness Enumerations (CWEs)
and the C/C++ suite has 61; 387 test cases for 118 di erent CWEs. Each test
case has a non- awed test which will not be caught by the tools and a awed
test which should be detected by the tools.</p>
      <p>CodeChecker is a database of defects which was created by Ericsson with
the goal of studying and improving a static analysis tool to possibly test their
own code in the future. The OWASP Benchmark is a free and open Java test
suite which was created to study the performance of automated vulnerability
detection tools. It counts with more than 2500 test cases for 11 di erent CWEs.
https://samate.nist.gov/SRD/testsuite.php
https://cwe.mitre.org/
https://github.com/Ericsson/codechecker
https://www.owasp.org/index.php/Benchmark#tab=Main</p>
      <p>
        Defects4j[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is not only a database but also an extensible framework for
Java programs which provides real bugs to enable studies in the software
testing research area. They started with a small database containing 375 bugs from
5 open source repositories. The researchers allow the developers to build their
framework on top of the program's version control system which adds more bugs
to their database. Safety-db is a database of python security vulnerabilities
collected from python dependencies. The developers can use continuous integration
to check for security vulnerabilities in the dependencies of their projects. Data is
to be analyzed by dependencies and their security vulnerabilities or by Common
Vulnerabilities and Exposures (CVE) descriptions and URLs.
      </p>
      <p>Secbench is a database of only real security vulnerabilities for several di
erent languages which will help software testing researchers improving the tools'
capability of detecting security issues. Instead of only mining the dependencies
of a project, we mine security vulnerabilities patterns through all the commits of
Github repositories. The test cases - result of the patterns mining - go through
an evaluation process which tells if it will integrate the nal database or not.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Extracting And Isolating Vulnerabilities From</title>
    </sec>
    <sec id="sec-4">
      <title>Repositories</title>
    </sec>
    <sec id="sec-5">
      <title>Github</title>
      <p>This section describes the methodology used to obtain real security
vulnerabilities, from the mining process to the samples evaluation and approval. The main
goal of this approach is the identi cation and extraction of real security
vulnerabilities xed naturally by developers on their daily basis work. The research
for new methodologies to retrieve primitive data in this eld is really important
due to the lack of databases with a considerable amount of test cases and lack
of variety for di erent defects and languages to support static analysis tools
studies.</p>
      <p>The rst step was the identi cation of a considerable amount of trending
security patterns (Section 3.1). Initially, the main focus was the Top 10 OWASP
2017 and other trending security vulnerabilities such as memory leaks and bu er
over ows which are not much prevalent between web applications. Thereafter,
more patterns were added and we still have place for much more. For each
pattern, there is a collection of words and acronyms which characterizes the
security vulnerability. These words were mined on commits' messages (syntactic
analysis), in order to nd possible candidates to test cases. Every time the tool
identi ed a pattern, the sample was saved on the cloud and the informations
attached (e.g., sha, url, type of security vulnerability) on the database. The
candidates' search to our database is performed automatically using a crawler
in Python responsible for matching our patterns with commits' messages.</p>
      <p>As seen in Figure 1, after saving the initial data, a manual diagnosis (Section
3.3) is performed on two di erent types of information retrieved by our tool:
https://github.com/pyupio/safety-db
https://cve.mitre.org/
{ Commit's message, to validate if the message actually represents the x
of vulnerability or a false-positive;
{ Source code, to identify if the pieces of code responsible for the potential
vulnerability and its x exist or not;</p>
      <p>Both are validated manually in order to integrate the nal database. If a
sample is totally approved, then its information will be updated on the database
and, consequently, the test case (Section 3.2) is added to the nal database.
3.1</p>
      <sec id="sec-5-1">
        <title>Patterns - Extracting/Detecting Vulnerabilities</title>
        <p>
          The goal was mining for indications of a vulnerability x or patch committed
by a developer on a Github project. The rst step was the identi cation of a
considerable amount of trending security patterns (Section 3.1) based on annual
security reports from IBM[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], OWASP[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and ENISA[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; cybersecurity news and
sites where common security vulnerabilities are reported (e.g., CVE and CWE).
In order to understand if the chosen patterns were prevalent on Github, Github
BigQuery and Github searches through the search engine were used which led
to a good perception of what patterns would be more di cult to collect.
        </p>
        <p>For each pattern, a regular expression was created joining speci c words from
its own domain and words highlighting a tackle. In order to represent the tackling
of a x, words such as x, patch, found, prevent and protect were used (Figure
2, Example 1). In certain cases, such as the pattern iap, it was necessary to
adjust this approach due to nature of the vulnerability. This pattern represents
the lack of automated mechanisms for detecting and protecting applications.
So, instead of the normal set, another words were used: detect, block, answer
and respond (Figure 2, Example 2). It was necessary to adapt the words to each
type of vulnerability. To really specify the patterns and distinguish between them
more speci c words were added. For example, to characterize cross-site scripting
vulnerability tokens like cross site scripting, xss, script attack and many others
were used.</p>
        <p>First, we tried to create patterns for the Top 10 OWASP 2017 and then
we extended the tool to others that can be found on our website: https:
//tqrg.github.io/secbench/patterns.html. Besides words related to each
pattern, we added to the miscellaneous pattern (misc) the identi cation of
dictionaries of common vulnerabilities or weaknesses (using regular expressions able
to detect the IDs: CVE, NVD or CWE) or any cases where the message contains
indications of a generic security vulnerability x.</p>
        <p>ID
ml
over
rl
dos
pathtrav
misc</p>
      </sec>
      <sec id="sec-5-2">
        <title>Pattern</title>
        <p>Memory Leaks
Over ow
Resource Leaks
Denial-of-Service
Path Traversal</p>
        <p>Miscellaneous
Every time a pattern is found in a commit by the mining tool, a test case is
created. The test case has 3 folders: Vfix with the non-vulnerable source code
from the commit where the pattern was caught (child), Vvul with the
vulnerable source code from the previous commit (parent) which we consider the real
vulnerability; and, Vdiff with two folders, added and deleted, where the added
lines to x the vulnerability and the deleted lines that represent the security
vulnerability are stored (Figure 3).
After obtaining the sample and its information, a manual diagnosis was
performed on two di erent kinds of information retrieved from Github (commit's
message and source code). For each single candidate, we evaluated if the message
really re ected indications of a vulnerability x because some of the combinations
represented by the regular expressions can lead to false positives, i.e., messages
that do not represent the actual vulnerability x. The example presented in
Figure 4 shows not only how the mining tool nds two candidates matching the
over pattern (red boxes) but also how those two samples were nally diagnosed.
The rst re ects a real security vulnerability (bu er over ow) but the second
one represents a CSS issue (i.e., not a security vulnerability). Thus, the second
example is pointed out as non-viable and automatically not considered for the
nal database.</p>
        <p>If the analysis succeeds ( rst message, Figure 4), then the code evaluation is
performed through the di source code analysis. Hopefully, the researcher is
capable of isolating manually the functions or problems in the code responsible for
the x and the vulnerability. During the study, several cases were inconclusive,
mainly due to the di culties in understanding the code structure or when the
source code did not re ect the message. Normally, these last cases were pointed
out has non-viable, except when there was something that could be the x but
the researcher did not get it. In that case, they were put on hold as a doubt
which means that the case needs more research.</p>
        <p>To validate the source code much research was made on books, security
cheatsheets online, vulnerabilities dictionary websites and many other sources of
knowledge. Normally, the process would be giving a rst look at the code trying
to highlight a few functions or problems that could represent the vulnerability
and then make a search on the internet based on the language, frameworks and
information obtained by the di . The example presented below is easy to
identify because the socket initialized in the beginning needs to be released before
the function returns on the two di erent conditions (line 282 and 292) otherwise
we have two resource leaks. It was not always like this, sometimes it was really
di cult to understand where the issues were due to the source code complexity.</p>
        <p>Besides the validation, the set of requirements presented below needs to be
ful lled, in order to approve a test case as viable to the nal test suite:
{ The vulnerability belongs to the class where it is being evaluated
If it does not belong to the class under evaluation the vulnerability its put on
hold for later study except if the class under evaluation is the Miscellaneous
class which was made to mine vulnerabilities that might not belong to the
other patterns; or, to catch vulnerabilities that may skip in other patterns
due to the limitations of using regular expressions in these cases.
{ The vulnerability is isolated</p>
        <p>We accepted vulnerabilities which additionally include the implementation
of other features, refactoring or even xing of several security vulnerabilities.
But the majority of security vulnerabilities founded are identi ed by the le
names and lines where they are positioned. We assume all Vfix is necessary
to x the security vulnerability.
{ The vulnerability needs to really exist</p>
        <p>Each sample was evaluated to see if it is a real vulnerability or not. During
the analysis of several samples commits that were not related to security
vulnerabilities and xes of vulnerabilities, i.e., not real xes were caught.
3.4</p>
      </sec>
      <sec id="sec-5-3">
        <title>Challenges</title>
        <p>These requirements were all evaluated manually, hence a threat to the validity as
it can lead to human errors (e.g., bad evaluations of the security vulnerabilities
and adding replicated samples). However, we attempted to be really meticulous
during the evaluation and when we were not sure about the security vulnerability
nature we evaluated with a D (Doubt) and with R (Replica) when we detected a
replication of another commit (e.g., merges or the same commit in other class).
Sometimes it was hard to reassign the commits due to the similarity between
patterns (e.g., ucwkv and upapi ). Another challenge was the trash (i.e., commits
that did not represent vulnerabilities) that came with the mining process due to
the use of regular expressions.
4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Empirical Evaluation</title>
      <p>In this section, we report the results that we obtained through our study and
answer the research questions.
4.1</p>
      <sec id="sec-6-1">
        <title>Database of Real Security Vulnerabilities</title>
        <p>This section provides several interesting statistics about Secbench that were
obtained during our study. Our database contains 682 real security vulnerabilities,
mined from 248 projects - the equivalent to 1978482 commits - covering 16
different vulnerability patterns (Tables 1 and 2).</p>
        <p>In order to obtain a sample which could be a good representative of the
population under study, it was ensured that the top 5 of most popular programming
languages on Github and di erent sizes of repositories would be covered. Due
to the large amount of Github repositories (61M ) and constant modi cation,
it is very complicated to have an overall of the exact characteristics that the
sample under study should have in order to, approximately, represent the
domain under study. According to a few statistics collected from the Github blog
and GitHut, some of the most popular programming languages on Github are
JavaScript, Java, Python, Ruby, PHP, CSS, C, C++, C# and Objective-C. We
tried to, mainly, satisfy the top 5 of most popular programming languages on
Github (i.e., with higher number repositories): JavaScript (979M ), Java (790M ),
Python (510M ), Ruby (498M ) and PHP (458M ). Other than covering the top
5, we also tried to have a good variety of repositories sizes since Github has
repositories from di erent dimensions. Our database has repositories with sizes
between 2 commits and 700M commits. It would be expected that the result of
mining larger repositories would easily lead to more primitive data. But since
the goal is to have a good representation of the whole Github, it is necessary to
also contain smaller repositories, in order to reach balanced conclusions and
predictions. Github has a wide variety of developers whose programming skills can
be good or bad. This can be a threat to test cases quality. But we are not able to
identify repositories quality in an automated way yet. Also, due to the structure
of Github, the main limitation that we were not able to tackle was dealing with
https://github.com/blog/2047-language-trends-on-github
http://githut.info/
samples with more than one parent. Sometimes in our manual diagnosis, we
detected samples with 4 or 5 parents (e.g., merges). We tacked the issue analyzing
each single parent to detect the one containing the potential vulnerability.</p>
        <p>Throughout our diagnosis process, we were able to identify several CVE
identi ers. Thus, 105 out of the 682 security vulnerabilities are identi ed using
the CVE identi cation system. These 105 vulnerabilities belong to 98 di erent
CVE classes for 12 di erent years (e.g., CVE-2013-0155 and CVE-2017-7620).
The identi er for weaknesses (CWE) was never identi ed through our manual
diagnosis which re ects the information retrieved by Github's search engine: only
12K of commits' messages containing CWE but 2M for CVE.</p>
        <p>Year</p>
        <p>
          SecBench includes security vulnerabilities from 1999 to 2017, being the group
of years between 2012 and 2016 the one with the highest value of accepted
vulnerabilities (especially 2014 with a percentage of 14.37%). This supports the
IBM's X-Force Threat Intelligence 2017 Report [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] where it was concluded that
in the last 5 years the number of vulnerabilities per year had a signi cant increase
compared with the other years. The decrease of security vulnerabilities in the
last 2 years, it de nitely does not re ect the news and security reports. However,
these reports contain all kinds of software and the study is only performed on
open-source software.
        </p>
        <p>The decrease can re ect the concerns of the core developers within making the
code public since the number of attacks is increasing and one of the potential
causes can be the code availability. Except for 2000, we were able to collect
test cases from 1999 to 2017. The last 5 years (excluding 2017) were the years
with the higher percentage of vulnerabilities. The sample covers more than 12
di erent languages being PHP (42:38%), C (23:75%), and Ruby (12:9%) the
languages with the higher number of test cases (Figure 7). This supports the
higher percentage of security vulnerabilities caught for injec (16:1%), xss (23:4%)
and ml (12:8%) - Figure 6 - since C is a language where memory leaks are
predominant and Ruby and PHP are scripting languages where Injection and
Cross-Site Scripting are popular vulnerabilities. Although the database contains
94 di erent languages, it was only possible to collect viable information for 12
di erent languages.
4.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Research Questions</title>
        <p>As mentioned before, there are several automated tools that can scan security
vulnerabilities on source code. Yet, their performance is still far from an
acceptable level of maturity. To nd points of improvement it is necessary to study
them using real security vulnerabilities. The primary data for this kind of studies
is scarce as we discussed on Section 2, so we decided to rst evaluate if there is
enough information on Github repositories to create a database of real security
vulnerabilities (RQ1). And if yes, what are the security patterns we can most
easily nd on open source repositories (RQ2).
- RQ1: Is there enough information available on open-source
repositories to create a database of software security vulnerabilities?
To answer this question, it was necessary to analyze the distribution of real
security vulnerabilities across the 248 mined Github repositories. As a result of our
mining process for the 16 di erent patterns (Table 4), 62:5% of the repositories
contain vulnerabilities (VRepositories ) and 37:5% contained 0 vulnerabilities.</p>
        <p>After mining the repositories, the manual evaluation was performed where
each candidate had to ful l a group of requirements (Section 3.3). As we can see
on Table 5, the percentage of success, i.e., repositories containing vulnerabilities,
decreases to 54:19%. The approximate di erence of 8% is due to the cleaning
process made through the evaluation process where a human tries to understand
if the actual code xes and represents a security vulnerability or not. Although
the decrease from one process to another, we can still obtain a considerable
percentage (&gt; 50%) of VRepositories containing real vulnerabilities.</p>
        <p>In the end, we were able to extract vulnerabilities with an existence ratio
of 2:75 (682=248). The current number of repositories on Github is 61M , so
based on the previous ratio we can possibly obtain a database of 168 millions
(167750K) of real security vulnerabilities which is 246 thousand (245968)
times higher than the current database. Between 2 and 3 months, we were able
to collect 682 real security vulnerabilities for 16 di erent patterns with a resulting
success of 54:19% of vulnerabilities accepted. Thus, we can conclude that it is
possible to extract a considerable amount of vulnerabilities from open source
software to create a database of real security vulnerabilities that will highly
contribute to the software security testing research area. Due to the constant
change and dimension of Github, the lack of information about the domain and
the small size of the sample under study, it may not be plausible to take this
conclusion. However, based on results obtained we believe the answer to this
question is indeed positive.
There are enough vulnerabilities available on open-source repositories to
create a database of real security vulnerabilities.
- RQ2: What are the most prevalent security patterns on open-source
repositories?
This research question attempts to identify the most prevalent security patterns
on open-source repositories.</p>
        <p>
          After mining and evaluating the samples, the results for 16 di erent patterns
were obtained being the two main groups the ones presented on Figure 8, Top
10 OWASP and others. xss (20:67%), injec (14:81%) and ml (12:46%) are the
trendiest patterns on OSS which is curious since injec takes the rst place on Top
10 OWASP 2017 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and xss the second. ml does not integrate into the top ten
because it is not a vulnerability normally found on web applications. Injection
and Cross-Site Scripting are easy vulnerabilities to catch since the exploits are
similar and exist, mainly, due to the lack of data sanitization which oftentimes
is forgotten by the developers. The only di erence between the two is the side
from where the exploit is done (server or client). Memory leaks exist because
developers do not manage memory allocations and deallocations correctly. These
kind of issues are one of the main reasons of dos attacks and regularly appeared
on the manual evaluations, even in the misc class. Although these three patterns
are easy to x, the protection against them is also typically forgotten. Another
prevalent pattern that is not considered is misc because it contains all the other
vulnerabilities and attacks found that do not belong to any of the chosen patterns
or whose place was not yet well-de ned. One example of vulnerabilities that you
can nd on misc (14:37%) are vulnerabilities that can lead to timing attacks
where an attacker can retrieve information about the system through the analysis
of the time taken to execute cryptographic algorithms. There is already material
that can possibly result in new patterns through the misc class analysis.
        </p>
        <p>Although auth (6:6%) is taking the second place on Top 10 OWASP 2017,
it was not easy to nd samples that resemble this pattern maybe because of
the fact that highlighting these issues on Github can reveal other ones in their
session management mechanisms and, consequently, leading to session hijacking
attacks. The csrf (4:99%) and dos (6:16%) patterns are seen frequently among
Github repositories: adding protection through unpredictable tokens and xing
several issues which lead to denial-of-service attacks. The most critical patterns
to extract are de nitely bac (0:29%), which detects unproved access to sensitive
data without enforcements; upapi (1:03%), which detects the addition of
mechanisms to handle and answer to automated attacks; and, smis (1:32%) involving
default information on unprotected les or unused pages that can give
unauthorized access to attackers. rl (1:76%) is another pattern whose extraction was
hard. Although, memory leaks are resource leaks, here only the vulnerabilities
related to the need of closing les, sockets, connections, etc, were considered.
The other patterns (e.g., sde, iap, ucwkv, over and pathtrav ) were pretty
common during our evaluation process and also on our Github searches. The over
pattern contains vulnerabilities for several types of over ow: heap, stack,
integer and bu er. Another interesting point here is the considerable percentage of
iap (2:49%), which normally is the addition of methods to detect attacks. This
is the rst time that iap makes part of the top 10 OWASP 2017 and still, we
were able to detect more vulnerabilities for that pattern, than for bac which was
already present in 2003 and 2004. From 248 projects, the methodology was able
to collect 682 vulnerabilities distributed by 16 di erent patterns.</p>
        <p>The most prevalent security patterns are Injection, Cross-Site Scripting
and Memory Leaks.
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions &amp; Future Work</title>
      <p>This paper proposes a database, coined Secbench, containing real security
vulnerabilities. In particular, Secbench is composed of 682 real security vulnerabilities,
which was the outcome of mining 248 projects - accounting to almost 2M
commits - for 16 di erent vulnerability patterns.</p>
      <p>The importance of this database is the potential to help researchers and
practitioners alike improve and evaluate software security testing techniques. We
have demonstrated that there is enough information on open-source repositories
to create a database of real security vulnerabilities for di erent languages and
patterns. And thus, we can contribute to considerably reduce the lack of real
security vulnerabilities databases. This methodology has proven itself as being
very valuable since we collected a considerable number of security vulnerabilities
from a small group of repositories (248 repositories from 61M ). However, there
are several points of possible improvements, not only in the mining tool but
also in the evaluation and identi cation process which can be costly and
timeconsuming.</p>
      <p>As future work, we plan to augment the amount of security vulnerabilities,
patterns and languages support. We will continue studying and collecting
patterns from Github repositories and possibly extend the study to other source
code hosting websites (e.g., bitbucket, svn, etc). We will also explore natural
processing languages, in order to introduce semantics and, hopefully, decrease
the percentage of garbage associated with the mining process.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. IBM:
          <article-title>IBM X-Force threat intelligence index 2017</article-title>
          .
          <article-title>Technical report</article-title>
          , IBM Security Department USA (March
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. OWASP: OSWAP top 10 -
          <fpage>2017</fpage>
          :
          <article-title>The ten most critical web application security risks</article-title>
          .
          <source>Technical report, The OWASP Foundation (February</source>
          <year>2017</year>
          )
          <article-title>Release Candidate</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tassey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The economic impacts of inadequate infrastructure for software testing</article-title>
          .
          <source>Technical report, National Institute of Standards and Technology (May</source>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Goseva-Popstojanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perhinschi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On the capability of static code analysis to detect security vulnerabilities</article-title>
          .
          <source>Inf. Softw. Technol</source>
          . 68(C) (
          <year>December 2015</year>
          )
          <volume>18</volume>
          {
          <fpage>33</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Briand</surname>
          </string-name>
          , L.C.
          <article-title>: A critical analysis of empirical research in software testing</article-title>
          .
          <source>In: First International Symposium on Empirical Software Engineering and Measurement (ESEM</source>
          <year>2007</year>
          ).
          <source>(Sept</source>
          <year>2007</year>
          )
          <volume>1</volume>
          {
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Briand</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labiche</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Empirical studies of software testing techniques: Challenges, practical strategies, and future research</article-title>
          .
          <source>SIGSOFT Softw. Eng. Notes</source>
          <volume>29</volume>
          (
          <issue>5</issue>
          ) (
          <year>September 2004</year>
          )
          <volume>1</volume>
          {
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Just</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jalali</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ernst</surname>
          </string-name>
          , M.D.:
          <article-title>Defects4j: A database of existing faults to enable controlled testing studies for java programs</article-title>
          .
          <source>In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. ISSTA</source>
          <year>2014</year>
          , New York, NY, USA, ACM (
          <year>2014</year>
          )
          <volume>437</volume>
          {
          <fpage>440</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elbaum</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rothermel</surname>
          </string-name>
          , G.:
          <article-title>Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact</article-title>
          .
          <source>Empirical Softw. Engg</source>
          .
          <volume>10</volume>
          (
          <issue>4</issue>
          ) (
          <year>October 2005</year>
          )
          <volume>405</volume>
          {
          <fpage>435</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Just</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jalali</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inozemtseva</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ernst</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fraser</surname>
          </string-name>
          , G.:
          <article-title>Are mutants a valid substitute for real faults in software testing?</article-title>
          <source>In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE</source>
          <year>2014</year>
          , New York, NY, USA, ACM (
          <year>2014</year>
          )
          <volume>654</volume>
          {
          <fpage>665</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pearson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Just</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abreu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ernst</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Keller, B.:
          <article-title>Evaluating and improving fault localization</article-title>
          .
          <source>In: ICSE 2017, Proceedings of the 39th International Conference on Software Engineering</source>
          , Buenos Aires, Argentina (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <source>ENISA: Enisa threat landscape report 2016. Technical report, European Union Agency for Network and Information Security (January</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>