<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computer Security Training Recommender for Developers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Nadeem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward B. Allen</string-name>
          <email>edward.allen@msstate.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Byron J. Williams</string-name>
          <email>byron.williams@msstate.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Security, Experimentation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering Mississippi State University Mississippi State</institution>
          ,
          <addr-line>MS</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <abstract>
        <p>Vulnerable code may cause security breaches in software systems resulting in loss of confidential data and financial losses for the organizations. Software developers must be given proper training to write secure code. Conventional training methods do not take the code written by the developers over time into account, which makes these training sessions less effective. We propose a Computer Security Training Recommender to help identify focused and narrow areas in which developers need training. The proposed recommender system leverages the power of static analysis techniques to suggest the most appropriate training topics for different software developers; moreover it utilizes public vulnerability repositories to suggest community accepted solutions to different security problems. This paper presents an architecture of the proposed recommender system and a proof-of-concept case study. We found that vulnerabilities, flagged in source code by static analysis tools, can be mapped to relevant articles in a vulnerability repository. Hence, the mitigation strategies given in such articles may be used as a resource to train individual software developers. Preliminary empirical evaluation shows that the proposed system is promising.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender system</kwd>
        <kwd>security vulnerabilities</kwd>
        <kwd>training</kwd>
        <kwd>CWE</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>D.4.6 [Security and Protection]: Information flow</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>The proposed recommender system is based on the static analysis of
code written by software developers over time. Static analysis tools
e.g., FindBugs report suspected security and other vulnerabilities
present in source code. These results may be used as the basis for
recommending the most appropriate training topics to the software
developers who contributed to writing those modules, hence
improving their skill in terms of software security.</p>
      <p>
        We utilize the valuable knowledge in vulnerability repositories such
as Common Weakness Enumeration, CWE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is contributed
by software security experts across the globe, and is available for
public use for free.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. PROPOSED ARCHITECTURE</title>
      <p>
        The proposed architecture of the recommender system, which is an
extension to our previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], is shown in Figure 1 below.
      </p>
      <sec id="sec-3-1">
        <title>Software Code Repository</title>
      </sec>
      <sec id="sec-3-2">
        <title>Static Code Analysis Module</title>
      </sec>
      <sec id="sec-3-3">
        <title>Developers’ Performance Assessment Module</title>
      </sec>
      <sec id="sec-3-4">
        <title>Recommendation Module</title>
      </sec>
      <sec id="sec-3-5">
        <title>Training Delivery Module</title>
      </sec>
      <sec id="sec-3-6">
        <title>IDE plugin</title>
      </sec>
      <sec id="sec-3-7">
        <title>Developer</title>
        <p>M
e
t
a
d
a
t
a</p>
      </sec>
      <sec id="sec-3-8">
        <title>Custom training database</title>
      </sec>
      <sec id="sec-3-9">
        <title>Email and</title>
        <p>others</p>
        <p>Public
vulnerability
repositories</p>
      </sec>
      <sec id="sec-3-10">
        <title>Video streaming</title>
        <p>The Software Code Repository contains code and metadata which is
analyzed by the recommender system.</p>
        <p>The Static Code Analysis Module contains static analysis tool(s)
which scan the given code repository to find vulnerabilities.
The Developers’ Performance Assessment Module creates a profile
for each developer containing the description of vulnerabilities
induced in the code over time.</p>
        <p>
          A Public vulnerability repository contains information on
vulnerabilities, potential mitigations, and consequences etc.
Examples are NVD [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and CWE [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>A Custom training database refers to a database of custom designed
training modules on various topics.</p>
        <p>The Recommendation Module calculates the similarity scores
between vulnerability descriptions and the articles from
vulnerability repositories.</p>
        <p>The Training Delivery Module delivers the training modules to the
developers.</p>
        <sec id="sec-3-10-1">
          <title>Vulnerability types in</title>
        </sec>
        <sec id="sec-3-10-2">
          <title>Security Category</title>
          <p>Cross site scripting
vulnerability
Hardcoded constant database
password</p>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>HTTP Response splitting vulnerability</title>
      </sec>
      <sec id="sec-3-12">
        <title>Non-constant SQL string passed to execute method</title>
        <sec id="sec-3-12-1">
          <title>Recommended CWE Articles with the titles</title>
          <p>CWE 079 Improper Neutralization of Input During Web Page Generation
CWE 644 Improper Neutralization of HTTP Headers for Scripting Syntax
CWE 522 Insufficiently Protected Credentials
CWE 259 Use of Hard-coded Password
CWE 256 Plaintext Storage of a Password
CWE 640 Weak Password Recovery Mechanism for Forgotten Password
CWE 798 Use of Hard-coded Credentials
CWE 261 Weak Cryptography for Passwords
CWE 620 Unverified Password Change
CWE 309 Use of Password System for Primary Authentication
CWE 187 Partial Comparison
CWE 260 Password in Configuration File
CWE 258 Empty Password in Configuration File
CWE 263 Password Aging with Long Expiration
CWE 113 Improper Neutralization of CRLF Sequences in HTTP Headers
CWE 650 Trusting HTTP Permission Methods on the Server Side
CWE 644 Improper Neutralization of HTTP Headers for Scripting Syntax
CWE 089 Improper Neutralization of Special Elements used in SQL stmt.
CWE 484 Omitted Break Statement in Switch</p>
        </sec>
        <sec id="sec-3-12-2">
          <title>TFIDF</title>
          <p>Score
1.0470
0.4749
0.7739
0.7359
0.6858
0.6342
0.5803
0.5737
0.5231
0.5142
0.5003
0.4536
0.3816
0.3794
1.5627
0.6569
0.5360
0.4768
0.3839</p>
        </sec>
        <sec id="sec-3-12-3">
          <title>Relevance</title>
        </sec>
      </sec>
      <sec id="sec-3-13">
        <title>High</title>
        <p>Medium
High
High</p>
        <p>High
Medium</p>
        <p>High
Medium
Low
Low</p>
        <p>Low
Medium
Medium
Low
High
High
High
High
Low</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. EVALUATION</title>
      <p>We conducted a proof of concept case study to evaluate the
feasibility of mapping vulnerabilities to articles in CWE repository.</p>
      <sec id="sec-4-1">
        <title>Category</title>
        <sec id="sec-4-1-1">
          <title>Bad practice</title>
          <p>Correctness
Dodgy code
Experimental
Internationalization
Malicious code vulnerability
Multithreaded correctness
Performance
Security
Total</p>
          <p>Flagged Total
vulnerability types Occurrences
25 195
19 64
20 226
2 31
1 105
8 323
1 1
15 220
4 14
95 1179
An open source system, Tolven 2.0, having 418K lines of code in
2957 Java modules, was the target source code. It has been used in
other studies [5] as well. For static code analysis, we used the open
source tool FindBugs 2.0.3. Every category listed in Table 2 has a
number of flagged vulnerability types, e.g., there are 4 types of
security vulnerabilities with 14 occurrences in total. A short
description of each type in security category is given in Table 3.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Vulnerability types in Security Category</title>
        <p>Cross site scripting vulnerability
Hardcoded constant database password
HTTP Response splitting vulnerability
Non constant SQL string passed to execute method
Total</p>
      </sec>
      <sec id="sec-4-3">
        <title>Occurrences</title>
        <p>11
1
1
1
14
We used Vector Space Model (VSM) with term frequency-inverse
document frequency (TFIDF) [4] weights to calculate the similarity
between the vulnerability descriptions and CWE articles. Out of 95
flagged vulnerability types listed in Table 2, 81 were successfully
mapped to the CWE articles. Due to space limitation, we only list
the CWE articles mapped for each vulnerability in “security”
category in Table 1. TFIDF score and manually calculated
relevance by subject expert are also shown for each CWE article.
4. CONCLUSION AND FUTURE WORK
The proof of concept case study demonstrated the practical
feasibility of automatically finding relevant security articles based
on source code vulnerabilities. Though, the current implementation
uses only the CWE articles, however other vulnerability databases
e.g., National Vulnerability Database, NVD, host useful data related
to security checklists, security related software flaws,
misconfigurations, and impact metrics. By utilizing the NVD
database along with CWE articles, the scope of our recommender
system may expand. Another short term goal is to use more than
one static code analysis tools so that false positive detections may
be reduced by comparing the outputs of multiple tools.
2.2,</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Muneer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nadeem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>E. B.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Recommending Training Topics for Developers Based on Static Analysis of Security Vulnerabilities</article-title>
          ,
          <source>In Proceedings of the 52nd ACM Southeast Conference</source>
          , (Kennesaw,
          <string-name>
            <surname>GA</surname>
          </string-name>
          , Mar.
          <fpage>28</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2014</year>
          ).
          <source>ACMSE'14</source>
          . ACM New York, NY.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] CWE, Common Weakness Enumeration, A CommunityDeveloped Dictionary of Software Weakness Types</article-title>
          , http://cwe.mitre.org, accessed Apr.
          <volume>30</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>NVD</surname>
          </string-name>
          , National Vulnerability Database https://nvd.nist.gov,
          <source>accessed Apr 25</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>