Toward Semantic Assessment of Vulnerability Severity: A Text Mining Approach Yongjae Lee and Seungwon Shin Korea Advanced Institute of Science and Technology Daejeon, Republic of Korea {ylee.cs, claude}@kaist.ac.kr ity data and show that our method can sift out more critical vulnerabilities effectively. Abstract 1 Introduction A security vulnerability is a flaw in software A vulnerability is a flaw which exists in either hard- or hardware systems that an adversary could ware or software systems and can be used to threaten exploit to compromise resources. Despite the the systems [ALRL04]. A vulnerability itself is not a never ending effort to reduce and prevent problem unless an adversary exploits it for the pur- the vulnerabilities, its number has been con- pose of making the systems fail in terms of security. stantly increasing until today. To deal with In other words, the vulnerability can be used by ma- the vulnerabilities that are increasingly found licious people to violate the systems’ important secu- in diverse systems, various methods to priori- rity properties such as Confidentiality, Integrity, and tize and manage the vulnerabilities have been Availability (CIA) [ALRL04]. Therefore, swiftly find- proposed. The de facto standard method used ing and patching the vulnerabilities is one of the most to assess and prioritize the vulnerability based significant concerns for hardware or software manufac- on severity is using CVSS (Common Vulner- turers, security software vendors, and researchers. ability Scoring System), and many organiza- Unfortunately, it is so labor intensive and time con- tions have been using this system for vulnera- suming to fix the ever increasing vulnerabilities, and bility management. However, CVSS is limited thus people want to prioritize vulnerabilities to know in that it only takes some properties (e.g., ease much more critical ones. For example, we can decide to of exploit, impact, etc.) of a vulnerability into fix remotely exploitable vulnerabilities prior to locally account when measuring severity, and hence, exploitable ones because the former can be easily ex- CVSS scores are often considered inaccurate ploited by most attackers. To this end, the vulnerabil- or impractical. In this paper, we present a ities are managed in a systematic manner where they semantic approach to assess the severity of are given a unique ID and stored in a central database, vulnerabilities by ranking them. Our ranking NVD (National Vulnerability Database) [NIS17], and method uses the relational information of how their severity is assessed by a severity assessment sys- strongly two vulnerabilities are related or sim- tem. ilar to each other. With this ranking method, Common Vulnerabilities and Exposures we try to find which vulnerability has more (CVE) [MSR06, MIT18] is the most popular common characteristics than others, since we vulnerability management scheme, which is operated believe that if a vulnerability has more com- by NVD. If someone finds and asks to register a mon and popularly used characteristics, then newly discovered vulnerability, NVD issues a unique the vulnerability is likely to attract more at- ID (CVE identifier) to the vulnerability. Once the tack trials. Based on this insight, we evalu- vulnerability is registered to NVD, one can check ate our ranking method with real vulnerabil- its related information by the issued CVE identifier, Copyright © CIKM 2018 for the individual papers by the papers' which includes a short description, references, a list of authors. Copyright © CIKM 2018 for the volume as a collection affected products, and a severity score. In particular, by its editors. This volume and its papers are published under the severity score is evaluated by Common Vulnera- the Creative Commons License Attribution 4.0 International (CC BY 4.0). bility Scoring System (CVSS) [FIR17], the de facto To get the rank of each CVE, we employ the TextRank standard to quantify a vulnerability’s severity. With algorithm [MT04], and it is an unsupervised ranking CVSS score, one can sort vulnerabilities from highest algorithm that can summarize and extract important to lowest, which helps prioritize the vulnerabilities to sentences or words within a text. fix. To initially evaluate our proposed ranking ap- However, many researchers have argued that the proach, we have collected real CVE data and apply our CVSS does not account for what security experts per- method to the data. In addition, we compare our rank- ceive in the wild [AM12, AM14]. For example, a ing results with CVSS score to understand whether our vulnerability with a low CVSS score is ranked in a approach can clearly reflect real world opinions. Our higher position in bug bounty programs [MM16], and initial results show that our approach provides much the CVSS scores of randomly selected vulnerabilities more realistic (and reasonable) ranking results than are not correlated well with severity scores manu- CVSS. ally evaluated by experts in the security field [HA15]. More specifically, let’s take Heartbleed as an example. 2 Method Heartbleed (CVE-2014-0160) is one of the most well- known vulnerabilities that received worldwide atten- Before we give the detailed explanation on our vul- tion lately. It is an implementation flaw of OpenSSL, nerability ranking method, we illustrate our system the most used open source encryption library and TLS overview in Figure 1. Our method operates in three implementation [Ope18]. This vulnerability can make phases: (1) corpus building, (2) graph building, and servers leak confidential data including the encryption (3) vulnerability ranking. Our method is a text- key of the servers, which makes the problem much oriented vulnerability ranking, and thus we need a lot worse, but its CVSS score is just 5.0 out of 10.0 with of text descriptions about CVEs. Fortunately, NVD Medium severity level. compiles related information in a database and allows to access the information freely, and we glean CVE de- In this paper, we present a semantic approach to as- scriptions from NVD to build CVE description corpus. sess the severity of vulnerabilities (specifically CVEs) After building the corpus, our method generates a vul- by analyzing descriptions for a CVE. There are many nerability ranking graph where a vertex represents a text descriptions for a CVE, such as NVD entries, se- CVE. In the graph, vertices are linked to one another curity blog posts, and manufacturers’ web bulletins, when there is a certain relation between two CVEs. and such text descriptions explain how to exploit the We will discuss the relation that the CVEs can have CVE and what kind of damage can be caused if the in the following sections. Once we completed the graph CVE is exploited by attackers. Since those descrip- building, we run TextRank algorithm on the graph to tions commonly present various characteristics about obtain importance scores of the CVEs, by which the the CVE in human readable natural language, we can CVEs are sorted. glean insightful information from them with the help of Natural Language Processing (NLP) techniques. 2.1 Vulnerability ranking To this end, we first collect text descriptions il- lustrating the characteristics of CVEs from various Ranking model Our vulnerability ranking method sources: NVD, blogs, and web bulletins. Next, we is based on TextRank [MT04], which is a graph-based extract information from the text descriptions, which and unsupervised ranking model. TextRank summa- includes the type of product where a CVE is found, rizes a text by ranking sentences in the text according which version of the product that has the CVE, to their importance and singling out a set of higher whether there exists an easy-to-use exploit for the ranked sentences. In the model, a sentence is rep- CVE, and so forth. Once such information is ex- resented as a vertex, and two sentences are linked to tracted, then we apply a ranking method to under- each other if they share similar contents, or words. Al- stand the severity of CVEs clearly. Based on the ex- though TextRank is an application of Google’s PageR- tracted information, our ranking method first tracks ank [BP98] to text summarization, the two ranking how strongly CVEs are related or similar to one an- methods are different in that TextRank operates on other. This relation can reveal whether characteristics an undirected graph. This is because, unlike web of a CVE are also shared by other CVEs or not. Fi- pages, sentences do not have explicit reference re- nally, our ranking method sorts CVEs in order, i.e., a lations. Therefore, TextRank cannot use the graph CVE with more common characteristics will be ranked structure information which denotes that a node votes higher. The intuition behind our ranking method is another one. Instead, TextRank assigns a similarity that if characteristics of a CVE are more general, weight on each link between two nodes, and they ex- which means that they could be commonly/widely change the weight when calculating the importance adopted by attackers, then the CVE is more serious. score. regex: (CVE-\d{4}-\d{4,7}) Preprocessor Graph Builder Rank CVE CVE Description Sentence 1 CVE-YYYY-XXXX Boundary CVE-YYYY-XXXX Sentence 1, Sentence 2 … Linker NVD Detection 2 CVE-YYYY-XXXX CVE-YYYY-XXXX Sentence 1, Sentence 2 … Link Weight 3 CVE-YYYY-XXXX PoS Tagging … … Evaluator … … Corpus Building Graph Building and Ranking Figure 1: Vulnerability ranking system overview In the ranking graph G=(V, E), where V is the set The (1) TLS and (2) DTLS implementations in of vertices and E is the set of edges, let’s assume that OpenSSL 1.0.1 before 1.0.1g do not properly handle there is a vertex Vi . For Vi , let In(Vi ) and Out(Vi ) be Heartbeat Extension packets, which allows remote at- the set of predecessors of Vi and the set of successors tackers to obtain sensitive information from process of Vi , respectively. In addition, if there is a vertex Vj memory via crafted packets that trigger a buffer over- that belongs to In(Vi ), the similarity weight between read, as demonstrated by reading private keys, related Vi and Vj is defined as wji . Then, the importance to d1 both.c and t1 lib.c, aka the Heartbleed bug. score of the vertex Vi can be computed as below: Figure 2: Description of CVE-2014-0160 (Heartbleed) X wji attackers and what kind of damages can be caused af- IS(Vi ) = (1 − d) + d ∗ P IS(Vj ) ter the vulnerability is successfully exploited. In other Vj ∈In(Vi ) Vk ∈Out(Vj ) wjk words, in the CVE description, the characteristics of (1) the CVE are expressed in natural language, and the where d is a damping factor, which denotes the prob- presence of common characteristics between two ver- ability (1 - d ) for a random surfer on the graph to tices determines whether they are linked together or jump from a vertex to another one randomly [BP98]. not. Notice that the text description cannot be used In this model, the importance score of a vertex is dis- directly to be drawn as a vertex and needs to pass tributed to its successors proportionally to the simi- the predefined preprocessing steps to be converted to larity weight. Therefore, a vertex that is similar to a bag-of-words such as part-of-speech tagging, lemma- majority of other vertices within the graph tends to tization, and so forth. have a higher importance score. How to define similarity between two vulner- In our vulnerability ranking problem, we believe abilities? For two CVEs to share similar character- that a vulnerability that has similar properties with istics can be defined as having similar words in both all kinds of vulnerabilities is important and thus needs of their bags-of-words simultaneously. If the two bags- to be fixed first. This is because, if such a vulnerability of-words describing the two CVEs have similar words, is found in a hardware or software product, it means we compute the similarity between them by employ- that the vulnerability makes the product have broad ing text similarity measures such as Jaccard index or attack surfaces. In other words, the product can be TF-IDF cosine similarity [BYRN99]. In this work, we attacked in various ways. Here, for two vulnerabilities employ Jaccard index [Pau12] as presented in Equa- to have similar properties means that they can be used tion 2, where X and Y are the sets of unique words by similar types of attacks or violate one of the CIA that constitute each bag-of-words of the two vulnera- triads alike. bility descriptions. How to represent a vulnerability in a graph? In our ranking graph, a vertex represents the short de- |X ∩ Y | |X ∩ Y | scription of a CVE, which is less than 10 sentences and JaccIndex(X, Y ) = = |X ∪ Y | |X| + |Y | − |X ∩ Y | recorded for every CVE in NVD. For example, a ver- (2) tex labeled with CVE-2014-0160 represents the text Using this metric, we can measure how similar two description presented in Figure 2 which is excerpted CVEs are. For instance, we present three CVEs and from NVD 1 . From content words in the description, their descriptions in Figure 3 and summarize their we can grasp how the vulnerability can be exploited by similarities in Table 1. Since both CVE-2015-0311 1 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE- and CVE-2015-7645 are vulnerabilities of Adobe Flash 2014-0160 Player and affect the same operating systems (i.e., MS Unspecified vulnerability in Adobe Flash Player CVE ID CVSS Keyword CVE-2014- 5.4 Android, X.509 certificate, SSL, MITM through 13.0.0.262 and 14.x, 15.x, and 16.x through 5694 16.0.0.287 on Windows and OS X and through CVE-2014- 7.5 Use-after-free, DOM, Google Chrome, DoS, 11.2.202.438 on Linux allows remote attackers to exe- 3169 remote attack cute arbitrary code via unknown vectors, as exploited CVE-2014- 5.4 7Sage LSAT Prep - Proctor, Android, 6707 X.509 certificates, SSL, MITM, spoof in the wild in January 2015. CVE-2014- 7.5 Integer overflows, Blink, Google Chrome, (a) CVE-2015-0311 1741 remote attackers, DoS CVE-2014- 5.0 Heimdal, Apple OS X, remote attackers, 1316 DoS, Kerberos 5 Adobe Flash Player 18.x through 18.0.0.252 and 19.x CVE-2014- 4.3 Multiple directory traversal, McAfee, re- through 19.0.0.207 on Windows and OS X and 11.x 2536 mote authenticated users CVE-2014- 6.4 Multiple directory traversal, SeedDMS, re- through 11.2.202.535 on Linux allows remote attackers 2279 mote authenticated users, read arbitrary to execute arbitrary code via a crafted SWF file, as files, .. (dot dot) in the logname parame- exploited in the wild in October 2015. ter CVE-2014- 5.4 GittiGidiyor, Android, X.509 certificates, (b) CVE-2015-7645 5836 SSL, MITM, spoof CVE-2014- 6.8 CSRF, Admin Web UI, IBM Lotus Protec- 0885 tor for Mail Security, remote authenticated Unspecified vulnerability in Oracle MySQL Server users, unknown vectors 5.6.23 and earlier allows remote authenticated users CVE-2014- 5.4 Bouncy Bill, Android, X.509 certificates, to affect availability via unknown vectors related to 5780 SSL, MITM, spoof Server : Security : Privileges. Table 2: Top 10 CVEs generated by our ranking (c) CVE-2015-2567 method and their CVSS scores and keywords. CVSS Figure 3: Three random CVEs registered in 2015 and score ranges from 0.0 (not severe) to 10.0 (the most their NVD descriptions. severe) and it is increased by 0.1. Windows and Apple OS X), they are the most similar specify the whole description about each of the CVEs vulnerability pair among the three vulnerability pairs. but present some keywords in the table. Next, CVE-2015-0311 and CVE-2015-2567 are similar Taking a look at Table 2, we know that most of to each other because they have the same property the CVEs are related to an X.509 public key certifi- that remote attackers can exploit the vulnerabilities cate problem and can cause some remote attacks such via unknown vectors. In consequence, CVE-2015-0311 as Man-In-The-Middle (MITM) attacks and Denial of has various factors to be exploited by attackers such as Service (DoS) attacks. In addition, the CVEs are re- Adobe Flash Player, OS, and unknown attack vectors, ported to be found in widely used software products and we can conclude that it should be handled earlier including Android, Google Chrome, and Apple OS X. than others. In summary, our ranking method ranks CVEs higher, which (1) are related to a security hole (e.g., certifi- 3 Evaluation cate verification bypass), (2) are found in popularly used products (e.g., Android) , and (3) can cause well- To evaluate our ranking method, we randomly selected known types of attacks (e.g., MITM and DoS). 100 CVEs from NVD, which were registered in 2014, Note that, while the CVSS score for CVE-2014-5694 and then constructed a small corpus consisting of the is lower than that for CVE-2014-3169, CVE-2014-5694 100 CVE descriptions. The descriptions are converted is located at a higher position than CVE-2014-3169 to bags-of-words, and the conversion requires prepro- by our ranking method. This is because the keywords cessing that normally consists of sentence boundary contained in the description of CVE-2014-5694 (i.e., detection, stop word removal, and lemmatization. Af- Android, X.509, SSL, etc.) are commonly found in ter that, we run TextRank algorithm on the CVE de- other CVEs’ descriptions more frequently than those scription corpus and obtain the rank of the 100 CVEs. of CVE-2014-3169 (i.e., DOM, Google Chrome, DoS), In Table 2, we present the CVEs ranked in the top 10 which has an impact on the weights of the links to out of 100. Due to the page limitation, we could not which the CVE is connected. CVE pair Jaccard index CVE-2015-0311 & CVE-2015-7645 0.6182 4 Discussion CVE-2015-0311 & CVE-2015-2567 0.2223 CVE-2015-7645 & CVE-2015-2567 0.1132 Although our ranking method gives a new rank of the vulnerabilities, reflecting other facets that were Table 1: Pairs of the three CVEs registered in 2015 never used for assessing the vulnerabilities’ severity, and their Jaccard index the ranking method has issues to be addressed fur- ther. First, using NVD descriptions, we make nodes References of the vulnerability ranking graph. However, the de- [ALRL04] A. Avizienis, J. C. Laprie, B. Randell, and scriptions are a short text and provide limited infor- C. Landwehr. Basic concepts and taxon- mation. On the contrary, there are useful sources from omy of dependable and secure computing. which we can glean detailed information about vul- IEEE Transactions on Dependable and Se- nerabilities. For example, one can search Microsoft cure Computing, 1(1):11–33, Jan 2004. Security Bulletins [Mic18] for web documents explain- ing about newly discovered or patched vulnerabili- [AM12] Luca Allodi and Fabio Massacci. A prelim- ties in their own words, and many researchers and inary analysis of vulnerability scores for at- practitioners post security-related information on their tacks in wild: The ekits and sym datasets. own blogs [Fee18]. In addition, exploits, which are In Proceedings of the 2012 ACM Workshop a set of commands to infringe a system using a vul- on Building Analysis Datasets and Gather- nerability, are archived with related information in a ing Experience Returns for Security, BAD- database [Sec18], and thus we can collect another type GERS ’12, pages 17–24, New York, NY, of information that explains how to use the vulnera- USA, 2012. ACM. bility from the viewpoint of practitioners. Furthermore, to measure the similarity of given two [AM14] Luca Allodi and Fabio Massacci. Compar- vulnerabilities, we use Jaccard index based on their ing vulnerability severity and exploits us- CVE descriptions. However, we can extend our sim- ing case-control studies. ACM Trans. Inf. ilarity measure, considering not only such topologi- Syst. Secur., 17(1):1:1–1:20, August 2014. cal semantics but also distributional semantics such [BP98] Sergey Brin and Lawrence Page. The as word embeddings. In addition, it is not sufficient to anatomy of a large-scale hypertextual web measure two vulnerabilities’ similarity only consider- search engine. In Proceedings of the Sev- ing the textual descriptions because there are other enth International Conference on World factors to determine whether given two vulnerabili- Wide Web 7, WWW7, pages 107–117, Am- ties are similar to each other, which may not be ex- sterdam, The Netherlands, The Nether- pressed in the descriptions. For instance, there are lands, 1998. Elsevier Science Publishers B. bug bounty programs that are operated by many or- V. ganizations such as Google, Mozilla, and Facebook, and they give a bounty for a bug or a vulnerability [BYRN99] Ricardo A. Baeza-Yates and Berthier to the bug discoverer. Then, we can assume that two Ribeiro-Neto. Modern Information Re- vulnerabilities are similar if their bounties are set in a trieval. Addison-Wesley Longman Publish- similar level. ing Co., Inc., Boston, MA, USA, 1999. 5 Conclusion [Fee18] Feedspot. Top 100 information security blogs for data security professionals. http: In this work, we present a semantic way to assess / / blog . feedspot . com / information _ vulnerabilities by examining their textual descriptions security_blogs, 2018. from which we can grasp characteristics of the vulnera- bilities. We then build the vulnerability ranking graph [FIR17] FIRST. Common vulnerability scoring by representing each vulnerability’s characteristics as system. https://www.first.org/cvss, a node, which are expressed in natural language, and 2017. run the TextRank algorithm on the graph to obtain the rank of the vulnerabilities. As our future work, we [HA15] Hannes Holm and Khalid Khan Afridi. An will address the issues discussed in Section 4 to im- expert-based investigation of the common prove the performance of our ranking method to the vulnerability scoring system. Computers & degree to which security experts and practitioners can Security, 53:18 – 30, 2015. agree with our ranking result. To this end, we are [Mic18] Microsoft. Microsoft security bulletins. going to carry out expert-based performance evalua- https : / / technet . microsoft . com / tion for our ranking method, inspired by the existing en-us/security/bulletins.aspx, 2018. research work [HA15]. [MIT18] MITRE. Common vulnerabilities and ex- Acknowledgment posures. https://cve.mitre.org/, 2018. This research is (in part) based on the work supported by Samsung Research, Samsung Electronics. [MM16] Nuthan Munaiah and Andrew Meneely. Natural Language Processing, EMNLP ’04, Vulnerability severity scoring and boun- 2004. ties: Why the disconnect? In Proceedings of the 2nd International Workshop on Soft- [NIS17] NIST. National vulnerability database. ware Analytics, SWAN 2016, pages 8–14, https://nvd.nist.gov/, 2017. New York, NY, USA, 2016. ACM. [Ope18] OpenSSL Software Foundation. Openssl. [MSR06] Peter Mell, Karen Scarfone, and Sasha https://www.openssl.org/, 2018. Romanosky. Common vulnerability scor- [Pau12] Jaccard Paul. The distribution of the ing system. IEEE Security and Privacy, flora in the alpine zone. New Phytologist, 4(6):85–89, November 2006. 11(2):37–50, 1912. [MT04] Rada Mihalcea and Paul Tarau. TextRank: [Sec18] Offensive Security. The exploit database. Bringing order into texts. In Proceedings of https://www.exploit-db.org/, 2018. the 2004 Conference on Empirical Methods