<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference on Applied Machine Learning in Information Security (CAMLIS), October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Threat Class Predictor: An explainable framework for predicting vulnerability threat using topic and trend modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>François Labrèche</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serge-Olivier Paquette</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Every day, an increasing number of new software is found to be vulnerable to exploitation. Such vulnerabilities are disclosed through publicly available databases, such as the National Vulnerability Database (NVD). However, the rate of disclosures now far outpaces the ability of any single research team or remediation team to handle them all. In this paper, we present a framework that not only predicts the vulnerabilities that will be exploited by malicious actors or malware, but also which vulnerabilities can go under the radar, escaping the trending discussions of online cybersecurity communities. This is achieved by leveraging topic modeling in a novel way, combining a threat score and a trend score. The interpretable nature of such topic models enables security teams to dig deeper into the predictions of our model, making it a valuable tool for their remediation and investigative work.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Attack prediction</kwd>
        <kwd>Exploit prediction</kwd>
        <kwd>Vulnerability prioritization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>We present an explainable machine learning framework to predict threats associated with
disclosed vulnerabilities and better inform security professionals on potentially overlooked
critical vulnerabilities. We first apply topic modeling to vulnerability descriptions to build a
semantic representation of vulnerabilities. Using this representation, we train a multi-label
threat prediction classifier for recently disclosed vulnerabilities. The model provides two
independent threat predictions; a probability of either having a proof-of-concept/weaponized
exploit code published, and/or of being included in malware. We combine these to obtain a
threat score for each vulnerability. This score can be used to prioritize the remediation or
investigation of vulnerabilities.</p>
      <p>We also use the same topic model to create a novel trend score from online infosec discussions.
This trend score, used in conjunction with the threat score, can inform security researchers
on where to focus their attention, i.e., on the most interesting and potentially overlooked
vulnerabilities. We do this by joining the two independent scores, the threat score and the
trend score, visually, in a two-dimensional plane. Given the interpretable nature of topic models
and our novel visual representation, we believe that our framework brings new value to the
cybersecurity community by ofering a method of prioritizing investigative work.</p>
      <p>Our contributions are the following:
• We build a semantic representation of vulnerabilities based on the underlying concepts of
all descriptions, which represents them in a more holistic way than what was previously
done.
• Using this new representation, we compute an explainable threat score and trend score.
• We provide a threat dashboard which helps visualizing vulnerability trends in relation to
the likelihood of an attack leveraging them.</p>
      <sec id="sec-1-1">
        <title>The rest of the paper is as follows:</title>
        <p>1. Section 2 presents prior work done in predicting threats and exploit publication using
machine learning.
2. Section 3 describes the methodology used for this approach. The corresponding results
are presented as the methodology is discussed. We first present the topic model, used
both by the threat model and the trend model, before presenting each model respectively.</p>
        <p>This section closes with the combination of the two scores in a visual dashboard.
3. Section 4 explores the trained models, their features and their explainable nature.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        A number of previous studies have built exploit prediction models using vulnerability features [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,
2, 3</xref>
        ], such as the CVSS score and its sub-components, the Common Weakness Enumeration
(CWE), the references, the description and the vulnerable products. While feature encodings vary,
they all use supervised machine learning trained on NVD data to predict vulnerabilities labeled
with exploits. Suciu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] employ a similar approach, but with the goal to predict over time
the likelihood that a functional exploit will be developed. Other approaches [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] explore using
social network data and dark web discussions as additional features to predict the likelihood of
an exploit targeting a vulnerability. Huang et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] use Latent Dirichlet Allocation (LDA) to
identify important words through six topics built on vulnerability descriptions, combined with
a classifier that labels tweets as cybersecurity-related or not. Xiao et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] employ community
detection over botnet IP activity to identify if a vulnerability is being exploited. Additionally,
however, Bullough et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] identify key methodological errors in some of these previous
works, most notably incorrect metrics used for evaluating an imbalanced dataset. Finally, others
model the vulnerability description to predict the publication of an exploit or an attack, such as
using tf-idf [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], neural networks [11], deep learning with CNNs [12] or a BERT pre-trained
model [13, 14]. In this work, we build a vulnerability representation using topic modeling, which
we then use to predict multiple threat classes and identify trending vulnerabilities. Contrary to
previous approaches employing deep learning on vulnerability descriptions, our use of topic
modeling provides an explainable framework which can provide insights into how diferent
types of threats are linked to vulnerabilities.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology and Results</title>
      <sec id="sec-3-1">
        <title>3.1. Topic Model</title>
        <p>We obtain a topic model by training LDA [15] on the textual descriptions of 152,585 published
vulnerabilities from the 1st of January 2008 to the 1st of August 2022. We prepare the corpus by
removing all stop words, common words, and URLs. We lemmatize and tokenize the documents
to obtain a bag-of-words representation to feed to the model. The number of topics is selected
using a coherence score [16], a measure to compute the strength of the similarity of words
inside a topic. A coherence score provides a robust way to evaluate topic models, in regards to
interpretability by humans [17]. We obtain an optimal model with 30 topics, 50 iterations and
10 passes.</p>
        <p>
          With this trained topic model, we now have a list of 30 topic probabilities   = (1 , ..., 3 0)
with real numbers  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] and ∑︀30
        </p>
        <p>=1  = 1 representing each topic probability  for every
vulnerability   in our dataset. Examples of six extracted topics, visualized as word clouds with
weighted words, are presented in Figure 2. Each topic corresponds to a set of words, where
larger (higher probability) words are more salient inside the topic.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Threat Class Prediction Model</title>
        <p>3.2.1. Feature Selection
We build a threat class predictive model1, using the topics from the descriptions above, and
details from vulnerability disclosures on the National Vulnerability Database (NVD)2 as features.
Categorical features are encoded as dummy variables.</p>
        <p>
          The list of additional features used is the following and follows previous works[
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref7">1, 5, 2, 3, 4, 7</xref>
          ]:
• The length of the description,
• The number of references available for the vulnerability at the time of publication,
• The number of software configurations afected by this vulnerability,
• The CVSSv2 score,3
• The CVSSv2 metrics.
3.2.2. Dataset
As previously mentioned, the dataset used to build our features is the NVD. The two threat
classes that we predict are exploit publication and malware inclusion. These two classes have
been chosen because they represent key cybersecurity threats and labels for them can be found
openly. Although they do overlap, they do not do so completely. Each threat class uses its own
datasets for labels. Exploit publications are labeled from exploitDB, Packetstorm and a Github
repository listing POCs4, all of which are publicly available. ClamAV [18] signatures are used
used
1Patent pending
2https://nvd.nist.gov/
3There is a larger body of vulnerabilities published with a CVSSv2 score, however, the CVSSv3 score can also be
        </p>
        <sec id="sec-3-2-1">
          <title>4https://github.com/nomi-sec/PoC-in-GitHub/blob/master/README.md</title>
          <p>for malware labels, which are also publicly available. We join these signatures to a database of
malware threat intelligence reports from the Counter Threat Unit™ (CTU)5.</p>
          <p>The datasets are summarized in Table 1. There are 835 vulnerabilities that overlap between the
exploits and malware labels. The classes are highly imbalanced and are not mutually exclusive,
hence we train two independent classifiers, which both output a probability between 0 and 1.
To obtain a single threat score, we add their outputs to obtain a value between 0 and 2.
3.2.3. Model Selection
We train a classifier for each of the two threat classes. We tested a Logistic Regression model, a
Support Vector Machines and a Random Forest classifier, using 10-fold cross-validation. Table 2
shows the results for each of the three classifiers.</p>
          <p>By far, the random forest model exhibits the best performances. To compensate for the class
imbalance, we used class-weight optimization and threshold-moving based on the F2-score, a
performance metric that optimizes for recall on the minority class, which is suitable for our
need. Threshold-moving lets us choose the threshold on which to assign the model output class
using the class probability. A high recall measures the ability of the model to predict the positive
class and to avoid false negatives, but at the price of potentially having more false positives,
which is an acceptable cost for identifying true attacks.</p>
          <p>An important note is that our label datasets, while suficient for training models that can
correctly identify a majority of our samples, do not include all exploits and malware samples in
5Although we obtained better results using the CTU™ database, one can get similar but slightly lower results
with the ClamAV database alone, or potentially with other public sources.
the wild, hence the true precision of the model is likely higher given that many false positives
are in fact true positives.</p>
          <p>Using grid search over the model parameters and 10-fold cross validation, we obtained a final
model for which the performance and parameters are presented in Table 3.
3.2.4. Features Must be Chosen Carefully
Our initial prediction model, which was discarded, included a number of time-sensitive features
inspired by the literature :
• The published date of the vulnerability,
• The date of its last modification,
• The number of online discussions related to the vulnerability.</p>
          <p>Although this version of the model performed better in our training phase, with a higher
accuracy and recall than our current model, it performed poorly in a real implementation by
not predicting any instances of the positive classes. After investigation, three of the top five
most impactful features were time-sensitive features, which skewed our model to better predict
older vulnerabilities (i.e., newly published vulnerabilities rarely have a modification date and
the publication date is always recent). In the end, including time-sensitive features was found
irrelevant to our task of predicting threats for new vulnerabilities, and were discarded, even
though the model performed better when evaluated on historical data.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Trend Model</title>
        <p>
          We then compute a trend score  ( ,  ) ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] for each published vulnerability   on day
 based on how closely its computed description topics match those in online discussions
from that day. This numerical value is obtained by generating trending topics using the same
LDA model previously trained for the threat class predictor.
        </p>
        <p>Each day, we apply the topic model to a set of relevant online
discussions, social media posts and dark web forum posts related to
hacking and cybersecurity, in order to obtain an average for each
topic value over all posts in a 30 day time window.
3.3.1. Dataset
We obtain these discussions through the Twitter API, Reddit API
and Flare6 API, a data provider who specializes in crawling dark
web forums7. In the first 6 months of 2022, we searched for the
following keywords on Twitter, Reddit and 90 dark web forums:
CVE-2013 to CVE-2022, #infosec #vulnerability, #infosec #exploit. We
searched the hashtag keywords only on Twitter, and in pairs, in
order to avoid noise and unrelated comments. Out of this, we
obtained 512,347 tweets, 13,114 dark web forum posts and 36,598
Reddit posts mentioning CVE ids or hashtag pairs.
3.3.2. Obtaining a Stable Trend Score
Every day, we apply the LDA model to each sample, obtaining a
topic weight vector  = (1, ..., 30). To obtain a raw trend value
for a day ˆ we average over all  topic vectors for that day .</p>
        <p>Online discussions
ˆ = 1 ∑︁  = 1 (∑︀=1 1 , ..., ∑︀=1 30)</p>
        <p>This process gives a trend vector of dimension 30 for each day, indicating the relevance of
each topic to infosec discussions for that day. In order to dampen the variability between each
day and to encode the momentum of evolving trends, we instead use a 30-day rolling average
of the trend vector for each day.</p>
        <p>= 1
30</p>
        <p>∑︁
=− 30</p>
        <p>ˆ = ( , ..., 30)</p>
        <p>
          The daily trend score of a single vulnerability ( ,  ) ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] is obtained by computing
the dot product of the 30-day averaged trend vector  = ( , ..., 30) with the vulnerability
topic weight vector   = (1 , ..., 3 0), which is a real number between 0 (not matching online
discussions) and 1 (perfectly matching online discussions).
        </p>
        <p>( ,  ) =</p>
        <p>30
1 ∑︁  *</p>
        <p>30</p>
        <p>=
6https://flare.systems/
7The most important ones are the exploit.in forum, xss.is, pediy, nulled.to and RaidForums.
(1)
(2)
(3)
A simplified version of this process is graphically presented in Figure 5.</p>
        <p>Post 1</p>
        <p>Post 2</p>
        <p>Post 3
3.3.3. Combining the Trend Scores and Threat Scores
lower half of the graph are those who do not match the trending online discussions. We believe
those are the most interesting vulnerabilities for a researcher.</p>
        <p>The following vulnerabilities have been correctly identified as having exploits published:
CVE-2022-342658, CVE-2022-349189, CVE-2022-3179510. These vulnerabilities had exploits
available outside of our datasets, and were identified by our prediction model. Additionally, the
following vulnerability was identified in malware after its prediction: CVE-2022-22047 11. An
example of a vulnerability closely matching currently trending topics of remote code execution
vulnerabilities is also shown: CVE-2022-35872. While some of the vulnerabilities identified are
false positives, the total number of vulnerabilities to investigate has been considerably lowered
and true attacks were successfully identified.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <sec id="sec-4-1">
        <title>4.1. An Explainable Framework</title>
        <p>In this section, we show how a human can understand the decisions made by the models
described in this approach. Our threat class predictor uses our generated topics as input features
when predicting specific types of threats. For this reason, we can explore which concepts drive
the fitted model, through the topics that come out as top features. More importantly, these
topics vary per fitted model: they change depending on the type of threat we wish to predict.
Below are shown, in Figure 9 and Figure 10, the top features for each model.</p>
        <sec id="sec-4-1-1">
          <title>8https://github.com/aeyesec/CVE-2022-34265</title>
          <p>9https://www.openwall.com/lists/oss-security/2022/07/05/1
10https://research.nccgroup.com/2022/05/27/technical-advisory-fujitsu-centricstor-control-center-v8-1unauthenticated-command-injection/</p>
          <p>11https://www.forbes.com/sites/daveywinder/2022/07/15/new-0day-hack-attack-alert-issued-for-all-windowsusers</p>
          <p>As can be observed, when predicting exploit publications, the number of references, the
number of vulnerable configurations and the length of the description impact the model. The
six most impactful topics, along with their most salient words, are:
1. Topic 22 - Parameter, Plugins and SQL injections (plugin, wordpress, injection, php,
parameter, sql, admin)
2. Topic 29 - Google and OAuth Vulnerabilitiess (prior, google, extension, convince, vector,
agent, unknown, storage)
3. Topic 26 - Cross-Site Scripting (XSS) vulnerabilities (page, cross, html, site, script, xss,
store, javascript)
4. Topic 23 - Denial of Service (DOS) vulnerabilities (service, cause, denial, null, pointer,
dereference, crash, craft)
5. Topic 17 - Web vulnerabilities (request, http, web, perform, forgery, unauthenticated, csrf,
craft)
6. Topic 8 - Vulnerabilities centered around network attacks (series, interface, network,
device, management, dos)</p>
          <p>The top topic, understandably, refers to command injections, which is a common way of
exploiting a vulnerability. We see as other topics more common techniques used in public
exploits.</p>
          <p>The top features used to predict malware are diferent, with Topic 21 about Windows handles
appearing as most impactful. The following topics are the most influential in the prediction of
vulnerabilities included in malware:
1. Topic 21 - Vulnerabilities including the use of Windows handles (object, window, engine,
exists, handle, git, dll)
2. Topic 6 - PDF vulnerabilities (module, update, upgrade, pdf, reader, zone)
3. Topic 4 - Heap and bufer overflow vulnerabilities ( heap, corruption, function, overflow,
bufer, stack )</p>
          <p>Vulnerabilities centered around the exploitation of processes are more impactful when
predicting malware, which contrasts with the prediction of exploits where specific types of
attacks influence the prediction model.</p>
          <p>The explainability of the framework goes even further, as one can obtain the trend score
of a given vulnerability for each topic. A security researcher can thus explore the topics of a
vulnerability or set of vulnerabilities and identify if it appears overlooked in online infosec
discussions. A vulnerability can also be identified as part of a hype wave with respect to certain
semantic characteristics.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this research, we presented a coherent and explainable framework to predict the threat
associated with a vulnerability, both from an exploitability perspective and from a semantic
tendency perspective. Our results showcase vulnerabilities with a high likelihood of being
included in real attacks that may appear overlooked by the cybersecurity community. The
results of this paper show that we can easily achieve this using mainly open source data, with
well-known and interpretable techniques.
Proactive identification of exploits in the wild through vulnerability mentions online, in:
2017 International Conference on Cyber Conflict (CyCon US), IEEE, 2017, pp. 82–88.
[11] Y. Fang, Y. Liu, C. Huang, L. Liu, Fastembed: Predicting vulnerability exploitation possibility
based on ensemble machine learning algorithm, Plos one 15 (2020) e0228439.
[12] A. Okutan, M. Mirakhorli, Predicting the severity and exploitability of vulnerability
reports using convolutional neural nets, in: 2022 IEEE/ACM 3rd International Workshop
on Engineering and Cybersecurity of Critical Systems (EnCyCriS), IEEE, 2022, pp. 1–8.
[13] J. Yin, M. Tang, J. Cao, H. Wang, Apply transfer learning to cybersecurity: Predicting
exploitability of vulnerabilities by description, Knowledge-Based Systems 210 (2020)
106529.
[14] J. Yin, M. Tang, J. Cao, H. Wang, M. You, Y. Lin, Vulnerability exploitation time prediction:
an integrated framework for dynamic imbalanced learning, World Wide Web 25 (2022)
401–423.
[15] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, Journal of machine Learning
research 3 (2003) 993–1022.
[16] M. Röder, A. Both, A. Hinneburg, Exploring the space of topic coherence measures, in:
Proceedings of the eighth ACM international conference on Web search and data mining,
2015, pp. 399–408.
[17] J. Chang, S. Gerrish, C. Wang, J. Boyd-Graber, D. Blei, Reading tea leaves: How humans
interpret topic models, Advances in neural information processing systems 22 (2009).
[18] T. Kojm, Clamav, 2004.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bozorgi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Saul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Savage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Voelker</surname>
          </string-name>
          ,
          <article-title>Beyond heuristics: learning to classify vulnerabilities and predict exploits</article-title>
          ,
          <source>in: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Edkrantz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Said</surname>
          </string-name>
          ,
          <article-title>Predicting cyber vulnerability exploits with machine learning</article-title>
          .,
          <source>in: SCAI</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Romanosky</surname>
          </string-name>
          , I. Adjerid,
          <string-name>
            <given-names>W.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <article-title>Improving vulnerability remediation through better exploit prediction</article-title>
          ,
          <source>Journal of Cybersecurity</source>
          <volume>6</volume>
          (
          <year>2020</year>
          )
          <article-title>tyaa015</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Suciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bao</surname>
          </string-name>
          , T. Dumitras,
          <article-title>Expected exploitability: Predicting the development of functional vulnerability exploits</article-title>
          ,
          <source>arXiv preprint arXiv:2102.07869</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sabottke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Suciu</surname>
          </string-name>
          , T. Dumitras, ,
          <article-title>Vulnerability disclosure in the age of social media: Exploiting twitter for predicting {Real-World} exploits</article-title>
          ,
          <source>in: 24th USENIX Security Symposium (USENIX Security 15)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1041</fpage>
          -
          <lpage>1056</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tavabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almukaynizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shakarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          , Darkembed:
          <article-title>Exploit prediction with neural language models</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.-Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ban</surname>
          </string-name>
          ,
          <article-title>Monitoring social media for vulnerability-threat prediction and topic analysis</article-title>
          ,
          <source>in: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1771</fpage>
          -
          <lpage>1776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dumitras</surname>
          </string-name>
          ,
          <article-title>From patching delays to infection symptoms: Using risk profiles for an early discovery of vulnerabilities exploited in the wild</article-title>
          ,
          <source>in: 27th USENIX Security Symposium (USENIX Security 18)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>903</fpage>
          -
          <lpage>918</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Bullough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Yanchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Zipkin</surname>
          </string-name>
          ,
          <article-title>Predicting exploitation of disclosed software vulnerabilities using open-source data</article-title>
          ,
          <source>in: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Almukaynizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dharaiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Senguttuvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shakarian</surname>
          </string-name>
          , P. Shakarian,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>