<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrei Lima Queiroz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susan Mckeever</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Keegan</string-name>
          <email>brian.x.keegang@tudublin.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Applied Intelligence Research Centre, Technological University Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cyber security initiatives are nding new approaches to mitigating threats against the computational infrastructure of companies. One of these approaches is the use text mining techniques and classi cation models to detect potentially malicious messages or posts in hacker communications. This is a di cult task due the ambiguity and the strong use of technical vocabulary inherent in such posts. This paper aims to evaluate the use of robust language models for feature representation of input to downstream classi cation tasks of hacker communication posts. We perform the experiment against ve hacker forum datasets using a variety of language models: two Word Embeddings (Word2vec and Glove), and three Sentence Embeddings (Sent2vec, InferSent and SentEncoder). We conclude that, for this task, only Sentence Embeddings enhance the performance of SVM classi cation models compared to traditional language models (Bag-of-words, word/char n-grams). Additionally, we found that models using CNN improves upon SVM models by achieving 93% of positive recall and 96% of average class accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>Cybersecurity</kwd>
        <kwd>Threat intelligence</kwd>
        <kwd>Text Mining</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>Sentence Embeddings</kwd>
        <kwd>SVM</kwd>
        <kwd>CNN</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the age of information, hackers are taking advantage of communication
channels on social media for sharing security information about computational assets.
They participate on these channels by either o ering tools that might be used to
promote attacks against computer systems or by simply exchanging information
related to current hacking techniques and software vulnerabilities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], hacker forums are operating a parallel economy for selling
and buying malicious tools, generating an estimated revenue of at least $600
billion per year. As a result, cyber security researchers have been focusing recently
on the creation of classi cation models that will detect whether such forum posts
present a potential threat or not [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. With this collected information, a Chief
Security O cer (CSO) can then strategically mitigate risks and potential threats
against computational infrastructure of companies.
      </p>
      <p>
        In the creation of these models, their accuracy is tied in with nding an
appropriate combination of algorithms and input features. Determining which
combination brings the best performance for each di erent task is a matter of
experimentation and evaluation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        For this reason, this paper aims to analyse di erent combinations of
classi cation algorithms and pre-trained language models. For classi cation
algorithms, we use Support Vector Machine (SVM) and Convolutional Neural
Network (CNN) due their good performance in text classi cation tasks [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref20">20</xref>
        ];
For language models, we use pre-trained Word Embeddings (WEMB), including
Word2vec and Glove, as well as Sentence Embeddings (SEMB), with Sent2vec,
InferSent and SentEncoder as our chosen models.
      </p>
      <p>
        The embedding models are considered an evolution over the classical language
models (such as Bag-of-Words (BoW) and Word/Char N-grams). A principal
characteristic of embedding models is the ability to capture semantic word (or
sentence) information based on capturing contextual word usage as part of the
embedding training task. As a result, they achieve better detection performance
in a range of downstream text analysis tasks, such as spam detection [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], abusive
content detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and news categorisation [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Our goal in this paper is to investigate the performance of embedding models
in detecting hacker threats in online forums. We compare the resulting models
with our previous experimental work which was performed on classic language
models. We also publish the ve hacker forum datasets that we have labelled
and used in our work. This paper is organised as follows: In Section 2, we review
works that performed downstream tasks with WEMB and SEMB in a variety
of domains. In Section 3 we describe our approach, including the datasets used,
methodology, algorithm and features representation used in this experiment.
in Section 4, we present the results of the baseline models. In Section 5, we
present the results for all con guration of models. In Section 6, we discuss the
experiment results comparing them with the baseline results. Finally, in Section
7, we summarise the contribution of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The WEMB and SEMB models have been widely used in many di erent
downstream task. As an example, in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the authors of Sent2vec have shown that
their model can outperform traditional feature representation for the tasks of
sentiment analysis.
      </p>
      <p>
        Also, in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the models achieved better results by using Word2vec
WEMB models than traditional language model, in sentiment analysis and spam
detection tasks respectively. The authors in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] also highlight that that SVM
+ Word2vec have achieved slightly better results compared to the CNN models.
Similarly, in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], the authors provided results of a model for news categorisation
which has also used SVM + Word2vec, with this combination achieving better
performance compared to Neural-networks when trained with small datasets.
      </p>
      <p>
        However, embeddings have not improved the performance of models in all
domains. As seen in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the authors have used Sent2vec in their model to
classify adverse drugs reaction mentions. However it did not overcome their
baseline model using SVM with BOW.
      </p>
      <p>
        Within the security domain, the authors in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] compared traditional SVM
models with CNN for a similar task as set out by this paper. They conclude
that traditional classi ers and features representations have highly comparable
results with Neural Network based classi ers and WEMB models. In their work,
they apply dataset labels using key-word matching. In our work, we present ve
hacker forum datasets, with a robust expert multi-labeller approach. We view the
dataset labelling as a critical component for producing robust real-life models.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>In this section we describe the datasets, methodology, learning algorithms and
feature representations used in our analysis. The experiments were performed
using Python 3.6.0v programming language, and Sci-kit 0.20.2v, Keras API for
TensorFlow 1.15.0v as libraries.
3.1</p>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>We refer to the ve datasets as D1 to D5 respectively. They are publicly available
in http://tiny.cc/8ws67y. They contain social media posts from surface web,
deep web and dark web, including forum, micro blogs, and hacker marketplace.
These posts are related to technical and personal references to computing,
security, internet services, and technology. A minority proportion refer to malicious
activities in software products or have mentioned security problems in software
( aws, vulnerabilities).</p>
        <p>In the preparation of our data, we performed a labelling task in which
computer science domain experts determine whether posts relate to
softwarevulnerability-related communication or not. Analysing the dataset, we noted
that posts can be ambiguous and di cult to assign as a binary task. To address
this, each message was labelled by three domains experts. The labelling tasks
results in the following classes:
{ Yes, for posts that appear as malicious posts of vulnerabilities in software
assets.
{ No, for posts not related to hacker activity or are out of the scope of our
research (Data breach, copyrighted software cracked, stolen accounts and
credit card accounts).
{ Undecided, for posts that the labeller does not have enough information
or con dence to mark as Yes or No.</p>
        <p>The nal label was determined by the majority-of-votes rule, reducing the
risk of individual human subjectivity. In Table 1 we present examples of these
posts and their nal labels.</p>
        <p>The MSG-1, marked as Yes, is related to a type of vulnerability (Stack Bu er
Over ow) a ecting a software product. Message MSG-2, also marked as Yes, is
related to a release of a Proof Of concept (PoC) of a vulnerability called dirtycow.
The posts MSG-3 and MSG-4 are related to personal opinion and have no direct
relation to real vulnerabilities in software. Despite MSG-3 and MSG-4 having
hack and hacker keywords, they are not considered malicious communication,
and are thus marked as No. In MSG-5, there is not enough information to decide
whether either the ssh scan tool is vulnerable or can be used against a vulnerable
software. Likewise in MSG-6, we cannot con rm that the error mentioned leads
to a vulnerability into the sneaker software product, thus they are marked as
Undecided. We acknowledge that the model will only be as good at detecting
hacker posts as the knowledge of the labellers. For this reason, labellers who
understand the ambiguity and subtlety of the posts are a critical component in
our work.</p>
        <p>Finally, for this experiment, we have included the Undecided posts as positive
instances since they represent a risk category of posts needing further inspection
in a real-life application of the resultant model. The details and description of
each dataset can be seen in Table 2 as following:</p>
        <p>D1 - Cracking Arena Forum - This was one of the largest hacker forums
existing in 2018 with 11,977 active users. It contains communication related to
security issues in computing, which makes the data suitable to cyber security
research on the interaction patterns among cyber criminals. The posts range
from April 2013 to February 2018.</p>
        <p>D2 - Twitter Security Experts - The data contains posts from 12
securityexpert users on Twitter. 6 of whom are well-known-security experts with an
average number of followers of 18,800, and, the other 6 are of the lesser-known
security experts, with an average number of followers of 1,100. Their Tweets are
mostly related to security aspects of technology, including software
vulnerabilities and hacking. They have one year range from March 2016 to March 2017.</p>
        <p>D3 - Dream Market - One of the largest marketplaces with 91,463 posted
products from 2,092 sellers in 2016, this is a well-known place for selling illegal
products such as illicit drugs, fake IDs, stolen credit card numbers and
copyrighted software. It also advertises hacker products used in malicious hacker
activities. This marketplace can be accessed only via the ToR network. The
posts range from April 2013 to April 2017.</p>
        <p>D4 - Garage4Hackers Forum - This is a medium-sized forum in terms of
number of content and users. This forum contains material related to
exploitation, botnets and reverse engineering, it also provides information regarding
specialised hacking tools. The posts range from July 2017 to September 2017.</p>
        <p>D5 - Cracking Fire Forum - This forum has approximately 14,511 active
users. Some of the posts contains pieces of source code of a variety of languages,
which is aimed to perform malicious operations, such as compromise online social
media accounts. The posts range from April 2011 to February 2018.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Methodology</title>
        <p>In the following, we describe the evaluation, metrics and sampling techniques
applied to all models in this work.</p>
        <p>
          Evaluation: We use 10-fold cross validation to divide our training and test data
in each dataset as done in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In addition, for the random partitioning of the
datasets into the 10 folds, we use strati ed folds, such that the ratios of positive
instances to negative instances per fold matches the ratio of the full dataset.
Metrics: Our approach is to produce classi cation models that will assess user
posts as potential threat communication or not. In this context, the impact of
a false negative (FN), or non-detection of threat communication, is higher than
the impact of a false positive (FP), or threat communication being detected as
normal communication. Under these circumstances, our model is prioritising the
classi cation of the positive classes (threat communication) rather than negative
class (regular communication). As a result, we de ne the Recall (1) of positive
classes as the principal metric.
        </p>
        <p>We acknowledge that a model with high rate of FP (also known as false alarm)
is not desirable either, as it implies that a model is wrongly detecting a threat
where there is none. When this situation occurs, either a time-consuming expert
investigation will be needed or unnecessary security actions will be taken. For
this reason we also need to evaluate the average class accuracy (2). Additionally,
this metric is suitable for imbalanced datasets as it prevents the majority class
from dominating the results.</p>
        <p>Recall =</p>
        <sec id="sec-3-2-1">
          <title>T rueP ositive</title>
          <p>T rueP ositive + F alseN egative
Avg:ClassAcc =</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Recall(pos:Class) + Recall(neg:Class) N o:Classes</title>
          <p>
            (1)
(2)
Random Oversampling the Positive Instances: As seen in Table 2, all
datasets have imbalanced class representations, with the positive class
underrepresented relative to the negative class. In order to solve this problem, we
arti cially increase the number of positive instances (training data only) as done
in [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ] and [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. This is a proven method for improving recall measure. Using the
random oversampling technique, we increase the number of positive instances
using optimal proportions identi ed for these datasets previously [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. After
resampling, the new ratio of instances is shown in Table 3.
We apply two di erent types of classi cation algorithms for this experiment,
one traditional (SVM) and the other a Neural Network-based (CNN). Both
algorithms are commonly used for text classi cation, having produced high accuracy
models [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ][
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
          </p>
          <p>
            Support Vector Machine (SVM): This is a supervised learning algorithm
used for classi cation tasks and is based on the maximal margin principle. It
is also known for achieving favourable performance with high dimensional data
and text classi cation [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. SVM is no longer considered rst choice for text
classi cation in several tasks such as spam detection [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], sentiment analysis
[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], online hate speech detection [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] and in our case, the detection of software
vulnerability communications in hacker forums [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. In this paper, we are using
the SVM with linear kernel.
Outliers
Maintaining
Excluding
Convolution Neural Network (CNN): The architecture used for CNN in
this experiment is based on [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] and optimal settings provided by [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. For the
input representation, the choice was Glove and Word2vec. As this model requires
a xed input size, we have used the length of the longest posts (in each dataset)
excluding the outliers maximum size. As seen in Figure 1, this strategy presented
a slight improvement over maintaining the outliers. For CNN + Glove model,
excluding the outliers has improved in 3 of 5 datasets (D2, D3 and D5), and for
CNN + Word2vec, it has improved in 2 of 5 datasets (D2 and D5). For D3 and
D4, no visible improvement is noted.
          </p>
          <p>Model = CNN+Glove</p>
          <p>Model = CNN+Word2vec
D1</p>
          <p>D2</p>
          <p>D3
Datasets</p>
          <p>D4</p>
          <p>D5</p>
          <p>D1</p>
          <p>D2</p>
          <p>D4</p>
          <p>D5</p>
          <p>
            D3
Datasets
Furthermore, we apply zero-padding to ensure that the same input length for
short posts is achieved. In addition, we used Adam optimiser, categorical cross
entropy loss function and softmax as an output layer. The rest of the parameters
can be seen in Table 4.
Classical Language models: Bag-of-words (BoW), Word n-grams and Char
n-grams are commonly used as text representation for several text mining and
classi cation tasks. With BoW, the sentence (in our case the post) is split into
a set of tokens (words), then each token is counted to produce a vector that
represents the entire sentence. Words n-grams and Char n-grams are also
tokenbased, however, the number of word/characters representing a tokens is de ned
by n, where n 1. For Word and Char n-grams, we are using a range of n = (1; 4)
tokens. The main di erence between BoW and N-grams models is that the latter
encodes a degree of word of sequence information when n &gt; 1. While char
ngrams is better for representing rare words and morphological variants.
Word Embeddings (WEMB) models: The WEMB models we are using in
this experiment are Word2vec [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] and Glove [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. Word2vec is a model based
on a three-layer neural network, while Glove is provided by the co-occurrence
of words in a corpus. Di erent from the classical bag of words and n-grams
model, these are known for mapping words together according to their semantic
or syntactic similarity. However, these models su er from the same problem as
BoW. When each word in a post is translated to a vector, word sequence in the
post is lost.
          </p>
          <p>
            In order to represent the entire message of the dataset with xed-length
WEMB vector, we use a simple technique called averaging, which has shown
positive performance compared to more complex approaches [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. This technique
consists in averaging vectors of the pre-trained WEMB model for each word of
the sentence (post messages). The description of the pre-trained models used in
this experiment can be seen in Table 5.
          </p>
          <p>
            Sentence Embeddings (SEMB) models: SEMB models are categorised
regarding their creation method which falls into three categories: Unsupervised,
Supervised and Multi-task, with the latter being a combination of the
supervised and unsupervised. In this work, we are using one pre-trained SEMB of
each category, they are: Sent2vec [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] (unsupervised), InferSent [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] (supervised)
and SentEncoder [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] (multi-task).
          </p>
          <p>Similar to WEMB models, SEMB are known for mapping the entire sentences
together according with their similarity (not only the words). However, the main
improvement over WEMB is that it considers the order in which the words occur
in a sentence. The description of the pre-trained models used in this experiment
can be seen in Table 5.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Baseline - Experiment using classical language models</title>
      <p>
        The baseline results are reported in Table 6. These results build on previous
experimental work carried out in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], for which we have extended with two
more datasets (D4 and D5). To create the model, the SVM algorithm was used
with three di erent feature representations, BoW, Word n-grams (1,4) and Char
n-gram (1,4).
      </p>
      <p>We see that, on average, the same score is recorded for BoW and Char
ngram, 0.75 and 0.55 of average class accuracy and positive recall, respectively.
Word N-grams feature representation achieved the poorest performance with
0.68 and 0.38 of average class accuracy and positive recall, respectively.</p>
    </sec>
    <sec id="sec-5">
      <title>Experiment using WEMB and SEMB language models</title>
      <p>The results for each con guration of the model is described in Table 7. They are
grouped into the following three categories: SVM+Word Embeddings (MDL-1
and MDL-2); SVM+Sentence Embeddings (MDL-3, MDL-4 and MDL-5); and
CNN+ WEMB (MDL-6 and MDL-7). A description for each con guration can
be seen in Table 5.</p>
      <p>It was found that the best model is MDL-7 with CNN and Word2vec con
guration, which achieved 0.96 of average class accuracy, 0.93 of positive recall. This
performance is close to the second best model, MDL-6 with CNN and Glove, 0.95
of average class accuracy and 0.92 of positive recall. If we consider only models
from the SVM+Word Embeddings category, such as MDL-1 (Word2vec) and
MDL-2 (Glove), we see that they have recorded comparable results, with 0.58
and 0.62 of average class accuracy, and 0.23 and 0.33 of positive recall,
respectively. It is important to notice that both have performed poorly on positive
recall, although MDL-2 with Glove achieved a marginally better performance
compared to MDL-1 in both metrics.</p>
      <p>
        Moreover, if we consider only SVM+Sentence Embeddings models, such as
MDL-3 (Sent2vec), MDL-4 (InferSent) and MDL-5 (SentEncoder), we see that
MDL-5 has achieved the best result, with 0.82 and 0.74 of average class accuracy
and positive recall respectively. It should also be noted that the second best in
this category is MDL-3, with 0.75 and 0.60 of average class accuracy and positive
recall.
Comparing the results of the experiment (Section 5) with baseline (Section 4),
we note that replacing classical language models with the semantically richer
WEMB does not result in a better classi cation performance. However, when
exchanged for SEMB models, the performance is improved, which is on par with
results presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Similar to this result, we see in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] that WEMB have not improved the
performance compared to their baseline model (traditional feature representation
+ linear algorithm). However for sentiment analysis [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as well as spam
detection in SMS messages [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], WEMB has been shown to outperform traditional
language models.
      </p>
      <p>WEMB model are known for considering the semantic- and syntactic-related
words in closeness vector space but do not consider the order in which a word
occurs in a sentence. Whereas SEMB models inherit both of these properties. We
believe that it is this property that allows SEMB models to outperform WEMB
in our classi cation task.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Detection of potential cyber threat communication in hacker forums and social
media is a hard task due the technical vocabulary and ambiguity of certain
posts. In this paper, we performed an experiment with di erent con gurations
of classi cation algorithms and language models for detecting malicious messages
related to software vulnerabilities in forums and social media. We have evaluated
these models through 5 di erent labelled datasets (D1 to D5). With this respect,
it has been concluded:</p>
      <p>(1) SEMB features representation is the best embedding for improving
classi cation performance on linear SVM model compared to WEMB and other
traditional representations, such as, bag of words, n-gram and char n-gram. In
this work, 2 out of 3 models with SEMB were able to overcome the best baseline
models. MDL-3 (Sent2vec) and MDL-5 (SentEconder), achieved 60% and 74%
of recall, respectively, and MDL-4 (InferSent) recorded same performance as the
baseline, 55% of recall.</p>
      <p>
        (2) In terms of classi cation performance over all models, we found that
MDL-7 (CNN + WEMB (Word2vec)) is able to achieve 93% of positive recall
and 96% of average class accuracy. The result is compatible with experiment done
in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where the authors applied a CNN model to detect security communication
of hackers, similar to this work. This con guration shows promise as an optimal
approach for detecting posts related to software vulnerabilities. In practice, these
models can be used by companies for prioritising security updates (patching) of
vulnerable systems in their assets.
      </p>
      <p>Acknowledgement. Andrei Lima Queiroz would like to thank the scholarship
granted by the Brazilian Federal Programme Science without Borders supported
by CNPq (Conselho Nacional de Desenvolvimento Cient co e Tecnologico), No
201898/2015-2.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Algarni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malaiya</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Software vulnerability markets: Discoverers and buyers</article-title>
          .
          <source>International Journal of Computer, Information Science and Engineering</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>71</volume>
          {
          <fpage>81</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Limtiaco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
          </string-name>
          , R.S.,
          <string-name>
            <surname>Constant</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guajardo-Cespedes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strope</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzweil</surname>
          </string-name>
          , R.:
          <article-title>Universal sentence encoder</article-title>
          . CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mckeever</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delany</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Harnessing the power of text mining for the detection of abusive content in social media</article-title>
          .
          <source>In: Advances in Computational Intelligence Systems</source>
          . pp.
          <volume>187</volume>
          {
          <fpage>205</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeever</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delany</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>A comparison of classical versus deep learning techniques for abusive content detection on social media sites</article-title>
          .
          <source>In: Social Informatics</source>
          . pp.
          <volume>117</volume>
          {
          <fpage>133</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>670</volume>
          {
          <fpage>680</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine Learning Journal</source>
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <volume>273</volume>
          {297 (Sep
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Deliu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leichter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Extracting cyber threat intelligence from hacker forums: Support vector machines versus convolutional neural networks</article-title>
          .
          <source>In: 2017 IEEE International Conference on Big Data (Big Data)</source>
          . pp.
          <volume>3648</volume>
          {
          <issue>3656</issue>
          (Dec
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. InfoSecurity:
          <article-title>Evolution of the Cybercrime-as-a-</article-title>
          <string-name>
            <surname>Service</surname>
            <given-names>Epidemic</given-names>
          </string-name>
          , uRL: https://www.infosecurity-magazine.com/magazine-features/ evolution-cybercrime-service/ [Accessed: Oct,
          <year>2019</year>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          . In: Nedellec,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Rouveirol</surname>
          </string-name>
          , C. (eds.)
          <source>Machine Learning: ECML-98</source>
          . pp.
          <volume>137</volume>
          {
          <fpage>142</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <fpage>1751</fpage>
          . Association for Computational Linguistics (Oct
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32</source>
          . pp.
          <volume>1188</volume>
          {
          <fpage>1196</fpage>
          . ICML'14, JMLR.org (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Word embedding method of sms messages for spam message ltering</article-title>
          .
          <source>In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>4</issue>
          (Feb
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , H.:
          <article-title>Sentiment analysis of citations using word2vec</article-title>
          .
          <source>CoRR abs/1704</source>
          .00177 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Miftahutdinov</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alimova</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tutubalina</surname>
            ,
            <given-names>E.: KFU</given-names>
          </string-name>
          <article-title>NLP team at SMM4H 2019 tasks: Want to extract adverse drugs reactions from tweets? BERT to the rescue</article-title>
          .
          <source>In: Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop &amp; Shared Task</source>
          . pp.
          <volume>52</volume>
          {
          <issue>57</issue>
          (Aug
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>In: 1st International Conference on Learning Representations, ICLR</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          . In: In EMNLP. pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Perone</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paula</surname>
            ,
            <given-names>T.S.</given-names>
          </string-name>
          :
          <article-title>Evaluation of sentence embeddings in downstream and linguistic probing tasks</article-title>
          . CoRR abs/
          <year>1806</year>
          .06259 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Queiroz</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mckeever</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keegan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Eavesdropping hackers: Detecting software vulnerability communication on social media using text mining</article-title>
          .
          <source>In: The Fourth International Conference on Cyber-Technologies and Cyber-Systems</source>
          . pp.
          <volume>41</volume>
          {
          <issue>48</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wieting</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Livescu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Towards universal paraphrastic sentence embeddings</article-title>
          .
          <source>In: 4th International Conference on Learning Representations (ICLR)</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dahlmeier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Character-based text classi cation using top down semantic model for sentence representation</article-title>
          .
          <source>CoRR abs/1705</source>
          .10586 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Character-level convolutional networks for text classi cation</article-title>
          .
          <source>CoRR abs/1509</source>
          .01626 (
          <year>2015</year>
          ), http://arxiv.org/abs/1509.01626
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classi cation</article-title>
          .
          <source>CoRR abs/1510</source>
          .03820 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>