<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Abusive Text Detection Using Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hao Chen</string-name>
          <email>hao.chen@mydit.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susan McKeever</string-name>
          <email>susan.mckeever@dit.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah Jane Delany</string-name>
          <email>sarahjane.delany@dit.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, Dublin Institute of Technology</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neural network models have become increasingly popular for text classi cation in recent years. In particular, the emergence of word embeddings within deep learning architectures has recently attracted a high level of attention amongst researchers. In this paper, we focus on how neural network models have been applied in text classi cation. Secondly, we extend our previous work [4, 3] using a neural network strategy for the task of abusive text detection. We compare word embedding features to the traditional feature representations such as n-grams and handcrafted features. In addition, we use an o -the-shelf neural network classi er, FastText[16]. Based on our results, the conclusions are: (1) Extracting selected manual features can increase abusive content detection over using basic ngrams; (2) Although averaging pre-trained word embeddings is a naive method, the distributed feature representation has better performance to ngrams in most of our datasets; (3) While the FastText classi er works e ciently with fast performance, the results are not remarkable as it is a shallow neural network with only one hidden layer; (4) Using pre-trained word embeddings does not guarantee better performance in the FastText classi er.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Text classi cation is an essential component in many applications, such as
sentiment analysis[27, 29], news categorization[16], and in our research domain of
interest, abusive text detection[
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ]. One of the fundamental tasks in text
classi cation is feature representation - nding appropriate approaches to represent
text content. The traditional approach is based on the occurrence-model,
counting the frequency of words (e.g. BoW) in text content. This largely ignores
word orders and thus the problem of capturing semantics between words still
remains. Adding extra features that are identi ed by experts based on speci c
task requirements can alleviate the drawback of traditional features. However,
this takes time and human e ort and introduces domain speci c dependencies
into the model. One solution for feature extraction without hand-crafting is to
use deep learning methods. In particular, this trend is sparked by the emergence
of word embedding techniques, such as word2vec[22] and glove[26]. Word
embedding is a distributed representation at word level which has been proven to be
capable of learning word semantics. To generate a distributed feature
representation at sentence level, one of the straightforward approaches is averaging the
pre-trained word embeddings. However, this reduces context information such
as the sensitivity of word orders, which limits semantic knowledge. To address
this issue, combining word embeddings with deep neural networks is a promising
approach, and it has attracted increasing attention in recent research.
      </p>
      <p>Originating from neural network structures, deep neural networks aim to
automatically abstract feature representations for data based on hierarchical layers.
It produces the state-of-the-art results in many text classi cation tasks[27, 25,
14]. In this paper, we rst present a review of the recent deep neural networks
that are widely used in the text classi cation. Afterwards, we carry out some
preliminary experiments on abusive text detection using fundamental neural
network techniques. We compare word embeddings to the more traditional feature
representations using SVM classi ers. In addition, we investigate an o -the-shelf
neural network based classi er.</p>
      <p>The structure of the rest of paper is as follows: Section 2 discusses two modes
of using neural networks for text classi cation and how these have been used in
abusive text detection. In Section 3, we present experimental results. Finally,
conclusions and future work are presented in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>State-of-the-Art Neural Networks</title>
      <p>In this section, we investigate the current deep neural networks that have been
used in general text classi cation tasks. We consider the use of deep neural
networks in two modes: unsupervised for feature representation, and supervised
for text classi cation. Furthermore, we review some deep neural networks that
have been used in the abusive text detection domain.
2.1</p>
      <sec id="sec-2-1">
        <title>Unsupervised Mode for Feature Representation</title>
        <p>The cornerstone of using neural networks in unsupervised mode is word2vec[22]
which is an approach to generate a distributed representation for words in a
vector space with lower-dimension. These word vectors, also called word
embeddings, are learned based on the concept that the words with similar meaning
should have the similar surrounding words. Mikolov et al.[22] introduced two
models, skip-gram and continuous bag of words (CBOW). The framework of
skip-gram is shown in Figure 1. The architecture is very straightforward. In the
training process, the model generates a representation of current word Wt based
on predicting the nearby words (Wt 2,Wt 1,Wt+1,Wt+2) in a window. After
training, the vector of weights from the hidden layer is the representation of the
word, the word embedding. CBOW is a similar model to skip-gram except it
swaps the input and output, using nearby words to predict the current word.</p>
        <p>
          There are also approaches that use neural networks to generate the
distributed representation for blocks of text (sentence, paragraph or document).
Here, we introduce three typical unsupervised models, paragraph2vec[20],
SkipThought[18] and autoencoder[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Adapted from the word2vec architecture, Mikolov
et al.[20] proposed a method called paragraph2vec which can learn comment
representation from variable length text. Figure 2 shows the framework. Looking
at the base of the diagram, the input layer has two elements, a unique vector D
represents paragraph id, and a set of vectors W s representing words in a window
which slides over the text. The output layer is the prediction of the next word
in the context. During the training process, each weights of each word vector
W s is updated over each window. However, the weights of the paragraph vector
are updated only when the window is in the paragraph. At the end of training
process, the paragraph vectors D can be used as the text feature representation.
In paper[20], the error of classi cation when using paragraph vectors decreased
by approximately 39% compared to the traditional Bag-of-Words feature
representation.
Skip-Thought[18] is an another unsupervised approach to generate feature
representations at sentence level also inspired by word2vec. Given a tuple of
(Si 1; Si; Si+1) sentences, the words in Si are mapped as input, through the
neural network model, Si is converted into the vector, and then the vector is
converted back to a set of words that are appear in the nearby sentences(Si 1
for previous sentence, and Si+1 for next sentence). After the training process,
the resulting weights vector can be used as the representation. Instead of
reconstructing context information such as surrounding words (paragraph2vec)
or surrounding sentences (Skip-Thought), reconstructing the text content
itself is also a popular way to generate feature representation. This approach is
named autoencoder where the model starts by encoding the sentence into a
lower-dimensional embedding, and then converts it back to the original input.
After training, the lower-dimensional embeddings can be used as the feature
representation [
          <xref ref-type="bibr" rid="ref6">6, 32</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Supervised Mode for Text Classi cation</title>
        <p>Neural network architectures have also been used directly for text classi cation
tasks. For example, the simple model named FastText, proposed by Joulin et
al.[16], is an e cient classi er. As shown in Figure 3, a sentence with a set
of N ngrams features (X1; X2; :::; XN 1; XN ) are embedded and then averaged
to the middle layer, the output layer is the softmax function that computes
the probability distribution over the pre-de ned classes. The advantage of the
FastText model is its fast execution time. However, the performance of using a
shallow neural network in supervised classi cation is rudimentary (see the results
of experiments in Section 3).</p>
        <p>A range of more complex neural network architecutures have been used in
recent years for text classi cation. Two particular architectures that are widely
used are convolutional neural networks and recurrent neural networks. The
following subsections expand on details of these two models respectively.
Convolutional Neural Network The CNN model uses multiple layers with
convolving lters that aim to capture `local' features. It was originally used in
computer vision[19]. Subsequently, CNN was adopted in natural language
processing and produced impressive results for many text classi cation tasks. The
basic framework of CNN is shown in Figure 4. The sentence is represented by
a set of word embeddings which are then mapped through a variety of
convolutional lters of di erent sizes. Afterwards, the structure applies max-pooling to
reduce the dimensionality of the features in order to reduce the complexity of
the model and prevent over tting. The nal layer is the probability distribution
over classes.</p>
        <p>
          Several researchers have adapted the CNN architecture to perform text
classi cation. Ren et al. [27] proposed a context-based CNN for sentiment analysis
in a Twitter dataset, incorporating context information from relevant tweets
into the model in the form of word embedding vectors; Yang Wang et al.[30]
designed a hybrid CNN to integrate metadata with text content for fake news
classi cation. For sarcasm detection in social media, Amir et al.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] introduced
a CUE-CNN model which learns embeddings which represent both text content
and user information.
        </p>
        <p>The superiority of the CNN model is its ability to mine the relations in the
contextual windows and capture the `local' information such as semantic clues
through the convolutional lters. However, given multiple lters with a large
number of trainable parameters, the CNN is a heavy data model which usually
requires a huge amount of training data. In addition, a critical issue of CNNs is
the inability to handle sentences of variable length as input due to the restriction
of a xed input size. To address this particular issue, research has focused on
the recurrent neural network.</p>
        <p>Recurrent Neural Network The RNN is an extension of a deep neural model
that has ability to handle variable-length sequence input. Instead of learning
features by the traditional feedforward structure, the RNN involves recurrent
units which can use the information of the previous states. In addition, this
architecture can e ectively address the issue that the input of text content is
in xed size. Figure 5 illustrates the basic RNN framework and unfolded into a
graph of the timesteps at time t. The input xt is fed to the model at timestamp
t, St is the hidden layer that captures input information xt and previous state
St 1 at timestamp t 1, ot illustrates the output of the model.</p>
        <p>Although the RNN model can capture the information from previous states,
the key drawback is the vanishing gradients problem which make is di cult to
learn and tune parameters from earlier states in the network. The limitation
is addressed by two advanced models, gated recurrent units (GRU) and long
short-term memory (LSTM). As shown in Fig 6, the normal recurrent unit is
replaced by the these two variant units with multiple useful gates. For GRU unit,
the gates r and z are designed to control long-term and short-term dependencies
which can mitigate against the vanishing gradients problem; For the LSTM unit,
in addition to updates and reset gates, it adds one more cell c as memory for
the previous state, and the gate o stands for how much information should the
cell output.</p>
        <p>
          To date, RNN based models have been widely applied in text classi cation.
Tang et al.[29] employed a gated RNN architecture in sentiment analysis, which
showed a superior performance over a standard RNN model. Wang et al.[31]
applied LSTM to predict polarities of tweets and gained 1% better accuracy
comparing to the standard RNN model. Typically, the standard LSTM is a
single direction structure that can only capture textual information from one
directional sequence. A bidirectional LSTM, consisting of two LSTMs that are
run in parallel, has proved to be useful in text classi cation[34]. In addition,
Tai et al.[28] developed a variant LSTM model that is based on a tree topology.
Rather than the traditional LSTM unit composed from the current timestamp
input and previous state, the tree-LSTM unit composes from the current
timestamp and previous tree-based state. This model shows superiority for sentiment
classi cation than the standard LSTM.
Early research in text classi cation on addressing abusive social media
comments focused on exploring useful information such as lexical features[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the
users' pro le[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and historical activities[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Djuric[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] et al. are the forerunners
of implementing a neural network architecture to generate a distributed feature
representation for hate speech detection. They used paragraph2vec[20] for the
modelling of comments. Compared to the BoW representation, the classi
cation accuracy increased from 0.78 to 0.80 compared with a logistic regression
classi er. Subsequently, Nobata et al.[23] also conducted a set of
comprehensive experiments to evaluate the performance of a variety of representations for
abusive comments. They compared the paragraph2vec[20] to a number of
feature representations including n-grams, linguistic and syntactic. Using an SVM
classi er, the results indicated that using paragraph2vec to generate comment
embeddings outperformed the linguistic and syntactic handcrafted features. In
addition, they also show the performance of simply using an averaging strategy
over the pre-trained word embeddings is better to the ngrams feature
representation in most of datasets.
        </p>
        <p>
          Furthermore, there are an increasing number of researchers who started to
work on complex deep neural networks for tackling the problem of abusive text
detection. Badjatiya et al.[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] investigated CNNs for hate speech detection in
tweets, which signi cantly outperformed the traditional methods such as
Logistic Regression and SVM; Gamback et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] also conduct CNN architecture
to classify tweets into four categories include racism, sexism, both (racism and
sexism) and neither, they modi ed traditional CNN input with word
embedding by concatenating character ngrams; Park et al.[24] proposed an improved
CNN model that combined word embeddings and character embeddings as well.
Mehdad et al. [21] implemented RNN using characters as input instead of words,
which achieved an increase of approximately 8% in average class accuracy. An
advanced RNN model, bi-directional LSTM with attention mechanism which
adds weights for importance of each input, was proposed by Gao et al.[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and
Del Vigna et al.[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Both of them achieved better performance compared to
the one-directional LSTM. In addition, Pavlopoulos et al.[25] also showed the
attention mechanism improves the performance of the RNN model when dealing
with abusive comments in the Greek language.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments - Abusive UGC Detection</title>
      <p>
        From our previous paper[
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ], we have implemented traditional text classi
cation techniques to tackle abusive detection. In this section, we attempt to use the
fundamental neural network strategy to deal with this issue. We rst compare
word embeddings to a variety of traditional text feature representations include
ngrams at word level, ngrams at character level, and handcrafted features. In
addition, we investigate the performance of using a recent neural network
classier. The structure of this section is as follows: we describe the datasets that are
used in our experiments; we then detail the methodologies of our experiments;
nally, the experimental results are presented and discussed.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>
          All datasets used for abusive detection in this paper were extracted from social
media websites. Although these websites include a variety of user content sources
including forums, micro-blog, media-sharing, news article discussion, chat and
Q&amp;A, they share the common characteristic that they allow online users to
freely post their comments. We identi ed 8 published datasets[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] that have been
gathered from di erent social media platforms: Twitter, MySpace, Formspring,
YouTube, SlashDot. Given that these are published datasets in the research
domain, we have assumed that their labelling strategies are correct and that the
label results are reliable. We also used our own abusive content dataset, collected
from a news site and labelled using crowd sourced labelling[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The following
Table 1 is an overview of each dataset, showing basic information including the
source type, the number of instances, average number of words across instances,
and the proportion of positive (abusive) and negative instances.
        </p>
        <p>For each dataset, we carried out pre-processing operations as follows: all
letters were changed to the lowercase; the mentioned names started with `@'
symbol were replaced by the anonymous text "@username"; links following with
"http://" or "https://" were replaced by the generic term. Considering the
comments are typically short, we did not remove stop-words nor implement word
stemming.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Methodology</title>
        <p>
          We employed Support Vector Machines (SVMs) with a linear kernel, one of
the most e cient classi ers[15], as our classi cation algorithm. We established
baseline results by using ngrams feature representation, implemented in two
ways: 1-4 word level and 2-4 character level. Based on our previous work[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we
normalized features values, and used document frequency (1% as threshold) to
reduce the features where the most and least frequent 1% of terms are excluded.
        </p>
        <p>To validate our results, we applied strati ed 10-fold cross validation on each
dataset. In addition, Table 1 shows that most of the datasets are imbalanced,
with the positive instances (abusive) far less in number than the negative
instances (non-abusive). We used resampling to randomly oversample the minority
class of our training data and averaged results over three iterations.</p>
        <p>The results of our experiments are reported using a standard classi cation
measure recall, which measures ability to nd all instances of a speci c class. Our
working assumption is that the consequence of failing to detect abusive content is
more serious than the non abusive content being predicted as abusive. Therefore,
we focus on abusive recall rather than non-abusive recall. Equation 1 shows the
calculation where TruePositive is the proportion of abusive comments correctly
classi ed as abusive, and FalseNegative is the proportion of abusive comments
were wrongly classi ed as non-abusive. Average recall is also reported.</p>
        <p>Recall =</p>
        <p>T rueP ositives
T rueP ositives + F alseN egatives
(1)
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Experiments &amp; Results</title>
        <p>
          We present the results in Table 2. User generated comments are usually
freeformat in style and ngrams at word level capture word misspellings and
abbreviations. Therefore, the performance of using ngrams in word level is normally worse
than the ngrams at character level. In addition, based on the results of
characterlevel ngrams, extracting additional features that capture syntactic and semantic
information[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] achieves better results in abusive detection across 9 datasets. We
applied the paired t statistical test (p=0.0353) to con rm the di erence.
        </p>
        <p>Next, we analyzed the embedding representation which is averaging the word
embeddings that are appear in the comment. The existing word embeddings
we used in our paper are collected from Glove across 4 di erent dimensions
(50,100,200,300) 1 which were pre-trained on the Wikipedia data corpus.
Although averaging is one of the most naive approaches in using word vectors, it
shows better performance to ngrams for most of the datasets. In general, using
higher dimension word embedding achieves better performance since the higher
dimension vectors contain in theory more information than the lower dimension
ones. However, in our experiments, some datasets show the opposite results. For
example, in D5, the performance of using 300-dimension vectors slumped 10%
when compared to the 50-dimension vectors; In D6 also, abusive recall rate
decreased as higher dimension vectors were used. The selection of appropriate word
embeddings appears to play an important role when dealing with the speci c
classi cation task. One reason for this counter intuitive result is the di erence
between the training corpus language and the language of our posts. In this
paper, the Glove word embeddings used are trained on the Wikipedia text corpus
which generally uses a formal language style. However, social media datasets
typically contain casual expressions, meaning that the word context information
captured in Wikipedia-trained word embeddings may be di erent from that used
in the abusive datasets. In the future, we will attempt to use pre-trained word
embedding on a source corpus similar to the experimental dataset to boost the
classi er accuracy.</p>
        <p>FastText[16] is a one-hidden layer neural network text classi er. We analyzed
this model in two ways: with and without using pre-trained word embeddings. As
1 https://nlp.stanford.edu/projects/glove/
shown in Table 2, we note that the performance of FastText in abusive content
detection is in fact worse than the other feature representations with an SVM
classi er. However, FastText is an e cient classi er with a much faster execution
time than the SVMs. We attribute the classi cation results to the
straightforward structure of FastText. In addition, using pre-trained word embeddings in
FastText does not guarantee better results, which also may be due to the data
source used for the pre-trained word embeddings.
At this stage, we have not found a general model that performs best across
all 9 datasets. The ngrams with syntactic &amp; semantic information achieves
signi cantly better results to the standard ngrams in 7 of our 9 datasets. Averaging
word embedding is considered as the most straightforward approach to generate
sentence vectors and achieves good results in our experiments. In addition, the
FastText simple neural network shows poor performance in the abusive detection
task. We suggest that an advanced structure needs to be designed for abusive
detection in future work.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions &amp; Future Work</title>
      <p>The purposes of this paper were two fold: (1) to investigate how o -the-shelf
deep neural networks have been used across the two tasks within text
classication - feature representation and the classi cation tasks itself and (2) to
run preliminary experiments in abusive detection across 9 di erent social media
datasets. We highlight the following aspects from our work: Firstly, we
systematically summarized current neural models and categorized them into two modes,
unsupervised approaches for generating distributed feature representations and
supervised approaches for classi cation. Secondly, we attempted to compare the
classi cation performance of using traditional feature representation to word
embedding feature. Simply averaging pre-trained word embeddings has better
results to ngrams feature representation in most of datasets. In addition, we
employed a recent neural network, FastText. Due to its shallow architecture,
the performance is unexceptional. Using pre-trained word embeddings can not
guarantee better results possibly probably due to the characteristics of corpus
used for pre-training the embeddings being di erent from our datasets. We will
validate this assumption in our subsequent experiments.</p>
      <p>The ultimate goal of our research is to develop a powerful classi cation model
that can assist social media moderators to detect abusive comments e ciently
and e ectively. The path to the goal can be divided into two directions, designing
appropriate features for abusive comments and designing an outstanding model
for detection. In future work, we will focus our research on both directions by
exploring deep neural networks in unsupervised mode and supervised mode.
14. Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences
from unlabelled data. arXiv preprint arXiv:1602.03483 (2016)
15. Joachims, T.: Text categorization with support vector machines: Learning with
many relevant features. Machine learning: ECML-98 pp. 137{142 (1998)
16. Joulin, A., Grave, E., Mikolov, P.B.T.: Bag of tricks for e cient text classi cation.</p>
      <p>arXiv preprint arXiv:1607.01759 (2016)
17. Kim, Y.: Convolutional neural networks for sentence classi cation. arXiv preprint
arXiv:1408.5882 (2014)
18. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A.,
Fidler, S.: Skip-thought vectors. In: Advances in neural information processing
systems. pp. 3294{3302 (2015)
19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi cation with deep
convolutional neural networks. In: Advances in neural information processing systems.
pp. 1097{1105 (2012)
20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In:</p>
      <p>Proceedings of the 31st ICML-14. pp. 1188{1196 (2014)
21. Mehdad, Y., Tetreault, J.R.: Do characters abuse more than words? In: SIGDIAL</p>
      <p>Conference. pp. 299{303 (2016)
22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: E cient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
23. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language
detection in online user content. In: Proceedings of the 25th International
Conference on World Wide Web. pp. 145{153 (2016)
24. Park, J.H., Fung, P.: One-step and two-step classi cation for abusive language
detection on twitter. arXiv preprint arXiv:1706.01206 (2017)
25. Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deep learning for user
comment moderation. arXiv preprint arXiv:1705.09993 (2017)
26. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word
representation. In: Proceedings of the 2014 conference on empirical methods in natural
language processing (EMNLP). pp. 1532{1543 (2014)
27. Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Context-sensitive twitter sentiment
classication using neural network. In: AAAI. pp. 215{221 (2016)
28. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from
tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075
(2015)
29. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network
for sentiment classi cation. In: EMNLP. pp. 1422{1432 (2015)
30. Wang, W.Y.: " liar, liar pants on re": A new benchmark dataset for fake news
detection. arXiv preprint arXiv:1705.00648 (2017)
31. Wang, X., Liu, Y., Sun, C., Wang, B., Wang, X.: Predicting polarities of tweets
by composing word embeddings with long short-term memory. In: ACL (1). pp.
1343{1353 (2015)
32. Xu, W., Sun, H., Deng, C., Tan, Y.: Variational autoencoder for semi-supervised
text classi cation. In: AAAI. pp. 3358{3364 (2017)
33. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning
based natural language processing. arXiv preprint arXiv:1708.02709 (2017)
34. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classi cation improved by
integrating bidirectional lstm with two-dimensional max pooling. arXiv preprint
arXiv:1611.06639 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amir</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
          </string-name>
          , P.
          <string-name>
            <surname>C.M.J.:</surname>
          </string-name>
          <article-title>Modelling context with user embeddings for sarcasm detection in social media</article-title>
          .
          <source>arXiv preprint arXiv:1607.00976</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Badjatiya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Deep learning for hate speech detection in tweets</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web Companion</source>
          . pp.
          <volume>759</volume>
          {
          <issue>760</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mckeever</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delany</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Harnessing the power of text mining for the detection of abusive content in social media</article-title>
          .
          <source>In: Advances in Computational Intelligence Systems</source>
          , pp.
          <volume>187</volume>
          {
          <fpage>205</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mckeever</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delany</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          :
          <article-title>Presenting a labelled dataset for real-time detection of abusive user posts</article-title>
          .
          <source>In: Proceedings of the International Conference on Web Intelligence</source>
          . pp.
          <volume>884</volume>
          {
          <fpage>890</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Detecting o ensive language in social media to protect adolescent online safety</article-title>
          .
          <source>In: Privacy, Security, Risk and Trust (PASSAT)</source>
          ,
          <source>2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)</source>
          . pp.
          <volume>71</volume>
          {
          <fpage>80</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Van Merrienboer,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Bougares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Learning phrase representations using rnn encoder-decoder for statistical machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1406.1078</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>arXiv preprint arXiv:1412.3555</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dadvar</surname>
            ,
            <given-names>M.</given-names>
            , de Jong, F.M.
          </string-name>
          ,
          <string-name>
            <surname>Ordelman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trieschnigg</surname>
          </string-name>
          , R.:
          <article-title>Improved cyberbullying detection using gender information (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dadvar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trieschnigg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ordelman</surname>
          </string-name>
          , R., de Jong, F.:
          <article-title>Improving cyberbullying detection with user context</article-title>
          .
          <source>In: ECIR</source>
          . pp.
          <volume>693</volume>
          {
          <fpage>696</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Del Vigna12</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino23</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrocchi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tesconi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Hate me, hate me not: Hate speech detection on facebook (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Djuric</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grbovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radosavljevic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhamidipati</surname>
          </string-name>
          , N.:
          <article-title>Hate speech detection with comment embeddings</article-title>
          .
          <source>In: Proceedings of the 24th International Conference on World Wide Web</source>
          . pp.
          <volume>29</volume>
          {
          <fpage>30</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Gamback,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sikdar</surname>
          </string-name>
          , U.K.:
          <article-title>Using convolutional neural networks to classify hatespeech</article-title>
          .
          <source>In: Proceedings of the First Workshop on Abusive Language Online</source>
          . pp.
          <volume>85</volume>
          {
          <issue>90</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Detecting online hate speech using context aware models (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>