<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Abusive Text Detection Using Neural Networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hao</forename><surname>Chen</surname></persName>
							<email>hao.chen@mydit.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Dublin Institute of Technology</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Susan</forename><surname>Mckeever</surname></persName>
							<email>susan.mckeever@dit.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Dublin Institute of Technology</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sarah</forename><forename type="middle">Jane</forename><surname>Delany</surname></persName>
							<email>sarahjane.delany@dit.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Dublin Institute of Technology</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Abusive Text Detection Using Neural Networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5EEBA14C2BD78EF19F84C1F133EC637A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Neural network models have become increasingly popular for text classification in recent years. In particular, the emergence of word embeddings within deep learning architectures has recently attracted a high level of attention amongst researchers. In this paper, we focus on how neural network models have been applied in text classification. Secondly, we extend our previous work [4, 3] using a neural network strategy for the task of abusive text detection. We compare word embedding features to the traditional feature representations such as n-grams and handcrafted features. In addition, we use an off-the-shelf neural network classifier, FastText <ref type="bibr" target="#b15">[16]</ref>. Based on our results, the conclusions are: (1) Extracting selected manual features can increase abusive content detection over using basic ngrams; (2) Although averaging pre-trained word embeddings is a naive method, the distributed feature representation has better performance to ngrams in most of our datasets; (3) While the FastText classifier works efficiently with fast performance, the results are not remarkable as it is a shallow neural network with only one hidden layer; (4) Using pre-trained word embeddings does not guarantee better performance in the FastText classifier.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Text classification is an essential component in many applications, such as sentiment analysis <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b28">29]</ref>, news categorization <ref type="bibr" target="#b15">[16]</ref>, and in our research domain of interest, abusive text detection <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b2">3]</ref>. One of the fundamental tasks in text classification is feature representation -finding appropriate approaches to represent text content. The traditional approach is based on the occurrence-model, counting the frequency of words (e.g. BoW) in text content. This largely ignores word orders and thus the problem of capturing semantics between words still remains. Adding extra features that are identified by experts based on specific task requirements can alleviate the drawback of traditional features. However, this takes time and human effort and introduces domain specific dependencies into the model. One solution for feature extraction without hand-crafting is to use deep learning methods. In particular, this trend is sparked by the emergence of word embedding techniques, such as word2vec <ref type="bibr" target="#b21">[22]</ref> and glove <ref type="bibr" target="#b25">[26]</ref>. Word embedding is a distributed representation at word level which has been proven to be capable of learning word semantics. To generate a distributed feature representation at sentence level, one of the straightforward approaches is averaging the pre-trained word embeddings. However, this reduces context information such as the sensitivity of word orders, which limits semantic knowledge. To address this issue, combining word embeddings with deep neural networks is a promising approach, and it has attracted increasing attention in recent research.</p><p>Originating from neural network structures, deep neural networks aim to automatically abstract feature representations for data based on hierarchical layers. It produces the state-of-the-art results in many text classification tasks <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b13">14]</ref>. In this paper, we first present a review of the recent deep neural networks that are widely used in the text classification. Afterwards, we carry out some preliminary experiments on abusive text detection using fundamental neural network techniques. We compare word embeddings to the more traditional feature representations using SVM classifiers. In addition, we investigate an off-the-shelf neural network based classifier.</p><p>The structure of the rest of paper is as follows: Section 2 discusses two modes of using neural networks for text classification and how these have been used in abusive text detection. In Section 3, we present experimental results. Finally, conclusions and future work are presented in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">State-of-the-Art Neural Networks</head><p>In this section, we investigate the current deep neural networks that have been used in general text classification tasks. We consider the use of deep neural networks in two modes: unsupervised for feature representation, and supervised for text classification. Furthermore, we review some deep neural networks that have been used in the abusive text detection domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Unsupervised Mode for Feature Representation</head><p>The cornerstone of using neural networks in unsupervised mode is word2vec <ref type="bibr" target="#b21">[22]</ref> which is an approach to generate a distributed representation for words in a vector space with lower-dimension. These word vectors, also called word embeddings, are learned based on the concept that the words with similar meaning should have the similar surrounding words. Mikolov et al. <ref type="bibr" target="#b21">[22]</ref> introduced two models, skip-gram and continuous bag of words (CBOW). The framework of skip-gram is shown in Figure <ref type="figure">1</ref>. The architecture is very straightforward. In the training process, the model generates a representation of current word W t based on predicting the nearby words (W t−2 ,W t−1 ,W t+1 ,W t+2 ) in a window. After training, the vector of weights from the hidden layer is the representation of the word, the word embedding. CBOW is a similar model to skip-gram except it swaps the input and output, using nearby words to predict the current word.</p><p>There are also approaches that use neural networks to generate the distributed representation for blocks of text (sentence, paragraph or document). Here, we introduce three typical unsupervised models, paragraph2vec <ref type="bibr" target="#b19">[20]</ref>, Skip-Thought <ref type="bibr" target="#b17">[18]</ref> and autoencoder <ref type="bibr" target="#b5">[6]</ref>. Adapted from the word2vec architecture, Mikolov et al. <ref type="bibr" target="#b19">[20]</ref> proposed a method called paragraph2vec which can learn comment representation from variable length text. Figure <ref type="figure" target="#fig_0">2</ref> shows the framework. Looking at the base of the diagram, the input layer has two elements, a unique vector D represents paragraph id, and a set of vectors W s representing words in a window which slides over the text. The output layer is the prediction of the next word in the context. During the training process, each weights of each word vector W s is updated over each window. However, the weights of the paragraph vector are updated only when the window is in the paragraph. At the end of training process, the paragraph vectors D can be used as the text feature representation.</p><p>In paper <ref type="bibr" target="#b19">[20]</ref>, the error of classification when using paragraph vectors decreased by approximately 39% compared to the traditional Bag-of-Words feature representation.</p><p>Fig. <ref type="figure">1</ref>: The skip-gram architecture of word2vec proposed by Mikolove et al. <ref type="bibr" target="#b21">[22]</ref> Skip-Thought <ref type="bibr" target="#b17">[18]</ref> is an another unsupervised approach to generate feature representations at sentence level also inspired by word2vec. Given a tuple of (S i−1 , S i , S i+1 ) sentences, the words in S i are mapped as input, through the neural network model, S i is converted into the vector, and then the vector is converted back to a set of words that are appear in the nearby sentences(S i−1 for previous sentence, and S i+1 for next sentence). After the training process, the resulting weights vector can be used as the representation. Instead of reconstructing context information such as surrounding words (paragraph2vec) or surrounding sentences (Skip-Thought), reconstructing the text content itself is also a popular way to generate feature representation. This approach is named autoencoder where the model starts by encoding the sentence into a lower-dimensional embedding, and then converts it back to the original input.</p><p>After training, the lower-dimensional embeddings can be used as the feature representation <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b31">32]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Supervised Mode for Text Classification</head><p>Neural network architectures have also been used directly for text classification tasks. For example, the simple model named FastText, proposed by Joulin et al. <ref type="bibr" target="#b15">[16]</ref>, is an efficient classifier. As shown in Figure <ref type="figure" target="#fig_1">3</ref>, a sentence with a set of N ngrams features (X 1 , X 2 , ..., X N −1 , X N ) are embedded and then averaged to the middle layer, the output layer is the softmax function that computes the probability distribution over the pre-defined classes. The advantage of the FastText model is its fast execution time. However, the performance of using a shallow neural network in supervised classification is rudimentary (see the results of experiments in Section 3).</p><p>A range of more complex neural network architecutures have been used in recent years for text classification. Two particular architectures that are widely used are convolutional neural networks and recurrent neural networks. The following subsections expand on details of these two models respectively. Convolutional Neural Network The CNN model uses multiple layers with convolving filters that aim to capture 'local' features. It was originally used in computer vision <ref type="bibr" target="#b18">[19]</ref>. Subsequently, CNN was adopted in natural language processing and produced impressive results for many text classification tasks. The basic framework of CNN is shown in Figure <ref type="figure" target="#fig_2">4</ref>. The sentence is represented by a set of word embeddings which are then mapped through a variety of convolutional filters of different sizes. Afterwards, the structure applies max-pooling to reduce the dimensionality of the features in order to reduce the complexity of the model and prevent overfitting. The final layer is the probability distribution over classes.</p><p>Several researchers have adapted the CNN architecture to perform text classification. Ren et al. <ref type="bibr" target="#b26">[27]</ref> proposed a context-based CNN for sentiment analysis in a Twitter dataset, incorporating context information from relevant tweets into the model in the form of word embedding vectors; Yang Wang et al. <ref type="bibr" target="#b29">[30]</ref> designed a hybrid CNN to integrate metadata with text content for fake news classification. For sarcasm detection in social media, Amir et al. <ref type="bibr" target="#b0">[1]</ref> introduced a CUE-CNN model which learns embeddings which represent both text content and user information. The superiority of the CNN model is its ability to mine the relations in the contextual windows and capture the 'local' information such as semantic clues through the convolutional filters. However, given multiple filters with a large number of trainable parameters, the CNN is a heavy data model which usually requires a huge amount of training data. In addition, a critical issue of CNNs is the inability to handle sentences of variable length as input due to the restriction of a fixed input size. To address this particular issue, research has focused on the recurrent neural network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recurrent Neural Network</head><p>The RNN is an extension of a deep neural model that has ability to handle variable-length sequence input. Instead of learning features by the traditional feedforward structure, the RNN involves recurrent units which can use the information of the previous states. In addition, this architecture can effectively address the issue that the input of text content is in fixed size. Figure <ref type="figure" target="#fig_3">5</ref> illustrates the basic RNN framework and unfolded into a graph of the timesteps at time t. The input x t is fed to the model at timestamp t, S t is the hidden layer that captures input information x t and previous state S t−1 at timestamp t − 1, o t illustrates the output of the model.</p><p>Although the RNN model can capture the information from previous states, the key drawback is the vanishing gradients problem which make is difficult to learn and tune parameters from earlier states in the network. The limitation is addressed by two advanced models, gated recurrent units (GRU) and long short-term memory (LSTM). As shown in Fig <ref type="figure" target="#fig_4">6</ref>, the normal recurrent unit is replaced by the these two variant units with multiple useful gates. For GRU unit, the gates r and z are designed to control long-term and short-term dependencies which can mitigate against the vanishing gradients problem; For the LSTM unit, in addition to updates and reset gates, it adds one more cell c as memory for the previous state, and the gate o stands for how much information should the cell output. To date, RNN based models have been widely applied in text classification. Tang et al. <ref type="bibr" target="#b28">[29]</ref> employed a gated RNN architecture in sentiment analysis, which showed a superior performance over a standard RNN model. Wang et al. <ref type="bibr" target="#b30">[31]</ref> applied LSTM to predict polarities of tweets and gained 1% better accuracy comparing to the standard RNN model. Typically, the standard LSTM is a single direction structure that can only capture textual information from one directional sequence. A bidirectional LSTM, consisting of two LSTMs that are run in parallel, has proved to be useful in text classification <ref type="bibr" target="#b33">[34]</ref>. In addition, Tai et al. <ref type="bibr" target="#b27">[28]</ref> developed a variant LSTM model that is based on a tree topology. Rather than the traditional LSTM unit composed from the current timestamp input and previous state, the tree-LSTM unit composes from the current timestamp and previous tree-based state. This model shows superiority for sentiment classification than the standard LSTM. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Application in Abusive Detection</head><p>Early research in text classification on addressing abusive social media comments focused on exploring useful information such as lexical features <ref type="bibr" target="#b4">[5]</ref>, the users' profile <ref type="bibr" target="#b7">[8]</ref> and historical activities <ref type="bibr" target="#b8">[9]</ref>. Djuric <ref type="bibr" target="#b10">[11]</ref> et al. are the forerunners of implementing a neural network architecture to generate a distributed feature representation for hate speech detection. They used paragraph2vec <ref type="bibr" target="#b19">[20]</ref> for the modelling of comments. Compared to the BoW representation, the classification accuracy increased from 0.78 to 0.80 compared with a logistic regression classifier. Subsequently, Nobata et al. <ref type="bibr" target="#b22">[23]</ref> also conducted a set of comprehensive experiments to evaluate the performance of a variety of representations for abusive comments. They compared the paragraph2vec <ref type="bibr" target="#b19">[20]</ref> to a number of feature representations including n-grams, linguistic and syntactic. Using an SVM classifier, the results indicated that using paragraph2vec to generate comment embeddings outperformed the linguistic and syntactic handcrafted features. In addition, they also show the performance of simply using an averaging strategy over the pre-trained word embeddings is better to the ngrams feature representation in most of datasets.</p><p>Furthermore, there are an increasing number of researchers who started to work on complex deep neural networks for tackling the problem of abusive text detection. Badjatiya et al. <ref type="bibr" target="#b1">[2]</ref> investigated CNNs for hate speech detection in tweets, which significantly outperformed the traditional methods such as Logistic Regression and SVM; Gamback et al. <ref type="bibr" target="#b11">[12]</ref> also conduct CNN architecture to classify tweets into four categories include racism, sexism, both (racism and sexism) and neither, they modified traditional CNN input with word embedding by concatenating character ngrams; Park et al. <ref type="bibr" target="#b23">[24]</ref> proposed an improved CNN model that combined word embeddings and character embeddings as well. Mehdad et al. <ref type="bibr" target="#b20">[21]</ref> implemented RNN using characters as input instead of words, which achieved an increase of approximately 8% in average class accuracy. An advanced RNN model, bi-directional LSTM with attention mechanism which adds weights for importance of each input, was proposed by Gao et al. <ref type="bibr" target="#b12">[13]</ref> and Del Vigna et al. <ref type="bibr" target="#b9">[10]</ref>. Both of them achieved better performance compared to the one-directional LSTM. In addition, Pavlopoulos et al. <ref type="bibr" target="#b24">[25]</ref> also showed the attention mechanism improves the performance of the RNN model when dealing with abusive comments in the Greek language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments -Abusive UGC Detection</head><p>From our previous paper <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b2">3]</ref>, we have implemented traditional text classification techniques to tackle abusive detection. In this section, we attempt to use the fundamental neural network strategy to deal with this issue. We first compare word embeddings to a variety of traditional text feature representations include ngrams at word level, ngrams at character level, and handcrafted features. In addition, we investigate the performance of using a recent neural network classifier. The structure of this section is as follows: we describe the datasets that are used in our experiments; we then detail the methodologies of our experiments; finally, the experimental results are presented and discussed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Datasets</head><p>All datasets used for abusive detection in this paper were extracted from social media websites. Although these websites include a variety of user content sources including forums, micro-blog, media-sharing, news article discussion, chat and Q&amp;A, they share the common characteristic that they allow online users to freely post their comments. We identified 8 published datasets <ref type="bibr" target="#b2">[3]</ref> that have been gathered from different social media platforms: Twitter, MySpace, Formspring, YouTube, SlashDot. Given that these are published datasets in the research domain, we have assumed that their labelling strategies are correct and that the label results are reliable. We also used our own abusive content dataset, collected from a news site and labelled using crowd sourced labelling <ref type="bibr" target="#b3">[4]</ref>. The following Table <ref type="table" target="#tab_0">1</ref> is an overview of each dataset, showing basic information including the source type, the number of instances, average number of words across instances, and the proportion of positive (abusive) and negative instances. For each dataset, we carried out pre-processing operations as follows: all letters were changed to the lowercase; the mentioned names started with '@' symbol were replaced by the anonymous text "@username"; links following with "http://" or "https://" were replaced by the generic term. Considering the comments are typically short, we did not remove stop-words nor implement word stemming.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Methodology</head><p>We employed Support Vector Machines (SVMs) with a linear kernel, one of the most efficient classifiers <ref type="bibr" target="#b14">[15]</ref>, as our classification algorithm. We established baseline results by using ngrams feature representation, implemented in two ways: 1-4 word level and 2-4 character level. Based on our previous work <ref type="bibr" target="#b2">[3]</ref>, we normalized features values, and used document frequency (1% as threshold) to reduce the features where the most and least frequent 1% of terms are excluded.</p><p>To validate our results, we applied stratified 10-fold cross validation on each dataset. In addition, Table <ref type="table" target="#tab_0">1</ref> shows that most of the datasets are imbalanced, with the positive instances (abusive) far less in number than the negative instances (non-abusive). We used resampling to randomly oversample the minority class of our training data and averaged results over three iterations.</p><p>The results of our experiments are reported using a standard classification measure recall, which measures ability to find all instances of a specific class. Our working assumption is that the consequence of failing to detect abusive content is more serious than the non abusive content being predicted as abusive. Therefore, we focus on abusive recall rather than non-abusive recall. Equation <ref type="formula">1</ref>shows the calculation where TruePositive is the proportion of abusive comments correctly classified as abusive, and FalseNegative is proportion of abusive comments were wrongly classified as non-abusive. Average recall is also reported.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recall =</head><p>T rueP ositives T rueP ositives + F alseN egatives</p><p>(1)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Experiments &amp; Results</head><p>We present the results in Table <ref type="table" target="#tab_1">2</ref>. User generated comments are usually freeformat in style and ngrams at word level capture word misspellings and abbreviations. Therefore, the performance of using ngrams in word level is normally worse than the ngrams at character level. In addition, based on the results of characterlevel ngrams, extracting additional features that capture syntactic and semantic information <ref type="bibr" target="#b3">[4]</ref> achieves better results in abusive detection across 9 datasets. We applied the paired t statistical test (p=0.0353) to confirm the difference. Next, we analyzed the embedding representation which is averaging the word embeddings that are appear in the comment. The existing word embeddings we used in our paper are collected from Glove across 4 different dimensions (50,100,200,300)<ref type="foot" target="#foot_0">1</ref> which were pre-trained on the Wikipedia data corpus. Although averaging is one of the most naive approaches in using word vectors, it shows better performance to ngrams for most of the datasets. In general, using higher dimension word embedding achieves better performance since the higher dimension vectors contain in theory more information than the lower dimension ones. However, in our experiments, some datasets show the opposite results. For example, in D5, the performance of using 300-dimension vectors slumped 10% when compared to the 50-dimension vectors; In D6 also, abusive recall rate decreased as higher dimension vectors were used. The selection of appropriate word embeddings appears to play an important role when dealing with the specific classification task. One reason for this counter intuitive result is the difference between the training corpus language and the language of our posts. In this paper, the Glove word embeddings used are trained on the Wikipedia text corpus which generally uses a formal language style. However, social media datasets typically contain casual expressions, meaning that the word context information captured in Wikipedia-trained word embeddings may be different from that used in the abusive datasets. In the future, we will attempt to use pre-trained word embedding on a source corpus similar to the experimental dataset to boost the classifier accuracy.</p><p>FastText <ref type="bibr" target="#b15">[16]</ref> is a one-hidden layer neural network text classifier. We analyzed this model in two ways: with and without using pre-trained word embeddings. As shown in Table <ref type="table" target="#tab_1">2</ref>, we note that the performance of FastText in abusive content detection is in fact worse than the other feature representations with an SVM classifier. However, FastText is an efficient classifier with a much faster execution time than the SVMs. We attribute the classification results to the straightforward structure of FastText. In addition, using pre-trained word embeddings in FastText does not guarantee better results, which also may be due to the data source used for the pre-trained word embeddings. At this stage, we have not found a general model that performs best across all 9 datasets. The ngrams with syntactic &amp; semantic information achieves significantly better results to the standard ngrams in 7 of our 9 datasets. Averaging word embedding is considered as the most straightforward approach to generate sentence vectors and achieves good results in our experiments. In addition, the FastText simple neural network shows poor performance in the abusive detection task. We suggest that an advanced structure needs to be designed for abusive detection in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions &amp; Future Work</head><p>The purposes of this paper were two fold: (1) to investigate how off-the-shelf deep neural networks have been used across the two tasks within text classification -feature representation and the classification tasks itself and (2) to run preliminary experiments in abusive detection across 9 different social media datasets. We highlight the following aspects from our work: Firstly, we systematically summarized current neural models and categorized them into two modes, unsupervised approaches for generating distributed feature representations and supervised approaches for classification. Secondly, we attempted to compare the classification performance of using traditional feature representation to word embedding feature. Simply averaging pre-trained word embeddings has better results to ngrams feature representation in most of datasets. In addition, we employed a recent neural network, FastText. Due to its shallow architecture, the performance is unexceptional. Using pre-trained word embeddings can not guarantee better results possibly probably due to the characteristics of corpus used for pre-training the embeddings being different from our datasets. We will validate this assumption in our subsequent experiments.</p><p>The ultimate goal of our research is to develop a powerful classification model that can assist social media moderators to detect abusive comments efficiently and effectively. The path to the goal can be divided into two directions, designing appropriate features for abusive comments and designing an outstanding model for detection. In future work, we will focus our research on both directions by exploring deep neural networks in unsupervised mode and supervised mode.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: The architecture of Paragraph2vec proposed by Mikolove et al.<ref type="bibr" target="#b19">[20]</ref> </figDesc><graphic coords="4,207.48,115.83,200.40,107.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 :</head><label>3</label><figDesc>Fig.3: The architecture of FastText proposed by Joulin et al.<ref type="bibr" target="#b15">[16]</ref> </figDesc><graphic coords="4,228.30,371.80,158.76,81.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 :</head><label>4</label><figDesc>Fig.4: The CNN model proposed by Kim<ref type="bibr" target="#b16">[17]</ref> </figDesc><graphic coords="5,164.22,185.20,286.92,117.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 5 :</head><label>5</label><figDesc>Fig.5: The RNN structure by Young et al.<ref type="bibr" target="#b32">[33]</ref> </figDesc><graphic coords="6,206.88,175.03,201.60,79.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 6 :</head><label>6</label><figDesc>Fig. 6: The structures of GRU(Left) and LSTM(Right) by Chung et al.[7]</figDesc><graphic coords="6,191.33,476.57,232.70,75.89" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Datasets Summary Statistics, D1-D8<ref type="bibr" target="#b2">[3]</ref>, D9<ref type="bibr" target="#b3">[4]</ref> </figDesc><table><row><cell>Dataset with</cell><cell>Dataset</cell><cell>#of</cell><cell>Avg Len</cell><cell>Class Dist.</cell></row><row><cell>Reference</cell><cell>Style</cell><cell>Instances</cell><cell>(words)</cell><cell>(Pos./Neg.)%</cell></row><row><cell>D1</cell><cell>Micro-Blog</cell><cell>3110</cell><cell>15</cell><cell>42/58</cell></row><row><cell>D2</cell><cell>Video-Sharing</cell><cell>3466</cell><cell>211</cell><cell>12/88</cell></row><row><cell>D3</cell><cell>Forum</cell><cell>1710</cell><cell>337</cell><cell>23/77</cell></row><row><cell>D4</cell><cell>Q&amp;A</cell><cell>13153</cell><cell>26</cell><cell>6/94</cell></row><row><cell>D5</cell><cell>Chat</cell><cell>4802</cell><cell>5</cell><cell>1/99</cell></row><row><cell>D6</cell><cell>Forum</cell><cell>4303</cell><cell>94</cell><cell>1/99</cell></row><row><cell>D7</cell><cell>Forum</cell><cell>1946</cell><cell>56</cell><cell>3/97</cell></row><row><cell>D8</cell><cell>Micro-Blog</cell><cell>1340</cell><cell>13</cell><cell>13/87</cell></row><row><cell>D9</cell><cell>News Discussion</cell><cell>2000</cell><cell>59</cell><cell>21/79</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Results of Abusive Detection across 9 Datasets. The results are shown as percentage recall, the abusive recall is outside of thebrackets, average recall is inside brackets. The highest abusive recall is displayed in bold.</figDesc><table><row><cell></cell><cell>D1</cell><cell>D2</cell><cell>D3</cell><cell>D4</cell><cell>D5</cell><cell>D6</cell><cell>D7</cell><cell>D8</cell><cell>D9</cell></row><row><cell cols="10">Ngrams (1-4 Word) 70(75) 35(62) 91(93) 62(77) 58(78) 12(56) 18(58) 65(78) 33(60)</cell></row><row><cell cols="10">Ngrams (2-4 Char) 75(77) 33(62) 89(93) 66(80) 57(78) 18(59) 18(58) 70(83) 35(62)</cell></row><row><cell>Ngrams+Feat.</cell><cell cols="9">75(77) 40(64) 89(93) 67(80) 57(78) 18(59) 22(60) 73(85) 40(64)</cell></row><row><cell cols="10">Avg. of Glove50 62(68) 13(53) 56(70) 49(68) 60(76) 61(74) 50(68) 75(82) 36(61)</cell></row><row><cell cols="10">Avg. of Glove100 65(71) 30(59) 66(76) 59(74) 58(76) 51(71) 48(68) 77(85) 48(66)</cell></row><row><cell cols="10">Avg. of Glove200 66(72) 31(60) 78(82) 62(76) 60(78) 44(69) 48(69) 78(85) 48(66)</cell></row><row><cell cols="10">Avg. of Glove300 68(72) 33(60) 81(85) 65(77) 54(76) 43(69) 50(70) 74(83) 50(68)</cell></row><row><cell>fastText</cell><cell cols="9">72(75) 42(61) 70(62) 65(76) 58(79) 31(63) 23(58) 47(71) 30(56)</cell></row><row><cell cols="10">fastText+Glove100 73(76) 37(62) 60(76) 46(71) 58(79) 21(60) 21(60) 62(79) 41(65)</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://nlp.stanford.edu/projects/glove/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Modelling context with user embeddings for sarcasm detection in social media</title>
		<author>
			<persName><forename type="first">S</forename><surname>Amir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C M J</forename><surname>Silva</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.00976</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Deep learning for hate speech detection in tweets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Badjatiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on World Wide Web Companion</title>
				<meeting>the 26th International Conference on World Wide Web Companion</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="759" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Harnessing the power of text mining for the detection of abusive content in social media</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mckeever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Delany</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Computational Intelligence Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="187" to="205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Presenting a labelled dataset for real-time detection of abusive user posts</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mckeever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Delany</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Web Intelligence</title>
				<meeting>the International Conference on Web Intelligence</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="884" to="890" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Detecting offensive language in social media to protect adolescent online safety</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="71" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Learning phrase representations using rnn encoder-decoder for statistical machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1406.1078</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Empirical evaluation of gated recurrent neural networks on sequence modeling</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.3555</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Improved cyberbullying detection using gender information</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dadvar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>De Jong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Trieschnigg</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Improving cyberbullying detection with user context</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dadvar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Trieschnigg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jong</forename><surname>De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="693" to="696" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Hate me, hate me not: Hate speech detection on facebook</title>
		<author>
			<persName><forename type="first">F</forename><surname>Del Vigna12</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cimino23</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Petrocchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tesconi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Hate speech detection with comment embeddings</title>
		<author>
			<persName><forename type="first">N</forename><surname>Djuric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Grbovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Radosavljevic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bhamidipati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th International Conference on World Wide Web</title>
				<meeting>the 24th International Conference on World Wide Web</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="29" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Using convolutional neural networks to classify hatespeech</title>
		<author>
			<persName><forename type="first">B</forename><surname>Gambäck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">K</forename><surname>Sikdar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Abusive Language Online</title>
				<meeting>the First Workshop on Abusive Language Online</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="85" to="90" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Detecting online hate speech using context aware models</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Huang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Learning distributed representations of sentences from unlabelled data</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1602.03483</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Text categorization with support vector machines: Learning with many relevant features</title>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine learning</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="page" from="137" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Bag of tricks for efficient text classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.01759</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Convolutional neural networks for sentence classification</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1408.5882</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Skip-thought vectors</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Urtasun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fidler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3294" to="3302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Imagenet classification with deep convolutional neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1097" to="1105" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st ICML-14</title>
				<meeting>the 31st ICML-14</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Do characters abuse more than words?</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Mehdad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Tetreault</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGDIAL Conference</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="299" to="303" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Abusive language detection in online user content</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nobata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mehdad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on World Wide Web</title>
				<meeting>the 25th International Conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="145" to="153" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">One-step and two-step classification for abusive language detection on twitter</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fung</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.01206</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Deep learning for user comment moderation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pavlopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Malakasiotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.09993</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Context-sensitive twitter sentiment classification using neural network</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ji</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="215" to="221" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Improved semantic representations from tree-structured long short-term memory networks</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1503.00075</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Document modeling with gated recurrent neural network for sentiment classification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1422" to="1432" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">liar, liar pants on fire&quot;: A new benchmark dataset for fake news detection</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.00648</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Predicting polarities of tweets by composing word embeddings with long short-term memory</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACL</title>
		<imprint>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1343" to="1353" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Variational autoencoder for semi-supervised text classification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3358" to="3364" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Recent trends in deep learning based natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Young</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hazarika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1708.02709</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Text classification improved by integrating bidirectional lstm with two-dimensional max pooling</title>
		<author>
			<persName><forename type="first">P</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1611.06639</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
