In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy.
      Copyright c 2017 for this paper by its authors. Copying permitted for private and academic purposes.


                            Hate me, hate me not:
                      Hate speech detection on Facebook
               Fabio Del Vigna12 , Andrea Cimino23 , Felice Dell’Orletta3 ,
                     Marinella Petrocchi1 , and Maurizio Tesconi1
                        1
                           Istituto di Informatica e Telematica, CNR, Pisa, Italy.
                         {f.delvigna, m.petrocchi, m.tesconi}@iit.cnr.it
                                      2
                                         University of Pisa, Pisa, Italy
                       3
                         Istituto di Linguistica Computazionale, CNR, Pisa, Italy
                          {andrea.cimino, felice.dellorletta}@ilc.cnr.it

                                                  Abstract
    While favouring communications and easing information sharing, Social Network Sites are also
used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement
to self-harm practices, sexual predation are just some of the severe effects of massive online offensives.
Moreover, attacks can be carried out against groups of victims and can degenerate in physical violence.
In this work, we aim at containing and preventing the alarming diffusion of such hate campaigns.
Using Facebook as a benchmark, we consider the textual content of comments appeared on a set
of public Italian pages. We first propose a variety of hate categories to distinguish the kind of hate.
Crawled comments are then annotated by up to five distinct human annotators, according to the defined
taxonomy. Leveraging morpho-syntactical features, sentiment polarity and word embedding lexicons,
we design and implement two classifiers for the Italian language, based on different learning algorithms:
the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural
Network named Long Short Term Memory (LSTM). We test these two learning algorithms in order to
verify their classification performances on the task of hate speech recognition. The results show the
effectiveness of the two classification approaches tested over the first manually annotated Italian Hate
Speech Corpus of social media text.


1     Introduction
Social Network Sites (SNSs) are an ideal place for Internet users to keep in touch, share infor-
mation about their daily activities and interests, publishing and accessing documents, photos
and videos. SNSs like Facebook, Twitter, Ask.fm and Google+ give the ability to create pro-
files, to have a list of peers to interact with and to post and read what others have posted. It
comes as no surprise that, overall, SNSs - together with search engines - are among the most
visited websites1 .
     Unfortunately, SNSs are also the ideal plaza for proliferation of harmful information. Cy-
berbullying, sexual predation [25], self-harm practices incitement [6] are some of the effective
results of the dissemination of malicious information on SNSs. Many of these attacks are often
carried by a single individual, but they can be also managed by groups. The target of the trolls
are often selected victims but, in some circumstances, the hate can be directed towards wide
groups of individuals, discriminated for some features, like race or gender. Such campaigns may
involve a very large number of haters that are self excited by hateful discussions, and such hate
might end up with physical violence or violent actions.

    1 http://www.alexa.com/topsites - All websites have been lastly accessed on October, 23, 2016.


                                                      86
    Work in [21] characterises the attacker and it provides a definition of trolls, i.e., online users
pretending to sincerely strive for be part of an online community, but whose real intentions are
to cause disruption and exacerbate conflict, for the purposes of their own amusement. Thus,
sexists, religious fanatics, political extremists massively use SNSs to foster hate against specific
individuals/organizations, by causing a sounding board effect, which may critically damage the
targets of the hate campaign, by using both psychological and physical violence. Although
more experienced users could be able to face threats and trolls, the great majority of them
cannot easily bear the attacks, especially minors and those who might get exposed mediatically
to public judgment. Media frequently report evidences about the (unfortunately extreme in
some cases) consequences that naı̈ve and emotive users have faced to2 .
    This work aims at containing and preventing the alarming diffusion of massive online hate
campaigns on SNSs and it focuses on Italian texts. The issue has been tackled in the past
with different approaches, laying somehow in the middle between pre-emption and mending.
One approach aims at mitigating chat conversations through ad hoc filters, like in [38], by
semantically detecting offensive content and removing it. The second approach operates on
published content and tries to remove the offending one, often leveraging the analysis on multiple
messages, as in [3, 7, 36].
    Contributions. Our aim is not censoring online content, as we mostly address its classifica-
tion for the Italian language to pinpoint anomalous waves of hate and disgust. Using Facebook
as a benchmark, we classify the content of comments appeared on a set of public pages. We
contribute along the following dimensions:
    • we design and develop the first hate speech classifier for the Italian language and we com-
      pare two different approaches based on state-of-the-art learning algorithms for sentiment
      analysis tasks;
    • starting from classifying the single comment on a Facebook page, the results proposed in
      this paper constitute the prelude to the detection of violent discussions as a whole. This
      with the ultimate goal to promptly detect waves of hate, which several users may take
      part to, as it happened recently and unfortunately on Facebook pages3,4 ;
    • we introduce a taxonomy of a variety of hate categories, expanding the classes proposed
      in [19] and specifically considering the subject of the hate, like, e.g., hate for religious,
      racial, socio-economical reasons; while not directly employed here in the classification, the
      definition of such taxonomy is a step towards more refined classification tasks.
    Next section introduces the corpus for hate speech detection. Section 3 presents our classi-
fication techniques and reports its performance results. In Section 4, we discuss related work
on detecting textual aggressions on social media. Finally, Section 5 concludes the paper.


2     Hate Speech Corpus
This section reports on the retrieval and annotation phase of our hate speech Italian corpus.


    2 http://osservatorio-cyberbullismo.blogautore.repubblica.it/ (La Repubblica - Italian newspaper

online edition)
    3 https://www.facebook.com/Black-block-430370993643807/ (Facebook page)
    4 https://goo.gl/jYJPoZ (Il Tempo - Italian Newspaper online edition)


                                                 87
2.1    Data crawling
Aiming at monitoring the “hate level” across Facebook, we have built a corpus of comments
retrieved from the Facebook public pages of Italian newspapers, politicians, artists, and groups.
These pages typically host discussions spanning across a variety of topics.
     We have developed a versatile Facebook crawler, which exploits the Graph API5 to retrieve
the content of the comments to Facebook posts. The crawler leverages the flexibility of the
Laravel framework to deploy a wide set of features, like flexibility, code reuse, different storage
strategies and parallel processing. Implemented as a Web service, it can be controlled through a
Web interface or using a cURL command6 . The tool requires a set of registered application keys
and some target pages to crawl. It is capable of storing data in the filesystem either as JSON7
files, or in Kafka8 queues or in Elasticsearch9 indexes. According to the number of application
keys provided to the application, it is able to crawl multiple pages in parallel. Starting from the
most recent post, the crawler collects all the information related to the posts, up to comments
to comments. For the sake of simplicity, in this work we have however limited our analysis to
direct comments to the posts.

2.2    Data annotation
The crawler was used to collect comments related to a series of web pages and groups, chosen
since we suspected to possibly contain hate content.

       Title of Facebook page          Annotated posts      Comments        Annotations
       salviniofficial                              19          5404              15298
       matteorenziufficiale                          2           158                584
       lazanzarar24                                 10           307               1253
       jenusdinazareth                               2           132                460
       sinistracazzateliberta2                       7            79                234
       ilfattoquotidiano                            11           126                135
       emosocazzi                                    4            73                 75
       noiconsalviniufficiale                       14           223                270

                            Table 1: Dataset description and annotations

    Overall, we collected 17,567 Facebook comments from 99 posts crawled from the selected
pages: 6,502 of them have been annotated at least once (spanning over 66 posts) and at most
received 5 annotations from distinct human annotators. We asked to 5 bachelor students to
annotate comments, and the majority of comments received more than one annotation. Stu-
dents annotated 5742, 3870, 4587, 2104 and 2006 comments respectively. In particular, among
the annotated comments, 3,685 received at least 3 annotations. On average, each annotator
annotated about 3,662 comments.
    The annotators were asked to assign one class to each post, where classes span over the
followings levels of hate: No hate, Weak hate, and Strong hate.
    We then divided hate messages into distinct categories: Religion, Physical and/or mental
handicap, Socio-economical status, Politics, Race, Sex and Gender issues, and Other.
   5 https://developers.facebook.com/docs/graph-api
   6 https://curl.haxx.se
   7 http://json.org
   8 http://kafka.apache.org
   9 https://www.elastic.co/products/elasticsearch


                                                88
   Given that the majority of comments has been annotated by more than one annotator, we
have also computed the Fleiss’ kappa κ inter-annotator agreement metric [20], which measures
the level of agreement of different annotators on a task. The level of agreement among anno-
tators conveys the level of the difficulty of a task. In our case, considering the 1,687 comments
that received annotations from all the 5 annotators, we obtain κ = 0.19 when discriminating
over three hate classes, while κ = 0.26 over two classes (where Strong Hate and Weak Hate have
been merged together). Such low κ values testify that the annotation task was really hard for
our students.


3       Text Classification
This section describes the classification approaches and gives their results. On the annotated
dataset, we compute a series of features, described in detail in the following. A series of lexicons
used to derive part of the features is described in Section 3.1.1. Comments in our dataset are
then represented as a vector of features, given as input to the classifier, along with the result
of the annotation. In the training phase, the classifier learns to classify a comment according
to the values of its features and the annotation result. In the test phase, the classifier takes
decision and tags comments as expressing hatred or not, according to the learned model.

3.1      The classifier
We tested two different classifiers based on different learning algorithms: the first one based on
Support Vector Machines (SVM) and the second one on a particular Recurrent Neural Network
named Long Short Term Memory (LSTM). While SVM is an extremely strong performer, hardly
to be transcended, unfortunately these types of algorithms capture “sparse” and “discrete” fea-
tures in document classification tasks. This makes the detection of relations in sentences really
hard, while this is often the key factor in detecting the overall sentiment polarity of a docu-
ment [33]. On the contrary, LSTM networks are a specialization of Recurrent Neural Networks
(RNN), which are able to capture long-term dependencies in a sentence. This type of neural
network has been recently tested on sentiment analysis tasks [33, 37], reaching outperforming
classification performance [29], with even a 3-4 points improvement with respect to commonly
used learning algorithms. In this work, we use the Keras [8] deep learning framework and
LIBSVM [5] to generate, respectively, the LSTM and the SVM statistical models. Since our
approach relies on morpho-syntactically tagged texts, the hate speech corpus was automatically
morpho-syntactically tagged by the Part-Of-Speech tagger described in [14].

3.1.1     Lexical resources
To improve the overall accuracy of our system, we used both sentiment polarity and word
embedding lexicons. Sentiment polarity lexicons10 were already successfully tested for the clas-
sification of positive, negative and neutral sentiment of Italian social media posts [2]. We
used the ones described in [9] which include a manually created lexicon for Italian [35], two
automatically translated sentiment polarity lexicons originally created for English [23, 35], an
automatically created Twitter sentiment polarity lexicon and two word similarity lexicons au-
tomatically created using word2vec11 [28], starting from two Italian corpora: (i) PAISÀ [26], a


    10 We downloaded the lexicons from www.italianlp.it
    11 http://code.google.com/p/word2vec/


                                                   89
large corpus of authentic contemporary Italian texts; and (ii) a lemmatized corpus of 1,200,000
tweets, automatically collected.
    In addition to these resources, we created two Word Embedding lexicons, to overcome the
issue that lexical information in a short text can be very sparse. For this purpose, we trained
two predict models using word2vec. These models learn lower-dimensional word embeddings.
Embeddings are represented by a set of latent (hidden) variables and each word is a multidimen-
sional vector that represent a specific instantiation of these variables. The first lexicon was built
using a tokenized version of the itWaC corpus12 . The itWaC corpus is a 2 billion word corpus
constructed from the Web, limiting the crawl to the .it domain and using medium-frequency
words from La Repubblica corpus and basic Italian vocabulary lists as seeds. The second lexicon
was built from a tokenized corpus of tweets. This corpus was collected using the Twitter APIs
and it is made up of 10,700,781 Italian tweets.

3.1.2   The SVM classifier
The SVM classifier exploits a wide set of features, ranging across different levels of linguistic
description. With the exception of the word embedding combination, these features have been
already used in sentiment polarity classification tasks [9] showing their effectiveness. The fea-
tures are organised into three main categories: raw and lexical text features, morpho-syntactic
and syntactic features, and lexicon features.

Raw and Lexical Text Features. Number of tokens: number of tokens occurring in the
analyzed text; Character n-grams: presence or absence of contiguous sequences of characters
in the analyzed text. Word n-grams: presence or absence of contiguous sequences of tokens
in the analyzed text. Lemma n-grams: presence or absence of contiguous sequences of lemma
occurring in the analyzed text. Repetition of n-grams chars: presence or absence of contiguous
repetition of characters in the analyzed text. Punctuation: checks whether the analyzed text
finishes with one of the following punctuation characters: “?”, “!”.

Morpho-syntactic and Syntactic Features. Coarse grained Part-Of-Speech n-grams:
presence or absence of contiguous sequences of coarsegrained PoS, corresponding to the main
grammatical categories (noun, verb, adjective). Coarse grained Part-Of-Speech distribution: the
distribution of nouns, adjectives, adverbs, numbers in the text. Fine grained Part-Of-Speech
n-grams: presence or absence of contiguous sequences of fine-grained PoS, which represents
subdivisions of the coarse-grained tags (e.g., the class of nouns is subdividefalice d into proper
vs common nouns, verbs into main verbs, gerund forms, past particles). Dependency types
n-grams: presence or absence of sequences of dependency types in the analyzed text. The
dependencies are calculated with respect to i) the hierarchical parse tree structure and ii) the
surface linear ordering of words Lexical Dependency n-grams: presence or absence of sequences
of lemmas calculated with respect to the hierarchical parse tree. Lexical Dependency Triplet
n-grams: distribution of lexical dependency triplets, where a triplet represents a dependency
relation as (ld, lh, t), where ld is the lemma of the dependent, lh is the lemma of the syntactic
head and t is the relation type linking the two. Coarse Grained Part-Of-Speech Dependency
n-grams: presence or absence of sequences of coarse-grained Part-of-Speech, calculated with
respect to the hierarchical parse tree. Coarse Grained Part-Of-Speech Dependency Triplet n-
grams: distribution of coarse-grained Part-of-Speech dependency triplets, where a triplet rep-
resents a dependency relation as (cd, ch, t), where cd is the coarse-grained Part-of-Speech of the
  12 http://wacky.sslmit.unibo.it/doku.php?id=corpora


                                                 90
dependent, h is the coarse-grained Part-of-Speech of the syntactic head and t is the relation
type linking the two.

Lexicon features. Lemma sentiment polarity n-grams: for each n-gram of lemmas extracted
from the analyzed text, the feature checks the polarity of each component lemma in the existing
sentiment polarity lexicons. Lemmas that are not present are marked with the ABSENT tag.
This is for example the case of the trigram “tutto molto bello” (all very nice) that is marked
as “ABSENT -POS -POS ” because molto and bello are marked as positive in the considered
polarity lexicon and tutto is absent. The feature is computed for each existing sentiment polarity
lexicons. Emoticons: presence or absence of positive or negative emoticons in the analyzed text.
The lexicon of emoticons was extracted from http://it.wikipedia.org/wiki/Emoticon and
manually classified. Polarity modifier: for each lemma in the text occurring in the sentiment
polarity lexicons, the feature checks the presence of adjectives or adverbs in a left context
window of size two. If this is the case, the polarity of the lemma is assigned to the modifier. This
is for example the case of the bigram “non interessante” (not interesting), where “interessante”
is a positive word, and “non” is an adverb. Accordingly, the feature “non POS” is created. The
feature is computed 3 times, checking all the existing sentiment polarity lexicons. PMI score: for
each set of unigrams, bigrams, trigrams, four-grams  P and five-grams that occur in the analyzed
text, the feature computes the score given by i–gram∈text score(i–gram) and it returns the
minimum and the maximum values of the five values (approximated to the nearest integer).
Distribution of sentiment polarity: this feature computes the percentage of positive, negative
and neutral lemmas that occur in the text. To overcome the sparsity problem, the percentages
are rounded to the nearest multiple of 5. The feature is computed for each existing lexicon.
Most frequent sentiment polarity: the feature returns the most frequent sentiment polarity of
the lemmas in the analyzed text. The feature is computed for each existing lexicon. Sentiment
polarity in text sections: the feature first splits the text in three equal sections. For each section,
the most frequent polarity is computed using the available sentiment polarity lexicons. The
purpose of this feature is identifying changes of polarity within the same text. Word embeddings
combination: the feature returns the vectors obtained by computing separately the average of
the word embeddings of the nouns, adjectives and verbs of the text. It has been computed once
for each word embedding lexicon.

3.1.3   The LSTM classifier
The LSTM unit was initially proposed by Hochreiter and Schmidhuber [22]. LSTM units are
able to propagate an important feature that came early in the input sequence over a long
distance, thus capturing potential long-distance dependencies. LSTM is a state-of-the-art per-
former for semantic composition and it allows to compute the representation of a document
from the representation of its words, with multiple abstraction levels. Each word is represented
by a low dimensional, continuous and real-valued vector.
    We employed a bidirectional LSTM architecture since it allows to capture long-range de-
pendencies from both the directions of a document by constructing bidirectional links in the
network [31]. In addition, we applied a dropout factor to both the input gates and to the recur-
rent connections in order to prevent overfitting which is a typical issue of neural networks [17].
As suggested in [17], we have chosen a dropout factor value in the optimum range [0.3, 0.5], more
specifically 0.45 for this work. Concerning the optimization process, categorical cross-entropy
is used as a loss function and optimization was performed by the rmsprop optimizer [34].
    To train the LSTM architecture, each input word in the text is represented by a 262-
dimensional vector which is composed by:


                                                  91
Word embeddings: the concatenation of the two word embeddings extracted by the two available
Word Embedding lexicons (128 components for each word embedding, thus resulting in a total
of 256 components), and for each word embedding an extra component was added in order
to handle the ”unknown word”. Word polarity: the corresponding word polarity obtained
by exploiting the Sentiment Polarity lexicons. This feature adds 3 extra components in the
resulting vector, one for each possible outcome in the lexicons (negative, neutral, positive). We
assumed that a word not found in the lexicons has a neutral polarity. End of Sentence: a
component indicating whether or not the sentence was totally read.


3.2    Experiments and Results
We conducted two different classification experiments: the first considering the three different
category of hate (Strong hate, Weak hate and No hate) the second considering only two cate-
gories, No hate and Hate, where the last category was obtained by merging the Strong hate and
Weak hate classes.
    For the experiments, we used only documents that were annotated at least by three different
annotators and where the most annotated class exists. This process resulted in two datasets:
the three-class dataset, composed by 3,356 documents - divided into 2,816 No hate, 410 Weak
hate and 130 Strong hate documents, and the two-class dataset, composed by 3,575 documents
- divided into 2,789 No hate and 786 Hate. To balance the datasets, we selected a subset of the
No hate texts, which was limited to the double size of the documents belonging to the Weak
hate class in the three-class experiment and to the double size of the Hate class in the two-class
one. To evaluate the accuracy of the two hate speech classifiers in the two experiments, we
followed a 10-fold cross validation process: each dataset was randomly split in ten different non
overlapping training and test sets. The overall Accuracy, Precision, Recall and F-score for each
class were calculated as the average of the these values over all the ten test sets. Accuracy,
Precision, Recall and F-score are evaluation metrics employed in standard classification tasks.
In our scenario: Accuracy measures how many comments are correctly identified in the classes;
Precision measures how many comments, among those classified as expressing hate, have been
correctly identified; Recall expresses how many comments, in the whole set, have been correctly
recognized: a low recall means that many relevant comments are left unidentified and F-score
is the armonic mean of Precision and Recall.
    Table 2 reports the results for the three-class experiment. Both SVM and LSTM are not
able to discriminate between the three classes and this is particularly true for the Strong hate
one. These results may be due to the small number of Strong hate documents (that is the class
with the lowest number of documents) and the low level of annotator agreement. These results
lead us to conduct the two-class experiment, whose accuracies are in Table 3. As we expected,
the results are much more higher than those in the previous experiment. This is probably due
to the higher number of Hate documents with respect to the Strong e Weak classes and to the
higher annotator agreement with respect to the three-class experiments.
    To evaluate the impact of the annotator agreement in the classification performances, we
performed a last experiment, where we selected the documents for which at least 70% of the
annotators were in agreement (321 Hate and 642 No-Hate documents). As Table 3 shows,
the more agreement yields an increasing accuracy for both the classification algorithms. This
improvement is particularly significant for the classification of the Hate class, with F-score of
about 72%. These results pave the way to the employment of our system in a real-use context.
In addition, the outcome shows that this Hate Speech corpus, filtered with respect to the
annotator agreement, allows to build automatic hate speech classifiers able to achieve accuracy


                                               92
in line with the ones obtained in mostly investigated sentiment analysis tasks for Italian, such
as subjectivity and polarity classification [2].

                                    Strong hate                Weak hate                   No hate
 Classifier    Accuracy (%)     Prec. Rec. F-score     Prec.    Rec. F-score       Prec.    Rec. F-score
 SVM              64.61         .452   .189   .256     .523     .525    .519       .724     .794   .757
 LSTM             60.50         .501   .054   .097     .434     .159    .221       .618     .950   .747

    Table 2: Ten-fold cross validation results on Strong hate, Weak hate and No hate classes.


                                                  Hate                     No hate
              Classifier   Accuracy (%)   Prec.   Rec.  F-score    Prec.    Rec. F-score
              SVM             72.95       .625    .568   .594      .778     .817   .797
              LSTM            75.23       .640   .6832   .657      .824     .791   .805
                                          ≥ 70% of Agreement
              SVM             80.60       .757    .689   .718       .833    .872     .851
              LSTM            79.81       .706    .758   .728       .859    .822     .838

              Table 3: Ten-fold cross validation results on Hate and No hate classes.


4     Related Work
Here, we briefly revise academic work on trolls and hate speech detection. Interestingly, the
connections between the users profiles on SNSs are often strictly related to the connections in
their real life [16]. Using machine learning, it was possible to recognize those users that adopt
troll profiles in cyberbullying practices [4, 13, 18]. Similarly, text analysis approaches have been
used to link together the contents of anonymous users among different opinion websites [1].
Relationships based on profile connections and behaviors have been exploited to effectively
identify fake Facebook profiles [10], while lightweight profile features succeeded in recognising
fake Twitter followers [11, 12]. Regarding text classification for automatic hate speech detection,
a seminal work is in [32]. In [27], the authors propose a rule-based classifier to distinguish
between legitimate and abusive information in texts. PALADIN [24] is a pattern mining tool to
mine patterns of language, to detect anti-social behaviors of users. The authors of [36] focus on
Twitter and propose a semi-supervised approach and statistical topic modeling for the detection
of offensive content, while work in [3] presents a supervised machine learning text classifier,
trained and tested to distinguish between hateful and antagonistic responses, with a focus on
race, ethnicity and religion. Work in [15] adopts neural language models to learn distributed
low-dimensional representations of comments. The approach generates text embeddings that
can be used to feed a classifier. The authors of in [19] describe the distinction among flame
and hate speech (the latter being more directed to groups, rather than individuals). The same
work proposes the three level hate classification adopted in this paper (partially suffering for
the low IAA too). Some studies concentrate on the users’ behaviour. Authors of [30] propose
a reputation system, which tracks reputation of users using positive and negative opinions. A
behavioral analysis of banned users is in [7], showing a certain degree of similarity in their texts,
containing often irrelevant content too.


                                                  93
5     Conclusions
This paper introduced the first hate speech classifier for Italian texts. Considering a binary
classification, the classifier achieved results comparable with those obtained in mostly investi-
gated sentiment analysis tasks for Italian. Encouraged by such promising outcome, we leave
for future work the refinement of the classifier results when considering distinction (i) among
hate levels (whereas the current classifier fails to achieve satisfactory results) and (ii) among
different types of hate (which we defined in the paper and worked with at the annotation level).
We will carry out a thorough analysis of the results of the two classifiers to investigate whether
they can be combined in order to increase the performance. In addition, we are enlarging the
annotation process, both to increase the corpus size and to collect more annotations for a single
comment. We are testing new annotation methods, evaluating the inter-annotator agreement
for validating the annotation on the different degrees of hate. Besides, we will investigate the
effect of sarcasm on the classifier performance. From the classification of single comments, the
hate classifier may evolve to detect bursts of hate, thus preventing that virtual discussions give
rise to severe injures to people and assets. Given that human moderators cannot monitor the
huge user generated texts on social networks, we believe this work represents the basis to track
divergent states of Italian texts in online conversations.
Acknowledgements. The authors would like to thank Salvatore Bellomo and Serena Tardelli for
their actionable support.


References
 [1] Mishari Almishari and Gene Tsudik. Exploring linkability of user reviews. In ESORICS, pages
     307–324. Springer Berlin Heidelberg, 2012.
 [2] Valerio Basile, Andrea Bolioli, Malvina Nissim, Viviana Patti, and Paolo Rosso. Overview of the
     Evalita 2014 sentiment polarity classification task. In EVALITA, 2014.
 [3] Peter Burnap and Matthew Leighton Williams. Hate speech, machine classification and statistical
     modelling of information flows on Twitter. In Internet, Policy and Politics, 2014.
 [4] Erik Cambria et al. Do not feel the trolls. In Semantic Web, 2010.
 [5] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM
     Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.
 [6] S. Chattopadhyay et al. Suicidal risk evaluation using a similarity-based classifier. In Advanced
     Data Mining and Applications, pages 51–61. Springer Berlin Heidelberg, 2008.
 [7] Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Antisocial behavior in online
     discussion communities. arXiv preprint arXiv:1504.00680, 2015.
 [8] François Chollet. Keras. https://github.com/fchollet/keras, 2015.
 [9] Andrea Cimino, Stefano Cresci, Felice Dell’Orletta, and Maurizio. Tesconi. Linguistically-
     motivated and lexicon features for sentiment analysis of italian tweets. In EVALITA, 2014.
[10] Mauro Conti, Radha Poovendran, and Marco Secchiero. FakeBook: Detecting fake profiles in
     on-line social networks. In Social Networks Analysis and Mining, pages 1071–1078. IEEE, 2012.
[11] Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi.
     A criticism to society (as seen by twitter analytics). In ICDCS Workshops, pages 194–200, 2014.
[12] Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi.
     Fame for sale: Efficient detection of fake twitter followers. Decision Support Sys., 80:56–71, 2015.
[13] Maral Dadvar et al. Improving cyberbullying detection with user context. In Advances in Infor-
     mation Retrieval, pages 693–696. Springer, 2013.
[14] Felice Dell’Orletta. Ensemble system for part-of-speech tagging. Proceedings of EVALITA, 2009.


                                                   94
[15] Nemanja Djuric et al. Hate speech detection with comment embeddings. In 24th International
     Conference on World Wide Web, pages 29–30. ACM, 2015.
[16] Nicole B. Ellison and Danah M. Boyd. Sociality through social network sites. In The Oxford
     Handbook of Internet Studies. Oxford Handbooks Online, 2013.
[17] Yarin Gal. A theoretically grounded application of dropout in recurrent neural networks. arXiv
     preprint arXiv:1512.05287, 2015.
[18] Patxi Galán-Garcı́a et al. Supervised machine learning for the detection of troll profiles in Twitter
     social network. In Joint Conf. Soco-Cisis-Iceute, pages 419–428. Springer, 2014.
[19] Njagi Dennis Gitari et al. A lexicon-based approach for hate speech detection. Multimedia and
     Ubiquitous Engineering, 10(4):215–230, 2015.
[20] Kilem L Gwet. Handbook of inter-rater reliability: The definitive guide to measuring the extent of
     agreement among raters. Advanced Analytics, LLC, 2014.
[21] Claire Hardaker. Trolling in asynchronous computer-mediated communication: From user discus-
     sions to academic definitions. Politeness Research, 6(2):215–242, 2010.
[22] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation,
     9(8):1735–1780, 1997.
[23] Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Knowledge discovery
     and data mining, pages 168–177. ACM, 2004.
[24] Ralf Klamma, Marc Spaniol, and Dimitar Denev. Paladin: A pattern based approach to knowledge
     discovery in digital social networks. In I-KNOW, volume 6, pages 6–8, 2006.
[25] April Kontostathis. Chatcoder: Toward the tracking and categorization of internet predators. In
     Text Mining Workshop of Siam Data Mining (SDM), 2009.
[26] Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice
     Dell’Orletta, Henrik Dittmann, Alessandro Lenci, and Vito Pirrelli. The PAISA corpus of Italian
     web texts. In Web as Corpus Workshop (WaC-9), 2014.
[27] Altaf Mahmud, Kazi Zubair Ahmed, and Mumit Khan. Detecting flames and insults in text. In
     Natural Language Processing, 2008.
[28] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word repre-
     sentations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[29] Preslav Nakov et al. Semeval-2016 task 4: Sentiment analysis in Twitter. In SemEval@NAACL-
     HLT 2016, pages 1–18, 2016.
[30] F Javier Ortega. Detection of dishonest behaviors in on-line networks using graph-based ranking
     techniques. AI Communications, 26(3):327–329, 2013.
[31] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions
     on Signal Processing, 45(11):2673–2681, 1997.
[32] Ellen Spertus. Smokey: Automatic recognition of hostile messages. In AAAI/IAAI, 1997.
[33] Duyu Tang et al. Document modeling with gated recurrent neural network for sentiment classifi-
     cation. In Empirical Methods in Natural Language Processing, pages 1422–1432, 2015.
[34] T. Tieleman and G. Hinton. Lecture 6.5—RmsProp: Divide the gradient by a running average of
     its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
[35] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Recognizing contextual polarity in phrase-
     level sentiment analysis. In HLT/EMNLP, pages 347–354. ACL, 2005.
[36] Guang Xiang et al. Detecting offensive tweets via topical feature discovery over a large scale
     Twitter corpus. In Information and Knowledge Management, pages 1980–1984. ACM, 2012.
[37] XingYi Xu et al. UNIMELB at SemEval-2016 tasks 4a and 4b: An ensemble of neural networks
     and a Word2Vec based model for sentiment classification. In SemEval, 2016.
[38] Zhi Xu and Sencun Zhu. Filtering offensive language in online communities using grammatical
     relations. In Collaboration, Electronic Messaging, Anti-Abuse and Spam, pages 1–10, 2010.


                                                    95