<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UO @ HaSpeeDe2: Ensemble Model for Italian Hate Speech Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariano Jason Rodriguez Cisnero</string-name>
          <email>mjasoncuba@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reynier Ortega Bueno</string-name>
          <email>reynier@uo.edu.cu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad de Oriente</institution>
          ,
          <addr-line>Santiago de Cuba</addr-line>
          ,
          <country country="CU">Cuba</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This document describes our participation in the Hate Speech Detection task at Evalita 2020. Our system is based on deep learning techniques, specifically RNNs and attention mechanism, mixed with transformer representations and linguistic features. In the training process a multi task learning was used to increase the system effectiveness. The results show how some of the selected features were not a good combination within the model. Nevertheless, the generalization level achieved yield encourage results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Modern societies found easy and interesting ways
for sharing information via Social Media. Users
discover freedom to express themselves through
online communication. Even if the ability to freely
express oneself is a human right, some users take
this opportunity to spread hateful content. A
dangerous and hurtful potential arises with this kind
of information. Recognizing automatically such
content is an interesting topic for researchers.</p>
      <p>
        Creative methods have been proposed to tackle
the fascinating task of recognizing hate in texts
        <xref ref-type="bibr" rid="ref10 ref19 ref7 ref8">(De la Pena Sarrace´n et al., 2018; Gamba¨ck and
Sikdar, 2017)</xref>
        . Some of those works face the
problem using feature extraction
        <xref ref-type="bibr" rid="ref10 ref19">(Schmidt and
Wiegand, 2017)</xref>
        and classification algorithms like
SVM
        <xref ref-type="bibr" rid="ref18">(Santucci et al., 2018)</xref>
        . In the last years,
Deep Learning approaches have become one of
the most successful research areas in Natural
Language Processing (NLP). There are exciting
investigations about this topic, such as
        <xref ref-type="bibr" rid="ref7">(Cimino et al.,
2018)</xref>
        , involving LSTM
        <xref ref-type="bibr" rid="ref15">(Liu and Guo, 2019)</xref>
        and
transformers
        <xref ref-type="bibr" rid="ref21">(Vaswani et al., 2017)</xref>
        that gain
attention in NLP community due to their results.
      </p>
      <p>We propose a model based on multiple
representations learned by means of deep learning
techniques and linguistic knowledge. Particularly a
Long Short Term Memory architecture mixed with
linguistic features and language model
representations given by a special kind of transformer model,
BERT.</p>
      <p>The paper is organized as follows. The
Section 2 introduces a brief description of HaSpeeDe
Task. Our hate detection system is presented
in Section 3. The experiments and results
are discussed in Section 4. Finally, in
Section 5 the conclusions and future directions
are given. The code of this work is
available on GitHub: https://github.com/
mjason98/evalita20_hate
2</p>
    </sec>
    <sec id="sec-2">
      <title>HaSpeeDe2 Task</title>
      <p>
        Hate speech and stereotypes recognition on
social media have become an attractive research area
from the computational point of view. In the
second edition of HaSpeeDe
        <xref ref-type="bibr" rid="ref17">(Sanguinetti et al., 2020)</xref>
        at Evalita 2020
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2020)</xref>
        , the
organizers proposed to address three subtasks. The main
subtask is the subtask A, which aims at
determining the presence or absence of hateful content in a
text. The dataset is composed by 6839 short texts,
2766 labeled as hate speech and 4076 as not hate
speech. In this work we focused only on subtask
A. The subtask B consists of a binary classification
problem oriented to stereotypes’ detection. The
last subtask C is a sequence labeling task aims at
recognizing Nominal Utterances in hateful tweets.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Proposal</title>
      <p>
        We dealt with hate detection task as a text
classification problem to classify “hateful” or “no
hateful” categories. We train a deep learning model
based on attention mechanism and Recurrent
Neural Networks, specifically a Bidirectional Long
Short Term Memory (Bi-LSTM)
        <xref ref-type="bibr" rid="ref11">(Hochreiter and
Schmidhuber, 1997)</xref>
        mixed with linguistic
features and transformers representations by means
of an interpretable multi-source fusion component
        <xref ref-type="bibr" rid="ref12">(Karimi et al., 2018)</xref>
        .
      </p>
      <p>In Section 3.1 and Section 3.2 we describe the
linguistic features and the transformer
representation used in this work. The Section 3.3 presents
the preprocessing phase. Finally, the neural
network model and the feature ensemble are
described in Section 3.4.
3.1</p>
      <sec id="sec-3-1">
        <title>Linguistic Feature</title>
        <p>To build the hate detection model, we start by
extracting several sets of linguistic features:</p>
        <p>WordNet Features: We count the number
of verbs, adverbs, nouns and adjectives. Also,
for every word, we calculated the average of its
similarity with respect to the others using the
similarity path function provided by the
wordnet2 corpus. Furthermore, we consider the degree
of lexical ambiguity by counting the number of
synsets of each word within the text.</p>
        <p>
          Hurt and Sentiment content: HurtLex
          <xref ref-type="bibr" rid="ref3">(Bassignana et al., 2018)</xref>
          is a lexicon of
offensive, aggressive, and hateful words in over 50
languages. The words according to the 17 categories
offered by the lexicon are counted and added as
linguistic features jointly with polarity and
semantic values obtained from SenticNet
          <xref ref-type="bibr" rid="ref5">(Cambria et al.,
2018)</xref>
          corpus.
        </p>
        <p>
          Information Gain: Information gain
          <xref ref-type="bibr" rid="ref14">(Lewis,
1992)</xref>
          had been a good feature selection measure
for text categorization. It takes into account the
presence of the term in a category as well as its
absence and can be defined by:
        </p>
        <p>IG(tk; Ci) = X</p>
        <p>C</p>
        <p>p(t; C)
X p(t; C) log2 p(t) p(C)</p>
        <p>t
where C 2 fCi; Cig and t 2 ftk; tkg. In this
formula, probabilities are interpreted on an event
space of documents, where p(tk; Ci) is the
probability that, for a random document d, term tk does
not occur in d who belongs to category Ci. In our
case, categories were two: hateful and no hateful,
and the term is the word’s lemma.</p>
        <p>2The wordnet came from the python library nltk</p>
        <p>To create the information gain feature (IgF), we
calculated the IG for every word and the highest
ones are chosen3. Then, the occurrence of those
selected words in the text are counted.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Italian BERT</title>
        <p>
          Finally, we use a pre-trained BERT4 to
accomplish the calculation of a deep representation of
the text. One of the most widely used
autoencoding pre-trained Language Models (PLMs) is
BERT
          <xref ref-type="bibr" rid="ref9">(Devlin et al., 2018)</xref>
          . BERT is trained
using the masked language modeling task that
randomly masks some tokens in a text sequence, and
then independently recovers the masked tokens by
conditioning on the encoding vectors obtained by
a bidirectional Transformer.
        </p>
        <p>Inside BERT, the information is passed forward
crosswise transformer layers. In this work, we
used a specific output from one of those layers,
this operation can be expressed by:
h0 = Hl0(texttok)
hi = Hli(hi 1)
hn = Hln(hn 1)
where texttok is the text after its tokenization5,
hi is the output of the ith transformer layer(Hli)
called hidden state and n is the total transformer
layers in BERT. Then, for an specific i, from the
tensor of order 2 hi it is computed the vector fbert,
as a deep representation of the initial text who will
act as PLM feature.
fbert =</p>
        <p>v
jjvjj
3.3
v = X hi[k; :]</p>
        <p>k=0</p>
      </sec>
      <sec id="sec-3-3">
        <title>Preprocessing</title>
        <p>In the preprocessing step, firstly stopwords were
removed . Then, the hashtags composed of many
words are split (e.g: #NessunDorma becomes #
nessun dorma). We use a regular expression6
algorithm to archive this step.</p>
        <p>
          Secondly, using the FreeLing7 tool we obtain
for each word it lemma, and non alphanumeric
characters are removed. Finally, the remaining
words are represented as vectors using a
pretrained word embedding generated by Word2Vec
model
          <xref ref-type="bibr" rid="ref16">(Mikolov et al., 2013)</xref>
          .
        </p>
        <p>3We selected the top 50 words with highest IG value.
4https://huggingface.co/dbmdz/bert-base-italian-cased
5The text is represented as a vector of integers using the
tokenizer function in BERT Model</p>
        <p>6The automaton was created using the re library from
python and the words from an italian corpus.</p>
        <p>7http://nlp.lsi.upc.edu/freeling/index.php
The standard LSTM receives sequentially at each
time step a vector xt and produces a hidden state
ht. Each hidden state ht is calculated as follow:
it = (W (i)xt + U (i)ht 1 + b(i))
ft = (W (f)xt + U (f)ht 1 + b(f))
ot = (W (o)xt + U (o)ht 1 + b(o))
ut = (W (u)xt + U (u)ht 1 + b(u))
ct = i + t
ht = ot
+ft</p>
        <p>ct 1
tanh(ct)
(1)</p>
        <p>Where all W ( ) , U ( ) and b( ) are parameters
to be learned during training. Function is the
sigmoid function and stands for element-wise
multiplication.</p>
        <p>Bidirectional LSTM, on the other hand, makes
the same operations as standard LSTM but,
processes the incoming text in a left-to-right and
a right-to-left order in parallel. Thus, it output
become h^t = [!ht ; ht ] for the two directions.</p>
        <p>By adding an attention mechanism, we allow
the model to decide which part of the sequence
“attends to”. First, lets define the softmax function
(v) for a vector v = [v0; ; vn 1] as:
(v) =</p>
        <p>ev</p>
        <p>Pi=0 evi</p>
        <p>Then, let I 2 RN L be the matrix of input
vectors, where L the size of then and N the length of
the given sequence. We define the attention layer
(AttLSTM), as a regular LSTM layer like (1) with
extra operations described as follow:</p>
        <p>T
ak;t = (Wk ht 1 + bk)
t = [ 0;t;
; S 1;t]</p>
        <p>T
k;t = ak;t I
xt = Wa
i + ba
(2)
Here k 2 [0; S 1] represents the number of
attention’s heads, Wk 2 RN M where M is the
size of the hidden state vector ht, Wa 2 RM SM ,
ba and bk are learnable parameters. The ( )T is
the transpose operation and the output of the layer
is O = [h0; :::; ht; :::; hN ], a concatenation of the
hidden states produced by the AttLSTM at each
time step.</p>
        <p>As mentioned before, we propose a feature
ensemble by using an interpretable multi-source
fusion component (IMF). The IMF aims to combine
features from different sources. A naive way of
doing this is concatenating the vector
representations into a single vector. This scheme considers
all sources equally, but one source may yield a
better result than others. With IMF we propose to
consider the contribution of every source of
feature via an attention mechanism. The IMF can be
expressed by:</p>
        <p>ri = tanh(Wpi fi + bpi )
where ri represents a projection of fi, the ith
feature vector passed to IMF ensuring that every ri
have the same size. In this step, all the Wpi , bpi ,
Wa and ba are parameters to be learned during
training, then:
ai = Wari + ba
i = iri
i = (ai)
z = X</p>
        <p>k
k=0
where i represents the importance of ri to the
final calculation of z, the IMF outcome.</p>
        <p>
          To increase the learning power of our system,
we used a multitask learning
          <xref ref-type="bibr" rid="ref6">(Caruana, 1997)</xref>
          in
which we predict the polarity of tweets in parallel
with the classes of the hate speech detection
subtask. This approach have been developed before
          <xref ref-type="bibr" rid="ref7">(Cimino et al., 2018)</xref>
          in HaSpeede at Evalita 2018
          <xref ref-type="bibr" rid="ref4">(Bosco et al., 2018)</xref>
          . The tweets used to
accomplish the multitask learning are extracted from the
Sentipolc-2016
          <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2016)</xref>
          challenge.
        </p>
        <p>Finally we present the composition of the
previous layers and features to create our deep
ensemble model:</p>
        <p>E = [w0; w1;</p>
        <p>; wN 1]
ob1 = BiLST M (E)
where E represents the vector representation of
the text, see Section 3.3. Equation (4) is the first
block of our model, and the second block can be
described as follow:
(3)
(4)
(5)
A = AttLST M (ob1)
mi =</p>
        <p>max
j=0; ;N 1</p>
        <p>Aj;i
ob2 = [m0;
; mM 1]
The vector ob2 is the return of a MaxPool layer
over the A vector sequence, then:</p>
        <p>F = [ob2; fbert; fwn; fhs; fig]
ob3 = IM F (F )</p>
        <p>y^ = (Whob3 + bh)
y^f = (Wf ob3 + bf )
(6)</p>
        <p>The third block is described in (6) where Wh,
Wf , bf and bh are learnable parameters and
y^; y^f 2 R. The vectors fbert, fwn, fhs and fig
correspond to the BERT, WordNet, Hurt-Sentiment
and Information Gain features respectively. The
prediction of the tweets polarity is determined by
the y^f value and the hate value trough y^.</p>
        <p>The overall weighted loss of the model is
calculated by cross-entropy, with higher importance
value for the hate speech predictions that polarity
predictions. The overall loss is calculated
according to the following formula.</p>
        <p>L1 =</p>
        <p>X yi log(y^i)
loss =</p>
        <p>L1 + (1
)L2</p>
        <p>L2 =
(0</p>
        <p>X yfi log(y^fi )
1)
(7)</p>
        <p>Here L1 and L2 are the cross-entropy loss of
hate predictions and sentiment polarity predictions
respectively. The value is the main task
importance weight. The values yi and yfi represents the
ground true hate classification and polarity
classification respectively. Then, the final loss is
obtained as a convex sum of L1 and L2.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>In this section we show the results of our proposed
method in subtask A and discuss about them. The
organizers allow a maximum of two submissions
for every subtask in the challenge. We named our
team UO.</p>
      <p>Experiments where conducted in two main
directions: Firstly, to investigate the impact of the
IMF fusion strategy and secondly, to evaluate the
impact of each proposed single-modal
representation into our proposal. The results of our
experiments are presented in Table 1 and Table 2.</p>
      <p>In those tables, the column named heads is
the number of attention headers in the Att-LSTM
layer. If this space is empty, this layer was not
used. Columns bert and ig correspond to the
presence or not of BERT and IG representations.
The column wn-hs express the presence of
HurtSentiment and WordNet based representations. If
a cell has a cross, the representation associated
acc
0.764386
0.742690
0.767544
0.713450
0.763158
0.757310
0.724152
0.755848
to the column were not used in the corresponding
run. We used a 10% of the training dataset for
validation. We report the accuracy measure computed
on this validation data.</p>
      <p>Both Tables show that the presence of BERT
increase the performance, also almost all the runs
have higher values with IMF in contrast to not
using it. Increasing the number of attention heads
without IMF increase the results, but the opposite
occurs in the presence of the IMF.</p>
      <p>
        The pretrained embedding have a size of 300,
the number of neurons in the Bi-LSTM and in the
AttLSTM was 128. The value was equal to 0.75
and the dropout
        <xref ref-type="bibr" rid="ref20">(Srivastava et al., 2014)</xref>
        after the
embedding layer was 0.3. The optimizer algorithm
to train the whole model was Adam
        <xref ref-type="bibr" rid="ref13">(Kingma and
Ba, 2015)</xref>
        , with a learning rate of 0.01.
      </p>
      <p>The bold models in Table 2 were chosen as final
submission for the subtask. The run1 uses the
attention layer proposed in Section 3.2 and consider
all proposed representations. The run2 does not
use attention mechanism and handcraft features,
using only the BERT text representation and the
rest of the architecture.</p>
      <p>The Table 3 shows the official results of our
system. The evaluation was performed on two distinct
corpora: one conformed by tweets and the other by
news headlines.</p>
      <sec id="sec-4-1">
        <title>Runs</title>
        <p>UO:tweets run1
UO:tweets run2
BEST RATED:tweets
UO:news run1
UO:news run2
BEST RATED:news</p>
        <p>These results show that between our two
models, the simple one get better results. The
simplicity is not a condition for a better performance
using deep learning. These results also express that
some linguistic features decrease the effectiveness
of the model, but the similarity between the results
in the tweets and news evaluation sets suggest that
the system is able to generalize with a good
performance.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper we presented an Ensemble Model for
the task Hate Speech Detection (HaSpeeDe2)
subtask A at Evalita 2020. Our proposal combines
linguistic features and RNNs with transformers
representations using an IMF. In the training phase,
we used a multi-task learning approaches to
recognize hate speech and polarity simultaneously.</p>
      <p>The achieved results show that the ability of this
ensemble to generalize the detection of hate
content in different text genres. Nevertheless, some
handcraft features decrements its results.
Motivated by this, we plan to explore better features
selection, other attention mechanisms and multitask
learning techniques to improve the performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the evalita 2016 sentiment polarity classification task</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Elisa</given-names>
            <surname>Bassignana</surname>
          </string-name>
          , Valerio Basile, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hurtlex: A multilingual lexicon of words to hurt</article-title>
          .
          <source>In 5th Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2018</year>
          , volume
          <volume>2253</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . CEUR-WS.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta Felice</surname>
            , Fabio Poletto, Manuela Sanguinetti, and
            <given-names>Tesconi</given-names>
          </string-name>
          <string-name>
            <surname>Maurizio</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 hate speech detection task</article-title>
          .
          <source>In EVALITA 2018-Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</source>
          , volume
          <volume>2263</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Erik</given-names>
            <surname>Cambria</surname>
          </string-name>
          , Soujanya Poria, Devamanyu Hazarika, and
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Kwok</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Rich</given-names>
            <surname>Caruana</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Multitask learning</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>28</volume>
          (
          <issue>1</issue>
          ):
          <fpage>41</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cimino</surname>
          </string-name>
          , Lorenzo De Mattei, and Felice Dell'Orletta.
          <year>2018</year>
          .
          <article-title>Multi-task learning in deep neural networks at evalita 2018</article-title>
          .
          <article-title>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (</article-title>
          <source>EVALITA'18)</source>
          , pages
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Gretel Liz De la Pena Sarrace</surname>
          </string-name>
          <article-title>´n, Reynaldo Gil Pons, Carlos Enrique Muniz Cuza, and</article-title>
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hate speech detection using attention-based lstm</article-title>
          .
          <source>EVALITA Evaluation of NLP and Speech Tools for Italian</source>
          ,
          <volume>12</volume>
          :
          <fpage>235</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>Bjo¨rn Gamba¨ck and Utpal Kumar Sikdar</article-title>
          .
          <year>2017</year>
          .
          <article-title>Using convolutional neural networks to classify hatespeech</article-title>
          .
          <source>In Proceedings of the first workshop on abusive language online</source>
          , pages
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Hamid</given-names>
            <surname>Karimi</surname>
          </string-name>
          , Proteek Roy, Sari Saba-Sadiya, and
          <string-name>
            <given-names>Jiliang</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-source multi-class fake news detection</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1546</fpage>
          -
          <lpage>1557</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Diederik P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In Yoshua Bengio and Yann LeCun</source>
          , editors,
          <source>3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>David D</given-names>
            <surname>Lewis</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>An evaluation of phrasal and clustered representations on a text categorization task</article-title>
          .
          <source>In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>37</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Gang</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jiabao</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bidirectional lstm with attention mechanism and convolutional layer for text classification</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>337</volume>
          :
          <fpage>325</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Gregory S. Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          . In Christopher J. C. Burges, Le´on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors,
          <source>Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8</source>
          ,
          <year>2013</year>
          ,
          <string-name>
            <given-names>Lake</given-names>
            <surname>Tahoe</surname>
          </string-name>
          , Nevada, United States, pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Gloria Comandini, Elisa Di Nuovo, Simona Frenda, Marco Stranisci, Cristina Bosco, Tommaso Caselli, Viviana Patti, and
          <string-name>
            <given-names>Irene</given-names>
            <surname>Russo</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Overview of the evalita 2020 second hate speech detection task (haspeede 2)</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Valentino</given-names>
            <surname>Santucci</surname>
          </string-name>
          , Stefania Spina, Alfredo Milani, Giulio Biondi, and Gabriele Di Bari.
          <year>2018</year>
          .
          <article-title>Detecting hate speech for italian language in social media</article-title>
          .
          <source>In EVALITA</source>
          <year>2018</year>
          ,
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), volume
          <volume>2263</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Anna</given-names>
            <surname>Schmidt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wiegand</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          .
          <source>In Proceedings of the Fifth International workshop on natural language processing for social media</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Nitish</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Geoffrey E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Łukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>