<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Domain Adaptation for Text Classification with Weird Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <email>valerio.basile@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In recent years, the Natural Language Processing
community has directed a great deal of effort
towards text classification, in different declinations.
The list of shared tasks proposed at the recent
editions (2016–2019) of the International Workshop
on Semantic Evaluation (SemEval) shows an
increasing number of tasks that can be cast as text
classification problems: given a text and a set of
labels, choose the correct label to associate with
the text. If the cardinality of the set of labels is
two, we speak of binary classification, as opposed
to multiclass classification. Furthermore, not all
binary classification tasks are the same. When the
labels indicate the presence or absence of a given
phenomenon, we speak of a detection task.</p>
      <p>Classification tasks are mainly approached in
a supervised fashion, where a labeled dataset is
employed to train a classifier to map certain
features of the input text to the probability of a
certain label. Arguably, the most useful features in a
NLP problem are the words that compose the text.
However, in order to be processed by a machine
learning algorithm, words need to be represented
in a dense and machine readable format. Word
embeddings solve this issue by providing
vectorial representations of words where vectors that
are close in the geometric space represent words
that occur often in the same contexts. Among their
applications, pre-trained word embeddings are a
powerful source of knowledge to boost the
performance of supervised models that aim at learning
from textual instances.</p>
      <p>Several deep learning models compute word
embeddings at training time. However, they can
be initialized with pre-trained word embeddings,
typically computed on the basis of concordances
in large corpora. This kind of initialization not
only boosts the training of the model, but it also
represents a way of injecting precomputed world
knowledge into a model otherwise trained on a
(sometimes very specific) data set.</p>
      <p>An issue with word embedding models,
including recent contextual embeddings such as Peters
et al. (2018), is that they are typically trained on
general-purpose corpora. Therefore, they may fail
to capture semantic shifts that occur in specific
domains. For instance, in a dataset of online hate
speech, negatively charged words such as insults
often co-occur with words that would normally
be considered neutral, but carry instead a negative
signal in that particular context. More concretely,
in a dataset of hate speech towards immigrant in
the post-Trump U.S., a word that otherwise would
be considered neutral such as wall carries a
definite negative connotation.</p>
      <p>In this work, we try to capture this intuition
computationally, and model this phenomenon in
a word embedding space. We employ an
automatic measure to score words in a labeled corpus
according to their association with a given label
(Section 3.1) and use this score in a fully
automated method to adapt generic pre-trained word
embeddings (Section 3.2). We test our method
on existing benchmarks of hate speech detection
(Section 4.1) and gender prediction (Section 4.2),
reporting improvements in precision and recall.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Kameswara Sarma et al. (2018) propose a method
to adapt generic word embeddings by computing
domain specific word embeddings on a corpus of
text from the target domain and aligning the two
vector spaces, obtaining a performance boost on
sentiment classification. Another recent approach
is based on projecting the vector representations
from two domain-specific spaces into a joint word
embedding model
        <xref ref-type="bibr" rid="ref3 ref4">(Barnes et al., 2018b)</xref>
        , building
on a similar method applied to cross-lingual word
embedding projection
        <xref ref-type="bibr" rid="ref3 ref4">(Barnes et al., 2018a)</xref>
        . With
respect to these works, the approach proposed in
this paper is significantly more lightweight, acting
directly on a generic word embedding model
without the need to train a domain specific one.
      </p>
      <p>
        The word-level measure introduced in the next
section is reminiscent of similar metrics from
Information Theory, e.g., Information Content
        <xref ref-type="bibr" rid="ref14">(Pedersen, 2010)</xref>
        , and measures of frequency
distribution similarity such as Kullback-Leibler
divergence
        <xref ref-type="bibr" rid="ref13">(Kullback and Leibler, 1951)</xref>
        . However,
in this paper we aimed at keeping the complexity
of such computation low, in order to manually
explore its effect on the word embeddings.
      </p>
      <p>
        In the domain of hate speech, several
approaches mix word embeddings and supervised
learning with domain-specific lexicons (e.g.,
dictionaries of hateful terms), as highlighted by the
description of participant systems to recent
evaluation campaigns
        <xref ref-type="bibr" rid="ref6 ref9">(Fersini et al., 2018; Bosco et al.,
2018)</xref>
        . These methods are computationally
inexpensive, but require curated resources that are not
always available for less represented languages.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Weirdness-based Embedding</title>
    </sec>
    <sec id="sec-4">
      <title>Adaptation</title>
      <p>In this section, we present our method for
automatic domain adaptation of pre-trained word
embeddings. The input of the procedure is a set of
pre-trained word embeddings and a corpus of texts
paired with labels.
3.1</p>
      <sec id="sec-4-1">
        <title>Polarized Weirdness</title>
        <p>The Weirdness index was introduced by Ahmad et
al. (1999) as an automatic metric to retrieve words
characteristic of a special language with respect
to their typical usage. According to this metric,
a word is highly weird in a specific collection of
documents if it occurs significantly more often in
that context than in a general corpus. In practice,
given a specialist text corpus and a general text
corpus, the weirdness index of a word is the ratio
of its relative frequencies in the respective corpora.
Calling ws the frequency of the word w in the
specialist language corpus, wg the frequency of the
word w in the general language corpus, and ts and
tg the total count of words the specialist and
general language corpora respectively, the weirdness
index of w is computed as:</p>
        <p>W eirdness(w) =
ws=ts
wg=tg</p>
        <p>The weirdness index is used to retrieve words
that are highly typical of a particular domain. For
instance, in Ahmad et al. (1999), the words
dollar, government and market are extracted from the
TREC-8 corpus, a collection of governmental and
financial domain, by comparing their frequencies
to the general domain British National Corpus.</p>
        <p>In this work, we propose a new application of
the weirdness index to the task of text
classification. Rather than comparing the frequencies of
words from corpora of different domains, we
compute the weirdness index based on the frequency
of words occurring in labeled datasets. The
mechanism is straightforward: instead of comparing the
relative frequencies of a word in a special
language corpus against a general language corpus,
we compare the relative frequencies of a word
as it occurs in the subset of a labeled dataset
identified by one value of the label against its
complement. Consider a labeled corpus C =
f(e1; l1); (e2; l2); :::g where ei = fw1; w2; :::g is
an instance of text (e.g., an online comment), and
li is the label associated with ei, belonging to a
fixed set L (e.g., fpositive; negativeg).</p>
        <p>
          The polarized weirdness
          <xref ref-type="bibr" rid="ref10">(Florio et al., 2020)</xref>
          of
w with respect to a specific label l 2 L is the
ratio of the relative frequency of w in the subset
fei 2 C : li = l g over the relative frequency of
w in the subset fei 2 C : li 6= l g
        </p>
        <p>Here is an example of how polarized weirdness
is computed. Consider a corpus of 100 instances,
50 of which labeled positive and 50 labeled
negative. The total number of words in instances
labeled positive is 3,000, while the total number of
words in instances labeled negative is 2,000. The
word good occurs 50 times in positive instances
and 5 times in negative instances. Therefore its
polarized weirdness with respect to the positive label
is:</p>
        <p>P Wpositive(good) =
However, the polarized weirdness of good with
respect to the negative label is:</p>
        <p>P Wnegative(good) =
indicating that good is much more indicative of
positiveness than negativeness.</p>
        <p>Polarized weirdness can be computed at a low
computational cost on any dataset labeled with
categorical values, with just tokenization for
preprocessing. The outcome of the calculation of the
polarized weirdness index is a set of rankings, one
for each label, over the vocabulary, there the top
words in the ranking relative to a given label l are
the most characteristic for that label.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Word Embedding Adaptation</title>
        <p>In Section 3.1, we introduced an automatic
metric that allows us to compute how much a word is
characteristic to a certain label. We use this
information to transpose the vector representing words
highly typical of a label closer to each other in
the vector space. Formally, once a label has been
decided and the polarized weirdness is computed
with respect to it, for each pair of vectors ~v1; ~v2 in
a word embedding model, representing words with
polarized weirdness pw1 and pw2 respectively, we
compute new representations:
~v1 = ((1
~v2 = ((1
pw1)~v1) + (( pw2)~v2)
pw2)~v2) + (( pw1)~v1)
where is a parameter controlling the extent of
the adaptation. The result of the application of
this algorithm is a new word embedding model
over the same vocabulary as the original model,
where pairs of word vectors are closer in the space
to an extent proportional to their respective
polarized weirdness score.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>We test the word embedding adaptation introduced
in Section 3 by adapting pre-trained multilingual
word embeddings to three different tasks. For
each task, the polarized weirdness index is
computed on the labeled training sets as described in
Section 3.1, and the generic word embeddings are
adapted to the particular task domain applying the
algorithm described in Section 3.2.</p>
      <p>
        Our baseline model is a convolutional neural
network (CNN) with a 64x8 hidden layer and
Rectified Linear Units activation (ReLU), followed by
a 4-size max pooling layer. We use the
implementation from the Keras Python library1, with
ADAM optimization
        <xref ref-type="bibr" rid="ref12">(Kingma and Ba, 2014)</xref>
        ,
leaving the hyperparameters at their default value,
except for optimization of learning rate (set
between 10 2 and 10 3 depending on the dataset)
and number of epochs (between 10 and 25).
      </p>
      <p>
        We use the multilingual word embeddings
provided by Polyglot
        <xref ref-type="bibr" rid="ref2">(Al-Rfou et al., 2013)</xref>
        . These
are distributed word representations for over 100
languages trained on Wikipedia. The vector
representations of words in Polyglot are
64dimensional. The choice of this model is
motivated by the need to have word embedding models
for different languages that were created with the
same method, to be able to measure improvements
introduced merely by our adaptation method. In
these experiments, we set = 0:5.
4.1
      </p>
      <sec id="sec-5-1">
        <title>Experiment 1: Multilingual Hate Speech</title>
      </sec>
      <sec id="sec-5-2">
        <title>Detection</title>
        <p>
          In the first experiment, the generic word
embeddings are adapted to provide a better
representation for words used in online messages containing
hate speech towards women and immigrants. We
use the dataset provided by the SemEval Task 5
(HatEval: Multilingual Detection of Hate Speech
Against Immigrants and Women in Twitter), a
public challenge where participants are invited to
submit the predictions of systems for hate speech
1https://keras.io/
detection
          <xref ref-type="bibr" rid="ref5">(Basile et al., 2019)</xref>
          . In particular, we
employ the data of the subtask A, where the
prediction is binary (hateful vs. not hateful). The
shared task website2 provides datasets in Spanish
and English, already divided into training,
development and test sets. The topics of the messages
are mainly two, namely women and immigrants,
in a fairly balanced proportion. In fact, the dataset
has been created by querying the Twitter API with
a set of keywords crafted to capture these two
topics. The English dataset comprises 13,000 tweets
(10,000 for training and 3,000 for testing), with
about 42% of the messages labeled as hateful. The
Spanish dataset is smaller (6,600 tweets in total,
5,000 for training and 1,600 for testing), and it
follows a similar distribution of topics and labels as
the English set. Following are two examples of
tweets from the English HatEval data,, with their
Hate Speech label:
        </p>
        <p>I’d say electrify the water but that would kill
wildlife. #SendThemBack
label: yes
Polish Prime Minister Mateusz Morawiecki
insisted that Poland would push against any
discussion on refugee relocations as part of
the EU’s migration politics.</p>
        <p>label: no
Similarly, two examples of tweets from the
Spanish HatEval data, with translation and label:
@rubenssambueza eres una basura de
persona, lo cual no me sorprende porque eres
SUDACA, y asi son los tercermundistas
@rubenssambueza you are garbage, which does
not surprise me because you are a SUDACA, and
so are third-worlders
label: yes
Yo cre´ıa que ese jueguito solo exist´ıa para
los a´rabes, jajaja.</p>
        <p>I thought that this little game was only for arabs,
ahahah.</p>
        <p>label: no
The polarized weirdness of the words in the
HatEval datasets (English and Spanish) is computed on
the respective training sets as the ratio of their
relative frequency in hateful messages over their
relative frequency in non hateful messages. A
modified version of the Polyglot embeddings is then
2https://competitions.codalab.org/
competitions/19935
computed3 and the performance of the CNN using
the adapted embeddings for initialization is
compared with the performance obtained by
initializing the CNN with the generic embeddings.</p>
        <p>The results on the English dataset, presented
in Table 1, show a clear improvement in the
detection of hateful messages, leading to a +1.2%
performance gain in macro-average F1-score.
Recall is particularly impacted by the adapted
embeddings, indicating that the modified model
successfully helps in correcting false negatives.</p>
        <p>The results on the Spanish HatEval task dataset,
presented in Table 1 are even better than on
English, with improvements in precision and recall
for both the positive and the negative class, and a
total gain of almost 2% macro-averaged F1-score.
Similarly to English, the largest improvement is
measured on the recall.</p>
        <p>One of the advantages of the proposed method
is that it is transparent with respect to the
semantic shift computed on the pre-trained
embeddings. Firstly, the words with the highest
polarized weirdness index can be extracted, to
gain insights into the specificity of the datasets.
The top twenty weird words in the hateful
English HatEval set are the following: nodaca,
enddaca, kag, womensuck, @hillaryclinton,
americafirst, trump2020, taxpayers, buildthewallnow,
illegals, @senatemajldr, dreamer, buildthewall,
they, @potus, walkawayfromdemocrat,
votedemsout, wethepeople, illegalalien, backtheblue. The
top twenty weird words in the hateful Spanish
HatEval set with English translations are the
following: mantero (street vendor), turista (tourist),
negratas (nigger), calor´ıa (calory), sanidad
(healthcare), drogar (to drug), paises (countries),
emigrante (immigrant), Hija (daughter), ZORRA
(bitch), impuesto (tax), zorro (bitch (masculine)),
3To speed up to computation without major loss of
information, we consider only the top 2,000 items from the
weirdness ranking.
totalmente (totally), lleno (full), invasor (invader),
costumbre (custom), barrio (neighborhood), PAIS
(country), Oye (hey), Espan˜oles (Spaniards).</p>
        <p>Secondly, one can extract the word embeddings
after the polarized weirdness adaptation is applied,
and qualitatively inspect their respective position
in the vector space. Table 2 shows how certain
pairs of words become more related in the adapted
space, while others are untouched by the process.
The example in Spanish is particularly interesting
(and worrying), where a misogynistic derogatory
word (puta) becomes closer to the feminine
inflection of “director” but not to the masculine
inflection.
4.2</p>
      </sec>
      <sec id="sec-5-3">
        <title>Experiment 2: Gender Prediction</title>
        <p>In the second experiment, we test our word
embedding adaptation method in a different scenario,
that is, the prediction of the gender of the author
of messages. The assumption is that the most
typical words used by each gender will cluster in the
vector representation, thus helping the model
discriminate them better.</p>
        <p>
          We use the dataset distributed for the
CrossGenre Gender Prediction in Italian (GxG) shared
task of the 2018 edition of EVALITA, the
evaluation campaign of language technologies for
Italian
          <xref ref-type="bibr" rid="ref11 ref3 ref4 ref6 ref8 ref9">(Dell’Orletta and Nissim, 2018)</xref>
          . The
participants to the shared task are invited to submit
the prediction of their system on a set of short
and medium-length texts in Italian from
different sources, including social media, news articles
and personal diaries, on the gender of the author.
The task is therefore a binary classification,
evaluated by means of accuracy. We downloaded the
data from the task website4, comprising 22,874
in4https://sites.google.com/view/gxg2018/
stances divided into training set (11,000) and test
set (10,874). The labels of the GxG are perfectly
balanced between M (male) and F (female).
        </p>
        <p>Following are two examples of instances from
the GxG dataset with their label and translation:
@ElfoBruno no la barba la devo tenere
lunga per sembrare folta perch e` in realta` e`
rada...
@ElfoBruno no I have to keep the beard long to
make it look thick because it really is patchy...
label: M
Sabato prossimo sono davvero curiosa di
scoprire cosa fara` @Valerio Scanu a
#BallandoConLeStelle
Next Saturday I am very curious to find out what
@Valerio Scanu will do at
#DancingWithTheStars
label: F
Since this is a classification rather than a
detection task, the process is slightly different from the
previous experiment, to account for the symmetry
between the labels. First, the polarized weirdness
is computed on the training set twice, once on the
texts written by males (against the women’s texts)
and once on the texts written by females (against
the men’s texts). Then the general Polyglot
embeddings are adapted by applying the algorithm
in Section 3.2 twice, in both directions, using the
respective weirdness rankings. The adapted
embeddings are used to initialize the CNN, resulting
in the classification performance presented in
Table 3. The overall performance improves when
the adapted embeddings are included in the model.
However, the classification of the male label
improves while the classification of female does not,
due to the difference in recall.</p>
        <p>Qualitative analysis reveals interesting patterns,
confirming that strong bias is present in some
pre-trained word embedding models. The twenty
top weird words in the Male GxG set are:
costituzionale (constitutional), socialisto (socialist),
Lecce (name of a city and a football club),
DALLA (name of a singer), utente (user), Samp
(name of a football team), Sampdoria (same of a
football team), Nera (black), allenatore (coach),
Orlando (proper name), Bp (acronym), ni (yes and
no), maresciallo (marshall), garanzia (guarantee),
cerare (to wax), voluto (willing), pilotare (to pilot),
disco (disco), caserma (barracks), From (proper
name).</p>
        <p>The top twenty weird words in the Female
GxG set are instead the following: qualcuna
(someone (feminine)), HEART EMOJI,
Qualcuna (someone (feminine)), KISS EMOJI, 83
(number), essi (them), leonessa (lioness), Sarah
(proper name), 06 (number), HEART-EYED
EMOJI, nervoso (nervous), James (proper name),
Dante (proper name), coreografia (choreography),
Strada (street), Fra (proper name), Chiama (call),
en (en), bravissimi (very good (plural)), Moratti
(proper name). Arguably, a stronger topic bias
(football) is present in the male subset, possibly
explaining the better performance induced by the
adaptation.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>In this work, we adapted an extension of the
weirdness index to score the words in a labeled corpus
according to how much they are typical of a given
label. The polarized weirdness score is used to
automatically adapt an existing word embedding
space to better reflect target-specific semantic
associations of words. We measured a performance
boost on tasks of hate speech detection in English
and Spanish, and gender prediction in Italian.</p>
      <p>On detection tasks, the improvement from our
method is remarkable in terms of recall,
indicating the potential of weirdness-adapted word
embeddings to correct false negatives. This result
is in line with the original motivation for this
approach, i.e., to account for semantic shift
occurring in domain-specific corpora of opinionated
content. For instance, in the hate speech domain,
the adapted embeddings are able to capture that
certain neutral words (e.g., “wall”) assume a
polarized connotation (e.g., negatively charged).</p>
      <p>The results from this study are promising, and
encourage us to extend the method to richer
representations (e.g., “weird” ngrams), languages other
than European, and its integration into more
sophisticated deep neural models. Recent
Transformer models, in particular, compute
contextualized embeddings, therefore including
transformations similar to the present method. Although
such models are less transparent with respect to
such transformation, an experimental comparison
is among the next steps planned in this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Khurshid</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lee</given-names>
            <surname>Gillam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Lena</given-names>
            <surname>Tostevin</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>University of surrey participation in trec8: Weirdness indexing for logical document extrapolation and retrieval (wilder)</article-title>
          .
          <source>In The Eighth Text REtrieval Conference (TREC-8)</source>
          , Gaithersburg, Maryland, November.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Rami</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bryan</given-names>
            <surname>Perozzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Skiena</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Polyglot: Distributed word representations for multilingual nlp</article-title>
          .
          <source>In Proceedings of the Seventeenth Conference on Computational Natural Language Learning</source>
          , pages
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Barnes</surname>
          </string-name>
          , Roman Klinger, and
          <article-title>Sabine Schulte im Walde. 2018a. Bilingual sentiment embeddings: Joint projection of sentiment across languages</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>2483</fpage>
          -
          <lpage>2493</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Barnes</surname>
          </string-name>
          , Roman Klinger, and
          <article-title>Sabine Schulte im Walde. 2018b. Projecting embeddings for domain adaption: Joint modeling of sentiment analysis in diverse domains</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>818</fpage>
          -
          <lpage>830</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Semeval2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          .
          <source>In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-</source>
          <year>2019</year>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Felice Dell'Orletta, Fabio Poletto, Manuela Sanguinetti, and
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>Overview of the EVALITA 2018 hate speech detection task</article-title>
          .
          <source>In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), Turin, Italy,
          <source>December 12-13</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 cross-genre gender prediction (gxg) task</article-title>
          .
          <source>In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), Turin, Italy,
          <source>December 12-13</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Paolo Rosso, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the task on automatic misogyny identification at ibereval 2018</article-title>
          . In IberEval@SEPLN, volume
          <volume>2150</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>214</fpage>
          -
          <lpage>228</lpage>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Komal</given-names>
            <surname>Florio</surname>
          </string-name>
          , Valerio Basile, Marco Polignano, Pierpaolo Basile, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Time of your hate: The challenge of time in hate speech detection on social media</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>10</volume>
          (
          <issue>12</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Prathusha</given-names>
            <surname>Kameswara</surname>
          </string-name>
          <string-name>
            <surname>Sarma</surname>
          </string-name>
          , Yingyu Liang, and
          <string-name>
            <given-names>Bill</given-names>
            <surname>Sethares</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Domain adapted word embeddings for improved sentiment classification</article-title>
          .
          <source>In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP</source>
          , pages
          <fpage>51</fpage>
          -
          <lpage>59</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Diederik P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>CoRR, abs/1412</source>
          .6980.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Kullback</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Leibler</surname>
          </string-name>
          .
          <year>1951</year>
          .
          <article-title>On information and sufficiency</article-title>
          .
          <source>The Annals of Mathematical Statistics</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Ted</given-names>
            <surname>Pedersen</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Information content measures of semantic similarity perform better without sensetagged text</article-title>
          .
          <source>In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>329</fpage>
          -
          <lpage>332</lpage>
          , Los Angeles, California, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Peters</surname>
          </string-name>
          , Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <issue>Long Papers)</issue>
          , pages
          <fpage>2227</fpage>
          -
          <lpage>2237</lpage>
          , New Orleans, Louisiana, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>