<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hierarchical Neural Network Approach for Bots and Gender Profiling</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Istituto di Linguistica Computazionale “Antonio Zampolli”</institution>
          ,
          <addr-line>ILC-CNR</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>In this paper we describe our participation in the Bots and Gender Profiling shared task of PAN@CLEF2019 for the English language. We tested three approaches based on three different document classification algorithms. The first approach is based on a SVM classifier with handcrafted features using a wide set of linguistic information. The second and the third approaches exploit recent advances in Natural Language Processing using a Hierarchical GRU-LSTM Neural Network using word embeddings trained on Twitter and finally an adaptation of the BERT system. After an in-house evaluation, we submitted the final run with the Hierarchical Neural Network model, which achieved a final accuracy of 0.9083 in the Bots Profiling task and a score of 0.7898 in the Gender Profiling task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Nowadays the increased importance of social media platforms in everyday life has made
the users of these platforms extremely impressionable by messages and posts written
by companies, political parties or even social media influencers. In the recent years it
has been shown that such platforms were exploited in order to diffuse fake news or
for commercial activities using sophisticated techniques, such as very smart Bots, for
massive mind manipulation. For this reason, the biggest social media platforms such
as Facebook or Twitter started using algorithms to automatically detect and delete such
Bot accounts, but latest advances in Natural Language Generation such as GPT-2 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
makes the automatic Bot detection still a challenging problem. One of the approaches
commonly used by these platforms in order to detect a Bot, is the classification of
a set of documents (e.g. tweets) rather than a single document, since usually the set
of documents written by a Bot follows a common lexical and stylistic pattern. With
respect to the previous PAN shared task, the PAN 2019: Bot and Gender profiling task
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] introduces the novelty of asking to participants to identify, given a set of tweets, the
type of a user (Bot or human) and, in case the type of user is classified as human, to
predict the gender (male or female). We addressed the Bots and genre profiling task as
a 3–class classification problem and we developed three different classifier models: one
that uses a classic approach based on the extraction of linguistic features and lexicons
lookups using the linear SVM algorithm, and two more recent neural network based
solutions. The first is based on a hierarchical GRU-LSTM deep neural network, and the
second on a language model based neural network (BERT), which we adapted in order
to handle large documents.
      </p>
      <p>This paper makes the following contributions:
1. We propose a comparison between a more classical classification approach with
extremely handcrafted features learned by a linear SVM algorithm and a
lowengineered approach based on neural network models.
2. We show that the Hierarchical GRU-LSTM deep neural network has better
performance with respect BERT, a very famous pretrained language model based on
neural network.</p>
      <p>In the following sections of this paper, we first explain the related work in Section
2. The preprocessing step and the external resources used are described in Section 3.
Our models are properly described in Section 4. The details of the experiments used to
confirm the model performances are described in Section 5. Finally, we conclude the
paper and outline future work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        This year edition of the PAN Bots and Gender Profiling shared task focuses on
automatic identification on the Twitter platform of the type of user (BOT or human), and in
case of a human, to detect the human gender. This is a slightly different version of
previous year shared task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where participants where asked to identify the gender based
not only on textual content, but exploiting also images posted on the Twitter platform.
For what concerns the classification task using only the textual component, the linear
SVM learning algorithm with heavily engineered features has been shown to be very
effective method for identification of the gender. Daneshvar and Inkpen [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used word
n–grams and character n–grams with dimensionality reduction techniques and achieved
the best score on the English and Spanish language. A similar solution was used by
the second best participant (Tellez et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) that achieved the second best score on the
English language and the best score on the Arabic language. Surprisingly, deep
learning based sequential models did not achieve very good results when just considering
the textual components. The best model on the English language that used this kind of
architecture was presented by Takashi et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The authors used a textual component
composed by a word embedding, recurrent neural network (GRU), pooling, and fully
connected layers. When tested on the English language, they achieved an accuracy of
0.7864, 4 points less than the state of the art. On the other hand their model, when mixed
with visual information, achieved the average best scores among the Arabic, Spanish
and English language, showing that deep learning architectures have strong results
specially when combining multi–modal information.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Preprocessing and resources</title>
      <p>
        The training dataset provided by the task organizers consists of 4,120 examples, each
example containing a set of tweets which were written on the Twitter platform by a
bot, a male or a female. In our approach we concatenated the tweets contained in each
sample to produce the document, which is our classification unit. We used the "SEP"
token as tweets separator in order to preserve the tweet length information which is
used by our models. Since the SVM model relies on morpho-syntactically tagged texts,
both training and test data were automatically morpho-syntactically tagged by our POS
tagger described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In addition, in order to improve the overall accuracy of our
models, we used an existing sentiment polarity lexicon and developed a word embedding
lexicon for English tweets.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Sentiment Polarity Lexicon</title>
        <p>
          We used the SentiWordnet 3.0 sentiment polarity lexicon [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This is a freely available
lexicon for the English language1 and includes more than 117,000 English word entries.
It was automatically created using a semi–supervised step and a final random-walk step
for refining the final positive and negative polarity scores.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Word embedding Lexicon</title>
        <p>
          In order to extract semantic information from words we created a word embedding
lexicon using the word2vec2 toolkit [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. As recommended in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we used the CBOW
model that learns to predict the word in the middle of a symmetric window based on
the sum of the vector representations of the words in the window. For our experiments,
we considered a context window of 5 words. These models learn lower-dimensional
word embeddings. The word embedding lexicon was built using a set of 19,700,117
English tweets downloaded from the Twitter platform. In order test the contribution of
the embeddings in classification w.r.t. the vector size, we generated 16, 32, 64 and 128
sized vectors.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The proposed models</title>
      <p>In this section we will describe the 3 devised models proposed for our participation in
the Bots and Gender profile shared task.
4.1</p>
      <sec id="sec-4-1">
        <title>The SVM Model</title>
        <p>
          The SVM classifier exploits a wide set of features ranging across different levels of
linguistic description. All these features were already tested in our previous
participation at the EVALITA 2018 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], the periodic evaluation campaign of Natural Language
Processing (NLP) and speech tools for the Italian language.
        </p>
        <sec id="sec-4-1-1">
          <title>1 https://github.com/aesuli/sentiwordnet 2 http://code.google.com/p/word2vec/</title>
          <p>
            The features are organised into three main categories: raw and lexical text features,
morpho-syntactic features and lexicon features. All the calculated features are the input
of the linear SVM algorithm implemented in the liblinear [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] library which finally
generates the final statistical model used then to classify unseen documents.
Raw and Lexical Text Features
Number of tokens: The average number of tokens of an analyzed tweet.
Character n-grams: presence or absence of contiguous sequences of characters in the
analyzed tweets.
          </p>
          <p>Word n-grams: presence or absence of contiguous sequences of tokens in the analyzed
tweets.</p>
          <p>Lemma n-grams: presence or absence of contiguous sequences of lemma occurring in
the analyzed tweets.</p>
          <p>Repetition of n-grams chars: presence or absence of contiguous repetition of
characters in the analyzed tweets.</p>
          <p>Number of mentions: number of mentions (@) occurring in the analyzed tweets.
Number of hashtags: number of hashtags occurring in the analyzed tweets.
Punctuation: the number of tweets that ends with one of the following punctuation
characters: “?”, “!”.</p>
          <p>Morpho-syntactic Features
Coarse grained Part-Of-Speech n-grams: presence or absence of contiguous sequences
of coarse–grained PoS, corresponding to the main grammatical categories (noun, verb,
adjective).</p>
          <p>Fine grained Part-Of-Speech n-grams: presence or absence of contiguous sequences
of fine-grained PoS, which represent subdivisions of the coarse-grained tags (e.g. the
class of nouns is subdivided into proper vs common nouns, verbs into main verbs,
gerund forms, past particles).</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Coarse grained Part-Of-Speech distribution: the distribution of nouns, adjectives,</title>
        <p>adverbs, numbers in the tweets.</p>
        <p>Lexicon features
Emoticons: presence or absence of positive or negative emoticons in the analyzed
tweet. The lexicon of emoticons was extracted from the site http://it.wikipedia.org/wiki/Emoticon
and manually classified.</p>
        <p>Lemma sentiment polarity n-grams: for each n-gram of lemmas extracted from the
analyzed tweet, the feature checks the polarity of each component lemma in the existing
sentiment polarity lexicons. Lemma that are not present are marked with the ABSENT
tag. This is for example the case of the trigram all very nice that is marked as
“ABSENT-POS-POS” since very and nice are marked as positive in the considered polarity
lexicon and all is absent. The feature is computed exploiting the SentiWordnet 3.0
lexicon resource.</p>
        <p>Polarity modifier: for each lemma in the tweets occurring in the existing sentiment
polarity lexicons, the feature checks the presence of adjectives or adverbs in a left context
window of size 2. If this is the case, the polarity of the lemma is assigned to the
modifier. This is for example the case of the bigram not interesting, where “interesting” is
a positive word, and “not” is an adverb. Accordingly, the feature “not_POS” is created.
The feature is computed exploiting the SentiWordnet 3.0 lexicon resource.</p>
        <p>Distribution of sentiment polarity: this feature computes the percentage of positive,
negative and neutral lemmas that occur in the tweets. To overcome the sparsity
problem, the percentages are rounded to the nearest multiple of 5. The feature is computed
exploiting the SentiWordnet 3.0 lexicon resource.</p>
        <p>Most frequent sentiment polarity: the feature returns the most frequent sentiment
polarity of the lemmas in the analyzed tweets. The feature is computed exploiting the
SentiWordnet 3.0 lexicon resource.</p>
        <p>Word embeddings combination: the feature returns the vectors obtained by computing
separately the average of the word embeddings of the nouns, adjectives and verbs of
the tweet, obtaining a total of 3 vectors for each tweet. If a specific morphosyntactic
category is not present, a feature indicating the absence of such category is added.
4.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>The BERT Model</title>
        <p>
          Following the latest advances in NLP, we wanted to test how well pretrained language
model representations behave on the Bot and Gender stylistic profiling shared task.
Context-free models such as word2vec generate a single vector for each word, which is
independent by the context in which the word is found. For example, the word "bank"
can have two different meanings with respect to the context in which the word is
surrounded. Language models like BERT [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] or ELMo [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] allow to obtain a distinct
vector for each word based also on the context, which make such models very
suitable for many NLP downstream tasks. In order to test the performance of these models,
we choose BERT since Google provides pretrained models 3, which need only to be
fine-tuned with an inexpensive procedure. Among the models available on the github
repository, we choose the recommended model: BERT-Base Multilingual Cased which
is trained on 104 languages with 110M parameters. One of the limitations of this
pretrained model is that such model was trained on sentences not longer than 512 tokens,
which made the standard fine-tuning procedure not suitable for our case, since the
training documents (the concatenation of the tweets) were much longer than 512 tokens. For
this reason, we generated 5 different fined tuned downstream tasks models by
considering 5 chunks of 500 tokens each. In testing phase, each document was still divided in 5
chunks. Each chunk was then classified by the previously 5 fine tuned models. We then
choose as winning class among BOT, male and female, the majority class resulting by
all the predictions of the 5 models on the 5 chunks.
4.3
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>The Hierarchical GRU/LSTM Model</title>
        <p>
          GRU units are able to propagate an important features that came early in the input
sequence over a long distance, thus capturing potential long-distance dependencies.
Unfortunately, it has been shown that long dependencies are lost in case of very long
sequences. For this reason, since we treat the batch of tweets to be classified as single
3 https://github.com/google-research/bert
document, we resorted to a two-layer hierarchical GRU/LSTM architecture. In
addition, each document containing the set of tweets to be analyzed is first truncated to the
first 2500 tokens. This operation is done since in-house experiments have not shown
a significant drop in performance w.r.t. analyzing all the tweets contained in a single
example. Moreover, the truncation allows a faster training in terms of time, considering
the number of tweets belonging to the training set. Each document is then split in 5
chunks of 500 tokens, which are the input of five different GRU unit (48 dimensions),
which produce 5 "chunk" embeddings. Finally, all the chunk embeddings are the input
of a final LSTM (48 dimensions) layer. Figure 1 shows a graphical representation of the
hierarchical GRU-LSTM architecture. We applied a dropout factor to both input gates
and to the recurrent connections in order to prevent overfitting which is a typical issue
in neural networks [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. We have chosen a dropout factor value of 0.55. For what
concerns the optimization process, the categorical cross entropy function is used as a loss
function and optimization is performed by the rmsprop optimizer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Furthermore, we performed a 5-fold training approach. More precisely we build
5 different models using different training and validation sets. These models are then
exploited in the classification phase: the assigned labels are the ones that obtain the
majority among all the models. The 5-fold approach strategy was chosen in order to
generate a global model which should less be prone to overfitting or underfitting w.r.t.
a single learned model.</p>
        <sec id="sec-4-4-1">
          <title>Each input word is represented by a vector which is composed by:</title>
          <p>Word embeddings: the word embedding extracted by the available word embedding
lexicon (32 dimensions), and for each word embedding an extra component was added
to handle the "unknown word" (1 dimension).</p>
          <p>Word polarity: the corresponding word sentiment polarities obtained by using the
SentiWordnet 3.0 resource. This results in 3 components: 2 used for positive and negative
values found in the resource, and one binary component set to 1 in case the word is not
found in the lexicon.</p>
          <p>Is capitalized word: a component (1 dimension) indicating whether the word is
capitalized.</p>
          <p>Is uppercased word: a component (1 dimension) indicating whether the word is
uppercased.</p>
          <p>Is URL: a component (1 dimension) indicating whether the word is an URL.
Is hashtag: a component (1 dimension) indicating whether the word is an hastag.
Is mention: a component (1 dimension) indicating whether the word contains a
mention.</p>
          <p>Is separator: a component (1 dimension) indicating if the word is the "SEP" reserved
token, which we use to divide the tweets.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Results</title>
      <p>In order to choose the model to submit for the final run of the Bots and Gender
profiling shared task, we tested all the 3 devised models on the official development set
distributed by the organizers. The development set was composed by 1,240 examples:
640 composed by BOT messages, 310 by males and 310 by females. The training data
(including the development set) is composed by 4,120 examples.</p>
      <p>Configuration
linear SVM
Hierarchical GRU/LSTM 5 Fold
BERT Multi</p>
      <p>Bot F-score Male F-score Female F-score Avg F-score</p>
      <p>
        The obtained results on the development set lead us to choose the Hierarchical
GRU/LSTM model for the final runs since this model behaves better when
considering the average f–score on the 3 tasks (0.806). Table 2 reports the results obtained on
the official test set. The result was obtained using the official scorer provided by the
organizers on the TIRA[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] evaluation platform. In addition the table reports the
baselines provided by the shared task organizers which are char n–grams, word n–grams,
word2vec and LDSE [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] based and which are fully described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Dataset
GRU-LSTM model
char nGrams baseline
word nGrams baseline
W2V baseline
LDSE baseline</p>
      <p>Bot vs Human Male vs Female</p>
      <p>For what concerns the Bot vs Human task, we can notice that our proposed model
outperformed both the W2V and LDSE baselines, but char n-grams and word-ngrams
bases lines performed better than our model (+3% in accuracy). This suggest that these
features are very important for this classification task. Such behaviour was shown also
in our internal tests, but the gain in terms of accuracy was less than what was shown in
the test set (+2% in accuracy).</p>
      <p>For what concerns the Male vs Female task, also here our GRU-LSTM model
performed well, being in line with all the proposed baselines. Unfortunately all the errors
in classification made in the in the Bot vs Human task were propagated in the Male vs
Female task. So most probably a combination of a SVM based model for the Bot vs
Human task and the GRU-LSTM model for the Male vs Female task would result in the
best solution to achieve the best scores.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We presented three systems for the Bots and Gender Profiling shared task, on a SVM
classifier with handcrafted features using a wide set of linguistic information. The
second and the third based on Hierarchical GRU-LSTM on on the the BERT system. After
internal experiments, we participated with the Hierarchical GRU-LSTM model, which
showed promising results, outperforming all the W2V and LDSE baselines. It would be
interesting to incorporate char-level features in our GRU-LSTM in order to evaluate the
difference in terms of performance w.r.t. our current model, which is only token based
at the moment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Jeffrey Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , Ilya Sutskever.
          <source>OpenAI Blog</source>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Francisco Rangel, Paolo Rosso.
          <source>Overview of the 7th Author Profiling Task at PAN</source>
          <year>2019</year>
          :
          <article-title>Bots and Gender Profiling</article-title>
          . In: Cappellato L.,
          <string-name>
            <surname>Ferro</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Losada</surname>
            <given-names>D</given-names>
          </string-name>
          . (Eds.)
          <article-title>CLEF 2019 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings. CEUR-WS.org</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Francisco M. Rangel Pardo, Paolo Rosso,
          <article-title>Manuel Montes-y-</article-title>
          <string-name>
            <surname>Gómez</surname>
            , Martin Potthast,
            <given-names>Benno</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          .
          <source>Overview of the 6th Author Profiling Task at PAN</source>
          <year>2018</year>
          :
          <article-title>Multimodal Gender Identification in Twitter</article-title>
          .
          <source>In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . CEUR Workshop Proceedings. CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Saman</given-names>
            <surname>Daneshvar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Diana</given-names>
            <surname>Inkpen</surname>
          </string-name>
          .
          <article-title>Gender Identification in Twitter using N-grams and LSA: Notebook for PAN at CLEF 2018</article-title>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . CEUR Workshop Proceedings. CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Eric</given-names>
            <surname>Sadit</surname>
          </string-name>
          <string-name>
            <surname>Tellez</surname>
          </string-name>
          , Sabino Miranda-Jiménez, Daniela Moctezuma, Mario Graff,
          <article-title>Vladimir Salgado and José Ortiz-Bejar. Gender Identification through Multi-modal Tweet Analysis using MicroTC and Bag of Visual Words: Notebook for PAN at CLEF 2018</article-title>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . CEUR Workshop Proceedings. CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Takumi</given-names>
            <surname>Takahashi</surname>
          </string-name>
          , Takuji Tahara, Koki Nagatani, Yasuhide Miura , Tomoki Taniguchi and
          <string-name>
            <given-names>Tomoko</given-names>
            <surname>Ohkuma</surname>
          </string-name>
          .
          <article-title>Text and Image Synergy with Feature Cross Technique for Gender Identification: Notebook for PAN at CLEF 2018</article-title>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . CEUR Workshop Proceedings. CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cimino</surname>
          </string-name>
          and
          <article-title>Felice dell'Orletta. Building the state-of-the-art in POS tagging of Italian Tweets</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Baccianella</surname>
          </string-name>
          , Andrea Esuli,
          <source>Fabrizio Sebastiani. SentiWordNet 3</source>
          .
          <article-title>0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resource and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2010</year>
          ,
          <volume>17</volume>
          -
          <fpage>23</fpage>
          May
          <year>2010</year>
          , Valletta, Malta
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv1:1301</source>
          .
          <fpage>3781</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Andrea</surname>
            <given-names>Cimino</given-names>
          </string-name>
          , Lorenzo De Mattei,
          <article-title>Felice Dell'Orletta. Multi-task Learning in Deep Neural Networks at EVALITA 2018</article-title>
          .
          <article-title>Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2018</year>
          )
          <article-title>co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it</article-title>
          <year>2018</year>
          ), Turin, Italy,
          <source>December 12-13</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rong-En</surname>
            <given-names>Fan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kai-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho-Jui</surname>
            <given-names>Hsieh</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiang-Rui Wang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <article-title>LIBLINEAR: A Library for Large Linear Classification</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          . Volume
          <volume>9</volume>
          ,
          <fpage>1871</fpage>
          -
          <lpage>187</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jacob Devlin Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>arXiv preprint arXiv1</source>
          :
          <year>1810</year>
          .04805. http://arxiv.org/abs/
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Matthew</surname>
            <given-names>E</given-names>
          </string-name>
          . Peter,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Neumann</surname>
          </string-name>
          . Mohit Iyyer, Matt Gardner, Christopher Clark,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In Proceedings of the</source>
          <year>2018</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , NAACL-HLT
          <year>2018</year>
          , New Orleans, Louisiana, USA, June 1- 6,
          <year>2018</year>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers).</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zoubin</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          .
          <article-title>A theoretically grounded application of dropout in recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1512.05287</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <source>Tijmen Tieleman and Geoffrey Hinton. Lecture 6</source>
          .5
          <article-title>-RmsProp: Divide the gradient by a running average of its recent magnitude</article-title>
          .
          <source>In COURSERA: Neural Networks for Machine Learning</source>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Martin</surname>
            <given-names>Potthast</given-names>
          </string-name>
          , Tim Gollub,
          <source>Matti Wiegmann and Benno Stein. TIRA Integrated Research Architecture. Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of CLEF</source>
          , Springer.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Francisco M.
          <article-title>Rangel Pardo, Marc Franco-Salvador and Paolo Rosso. A Low Dimensionality Representation for Language Variety Identification</article-title>
          .
          <source>In Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing</source>
          <year>2016</year>
          , Konya, Turkey, April 3-
          <issue>9</issue>
          ,
          <year>2016</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <string-name>
            <surname>Papers</surname>
          </string-name>
          , Part II.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>