<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UOBIT @ TAG-it: Exploring a Multi-faceted Representation for Profiling Age, Topic and Gender in Italian Texts.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Labadie Tamayo</string-name>
          <email>roberto.labadie@estudiantes.uo.edu.cu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Castro Castro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reynier Ortega Bueno</string-name>
          <email>reynierg@uo.edu.cu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Departament, University of Oriente Santiago de Cuba</institution>
          ,
          <country country="CU">Cuba</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes our system for participating in the TAG-it Author Profiling task at EVALITA 2020. The task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. Our proposal combines learned representations by RNN at word and sentence levels, Transformer Neural Nets and hand-crafted stylistic features. All these representations are mixed and fed into a fully connected layer from a feed-forward neural network in order to make predictions for addressed subtasks. Experimental results show that our model achieves encouraging performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The growing integration of social media with
people’s daily live has made this medium a
common environment for the deployment of
technologies that allow the retrieval of useful information
in the development of business activities, social
outreach processes, forensic tasks, etc. That is
because people frequently upload and share content
in these media with various purposes such as
socialization of points of view about some topic or
promotion of personal business, etc. The analysis
of textual information from such data, is one of the
main reasons why researches become trending on
the Natural Language Processing (NLP) field.</p>
      <p>However, the fact that this information varies
greatly in terms of its format, even when it comes
from the same person, besides textual sequences
are unstructured information, make challenging
the process of analyzing it automatically. Author
Profiling (AP) task aims at discovering different
marks or patterns (linguistic or not) from texts,
that allow a user to be characterized in terms of</p>
      <p>Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
their age, gender, personality or any other
demographic attribute.</p>
      <p>
        Many forums, due to the applicability of AP,
share tasks directed to mining features that in
general way, predict that valuable information.
Those tasks commonly make special focus on
popular languages such as English and Spanish.
Nevertheless, other languages are explored on
important forums too, that is the case of EVALITA
1, this one, promoting analysis of NLP tasks in
the Italian language. Among the challenges from
its last campaign EVALITA 2018 was the AP
(in terms of gender) task GxG
        <xref ref-type="bibr" rid="ref1 ref3 ref6">(Dell’Orletta and
Nissim, 2018)</xref>
        , exploring the gender-predicting
issue.
      </p>
      <p>
        The analysis of age, gender and the topic a text
is related with, are tasks well explored and the
most approaches employ data representation
based on stylistic features, n-gram representations
and/or words embedding combined with Machine
Learning (ML) methods like Support Vector
Machine (SVM) and Random Forest
        <xref ref-type="bibr" rid="ref12">(Pizarro,
2019)</xref>
        . Also some authors by using Deep Learning
(DL) models like Convolutional Neural Networks
(CNN) and Long-Short Term Memory (LSTM)
combined with stylistic features
        <xref ref-type="bibr" rid="ref1 ref3 ref6">(Arago´n and
Lo´pez-Monroy, 2018)</xref>
        <xref ref-type="bibr" rid="ref1 ref3 ref6">(Bayot and Gonc¸alves,
2018)</xref>
        have yield encouraging performances.
      </p>
      <p>
        In this work we address precisely, the automatic
detection of gender and age of the authors, besides
the identification of the prevailing topic on textual
information from blogs. Also, we describe our
developed model for participating on TAG-it:
Topic, Age and Gender prediction for Italian2
        <xref ref-type="bibr" rid="ref5">(Cimino A., 2020)</xref>
        task at EVALITA 2020
        <xref ref-type="bibr" rid="ref2">(Basile
et al., 2020)</xref>
        .
      </p>
      <p>
        Having in account the proved ability of DL
1http://www.evalita.it/
2https://sites.google.com/view/
tag-it-2020
models to learn abstract depictions that are
omitted in hand-crafted features engine
methods, our approach is mainly based on them,
particularly on Bi-LSTM and Transformer Nets
        <xref ref-type="bibr" rid="ref17">(Vaswani et al., 2017)</xref>
        . We combine the feature
representations learned by DL models, with
handcrafted ones based on Term Frequency-Inverse
Document Frequency (tf-idf) and stylistic features.
      </p>
      <p>This paper is organized as follow: in the next
section a brief description about the different
subtasks of TAG-it task. Next, we present our
proposal. Specifically, we describe the data
preprocessing as well as the DL methods and features
used for depicting this data. Finally, the
experimental setting, the experiments conducted and the
results achieved.
1</p>
    </sec>
    <sec id="sec-2">
      <title>TAG-it Tasks</title>
      <p>Three sub-task have been proposed on TAG-it
task.</p>
      <p>• subtask 1: Toward to predict the gender, the
age (as an age range, eg: 20-29) and the topic
mentioned by the author given a collection of
texts written by him/her from a blog, all this
three dimensions at once.
• subtask 2a: For predicting gender.</p>
      <p>• subtask 2b: For predicting age.</p>
      <p>For these tasks a training corpus of texts written by
blogs users, with possibly multiple posts per user,
was provided. Each user information (i.e posts
per user) varies in terms of its length and quantity,
and the data for each subtask is unbalanced mainly
for gender and topic prediction tasks, which place
some complexity degree for the training stage of
the models for these classification tasks.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Our Proposal</title>
      <p>
        Deep Learning methods are capable to learn and
project relationships between elements within
textual information which are beyond the human
abstract comprehension. Therefore the use of just
hand-crafted representations may omit some
important patterns on textual information analysis.
However, stylistic and linguistic features have
proved to be good marks to determine some author
characteristics. Within the used DL models on AP
field, are the LSTM
        <xref ref-type="bibr" rid="ref10">(Labadie-Tamayo et al., 2020)</xref>
        and the Transformers Neural Nets, which rely on
two different paradigms. The first ones
analyses the information sequentially, token by token
whereas the second ones analyze all these tokens
at once, relating every one with respect to each
other. The opposite behavior of these two
architectures implies learning different patterns which
individually have proved to be an accurate way to
synthesize the information.
      </p>
      <p>We hypothesize that making an ensemble of these
deep representations and fusing it with
handcrafted ones as we show on Figure. 1 could yield
encouraging results on the proposed tasks.</p>
      <p>
        The first representation (Transformer Block)
based on Bidirectional Representation from
Transformers (BERT) Architecture
        <xref ref-type="bibr" rid="ref7">(Devlin et al., 2018)</xref>
        .
The second based on LSTM
        <xref ref-type="bibr" rid="ref13 ref9">(Hochreiter and
Schmidhuber, 1997)</xref>
        neural nets with self
attention mechanism (Att-LSTM) by using words
embedding (Recurrent Word-Level Block). The third
one, a condensed representation based on the
combination of stylistic features and a vector with the
tf-idf computation of some keys tokens from the
text (Stylistic Block). Finally (Recurrent
SentenceLevel Block), another representation based on
AttLSTM, but at this time, analyzing the sequence
information at sentence level.
      </p>
      <p>All these representations are concatenated and fed
into a dense layer, by using Leaky Rectified Linear
Unit (Leaky ReLU) activation function, to
synthesize the extracted information on each block and
its output vector goes to a softmax dense layer
which have the same number of neurons as classes
on the analyzed task, in order to make the
predictions.</p>
      <p>For dealing with the three classification tasks we
used the same architecture, but trained separately
for each of them, with different targets attending
to the task.
2.1</p>
      <sec id="sec-3-1">
        <title>Preprocessing</title>
        <p>
          In the preprocessing stage we concatenate the
posts corresponding to the same user, in order to
treat them as only one super-document, but
between each post we place a tag i.e h post i
denoting the ending-beginning of them. Afterwards,
the numbers and dates are recognized and replaced
by a corresponding wildcard which encodes the
meaning of these special tokens. Then, the text is
tokenized and morphologically analyzed by means
of FreeLing
          <xref ref-type="bibr" rid="ref11">(Padro´ and Stanilovsky, 2012)</xref>
          .
For computing the stylistic and tf-idf vectors as
for feeding the deep models on prevailing topic
detection task, we removed the stop words from
the document and lemmatized the tokens to their
canonical form.
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Transformer Block. BERT</title>
        <p>BERT (Bidirectional Encoder Representations
from Transformers) is an architecture resulting of
applying a bidirectional training to the attention
model Transformer, designed for language
modeling. The Transformer model has two mechanisms,
the first one, known as the encoder, which is fed
with the text and finds out an encoded
representation for the sequence. The second one, the
decoder, produces the predicted tokens for language
modeling one at time, having in account the
encoder’s output and the previous predicted tokens
on each time step.</p>
        <p>
          The main advantage of this transformer
models w.r.t. traditional sequential architectures like
Gated Recurrent Unit (GRU)
          <xref ref-type="bibr" rid="ref4">(Cho et al., 2014)</xref>
          is
that instead of analyzing the textual information in
one or another direction (e.g. right to left or left to
right) it takes in account the entire information at
once by using an attention mechanism, which
relates each word on the text with its surrounding
context.
        </p>
        <p>Since the goal of BERT is to generate a language
representation, only the encoder mechanism is
necessary. It is structured with transformer blocks
connected sequentially and each transformer block
is composed by attention heads working in
parallel. These transformer blocks give to their
subsequent layer one representation for each element of
the input text, but these representations correlates
the entire input context.</p>
        <p>The original BERT model is trained with two
subtasks, one of them consisting on predict some
masked words from a sentence and the other one
consisting on predict if two sentences are
consecutive in the given corpus text.</p>
        <p>
          For the TAG-it tasks we employed a pre-trained
BERT model on a multilingual corpus
(multilingual L-12 H-768 A-12)3
          <xref ref-type="bibr" rid="ref16">(Turc et al., 2019)</xref>
          ,
which is fed with the super-document sequence.
From this model we just used the first two
transformer blocks and as its output we keep the first
and last vectors from the input sequence
encoding, which are concatenated.
        </p>
        <p>Also we applied fine tuning on BERT, adding
an intermediate dense layer of 64 units by using
Leaky ReLU activation function, and taking as
target for training a multitask focus trying to make
predictions for age, topic and gender tasks at once.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Recurrent Word-Level Block</title>
        <p>
          The second representation block of our system is
based on LSTM nets. This block takes as input
a sequence of the preprocessed text information,
which is fed into an embedding layer, set up with
fixed weights from FastText
          <xref ref-type="bibr" rid="ref8">(Grave et al., 2018)</xref>
          pretrained word embedding4, obtaining from each
word of the sequence a vectorial representation.
The textual sequence is provided with relevant or
not information with respect to the task in
analysis. In order to highlight the most important
elements for encoding the message instead of making
the network pays attention to all elements alike,
the embedding layer output tokens are scored by
its relative importance over the other elements
on its context with Scaled Dot-Product Attention
Mechanism
          <xref ref-type="bibr" rid="ref17">(Vaswani et al., 2017)</xref>
          . Then, the
new scored sequence is fed into a
BidirectionalLSTM (BI-LSTM)
          <xref ref-type="bibr" rid="ref13 ref9">(Schuster and Paliwal, 1997)</xref>
          layer with 64 neurons which perform two
analysis over this sequence, in forward and backward
directions, for detecting not just relations of an
element with the previous ones, but also with the
elements that appear after it. Afterwards, the hidden
states from the Bi-LSTM layer are considered as
a new sequence, which is fed into another LSTM
with 64 neurons too, taking from its output just
the last hidden state, which represents the
Recur3https://github.com/google-research/
bert
        </p>
        <p>4https://fasttext.cc/docs/en/
crawl-vectors.html
rent Word-Level Block encoding.</p>
        <p>
          For training this block we applied dropout
          <xref ref-type="bibr" rid="ref15">(Srivastava et al., 2014)</xref>
          to the neurons of the attention and
LSTM layers in order to improve the generalizing
capability of the model.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>2.3.1 Scaled Dot-Product Attention</title>
        <p>This attention function at first, maps for each
sequence token three representations ( the query and
a key-value pair) for computing a compatibility
index between every pair of elements. Afterwards,
for each token ti is evaluated its compatibility w.r.t
every other sequence token tj by relating its query
vector qi with all the keys kj , then these
compatibilities cij are normalized with a softmax function
and used for scoring the value vectors vj in front
of that specific query. Finally, the attention based
representation for ti is computed as the weighted
sum of these pondered values vectors. This
computation is defined as follows:
Q</p>
        <p>KT
p
dk
Attention(Q; V; K) = sof tmax(
) V
(1)
Where Q; K 2 &lt;n dk and V 2 &lt;n dv are
matrices, which, on every row contain for query, key
and value respectively the mappings of the
sequence tokens, n corresponds to the length of the
sequence and dk; dv to the dimension of mapping
vectors for key and value respectively.
2.3.2 LSTM
LSTM networks are a special kind of RNNs,
which are specialized on analyzing sequential
data. These have a main cell unit (the recurrent
unit) which explores the data sequence one
element at each time step (left to right order). This
network shares the information captured in
previous steps, for computing the new hidden state
at the current time step. Inside the main cell is
contained a gate structure that informs to the
network which information preserve or forget from
the hidden sates of previous time steps for the
current computation.
2.4</p>
      </sec>
      <sec id="sec-3-5">
        <title>Stylistic Block. Stylistic Features</title>
        <p>The Representation based on stylistic features is
twofold; in one side we consider for
characterizing a user attending to some classification task, a
vector containing the tf-idf of a set of key tokens
from the text and on the other side we construct a
statistical style features vector which captures
information from distinct lexical and syntactical
linguistic layers.</p>
        <p>For constructing the first one we used a feature
selection approach which score every term
employed by users corresponding to some category
within a classification task and then are selected
the more relevant ones.</p>
        <p>
          For scoring the tokens we use IG
          <xref ref-type="bibr" rid="ref14">(Sebastiani,
2002)</xref>
          standing for Information Gain, which takes
into account the presence of a term in a category
as well as its absence. The information gain of a
term t in a class C is defined as:
        </p>
        <p>P (x; c)
P (x; c) log2 P (x)P (c)
IG(t; C) =</p>
        <p>X</p>
        <p>X
c2fC;Cg x2ft;tg
(2)
In this formula, probabilities are interpreted on an
event space of documents (e.g. P (t; C) indicates
the probability that, for a random document d,
term t does not occur in d and d belongs to
category C ).</p>
        <p>Once computed the IG for every term which
belongs to documents of the class ci, the 5l0c0 tokens
with highest IG are chosen for characterizing
this class , where lc is the number of the task
classes. Finally a 500 dimensional vector is
constructed where its components are computed
as the tf-idf of the representative terms from every
class.</p>
        <p>The second representation is computed
independently of the addressed task as a 12
dimensional vector where its components are
real numbers corresponding to statistical values
from lexical and syntactical linguistic layers (e.g
sentence, paragraph, syntactic layers) such as:
• Paragraph layer: Standard deviation of the
sentences’ length written by the user.
• Text layer: Number of stop words used.
• Sentence layer: Average of words’ length.
• Syntactic layer: Proportion of nouns over
adjective.</p>
        <p>These two representations are combined and fed
into a 64-neurons dense layer to synthesize the
information and later being fused it with the other
blocks representations.
2.5</p>
      </sec>
      <sec id="sec-3-6">
        <title>Recurrent Sentence-Level Block</title>
        <p>This block shares the same structure with the
Recurrent Word-Level Block, but instead to be
fed with a sequence composed by word
representations provided by a word embedding layer,
it is fed with a sequence resulting of encoding
each super-document’s sentence by means of
an encoder with a similar structure as the first
analyzed Transformer-Block .</p>
        <p>For this Recurrent Sentence-Level Block, we
trained the sentence encoder with the same
multitask focus as in the Transformer-Block , but aiming
to predict for each sentence from a document the
annotated characteristics (i.e age and gender) of
the user who it belongs to and the topic of its
surrounding text. Then we encode all the sentences
from the super-document composed by the user’s
posts, and we considered them as tokens from a
sequence at sentence level. Afterwards, that
sequence is fed into a model with the same structure
Att-Bi-LSTM as the Recurrent Word-Level Block
taking from this, as the user’s profile encoding, the
last hidden state from the second LSTM layer as in
the Word-Level block.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>The dataset used in this work was the one provided
by the task organizers. This dataset is unbalanced,
mainly for gender classification task, where the
male class represents the 82:6% of the examples.
In order to prevent a biased training of the model
we applied a class-weighting method, scoring
the computed loss for every examples having
in account the class which it belongs to (i.e
for examples from male class we give to the
computed loss a weight of 0.3 whereas for female
examples we pondered the loss to 0.7) this makes
that when parameters are updated by means of
the gradients, the models pays more attention
to the most weighted class, specifically to the
under-represented class.</p>
      <p>We pretrain the Transformer models from the
Transformer Block and the sentence encoder of
the Recurrent Sentence-Level Block independently
of the entire model and then we fixed the learned
weights.</p>
      <p>For fine tuning these BERT models we employ
Adam Optimizer, using categorical cross-entropy
loss function for every output layer, since we
applied multi-task learning over two epochs. The
learning rate for this training was set up to a
low value (lr=1e-5) since we wanted to keep the
parameters learned from the original train with
an enormous data as more as possible, while we
made the model focus on our addressed tasks,
also we set the decay = 2e-3 to the learning rate
scheduler.</p>
      <p>We evaluate and select the hyper-parameters as
the representation and features that we used for
our model by using a cross-validation method to
obtain a more realistic an unbiased performance
evaluation, making 5 splits for validation. On each
cross validation step, the dataset was split in 20%
for validation and 80% for training, keeping the
distribution of examples relative to the split size.
The performance of the model on training stage
was evaluated independently for each subtask by
using different combinations of representations
from Recurrent Word-Level Block (RNN-W),
Recurrent Sentence-Level Block (RNN-S),
Transformer Block (T) and Stylistic Block (STY).
For age and gender prediction we employed
Micro-F1 metric whereas for topic prediction we
used accuracy metric for the evaluation. In
Table. 1 we summarize the results obtained in terms
of the average of these metrics in cross-validation
training.</p>
      <p>As we can see, assembling the three deep
representations with the stylistic one, yield a good
performance in all cases through the cross-validation
process. However, the stylistic representation had
a soft negative influence on gender prediction task.</p>
      <p>Regarding the official results, we submitted
3 runs as UOBIT team, on each of them we
employed the representations learned by the
Transformer and Stylistic Blocks by tuning the
use of the Recurrent Blocks’ encode, as shown on
Table. 2.</p>
      <p>After the evaluation phase we try to remove
the stylistic features based representation and we
found out that this representation, possibly
berun
run-1
run-2
run-3
RNN-W T STY
RNN-S T STY
RNN-W RNN-S T STY
RNN-W RNN-S T
RNN-W RNN-S
T
cause of it introduces some noise, makes the
model to have a worst performance, at least on
those tasks related to the author attributes (i.e
gender and age) corresponding to task 2a and task 2b.
We think that noise introduced by these features
mainly comes from the fact that they are computed
based on key tokens from the text, these tokens
may suggest to the model that texts with same
topic belongs to the same class within gender or
age classification task.</p>
      <p>The performance of our system just by using the
deep representations of the Recurrent and
Transformer Blocks, yield a performance of 0.4606
under F1 metric on subtask 2b which improves the
ones reached by the best team of 0.409, whereas
this same combination improves our best official
run on subtask 2a. These results are shown on
Table. 2 under the row named Unofficial.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we described our system for
participating in the TAG-it Author Profiling task at
EVALITA 2020. Our proposal is based on an
ensemble of RNN, Transformer Neural Nets and
hand-crafted stylistic features. The system
receives as input a user’s profile textual information
as an only one super document (sequence), this
information is encoded in four different ways,
the first one by a Transformer Block, specifically
a fine tuned and reduced BERT model, the
second one, by a Recurrent Block based on an
Attention-Bi-LSTM model analyzing the
information at word level, the third one by a feature
representation based on the combination of tf-idf
information and stylistic features extracted from
the text. Finally the fourth one by the same
recurrent structure as in the Recurrent Worf-Level
Block, but analyzing the information at sentence
level.</p>
      <p>This four representations are mixed and fed into
a dense layer for synthesize them and its output
is received by another dense layer which classify
this profile taking into account the classes from the
addressed subtask.</p>
      <p>The results shown that considering both the
stylistic representation and the deep
representations learned by Recurrent and Transformer
models we obtain the best effectiveness based on the
accuracy measure for the task related to the topic
classification, but this behavior changed for age
and gender classification, due to the relationship
of syntactic structures of the text with the topic
that the user’s posts are related to. We think that
excluding the stylistic features or at least those
related to the frequency of tokens from the text,
could be a way to increase the effectiveness of
the ensemble, mainly on the age detection subtask.
Also analyzing the content of the posts at
character level, due to the informal text origin, would
solve the problem of missidentification of some
key words within te text. We would like to explore
these ideas in future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Mario</given-names>
            <surname>Ezra</surname>
          </string-name>
          <article-title>Arago´n and</article-title>
          <string-name>
            <surname>A-Pastor</surname>
            <given-names>Lo</given-names>
          </string-name>
          ´
          <fpage>pez</fpage>
          -Monroy.
          <year>2018</year>
          .
          <article-title>A straightforward multimodal approach for author profiling</article-title>
          .
          <source>In Proceedings of the Ninth International Conference of the CLEF Association (CLEF</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Roy</given-names>
            <surname>Khristopher</surname>
          </string-name>
          Bayot and Teresa Gonc¸alves.
          <year>2018</year>
          .
          <article-title>Multilingual author profiling using lstms: Notebook for pan at clef 2018</article-title>
          . In CLEF (Working Notes).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart Van Merrie¨nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning phrase representations using rnn encoder-decoder for statistical machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1406</source>
          .
          <fpage>1078</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Nissim M. Cimino</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta F</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Tagit@evalita2020: Overview of the topic, age, and gender prediction task for italian.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 cross-genre gender prediction (gxg) task</article-title>
          .
          <source>EVALITA Evaluation of NLP and Speech Tools for Italian</source>
          ,
          <volume>12</volume>
          :
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Edouard</given-names>
            <surname>Grave</surname>
          </string-name>
          , Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning word vectors for 157 languages</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Roberto</surname>
          </string-name>
          Labadie-Tamayo, Daniel Castro-Castro, and
          <string-name>
            <surname>Reynier</surname>
          </string-name>
          Ortega-Bueno.
          <year>2020</year>
          .
          <article-title>Fusing Stylistic Features with Deep-learning Methods for Profiling Fake News Spreader-Notebook for PAN at CLEF 2020</article-title>
          . In Linda Cappellato, Carsten Eickhoff, Nicola Ferro, and Aure´lie Ne´ve´ol, editors,
          <source>CLEF 2020 Labs and Workshops</source>
          , Notebook Papers.
          <article-title>CEUR-WS</article-title>
          .org, September.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>Llu´ıs Padro´</article-title>
          and
          <string-name>
            <given-names>Evgeny</given-names>
            <surname>Stanilovsky</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Freeling 3.0: Towards wider multilinguality</article-title>
          .
          <source>In Proceedings of the Language Resources and Evaluation Conference (LREC</source>
          <year>2012</year>
          ), Istanbul, Turkey, May. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Juan</given-names>
            <surname>Pizarro</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Using n-grams to detect bots on twitter</article-title>
          .
          <source>In CLEF (Working Notes).</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster and Kuldip</surname>
          </string-name>
          <string-name>
            <given-names>K.</given-names>
            <surname>Paliwal</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Trans. Signal Process</source>
          .,
          <volume>45</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Machine learning in automated text categorization</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Nitish</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Geoffrey E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Iulia</given-names>
            <surname>Turc</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Well-read students learn better: On the importance of pre-training compact models</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .08962v2.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Łukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>