<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Author Profiling based on Text and Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luka Stout</string-name>
          <email>l.stout@anchormen.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Musters</string-name>
          <email>r.musters@anchormen.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Pool</string-name>
          <email>c.pool@anchormen.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Anchormen</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>In this paper we describe our participation in the PAN 2018 shared task of Author Profiling. In this task we identify the gender of authors based on written text and shared images. We describe our approaches to the text-based, imagebased and the combined task. The presence of three different languages raises the question whether a single model architecture can be built that works well on all three languages. We also propose a way to combine multiple predictions on shared content into a single prediction on user-level. Our final system for text is an ensemble of a Naive Bayes model and a RNN with attention. The image classification is done by finding selfies and predicting the gender of the person on those images using CNNs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        With the gaining influence and importance of social media it becomes more and more
relevant to gain insights into the authors of content, mostly made up of images and text.
Because social media networks allow people to create anonymous accounts it becomes
of greater interest to the research community to get to know users on social media.
Knowing specific details about a user, like gender, age, native language or emotional
state is an interesting challenge for the marketing, forensic and security sectors. Author
Profiling[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is the task of determining an author’s features like gender, age, language
variety by understanding their online persona. In addition to the tweets the shared task of
2018[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] includes images that were shared by the authors as well. The goal is to infer the
gender of an author given one-hundred of their tweets and ten images, in three
different languages, English, Spanish and Arabic. The presence of three different languages
raises the question whether a single model architecture can be built that works well on
all three languages. The shared task is divided into three subtasks: Infer the gender based
on tweets, based on their shared images and a combination of the two. We have focused
on the text-based task, however we have also developed an image-based approach to
also participate in the combined task. For this we experimented with traditional
techniques, such as tf-idf and Naive Bayes[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as well as deep learning techniques, such as
Recurrent Neural Networks (RNNs)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Convolutional Neural networks (CNNs)[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
In this paper we describe our final systems and results.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset Description and Preprocessing</title>
      <p>
        The PAN 2018 Author Profiling[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] training set consists of text in three different
languages and images grouped by authors, who are labeled by gender and language. The
number of authors per gender is balanced in every language. This training set was used
for feature engineering, parameter tuning and training of the classification model. For
the languages English, Spanish and Arabic we received a dataset containing 100 tweets
and 10 images per author. For English and Spanish there are 3,000 authors and for
Arabic 1,500 authors. This gives a total of 750,000 tweets and 75,000 images. The goal of
the task is to predict the gender of a user given these 100 tweets and 10 images. We
have chosen to create models on tweet and image level and combine the predictions to
create a single prediction for every author.
      </p>
      <p>The following preprocessing steps were performed, the two additional
preprocessing steps for Arabic can be found online :
– Replaced numbers, URLs, hashtags, mentions, emojis and smileys with their own
unique tokens.
– Used a tokenizer to filter out punctuation and tokenize sentences into a list of
lowercase words.
– Expanded contractions. (For English)
– Normalization of tokens, namely unifying the orthography of alifs, hamzas, and
yas/alif maqsuras. (For Arabic)
– Noise removal, i.e. removing short vowels and other symbols (harakat). (For
Arabic)</p>
      <p>After preprocessing and tokenization, the maximum number of words in a tweet is
39 for English. For the other languages there are fewer than 200 tweets longer than 39
words. As this only accounts for 0.02% of all the tweets in the data set and to keep the
models consistent across languages, we have decided to cap the number of words in a
sentence to 39.</p>
      <p>
        Basile et al.[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] note that augmenting the tweet dataset with the data of previous
Author Profiling tasks[
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] does not improve the performance of the resulting
classifiers. They emphasize that this is due to temporal differences in the data. We have seen
that topics reflect events from 2017 are definitely present in the data. While the data
from previous years contains data with events from 2016 and before. As such we have
decided not to include additional datasets to limit the effects of these differences.
      </p>
      <p>
        For the image classification task we have used additional data to create our
classifier: a selfie dataset[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the MIRFLICKR dataset[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Their use is explained in
Section 5.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Prediction Strategies</title>
      <p>There are two ways to predict gender based on an author’s social media content. The
first is to treat all of the content as a single item and create a single prediction based
on the entirety of the data. This is analogous to a bag of words approach in the case of
text. However, concatenating or summing up the images at pixel level is not
straightforward and does not make intuitive sense. As such we have chosen a different approach.
The approach is to make predictions on the item-level and combine these predictions
somehow.</p>
      <p>There are multiple ways of constructing an author-level prediction based on
tweetlevel predictions, whether it is text, images or a combination of both. We have used three
different strategies. (1) The first strategy is using the majority class of all predictions.
(2) The second strategy is to use the mean probability of all predictions. (3) The last
strategy is to only use predictions where the model is very sure that an input indicates a
certain gender. With the latter strategy a weighted average of the predictions where the
weights are zero for predictions that are within a certain range is used:
wi =
(0
1
if &lt; Pi(f emale) &lt;
otherwise
;
P (f emale) =</p>
      <p>i=1
1 XN wiPi(f emale)
N
(1)
(2)
where P is the prediction by a single model for a single author, Pi is the prediction
for the i-th tweet or image of the author and N the number of tweets or images for
the author. If no such prediction exists we fall back to the second strategy, the mean
strategy. We have found = 0:25 and = 0:75 to be good default values. Our usage
of the third prediction strategy improved our accuracy on a validation set, as illustrated
in Table 1. The rationale is that the predictions where the model is sure that a certain
input points towards a specific gender are the only ones that have any influence on the
author-level prediction.</p>
      <p>(1) Majority
(2) Mean
(3) Sure</p>
      <p>
        acc
For author profiling, it has been shown that tf-idf weighted n-gram features, both in
terms of characters and words, are very successful in inferring gender[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As such we
have decided to use character 2- to 7-grams and word 1- to 3-grams with tf-idf weighting
with sublinear term frequency scaling[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Word embeddings are a distributed representation for text that is perhaps one of the
key breakthroughs for the impressive performance of deep learning methods on
challenging natural language processing problems[
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. They work in such a way that
words with similar meaning get a similar representation in a lower dimensional space.
These embeddings are trained on huge corpora of text to have the most context-specific
information. For English and Arabic we used the pretrained fastText embeddings[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
For Spanish the pretrained embeddings we used were trained on the Spanish Billion
Word Corpus Embeddings[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
4.2
      </p>
      <p>
        Recurrent Neural Network
RNNs[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are used to model sequences where the order is important. They have an
internal memory that keeps track of the examples they have seen so far in the current
sequence. Text is one of the clear use cases for RNNs[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] because of its sequential
nature.
      </p>
      <p>
        A challenge with using a recurrent neural networks is the vanishing gradient
problem. In this problem long dependencies get lost over time. The problem was explored in
depth in [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ] who found some fundamental reasons why it might be difficult to retain
these dependencies. One solution to this problem is to use multiple gates as the atomic
units within a recurrent neural network. Multiple versions of these gates exist, such as
Long Short Term Memory-units (LSTM)[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and Gated Recurrent Units (GRU)[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
Chung et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] note that both LSTM and GRU are superior over recurrent neural
networks with traditional tanh units. LSTMs are, in theory, better able to remember longer
sequences than GRUs and outperform them in tasks requiring modeling long-distance
relations. An advantage of GRUs over LSTMs is that they are computationally more
efficient because they have fewer within the units. As noted in Section 2 the texts are
small in length and as such we do not need the additional power of LSTMs and have
decided to use GRUs.
      </p>
      <p>
        Another way to solve the long term dependency problem is to use an attention
mechanism. They were recently demonstrated to have success in a wide range of tasks[
        <xref ref-type="bibr" rid="ref23 ref24 ref25 ref26">23,
24, 25, 26</xref>
        ]. We use a modification of the mechanism proposed by Zhou et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], in
which we have not used the weighted sum but instead have taken the global maximum
and the global average over the attention matrix and have concatenated the two.
      </p>
      <p>
        Bidirectional RNNs[
        <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
        ] are a combination of two seperate RNNs. The input
sequence is fed in the normal order for one network, and in reverse order for the other. The
outputs of the two networks are usually concatenated at each time step. This structure
allows the networks to have both backward and forward information about the sequence
at every time step. Human understanding of text works in the same way, we use the
context of words to determine their meaning. In our work the seperate RNNs have the same
configuration.
      </p>
      <p>
        Recurrent neural networks can require millions of parameters to sufficiently model
tasks. This high dimensional parameter space translates to a high chance of overfitting
on the training data set. Because large networks are slow to use, creating an ensemble
of many large networks is infeasible. One technique to reduce the overfitting is to add
dropout[
        <xref ref-type="bibr" rid="ref30 ref31">30, 31</xref>
        ] to the network. We used different amounts of dropout in different places
in the network. Between the embeddings and the recurrent layer of our network we
tanh
input
      </p>
      <p>Dense layer</p>
      <p>PReLU</p>
      <p>
        Concatenate
output
use spatial dropout[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] instead of normal dropout. The benefit of this is that entire
embedding channels can be dropped with a certain probability, which is better than
removing random points in the embedding matrix.
      </p>
      <p>We used 300-dimensional word embeddings as the input for our network. Spatial
Dropout with a rate of 0:4 is applied to the word embeddings. We used a bidirectional
GRU with 256 units for each direction. The GRU had a tanh activation function, an
output dropout-rate of 0:35 and an internal dropout-rate of 0:1. After the recurrent layer the
attention mechanism was applied. Global average and max pooling were applied to this
layer to get a single vector for every input text. The pooling operations are concatenated
together as input to a dense network with a dropout rate of 0:5 between every dense
block. A dense block consists of a fully connected layer with a PReLU[33] activation
function. We also applied the tanh activation function on the output of the PReLU and
concatenated it together with the original, as seen in Figure 1. Three such dense blocks
were used with respectively 256, 128 and 64 neurons. Because of the concatenation the
output size of these blocks is twice the number of neurons. The final output was a single
neuron with the sigmoid function. We optimize the model with the Adam[34] optimizer
and as the loss function we chose binary cross entropy. The network architecture can be
seen in Figure 2.
To do our final predictions on the texts we make use of an ensemble of two models.
The ensemble is a combination of a traditional model and a deep-learning model. The
deep-learning model (GRU) is described in the previous section.</p>
      <p>
        The traditional model is a multinomial Naive Bayes[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] classifier (NB) using the
character and word n-grams with tf-idf weighting on tweet level. Naive Bayes is a
family of classification algorithms based on the assumption that every feature being
classified is independent of any other feature given the class. The Naive Bayes classifier
considers each word in a piece of text to contribute independently to the probability
that the author is female (or male), regardless of any correlations between features.
Although it is based on independence assumptions that often do not hold in the real world,
Naive Bayes can often obtain surprisingly good results[35].
      </p>
      <p>The ensemble uses a weighted average to combine the output of different models.
The weights and models within the ensemble have the same architecture, regardless of
language.</p>
    </sec>
    <sec id="sec-4">
      <title>5 Image classification</title>
      <p>After inspecting the images in the dataset we found that a lot of users post selfies. Our
hypothesis is that if we could identify selfies, and detect the gender of a person on that
selfie we could predict the gender of the author of the picture. If we do not find selfies
for a user this pipeline will give a random prediction for the user.</p>
      <p>Our image model is not our main approach to this shared task and as such we hope
to improve our score in the combined task using it. Because of this it does not really
make an impact on our results if a user does not post selfies.</p>
      <p>
        For this approach we need a dataset consisting of selfies, and a dataset without
selfies. For the selfies class we used the selfie dataset provided by Kaleyeh et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In
this research 46.836 selfies where collected and annotated with 36 different attributes.
The focus of this research was to predict the popularity of a selfie. For the no-selfie
class we used the MIRFLICKR dataset[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This dataset consists of 25.000 images
from Flickr. The images are annotated with tags. We removed images containing the
tags ’person’, ’portrait’ or ’selfie’ resulting in 23.500 images.
      </p>
      <p>In 2012 Krizhevsky et al. won the ImageNet competition with a CNN[36]. Since
then they have been the default architecture to tackle computer vision problems. We
have used a CNN to detect selfies and if it is we predict the gender of this selfie with a
different CNN with the same architecture. The architecture we used is shown in Figure
3. There are 64 filters in every convolutional layer. The kernel-sizes are 3 3 and
the max-pooling size is 2 2. In every layer except the last we used the ReLU[37]
activation function. In the last dense layer it is a sigmoid. The selfie detection was
trained for 20 epochs using the Adam[34] optimizer on 150px by 150px versions of the
input images with a batch size of 256. We augmented the dataset by rescaling, zooming
and shearing and horizontal flipping of the images. We got a 96 percent accuracy of
correctly identifying a selfie on a validation set of our created dataset. We found that on
a small sample over 80% of the users post images that get classified as selfies. For this
model we got an accuracy of 86% on just selfies. The model does not perform well on
images that are not selfies.</p>
      <p>One caveat of this approach is that not every picture with a face posted is of the
author themselves. However we hypothesize that more often than not women will post
pictures of themselves or other women and likewise with men.
To combine the text models and the image models we use a weighted average. Overall,
the text models were vastly outperforming the image models, however the addition of
the image models did improve the overall performance of the system.</p>
      <p>We chose to keep a single configuration for all languages. The weighted mean
between the text models is a 1:4 ratio in favor of the RNN model. This is also the case for
the combination of the image model and the text models where the ratio is in favor of
the text models.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>As in previous years with this shared task the models are compared using accuracy of
correctly predicting the gender of an author. For every language the accuracy is
calculated. Then, the accuracies are averaged to obtain a final score for our submission. The
results in this section are evaluated on the PAN 2018 Author profiling evaluation set.</p>
      <p>Table 2 shows the accuracy of our text models and ensemble. We achieve an
accuracy of 76% for Arabic, 78.5% for English and 74.1% for Spanish on the evaluation
set using the ensemble. There is a big difference in performance between Arabic and
English, and Spanish. This might be because we have done additional preprocessing
for Arabic and English. The GRU model outperforms the Naive Bayes model. The
ensemble has a higher accuracy than the models separately for Spanish. For Arabic and
English this is not the case, here the GRU model has the highest performance. To
prevent overfitting on the very small test set we used for tweaking we did not alter our
ensemble based on these results.</p>
      <p>Using only the selfie model we get accuracies upwards of 62% of the different
languages, as can be seen in Table 2. This low accuracy might be because not all users
Arabic
English
Spanish
Average
post selfies so our model does not know what to predict. Another reason might be that
the selfies in the shared task dataset are different from the ones in the selfie dataset. It
might also be the case that the MIRFLICKR dataset might not be sufficiently diverse.
The images in this dataset are all high quality photos, which is not necessarily the
case for the images shared in the PAN ’18 dataset. We note that the accuracy on the
images shared by Spanish users is a lot higher than with the Arabic and English users.
We postulate that Spanish users might post more selfies or images representative for
gender. For this reason we could have chosen to make the weight of the image model
higher in the combined model case. However, to prevent overfitting, we have not done
this.</p>
      <p>The addition of the image models to the text models did give a very small
improvement to the accuracy of our models (0.3%). This is because there is a big difference
between the performance of the two approaches. If the performance of our image models
would be on the same level as our text models we would see a significant improvement
by using an ensemble of the two.
8</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper we have used a combination of text models and image models to
create gender predictions for three different languages. We have done the predictions on
individual tweets and images and then used multiple strategies to combine these
predictions to create a single prediction on user level. We have also chosen to keep a single
configuration of the system across the languages.</p>
      <p>As such our performance on the individual languages is not as high as it could
have been, had we optimized every combination of models for the different regions.</p>
      <p>yThe results of the NB and GRU models are obtained by evaluating the models on a small test
set of 100 users as it was not possible to run the models on the evaluation set used. As such they
might not be entirely representable for the performance of our models. We show these results for
completeness.</p>
      <p>An ensemble of a RNN and a bag of words model did improve performance on the
English language, with respect to just using the RNN, but it does not improve on the
other languages.</p>
      <p>On the evaluation set, we got accuracy scores between 62.3% and 78.8% depending
on language and whether we used models that classify based on text or on images. On
our small test set our non-ensemble models showed an improved performance, however
the test set only contained 100 users and as such were not be representable for the
distributions shown in the evaluation set.</p>
      <p>To conclude: we successfully defined an ensemble of deep-learning and traditional
models capable of good performance.
33. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification. In: Proceedings of the IEEE international conference
on computer vision. (2015) 1026–1034
34. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization
35. Zhang, H.: The optimality of naive bayes. (2004)
36. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: Advances in neural information processing systems. (2012) 1097–1105
37. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in
convolutional network. arXiv preprint arXiv:1505.00853 (2015)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gómez-Adorno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Posadas-Durán</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez-Perez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chanona-Hernandez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Improving feature representation based on a neural network for author profiling in social media texts</article-title>
          .
          <source>Computational intelligence and neuroscience</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</article-title>
          . In Bellot, P.,
          <string-name>
            <surname>Trabelsi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murtagh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanjuan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , N., eds.:
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18)</source>
          , Berlin Heidelberg New York, Springer (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hand</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Idiot's bayes - not so stupid after all?</article-title>
          <source>International Statistical Review</source>
          <volume>69</volume>
          (
          <issue>3</issue>
          )
          <fpage>385</fpage>
          -
          <lpage>398</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berkowitz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A critical review of recurrent neural networks for sequence learning</article-title>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          <volume>521</volume>
          (
          <issue>7553</issue>
          ) (
          <year>2015</year>
          )
          <fpage>436</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</article-title>
          . In Cappellato, L.,
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          , L., eds.
          <source>: Working Notes Papers of the CLEF 2018 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (September</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haagsma</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
          </string-name>
          , M.:
          <string-name>
            <surname>N-GrAM</surname>
          </string-name>
          :
          <article-title>New Groningen Author-profiling Model</article-title>
          .
          <source>(July</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 4th author profiling task at pan 2016: Cross-genre evaluations</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings</source>
          , Évora, Portugal,
          <article-title>CLEF and CEUR-WS.org, CLEF and CEUR-WS.org (</article-title>
          <year>2016</year>
          /09 2016)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter</article-title>
          .
          <source>Working Notes Papers of the CLEF</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kalayeh</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seifu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LaLanne</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>How to take a good selfie?</article-title>
          <source>In: Proceedings of the 23rd ACM International Conference on Multimedia. MM '15</source>
          , New York, NY, USA, ACM (
          <year>2015</year>
          )
          <fpage>923</fpage>
          -
          <lpage>926</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huiskes</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lew</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>The mir flickr retrieval evaluation</article-title>
          .
          <source>In: MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval</source>
          , New York, NY, USA, ACM (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>(</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Learning word vectors for 157 languages</article-title>
          . CoRR abs/
          <year>1802</year>
          .06893 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Cardellino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <source>Spanish Billion Words Corpus and Embeddings (March</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karafiát</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , Cˇernocky`,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Khudanpur</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In: Eleventh Annual Conference of the International Speech Communication Association</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Untersuchungen zu dynamischen neuronalen netzen</article-title>
          .
          <source>Diploma, Technische Universität München</source>
          <volume>91</volume>
          (
          <year>1991</year>
          )
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasconi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (</article-title>
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          . arXiv:
          <volume>1406</volume>
          .1078 [cs, stat] (
          <year>June 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>arXiv preprint arXiv:1412.3555</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. Hermann,
          <string-name>
            <given-names>K.M.</given-names>
            ,
            <surname>Kocisky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Grefenstette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Espeholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Kay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Suleyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Blunsom</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Teaching machines to read and comprehend</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . (
          <year>2015</year>
          )
          <fpage>1693</fpage>
          -
          <lpage>1701</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Chorowski</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serdyuk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Attention-based models for speech recognition</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . (
          <year>2015</year>
          )
          <fpage>577</fpage>
          -
          <lpage>585</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Classifying relations via long short term memory networks along shortest dependency paths</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          . (
          <year>2015</year>
          )
          <fpage>1785</fpage>
          -
          <lpage>1794</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Attention-based bidirectional long short-term memory networks for relation classification</article-title>
          .
          <source>In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          <article-title>)</article-title>
          . Volume
          <volume>2</volume>
          . (
          <year>2016</year>
          )
          <fpage>207</fpage>
          -
          <lpage>212</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliwal</surname>
            ,
            <given-names>K.K.:</given-names>
          </string-name>
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          <volume>45</volume>
          (
          <issue>11</issue>
          ) (
          <year>1997</year>
          )
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Bidirectional long short-term memory networks for relation classification</article-title>
          .
          <source>In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation</source>
          . (
          <year>2015</year>
          )
          <fpage>73</fpage>
          -
          <lpage>78</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <issue>1</issue>
          ) (
          <year>2014</year>
          )
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Recurrent neural network regularization</article-title>
          .
          <source>arXiv preprint arXiv:1409.2329</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Tompson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goroshin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bregler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Efficient Object Localization Using Convolutional Networks</article-title>
          .
          <source>arXiv:1411.4280 [cs] (November</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>