<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Author Profiling with Bidirectional RNNs using Attention with GRUs</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Don Kodiyan</institution>
          ,
          <addr-line>Florin Hardegger, Stephan Neuhaus, and Mark Cieliebak</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Zurich University of Applied Sciences</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>This paper describes our approach for the Author Profiling Shared Task at PAN 2017. The goal was to classify the gender and language variety of a Twitter user solely by their tweets. Author Profiling can be applied in various fields like marketing, security and forensics. Twitter already uses similar techniques to deliver personalized advertisement for their users. PAN 2017 provided a corpus for this purpose in the languages: English, Spanish, Portuguese and Arabic. To solve the problem we used a deep learning approach, which has shown recent success in Natural Language Processing. Our submitted model consists of a bidirectional Recurrent Neural Network implemented with a Gated Recurrent Unit (GRU) combined with an Attention Mechanism. We achieved an average accuracy over all languages of 75,31% in gender classification and 85,22% in language variety classification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Social media has become an important platform for communication and exchange of
information. In contrast to classical letters and emails, the language on social media is
much more personal. This raises the question whether the text style and content allows
to draw conclusions about demographics traits of its author, such as age, gender, or
language variety. Such insights can be used in various applications, such as forensics,
security, or marketing. For instance, on the basis of such profiles it would be possible
to determine which users could be interested in a new product or campaign, how urgent
a complaint is, or if a profile in an online forum might be a fake profile.</p>
      <p>
        The Author Profiling Shared Task of the PAN shared task aims to answer these
question by extracting information about authors based on their linguistic style of
writing [
        <xref ref-type="bibr" rid="ref13 ref14">14,13</xref>
        ]. The goal of the 2017 shared task at PAN is to detect the author’s gender
and dialect from his/her Twitter texts. Both training and test data is provided in four
different languages: English, Spanish, Portuguese and Arabic.
      </p>
      <p>We have implemented a solution that is based on a bidirectional recurrent neural
network (bi-RNN) using gated recurrent units (GRUs) in combination with an attention
mechanism.</p>
      <p>The paper is structured as follows. In Section 3, we give a short overview of related
work. Then, in Section 4, we describe our model, and Section 5 compares the different
attempts and their results on test data. Conclusions are drawn in the last section.</p>
      <p>
        PAN
PAN is a series of different digital text forensics tasks. It organizes shared task
evaluations. Shared Tasks are computer science events of a specific problem of interest. This
paper is the result of our participation at the Author Profiling Shared Task of 2017.
Author Profiling includes gender and language variety predictions of an author of a given
Twitter document. To solve this problems, training and test datasets are available [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
PAN 2017 Training Data. PAN 2017 Training Data consists of Twitter profiles in
four different languages: English, Spanish, Portuguese and Arabic. The corpus was
annotated with gender and language variety information about the authors.
      </p>
      <p>For each of the language varieties, there are 600 Twitter profiles. In each language
there are the same number of male and female profiles. The dataset includes exactly
100 tweets for each author.</p>
      <p>
        Language Variety. Language Variety is defined as a specific variation of an author’s
native language. For instance, one has to identify whether an English author has a
language variation from Australia, Canada, Great Britain, Ireland, New Zealand or the
United States.
TIRA. TIRA is an evaluation-as-a-service platform [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The submission for the PAN
shared task was done with this tool. The submitted models were self-evaluated on a
virtual machine which was hosted by the organizers. The test data was only available
on this virtual machine and was not visible to the participants.
      </p>
      <p>Evaluation. The performance measure of the submissions at PAN 2017 is done with
accuracy. The individual accuracy for gender and variety identification was calculated
for each language as follows:</p>
      <p>correct predicted
accuracy : (1)</p>
      <p>total</p>
      <p>The joint accuracy is calculated when both gender and variety are properly
predicted together. The final ranking is calculated with the averaged accuracy over all four
languages.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In this chapter we provide an overview of the most relevant works for the Author
Profiling Task with neural networks.</p>
      <p>
        Neural Networks. Neural networks have achieved great results in natural language
processing in the past few years. In many tasks like machine translation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
sentiment analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], neural networks have proven to be very successful. The two
state-ofthe-art neural networks used today are recurrent neural networks (RNN) and
convolutional neural networks (CNN). The main challenge in most NLP tasks is to simplify the
input sequence and keep the most important information. Research on neural machine
translation (NMT) already focuses heavily on this challenge. For that reason we applied
techniques from NMT to the Author Profiling Task.
      </p>
      <p>
        RNNs and CNNs. The recent success of RNNs are achieved through long short-term
memory networks (LSTM) and gated recurrent unit networks (GRU) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. With their
capabilities of long-term dependencies, LSTMs and GRUs have achieved
state-of-theart results in various NLP tasks. The work of Bahdanau et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] proposed an attention
mechanism to simplify a sequence. In combination with a bidirectional RNN (bi-RNN)
learns this approach to automatically weigh the most relevant information of the input
sequence. This leads to substantial improvements in machine translation and other fields
like automatic summarization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The latest research of Gehring et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has shown
that CNNs are capable of achieving state-of-the-art results in NMT. Those results were
achieved by applying the attention mechanism to CNNs. CNNs are computationally
less expensive compared to LSTMs and GRUs, which makes them preferable for large
datasets.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>In this chapter we describe the technical solution. Main focus is on the system
architecture of the neural networks.
4.1</p>
      <sec id="sec-3-1">
        <title>Preprocessing</title>
        <p>
          Every single tweet was preprocessed by converting them to lower-case. We replaced
URLs and usernames with a standardized token. We converted hashtags to regular
words and used the TweetTokenizer from NLTK [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to tokenize the tweets. We use
a vocabulary to map tokens with an token-ID. The IDs point to a vector representation
of the token, which is used later. After the preprocessing step we receive a list of tweets
of each author and each tweet is a list of token-IDs.
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Embeddings</title>
        <p>
          Each token in a tweet is represented by pretrained word embeddings [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For English
and Spanish we used embeddings created with word2vec [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. For both languages a
corpus of 200 million unlabelled tweets were used. The skip-gram algorithm was used
for training with window-size 5, sample size of 1e-05, minimum frequency of 15 and
200 dimensions.
        </p>
        <p>
          For Portuguese and Arabic we used pretrained embeddings from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which were
trained on Wikipedia corpus1. They have an output dimension of 300.
4.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Architecture</title>
        <p>
          In this section we describe our model, which consists of a bi-RNN with GRUs followed
by an attention mechanism.
Embedding Layer. The embedding layer is used to map the token-IDs with their
vector representation. The token-ID is used to lookup the word-vector in the embeddings.
Those vectors get concatenated and are passed to the next layer. This results in an
output matrix S P Rd n, where d stands for the dimension of the word vector and n for the
size of the input. To determine n, we took the tweet with the biggest amount of tokens
from our training dataset and rounded the number up to the next 10. This resulted in a
maximum input size of n 60. Shorter inputs were padded with zeros to match that
size. To reduce the effect of unknown and padded words we used masking [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This way
our model only uses known words and skips zero-values.
        </p>
        <p>GRU Layer. This layer consists of two GRUs with u number of units. We used a GRU
for each direction, which resulted two matrices RF P Ru n and RB P Ru n. Finally
both matrices were concatenated and resulted a matrix R P R2u n. For our model we
used u 50.
1 http://wikipedia.org
Attention Layer. This layer is used to weight the most important parts of the GRU
encoded input and deliver a more simplified matrix of the input. The output-matrix R
of the previous layer, the weight-matrix Wa P R2u 2u and the bias b P R2u is used to
calculate a hidden state ht:
The hidden state ht and the weight-vector Wu P R2u used to calculate the final attention
a for each word by
ht
a
tanhpWaR</p>
        <p>
          bq:
softmaxphtWuq:
(2)
(3)
The attention-vector a is then multiplied with R and the result summed together. This
results a summarized representation of the sentence as a vector sa P R2u.
Softmax Layer. As the final layer we used a fully connected layer with softmax as
the activation function. The number of output nodes were depending on the number of
classification possibilities. For gender prediction were 2 nodes required, for language
variety predictions were between 2 and 7 nodes required, depending on the language.
Dropout. Dropout drops individual nodes during training with a probability of p and
is therefore used to reduce overfitting [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We used dropout on our softmax layer with
p 0:2.
        </p>
        <p>
          Optimization. Our model is trained using the AdaDelta optimizer [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. We used
10 5 and default values for the other hyper-parameters.
        </p>
        <p>Author Prediction. Our model is trained to classify single tweets. To get the
classification of an author, his tweets are classified separately. The outputs of our model,
which is the output of the softmax layer, is then summed together and the class with the
highest value is the final prediction. For example, if we want to predict the gender of an
user u who has three tweets t1; t2; t3, we first classify the tweets separately. This could
result following predictions: t1 r 0:4; 0:6s; t2 r 0:3; 0:7s; t3 r 1:0; 0:0s. The first
number of each output indicates the probability that the tweet is written by a female
and the second number indicates the probability that the author is a male. The outputs
of the tweets t1; t2; t3 are summed together and results r1:7; 1:3s. In this example, user
u would be predicted as a female.
4.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Training</title>
        <p>To train our models for submission we used 90% of the training data and the
remaining 10% were used as validation set. The validation set was used to select a model
checkpoint during training. For more details in model checkpoints, see Section 5.1.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>We distinguish between the evaluation during development, and the benchmark
measured on actual test data on TIRA. The results during the development phase were
achieved on the provided training corpus with cross validation.</p>
      <p>tp f n</p>
      <p>The harmonic mean of this two scores is called F1 score. The F1 score is calculated
as follows:</p>
      <p>F 1
2
precision recall
precision recall
:
(4)
(5)
(6)
tp f p</p>
      <p>Recall is the ratio between correct classified data (tp) to the number of total data in
the corresponding class (tp f n):
precision
recall</p>
      <p>tp
tp
:
:
Cross Validation. Our models were trained with 10-fold cross validation. We used
cross validation to calculate a representative score for the model. The data in each fold
was used as follows: 80% training data, 10% validation data and 10% test data. The
evaluation on the test data does not influence the training and is only used to evaluate
the model. We used a validation set in combination with model checkpoints to prevent
overfitting. Model checkpoints will be explained in the following section.
F1 Score. During the training phase we used F1 score to find the best model. The F1
score considers both precision and recall to compute the score. We used the F1 score,
because it penalizes one-sided predictions of a model. The abbreviations tp, f p, f n
indicate in the following calculations true positives, false positives and false negatives.
Precision is the ratio between correct predicted (tp) to all classified data of this class
(tp f p):
5.1</p>
      <sec id="sec-4-1">
        <title>Model Checkpoints</title>
        <p>The accuracy and F1 score of the model were measured during training. The scores
were evaluated on a validation and a test dataset. If the model achieved a higher F1
score on the validation data than a previous one, the model (and its weights) was saved.
An example of the measured scores is shown in Figure 2.</p>
        <p>The goal is to select the best weights for a model during the training phase. Figure
2 shows that our model performs very similar on validation and test data. That means
by choosing the best weights on the validation set, the chances are high that the model
performs equally on the test set. This makes our model very stable and predictable.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Analysis of the Attention</title>
        <p>While working with attention mechanism we developed a tool to represent how the
different words in a tweet are weighted. This tool helped us to understand which words
are more important for our model. An example on language variety is shown in Figure 3,
where multiple tweets of British and American authors are compared.</p>
        <p>
          In Figure 3 the attention of the words are highlighted. As we can see some typical
American English and British English words are marked. For example, in the first tweet
is the word "color" and in the third tweet "Walmart" marked as very important, which
are common words in American English. In the second and fourth tweet are the words
"bloody" and "cheeky" marked as significant for British English, which are common
words in British English.
During our preparation for the PAN shared task several models were tested and
compared. Our baseline was a CNN model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] which already participated in PAN 2016. The
model has a 2-layer CNN architecture with a fully-connected softmax layer at the end.
        </p>
        <p>The experiments have shown that the bi-GRU+Attention model has the best
performance on both classification tasks (gender, variety). The measured scores of both
models are shown in Table 2 and Table 3.
We trained two distinct models for each language: one for gender and one for variety.
These models were uploaded to the virtual machine and were evaluated on the actual
test dataset. In Table 4 the results obtained on the PAN 2017 Author Profiling test dataset
are shown.</p>
        <p>The highest score on gender prediction was achieved in English. Portuguese gender
prediction follows with 0.075% less accuracy. The gender predictions in Spanish and
Arabic are lower than the others. We assume that this issue is related to the worse
vocabulary usage: For both languages Spanish and Arabic, the vocabulary coverage is
below 80%, in contrast to around 90% coverage of the vocabularies in English and
Portuguese.</p>
        <p>In general, good scores are achieved for variety prediction. Outstanding is the
variety accuracy of 91,43% for the Spanish language, which consists of seven language
variations. The score dropped only in English and Arabic below 80%. The lowest score
76,88% is achieved for variety prediction on Arabic, due to low vocabulary coverage.
The exact vocabulary coverage of the used embeddings is shown in Table 5.</p>
        <p>The results in Table 5 seems to imply that the accuracy for gender prediction
correlates with vocabulary coverage.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented deep learning models to predict gender and language
variety of Twitter profiles. We described a bidirectional RNN with GRU and an attention
mechanism. We compared the average accuracy of our models over all languages with
a previously developed CNN model. The RNN exceeds the CNN in gender prediction
by 1,45% and in variety prediction by 2,69% on average over four languages on PAN
2017 training data.</p>
      <p>
        For future work, we would like to see if a combination of several high-quality
solutions for Author Profiling with a random forest could even outperform each of the
subsystems. This has been done successfully for sentiment analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and it would be
interesting to see if it works for Author Profiling as well.
7
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural Machine Translation by Jointly Learning to Align and Translate</article-title>
          .
          <source>CoRR abs/1409</source>
          .0473 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.: Natural</given-names>
          </string-name>
          <string-name>
            <surname>Language Processing with Python. O'Reilly Media</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          .
          <source>CoRR abs/1607</source>
          .04606 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          <volume>17</volume>
          (
          <issue>11</issue>
          ),
          <fpage>1875</fpage>
          -
          <lpage>1886</lpage>
          (
          <year>Nov 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.: Keras. https://github.com/fchollet/keras (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Gülçehre, Ç.,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>CoRR abs/1412</source>
          .3555 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Deriu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cieliebak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sentiment Analysis using Convolutional Neural Networks with Multi-Task Training and Distant Supervision on Italian Tweets</article-title>
          . In:
          <article-title>Evaluation of NLP and Speech Tools for Italian (EVALITA) (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Deriu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luca</surname>
            ,
            <given-names>V.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Severyn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cieliebak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaggi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web</source>
          . pp.
          <fpage>1045</fpage>
          -
          <lpage>1052</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dürr</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzdilli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cieliebak</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>JOINT_FORCES: Unite Competing Sentiment Classifiers with Random Forest</article-title>
          .
          <source>SemEval 2014-Proceedings of the 8th International Workshop on Semantic Evaluation</source>
          pp.
          <fpage>366</fpage>
          -
          <lpage>369</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gehring</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grangier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yarats</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>Y.N.</given-names>
          </string-name>
          :
          <article-title>Convolutional Sequence to Sequence Learning</article-title>
          . ArXiv e-prints (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>CoRR abs/1310</source>
          .4546 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In: Kanoulas,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Toms</surname>
          </string-name>
          , E. (eds.)
          <article-title>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization</article-title>
          .
          <source>5th International Conference of the CLEF Initiative (CLEF 14)</source>
          . pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : Overview of PAN'17:
          <string-name>
            <surname>Author</surname>
            <given-names>Identification</given-names>
          </string-name>
          , Author Profiling, and
          <string-name>
            <given-names>Author</given-names>
            <surname>Obfuscation</surname>
          </string-name>
          . In: Jones,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Lawless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          .
          <source>8th International Conference of the CLEF Initiative (CLEF 17)</source>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          , T. (eds.)
          <source>Working Notes Papers of the CLEF 2017 Evaluation Labs</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Dropout: A Simple Way to Prevent Neural Networks from Overfitting</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          (
          <year>Jan 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ljubešic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiedemann</surname>
          </string-name>
          , J.:
          <article-title>Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection</article-title>
          .
          <source>In: Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)</source>
          . pp.
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          . Reykjavik, Iceland (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zeiler</surname>
          </string-name>
          , M.D.:
          <article-title>ADADELTA: An Adaptive Learning Rate Method</article-title>
          .
          <source>CoRR abs/1212</source>
          .5701 (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>