<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pooled LSTM for Dutch cross-genre gender classi cation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matej Martinc</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senja Pollak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>matej.martinc@ijs.si</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>senja.pollak@ijs.si</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jozef Stefan Institute</institution>
          ,
          <addr-line>Jamova 39, 1000 Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Usher Institute, Medical school, University of Edinburgh</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results of cross-genre and in-genre gender classi cation performed on the data sets of Dutch tweets, YouTube comments and news prepared for the CLIN 2019 shared task. We propose a recurrent neural network architecture for gender classi cation, in which the input word and part-of-speech sequences are fed to the LSTM layer, which is followed by average and max pooling layers. The best cross-genre accuracy of 55.2% was achieved by the model trained on YouTube comments and tweets, and tested on the balanced news corpus, while the best in-genre accuracy of 61.33% was achieved on YouTube comments. Overall, the proposed approach ranked 2nd in the global cross-genre ranking and 6th in the global in-genre ranking of CLIN 2019 shared task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Author pro ling (AP) is a well-established sub eld of natural language
processing with a thriving community gathering data, organizing shared tasks and
publishing about this topic. AP entails the prediction of an author's pro le - i.e.
demographic and/or psychological characteristics of the author - based on the
text that he/she has written. The single most prominent author pro ling task
is gender classi cation, other tasks include the prediction of age, personality,
region of origin and mental health of an author.</p>
      <p>
        Gender prediction became a mainstream research topic with the in uential
work by Koppel et al. (2002). Based on the experiments on a subset of the
British National Corpus, they found that women have a more relational writing
style (e.g., using more pronouns) and men have a more informational writing
style (e.g., using more determiners). Later gender prediction research remained
focused on English, but in the last few years, more languages have received
attention in the context of author pro ling
        <xref ref-type="bibr" rid="ref14 ref19">(Rangel et al., 2015, 2016)</xref>
        , with
the publication of the TwiSty corpus containing gender information on Twitter
authors for six languages
        <xref ref-type="bibr" rid="ref19 ref21">(Verhoeven et al., 2016)</xref>
        as a highlight so far.
      </p>
      <p>Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)</p>
      <p>
        A recent study by van der Goot et al. (2018) calls the cross-genre
transferability of machine learning approaches to gender prediction into question by noticing
that most of these approaches has typically focused on lexical and specialized
social network features, which boosted the performance of the approaches, but
on the other hand also made the approaches highly genre and topic dependent.
To solve this problem, a fairly new development in the eld of AP is the search
for data set independent features and approaches, capable of capturing the most
generic di erences between male and female writing, which transfer well across
di erent genres and languages
        <xref ref-type="bibr" rid="ref4 ref9">(Dell Orletta and Nissim, 2018)</xref>
        . This is also the
main focus of the present research, in which we primarily deal with the
development and testing of the system for Dutch cross-genre gender classi cation.
In contrast to the majority of the best performing systems in the eld of AP,
which use hand-crafted features and traditional classi ers such as Support vector
machines (SVM) and Logistic regression
        <xref ref-type="bibr" rid="ref18">(Rangel et al., 2017)</xref>
        , we opted for the
neural classi er and automated feature engineering.
      </p>
      <p>This paper is structured as follows. The ndings from the related work are
presented in Section 2. The data sets and the methodology used are presented
in Section 3. Results are presented in Section 4, while in Section 5 we conclude
the paper and present plans for the future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The lively AP community is centered around a series of scienti c events and
shared tasks on digital text forensic, such as PAN (Uncovering Plagiarism,
Authorship, and Social Software Misuse)1 and VarDial (Varieties and Dialects)2
        <xref ref-type="bibr" rid="ref22">(Zampieri et al., 2014)</xref>
        . While VarDial is more focused on the identi cation of
language varieties and dialects, most past PAN AP shared tasks were centered
around gender classi cation.
      </p>
      <p>
        The rst PAN event took place in 2011 and the rst AP shared task was
organized in 2013
        <xref ref-type="bibr" rid="ref17">(Rangel et al., 2013)</xref>
        . From the beginning, the PAN shared
task was multilingual
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17 ref18 ref19">(Rangel et al., 2013, 2014, 2015, 2016, 2017, 2018)</xref>
        and
two of the past competitions also had a cross-genre setting
        <xref ref-type="bibr" rid="ref15 ref19">(Rangel et al., 2014,
2016)</xref>
        . Another shared task dedicated to cross-genre gender classi cation on
Italian documents was the EVALITA 2018 cross-genre gender prediction (GxG)
task
        <xref ref-type="bibr" rid="ref4 ref9">(Dell Orletta and Nissim, 2018)</xref>
        .
      </p>
      <p>
        The most popular approach to gender classi cation usually relies on
bagof-words features and SVM classi ers. For instance, winners of the PAN 2017
competition
        <xref ref-type="bibr" rid="ref1">(Basile et al., 2017)</xref>
        used an SVM based system with very simple
features (just word unigrams, bigrams and character three- to ve-grams).
      </p>
      <p>
        Some quite successful attempts of tackling the gender classi cation with
neural networks have also been reported. A system consisting of a recurrent neural
network (RNN) layer, a convolutional neural network (CNN) layer, and an
attention mechanism proposed by Miura et al. (2017) ranked fourth in the PAN
1 http://pan.webis.de/
2 http://corporavm.uni-koeln.de/vardial/sharedtask.html
2017 shared task. In the PAN 2018 multimodal gender classi cation task
        <xref ref-type="bibr" rid="ref16">(Rangel
et al., 2018)</xref>
        , where the task was to predict the gender of the Twitter user from
their tweets and published images, deep learning approaches were prevailing and
the overall winners used RNN for texts and CNN for images
        <xref ref-type="bibr" rid="ref20">(Takahashi et al.,
2018)</xref>
        .
      </p>
      <p>
        Another related research we looked at was the use of part-of-speech (POS)
tags in existing gender classi cation approaches, since we hypothesized that
POS based features would be less topic and genre-dependent, and therefore
appropriate for the cross-genre task at hand. Mukherjee and Liu (2010) showed
that sequences of POS tags can be successfully used for gender prediction as
a standalone feature or in combination with other features. POS tag sequences
were also successfully used in combination with other features in the PAN 2017
AP shared task by Martinc et al. (2017), who overall ranked second in the
competition and also tested their model in a cross-genre setting
        <xref ref-type="bibr" rid="ref4 ref9">(Martinc and
Pollak, 2018)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental setup</title>
      <p>This section describes the data sets, methodology and the conducted
experiments.
3.1</p>
      <p>Data sets
CLIN 2018 shared task organizers provided six data sets from three di erent
genres. Altogether, they provided 30,000 tweets, 19,658 YouTube comments and
2,832 news, each of them split into a gender labeled train set and an unlabeled
test set. All data sets are balanced in terms of number of documents written
by male and female authors. A more detailed description of all the data sets in
terms of document size and word length is given in Table 1.</p>
      <sec id="sec-3-1">
        <title>Dataset Documents Words</title>
        <p>Twitter train 20,000 380,074
Twitter test 10,000 192,306
YouTube train 14,744 280,498
YouTube test 4,914 87,038
News train 1,832 336,602</p>
        <p>News test 1,000 401,235</p>
        <p>Word sequence
Word embedding (200)
Bidirectional LSTM (256)</p>
        <p>POS sequence
POS embedding (200)
Bidirectional LSTM (256)
Max pooling (256)</p>
        <p>Avg pooling (256)</p>
        <p>Max pooling (256)</p>
        <p>Avg pooling (256)
Concatenation layer (1024)</p>
        <p>RELU
Dropout (0.4)</p>
        <p>Dense (2)
Altogether, six classi cation models, three in-genre and three cross-genre, were
trained and later used for prediction in our experiments. For the in-genre
experiments, the train set for a speci c genre was randomly split into a train set
containing 90% of the documents and a validation set containing 10% of the
documents. For the cross-genre experiments, we trained the Twitter cross-genre
model on a concatenation of YouTube and news train sets (Twitter train set
was used as a validation set during training), YouTube cross-genre model was
trained on a concatenation of Twitter and news train sets (YouTube train set was
used as a validation set during training) and news cross-genre model was trained
on tweets and YouTube comments (news train set was used as a validation set
during training).</p>
        <p>Text preprocessing is light, we only replace hashtags in some of the data sets
with #HASHTAG tokens, URLs with HTTPURL tokens and mentions with
@MENTION tokens. We also limit the text vocabulary to 30,000 most frequent
words and replace the rest with the &lt;unk&gt; token.</p>
        <p>
          We decided on a neural approach to the task at hand, mostly because of the
relatively large sizes of the available train and test sets (described in Section
3.1). Taking into the consideration some of the ndings from the related work,
we opted for the bidirectional recurrent architecture, which was successfully
employed for gender prediction in the past
          <xref ref-type="bibr" rid="ref11 ref20">(Miura et al., 2017; Takahashi et al.,
2018)</xref>
          . Initial experiments and previous research
          <xref ref-type="bibr" rid="ref10 ref4 ref9">(Martinc et al., 2017; Martinc
and Pollak, 2018)</xref>
          also suggested that adding POS tag information improves the
performance of the model (especially in the cross-genre setting), therefore POS
sequences are fed to the network together with the preprocessed texts.
Validation set in-genre
Validation set cross-genre
O cial test set in-genre
O cial test set cross-genre
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Twitter YouTube News Average</title>
        <p>0.6245 0.6270 0.6477 0.6331
0.5473 0.5580 0.5573 0.5542
0.6099 0.6133 0.5990 0.6074
0.5427 0.5507 0.5520 0.5485</p>
        <p>
          Embedding vectors of size 200 are produced for input word and POS tag
sequences, with the help of two randomly initialized embedding layers, and then
fed to two distinct Bidirectional Long short-term memory networks (BiLSTM)
with 256 neurons, which both produce a two dimensional matrix (with the
timestep dimension and the feature vector dimension) representation for every token
in the sequence. In order to nd the words/POS tags with the highest predictive
power, we use an approach similar to the one proposed by Lai et al. (2015), and
employ one-dimensional max pooling and average-pooling operations
          <xref ref-type="bibr" rid="ref3">(Collobert
et al., 2011)</xref>
          on the time-step dimension to obtain two xed-length vectors for
each of the inputs.
        </p>
        <p>The four resulting vectors are concatenated and fed into the recti ed linear
unit (RELU) activation function, on the output of which we conduct a dropout
operation, in which 40% of input units are dropped in order to reduce over tting.
The resulting vector is passed on to a fully connected layer (Dense) responsible
for producing the nal binary gender prediction.</p>
        <p>
          We use the Python Pytorch library
          <xref ref-type="bibr" rid="ref13">(Paszke et al., 2017)</xref>
          for the
implementation of the system. For optimization, we use an Adam optimizer
          <xref ref-type="bibr" rid="ref22 ref6">(Kingma and
Ba, 2014)</xref>
          with a learning rate of 0.0001. Each of the models is trained on the
train set for one hundred epochs and tested on the validation set after every
epoch. The model with the best performance on the validation set is chosen for
the test set predictions. For POS tagging, a Perceptron tagger from NLTK
          <xref ref-type="bibr" rid="ref2">(Bird
and Loper, 2004)</xref>
          is used and for measuring the performance of the classi er,
accuracy is used.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Classi cation results are presented in Table 2. On the o cial test sets, the highest
cross-genre accuracy (55.20%) was achieved on news. Slightly worse was the
accuracy on the data set of YouTube comments (55.07%), while the accuracy on
the tweet test set was almost 1% lower. When it comes to the o cial in-genre
results, the highest accuracy was achieved on the test set of YouTube comments
(61.33%) and lowest on news (59.99%).</p>
      <p>Results on the validation sets are in all cases better than the results on the
o cial test sets, when same genres and same types of classi cation on validation
and test sets are compared. This suggests some over tting, which is generally
more alarming in the in-genre setting, where the training sets were smaller.
Over tting is the worst in the news in-genre setting, where the di erence in
performance on the o cial test set and validation set is almost 5%.</p>
      <p>When we compare these results to the results of other teams in the CLIN
shared task, our approach yields good performance in the cross-genre part of the
competition, where we ranked second as a team, although it should be mentioned
that the rst ranked team submitted two runs which both performed better than
our submitted run. On the other hand, our approach yields worse results in the
in-genre setting, where we ranked sixth out of eight teams with the ninth best
run.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we presented the results of the CLIN 2019 cross-genre and in-genre
gender classi cation shared task performed on the data set of Dutch tweets,
YouTube comments and news. A neural network architecture, which takes word
and POS sequences as input, is capable of detecting relatively good features by
performing max and average pooling on the output matrix of the LSTM layer.
On the o cial CLIN 2019 test sets, our team ranked second in the cross-genre
setting and sixth in the in-genre setting.</p>
      <p>Not surprisingly, the models trained and tested on the same genre achieve
much better performance than the models with train and test sets from di erent
genres, even though the train sets in the cross-genre setting are much larger in all
the cases. The performance of our classi er is quite consistent across all genres,
which is against our expectations, since we expected better performance on the
news data set because of the on average much longer documents and therefore
more per-instance information for the classi er.</p>
      <p>Dutch gender classi cation is still a tough problem, which becomes clear, if
we compare the low performances of all the approaches in the shared task with
the performances usually achieved on the English data sets in PAN shared tasks.
In order to narrow this gap, for the short term future work we plan to test our
approach on other languages, just to get the better picture of the di culty of
cross-genre and in-genre gender classi cation across di erent languages. We will
also be conducting a comprehensive error analysis, which will help us identify
language- and genre-independent features that work well across di erent genres
and languages. In the long term, we will try to improve our approach by
testing numerous state-of-the-art neural architectures and employ transfer learning
techniques.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work presented in this paper has been supported by European Unions
Horizon 2020 research and innovation programme under grant agreement No.
825153, project EMBEDDIA (Cross-Lingual Embeddings for Less-Represented
Languages in European News Media). The authors acknowledge also the nancial
support from the Slovenian Research Agency core research programme
Knowledge Technologies (P2-0103). The Titan Xp used for this research was donated
by the NVIDIA Corporation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Angelo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel Haagsma, and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>N-gram: New groningen author-pro ling model</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>03764</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Steven</given-names>
            <surname>Bird</surname>
          </string-name>
          and
          <string-name>
            <given-names>Edward</given-names>
            <surname>Loper</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Nltk: the natural language toolkit</article-title>
          .
          <source>In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, page 31</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Collobert</surname>
          </string-name>
          , Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          (Aug):
          <volume>2493</volume>
          {
          <fpage>2537</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell</surname>
          </string-name>
          Orletta and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 crossgenre gender prediction (gxg) task. Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA18), Turin, Italy</article-title>
          . CEUR. org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Rob van der Goot</surname>
            , Nikola Ljubesic, Ian Matroos, Malvina Nissim, and
            <given-names>Barbara</given-names>
          </string-name>
          <string-name>
            <surname>Plank</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bleaching text: Abstract features for cross-lingual gender prediction</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .03122.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Diederik P Kingma and Jimmy Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Moshe</given-names>
            <surname>Koppel</surname>
          </string-name>
          , Shlomo Argamon, and Anat Rachel Shimoni.
          <year>2002</year>
          .
          <article-title>Automatically categorizing written texts by author gender</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          ,
          <volume>17</volume>
          (
          <issue>4</issue>
          ):
          <volume>401</volume>
          {
          <fpage>412</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Siwei</given-names>
            <surname>Lai</surname>
          </string-name>
          , Liheng Xu, Kang Liu, and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recurrent convolutional neural networks for text classi cation</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>333</volume>
          , pages
          <fpage>2267</fpage>
          {
          <fpage>2273</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Matej</given-names>
            <surname>Martinc</surname>
          </string-name>
          and
          <string-name>
            <given-names>Senja</given-names>
            <surname>Pollak</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Reusable work ows for gender prediction</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources</source>
          and
          <article-title>Evaluation (LREC-</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Matej</given-names>
            <surname>Martinc</surname>
          </string-name>
          , Iza Skrjanec, Katja Zupan, and
          <string-name>
            <given-names>Senja</given-names>
            <surname>Pollak</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Pan 2017: Author pro ling-gender and language variety prediction</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Yasuhide</given-names>
            <surname>Miura</surname>
          </string-name>
          , Tomoki Taniguchi, Motoki Taniguchi, and
          <string-name>
            <given-names>Tomoko</given-names>
            <surname>Ohkuma</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Author pro ling with word+ character neural attention network</article-title>
          .
          <source>In CLEF (Working Notes).</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Arjun</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Improving gender classi cation of blog authors</article-title>
          .
          <source>In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing</source>
          , pages
          <volume>207</volume>
          {
          <fpage>217</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Adam</given-names>
            <surname>Paszke</surname>
          </string-name>
          , Sam Gross, Soumith Chintala, and
          <string-name>
            <given-names>Gregory</given-names>
            <surname>Chanan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration</article-title>
          . Available at https://pytorch.org/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Fabio Celli, Paolo Rosso, Martin Potthast, Benno Stein, and
          <string-name>
            <given-names>Walter</given-names>
            <surname>Daelemans</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of the 3rd author pro ling task at pan 2015</article-title>
          .
          <source>In CLEF 2015 Working Notes. CEUR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, and
          <string-name>
            <given-names>Walter</given-names>
            <surname>Daelemans</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overview of the author pro ling task at pan 2014</article-title>
          .
          <article-title>In CLEF 2014 Evaluation Labs</article-title>
          and Workshop Working Notes Papers. CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo Rosso, Manuel Montes-y Gomez,
          <source>Martin Potthast, and Benno Stein</source>
          .
          <year>2018</year>
          .
          <article-title>Overview of the 6th author pro ling task at pan 2018: multimodal gender identi cation in twitter</article-title>
          .
          <source>Working Notes Papers of the CLEF.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and
          <string-name>
            <given-names>Giancomo</given-names>
            <surname>Inches</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Overview of the author pro ling task at pan 2013</article-title>
          .
          <article-title>In CLEF 2013 Evaluation Labs</article-title>
          and Workshop Working Notes Papers. CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo Rosso,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Potthast</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Overview of the 5th author pro ling task at pan 2017: Gender and language variety identi cation in twitter</article-title>
          .
          <source>Working Notes Papers of the CLEF.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo Rosso, Ben Verhoeven, Walter Daelemans,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Potthast</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the 4th author pro ling task at pan 2016: cross-genre evaluations</article-title>
          .
          <source>In CLEF 2016 Working Notes. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Takumi</given-names>
            <surname>Takahashi</surname>
          </string-name>
          , Takuji Tahara, Koki Nagatani, Yasuhide Miura, Tomoki Taniguchi, and
          <string-name>
            <given-names>Tomoko</given-names>
            <surname>Ohkuma</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Text and image synergy with feature cross technique for gender identi cation</article-title>
          .
          <source>Working Notes Papers of the CLEF.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Ben</given-names>
            <surname>Verhoeven</surname>
          </string-name>
          , Walter Daelemans, and
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Plank</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>TwiSty: a multilingual Twitter stylometry corpus for gender and personality pro ling</article-title>
          .
          <source>In Proceedings of the 10th Language Resources and Evaluation Conference (LREC</source>
          <year>2016</year>
          ). ELRA, Portoroz, Slovenia.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Liling Tan,
          <source>Nikola Ljubesic, and Jorg Tiedemann</source>
          .
          <year>2014</year>
          .
          <article-title>A report on the dsl shared task 2014</article-title>
          .
          <source>In Proceedings of the rst workshop on applying NLP tools to similar languages, varieties and dialects</source>
          , pages
          <volume>58</volume>
          {
          <fpage>67</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>