<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Author Profiling using Word Embeddings with Subword Information</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Arts, Sciences and Humanities (EACH) University of São Paulo (USP) São Paulo</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>We present a simple experiment on multilingual author profiling as proposed by the PAN-CLEF 2018 shared task, focusing on the issue of gender identification from Twitter text in English, Spanish and Arabic. Our proposal makes use of word embeddings enriched with char n-gram information, and outperforms a majority class baseline. Related work</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Author profiling (AP) is the computational task of determining author’s demographics
from the text they produce. Systems of this kind make use of document classification
methods to predict a wide range of traits, including author’s gender [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], age [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
personality [
        <xref ref-type="bibr" rid="ref16 ref7">7,16</xref>
        ], religiosity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and many others. AP is a popular research topic in NLP, and
has been the focus of a number of shared tasks in the PAN-CLEF series [
        <xref ref-type="bibr" rid="ref10 ref17">10,17</xref>
        ].
      </p>
      <p>
        At PAN-CLEF 2018 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a gender identification task from a combination of text
and/or images has been proposed. The languages addressed in the task are English,
Arabic and Spanish, all of which in the Twitter domain.
      </p>
      <p>
        This paper describes our own entry to the AP gender identification task. This
consists of a simple experiment involving word embedding models enriched with subword
information as proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to predict gender from Twitter text, hence disregarding
the image information also made available for the task. Preliminary results suggest that
the model outperforms a majority class baseline in the three target languages.
      </p>
      <p>
        The rest of this paper is structured as follows. Section 2 discusses related work on
AP. Section 3 describes our main AP approach, and Section 4 describes its evaluation
over the PAN-CLEF 2018 AP dataset. Finally, section 5 suggests future work.
2.1
Some of these studies, including the top-performing participants of the previous
PANCLEF AP gender detection task [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and other recent initiatives, are briefly discussed
as follows.
      </p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] presents a model called N-GrAM for predicting gender in English,
Spanish, Portuguese and Arabic. The model makes use of word and character n-grams
evaluated using decision tree, MLP, Naive Bayes and SVM classifiers. The SVM-based
model was the overall best performing system among participants in the PAN-CLEF
2017 AP gender prediction task.
      </p>
      <p>
        Also in the context of PAN-CLEF 2017, the work in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presents a method for
preprocessing Twitter publications, feature extraction, weighted features and a number
of classifier models for gender prediction and other tasks. The method obtained the
second best result for gender prediction in that shared task.
      </p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] proposes a document representation model for gender prediction
in English using document and term weighting with a combination of POS n-grams
and term frequencies (TF-IDF). The model outperforms a BoW baseline on a corpus of
hotel reviews.
      </p>
      <p>
        The growing interest in neural methods for NLP is also evident in the case of gender
recognition from text. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a recurrent convolutional neural network model with
contextual window (WRCNN) is applied to the task of gender prediction from blogs and
from Project Gutenberg’s books using extensions of previous RCNN models. Reported
results are, on average, 4% higher than those obtained by a baseline system on both
domains.
      </p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] classifies graph vertices using recursive networks to identify gender,
age and Twitter user type in the English language. The method combines network, text
and label information and converts a graph to a series of tree structure, and then uses
individual RNNs on each tree. The proposed approach is shown to outperform four
robust baseline systems, namely, logistic regression, label propagation, text-associated
DeepWalk and Tri-Party Deep Network Representations.
      </p>
      <p>
        Finally, the work in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] compares three neural network architectures to address the
problem of predicting gender in Twitter texts: character-level models with
convolution layers and bidirectional LSTM, word-level models with bidirectional LSTM using
GloVe [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] representation, and document-level models with feed-forward using
Bag-ofWords features. The study also explores an ensemble method that combines the three
architectures by majority vote. The combination of character-level and word-level
information outperforms the individual strategies, whereas the use of document-level
information actually reduces overall accuracy.
2.2
      </p>
      <p>
        Subword models
Popular word vectors representations such as the Skip-gram model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use feed-forward
networks to predict a word based on the words on its left and right context. This method
represents each word of the vocabulary by a unique vector, without shared parameters.
In particular, the internal structure of the word is disregarded, which is a major
limitation for morphologically rich languages. One possible way of overcoming these
difficulties is by adding character-level information as in the case of the subword models
proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>A subword model is a an extension of the standard Skip-gram model that takes
subword information into account. In this model, each word is represented by the sum
of their character n-grams. The symbols &lt; and &gt; are added at the beginning and end
of each word to distinguish prefixes and suffixes from other sequences, and the word
itself is added to the set of its n-grams in order to learn word representations as well.
For instance, given the word author and n = 3, the corresponding character n-grams
will be represented as follows.</p>
      <p>&lt; au; aut; uth; tho; hor; or &gt;</p>
      <p>
        The experiments in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] make use of all n-grams for n between 3 to 6, although
other strategies are possible (e.g., extracting all prefixes and suffixes etc.) Given a
ngram dictionary of size G and a word w, the set of n-grams in w is denoted as Gw
f1; :::; Gg. Each n-gram g is associated to a vector representation zg, and a word is
represented by the sum of the vector representations of its n-grams, as illustrated in
Equation 1, proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>s(w; c) = X zg&gt;vc
g2Gw
(1)</p>
      <p>
        Models of this kind allow representations to be shared across words, which will
arguably improve the learning of rare forms. Crucially to our own work, this may be
helpful in author profiling tasks such as gender detection, which may often rely on prefix
or affix information (and particularly so in the case of morphologically rich languages.)
The use of subword models as proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] makes the core of the author profiling
experiment described in the next sections.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>
        We developed a simple experiment to assess the use of word embedding models
enriched with char n-gram information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for the gender identification task from text.
The underlying assumption for our experiment is that these subword models may help
capture morphological clues (including prefixes, suffixes etc.) that represent gender
information in certain languages, and which are otherwise unavailable in standard word
embedding models.
      </p>
      <p>
        To investigate this, we used the Twitter data provided for the PAN-CLEF 2018
Author Profiling task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and created a group of documents for each of the three target
languages (English and Spanish, with 3000 authors each, and Arabic, with 1500
authors.) The groups were evenly balanced for gender (feminine / masculine). Each author
was represented by 100 tweets, which were grouped together as separate documents for
each language and author.
      </p>
      <p>
        The models made use of pre-trained size 300 word vectors for each target language
in the Wikipedia domain, and the subword skip-gram extension in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with default
parameters. In these models, each word is represented by the sum of its character n-gram
vectors, where n ranges from 3 to 6, and every tweet is represented as the average
sum of its individual word vectors. Words that do not appear in the vector vocabulary
are represented by zero vectors of size 300. No text preprocessing was performed. All
models were trained using the scikit-learn liblinear logistic regression implementation
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] with its default parameters and L2 regularization.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>After submission, we decided to further investigate whether learning the embedding
model directly from the training data (and not from Wikipedia, as in the actual
submission) would improve our test results. In this post hoc analysis, we found that mean
accuracy scores for the Spanish data remained essentially the same, whereas a small
increase was observed in the case of English (from 0.66 to 0.70) and Arabic (from 0.68
to 0.71).
5</p>
    </sec>
    <sec id="sec-4">
      <title>Final remarks</title>
      <p>This paper presented a simple experiment on multilingual author profiling, focusing on
the issue of gender identification based on text only. Our proposal makes use of word
embeddings enriched with char n-gram information, and outperforms a majority class
baseline. As future work, we intend to evaluate a wide range of alternative models and
make use of more robust baseline systems.</p>
      <p>Acknowledgements. The second author received financial support from FAPESP grant
nro. 2016/14223-0.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bartle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
          </string-name>
          , J.:
          <article-title>Gender classification with deep learning</article-title>
          .
          <source>Tech. rep.</source>
          ,
          <source>Stanford Technical Report</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haagsma</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
          </string-name>
          , M.:
          <string-name>
            <surname>N-GrAM</surname>
          </string-name>
          :
          <article-title>New groningen author-profiling model</article-title>
          .
          <source>In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          . Dublin (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gopinathan</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A deep learning ensemble approach to gender identification of tweet authors</article-title>
          .
          <source>Master's thesis</source>
          , Norwegian University of Science and Technology (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the ACL 5</source>
          ,
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>F.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dias</surname>
            ,
            <given-names>R.F.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paraboni</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Author profiling from facebook corpora</article-title>
          .
          <source>In: 11th International Conference on Language Resources</source>
          and
          <article-title>Evaluation (LREC-</article-title>
          <year>2018</year>
          ). pp.
          <fpage>2566</fpage>
          -
          <lpage>2570</lpage>
          . ELRA, Miyazaki, Japan (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Paris, C.:
          <article-title>Demographic inference on Twitter using recursive neural networks</article-title>
          .
          <source>In: Proceedings of ACL-2017</source>
          . pp.
          <fpage>471</fpage>
          -
          <lpage>477</lpage>
          . Vancouver (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mairesse</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using linguistic cues for the automatic recognition of personality in conversation and text</article-title>
          .
          <source>Journal of Artificial Intelligence Research (JAIR) 30</source>
          ,
          <fpage>457</fpage>
          -
          <lpage>500</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skrjanec</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zupan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollak</surname>
            ,
            <given-names>S.: PAN</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>Author profiling - gender and language variety prediction</article-title>
          .
          <source>In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          . Dublin (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Wen-tau, S.,
          <string-name>
            <surname>Zweig</surname>
          </string-name>
          , G.:
          <article-title>Linguistic regularities in continuous space word representations</article-title>
          .
          <source>In: Proc. of NAACL-HLT-2013</source>
          . pp.
          <fpage>746</fpage>
          -
          <lpage>751</lpage>
          . Atlanta, USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pardo</surname>
            ,
            <given-names>F.M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 5th author profiling task at PAN 2017: Gender and language variety identification in Twitter</article-title>
          . In: Working Notes of CLEF 2017 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          . Dublin (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , et al.:
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
          <source>Journal of machine learning research 12(Oct)</source>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In: Proceedings of EMNLP-2014</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In:
          <article-title>Evangelos Kanoulas et</article-title>
          . al. (ed.)
          <article-title>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization</article-title>
          .
          <source>5th International Conference of the CLEF Initiative (CLEF 14)</source>
          . pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer, Berlin Heidelberg New York (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Working Notes Papers of the CLEF 2018 Evaluation Labs</article-title>
          .
          <source>CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>T.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vardhan</surname>
            ,
            <given-names>B.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>P.V.:</given-names>
          </string-name>
          <article-title>N-Gram approach for gender prediction</article-title>
          .
          <source>In: Advance Computing Conference (IACC)</source>
          . pp.
          <fpage>860</fpage>
          -
          <lpage>865</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>dos Santos</surname>
            ,
            <given-names>V.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paraboni</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>B.B.C.</given-names>
          </string-name>
          :
          <article-title>Big five personality recognition from multiple text genres</article-title>
          .
          <source>In: Text, Speech and Dialogue (TSD-2017) Lecture Notes in Artificial Intelligence</source>
          vol.
          <volume>10415</volume>
          . pp.
          <fpage>29</fpage>
          -
          <lpage>37</lpage>
          . Springer-Verlag, Prague, Czech Republic (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</article-title>
          . In:
          <article-title>Patrice Bellot et</article-title>
          . al. (ed.)
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18)</source>
          . Springer, Berlin Heidelberg New York (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>