<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mixing Traditional Methods with Neural Networks for Gender Prediction</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Rick Kosse</institution>
          ,
          <addr-line>Youri Schuur and Guido Cnossen</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>In this paper we describe our participation in the PAN 2018 shared task on Author Profiling, identifying author's gender by Tweets and images for English, Spanish and Arabic. We focused only on the textual data and left images out of scope. Our submitted model is a small feed-forward neural network. While in previous work neural networks are often used in combination with word embeddings, our best-performing system used only unigrams as features. In an unofficial run, we show that extracting information from DBpedia can improve the performance. On the PAN 2018 test set our model achieved a score of 0.807, 0.792 and 0.792 for English, Arabic and Spanish respectively. With an average score of 0.797 we conclude that our model is quite robust among all three languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In recent years, author profiling has become increasingly important in daily life.
Everyone who publishes text or pictures on social media can be considered an author these
days. Where in the past profiling was done by hand, today we take advantage of smart
technology. This technology helps us in distinguishing fake news or identifying
terrorism threats on social media.</p>
      <p>
        It is interesting how language usage reflects basic social and personality processes
and how this reflection can help us to identify gender in social media. Adjacent to this
topic, PAN [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] has organized several shared tasks with the the focus on author profiling
[
        <xref ref-type="bibr" rid="ref14 ref15 ref16">15,16,14</xref>
        ] in social media. In past series of this shared task, PAN has focused on traits
like gender, age, personality type and language variety [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This year’s author profiling
task is to identify the gender of Twitter users.1 New this year are additional images that
(next to the textual Tweets) can help to identify gender. This year’s task is for three
different languages: Arabic, English and Spanish [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>We experimented with basic bag-of-words features combined with neural networks,
an unusual combination. Although this task provided three different languages we
decided to make a single model for all the languages, though trained on the specific
language data. We only used the textual data and left images out of scope. In addition, we
also experimented with automatically extracting DBpedia features, but the submitted
1 https://pan.webis.de/clef18/pan18-web/author-profiling.html
system does not include these features due to lack of time. Therefore, this part of the
system and the corresponding scores remain unofficial.</p>
      <p>In this paper we present a novel approach on the PAN 2018 shared task. We report
how our final submitted system works and was optimized. With our submitted system
we achieved an average score of 0.797 on the official PAN 2018 test set. For English we
achieved a score of 0.807. For both Arabic and Spanish we obtained a score of 0.792.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the years that PAN organized the author profiling shared task, many approaches and
models haven been submitted. Last year the N-GRAM team won with a straightforward
Support Vector Machine (SVM) trained with combinations of character and tf-idf
ngrams [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A logistic regression with combinations of character, word and POS n-grams
finished in second place [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In third place, a list of words per variety, learned with an
SVM [21]. Noticeable is that they all used simple classifiers as an approach for their
classification task in combination with traditional characters and word n-grams.
      </p>
      <p>
        Some deep learning techniques have been applied in last year’s competition, though
not with the best results. Word embeddings have been used in combination with a
convolutional network [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], scoring 0.78 on the gender task for English. Another CNN deep
learning approach achieved a score of 0.74 on the English gender task with traditional
tf-idf n-grams combined with word embeddings [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The approach of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] also used
word embeddings in combination with character embeddings but with a CNN, RNN,
attention mechanism, max-pooling layer, and fully-connected layer. Hereby scoring the
best of all the neural network approaches with an average score of 0.813. A different
approach was the use of Deep Averaging Networks with character embeddings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To
summarize, word/character embeddings were widely used in combination with neural
networks, but their results varied a lot.
      </p>
      <p>
        Another different approach is the cognitive approach by Rangel et al. (2013) based
on the neurology studies of Broca and Wernicke about the way users express themselves
online [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. They used Part-of-Speech (POS) Tag frequencies to determine gender
differences by examining all kinds of online data (among which Twitter messages [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
The results showed that men use more prepositions, while women use more pronouns,
determinants and interjections. We will also try an approach that uses POS-tags, but we
will use them to automatically extract information from DBpedia.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>The PAN 2018 training set consists of Tweets in three different languages, grouped
by Tweet authors, which are labeled by gender. Table 1 shows the number of training
instances released by the organization, which are equally distributed over male and
female. We divided the data set in training data and test data to develop and optimize
our system.
Language</p>
      <p>Authors(n)
English
Arabic
Spanish</p>
      <p>
        Since the data was extracted from Twitter, it contained some typical Twitter
elements, such as mentions (@username), links, hashtags and excessive use of
punctuation. In several previous studies [
        <xref ref-type="bibr" rid="ref1 ref10 ref17 ref6 ref7">6,1,7,17,10</xref>
        ] all Twitter elements have been removed.
We found that replacing them with a dummy value achieved better results then
removing them. Furthermore, we tokenized the data by lowercasing. We preprocessed the
data step by step. When the scores improved (using our model described in the next
section), the method remained, otherwise the method was ignored. The total overview
of preprocessing steps, can be found in Table 2.
We decided to submit a feed-forward neural network with traditional sparse n-hot
encoding created with the open source library Keras [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. After a parameter search, the
model obtained the best performance with an Adadelta optimizer and a learning rate of
0.22, feeding it with a batch size of 64 and training for 15 epochs. Moreover, the input
layer consisted of 100 neurons with a he_uniform weight initialization, using a max
norm kernel constraint of 5. Next, a RELU activation function was applied, followed
by a dropout layer. During optimization, we found that a relatively big dropout rate of
0.4 outperformed the smaller dropout rates. Finally, the output layer is a single neuron,
followed by a sigmoid activation function. Multiple intermediate layers with different
neurons were tried as well but did not come close to the score achieved by the smaller
model. Therefore, the model was kept to a minimum. The feature set provided to the
model was an n-hot encoding of the unigrams.
4.2
      </p>
      <sec id="sec-3-1">
        <title>Optimization</title>
        <p>
          For optimization we used our Keras model in combination with scikit-learn [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
wrapping the model with the KerasClassifier class2. We used the Grid Search functionality,
which is a model hyperparameter optimization technique. We provided a dictionary
of values and parameters to optimize the accuracy score. Since optimization is rather
time consuming, we used a 3-fold cross validation to evaluate each individual model.
The outcome described the combination of parameters that achieved the best results. In
Table 3 all tested parameters are provided in combination with their best fit.
        </p>
        <p>
          Due to the popularity of neural networks in combination with word/character
embeddings last year [
          <xref ref-type="bibr" rid="ref17 ref6 ref9">17,9,6</xref>
          ] we have conducted experiments with using pre-trained word
embeddings in combination with our feed-forward network. However, they did not
outperform the score of the model as described above. Therefore, we stayed with our
bagof-word feature approach.
4.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>DBpedia and NNP’s</title>
        <p>We are also interested in whether we could use DBpedia to improve our feature set.
Twitter users often talk about certain topics, but since tweets are short, not much
information of these topics is provided. We implement an approach that simply takes these
topics and checks if there is a DBpedia page available. If this is the case, we add some
of the information of that page to the tweets themselves. Aside from the obvious
advantage of providing more data, it also provides a more general representation of certain
topics, which is especially beneficial if they do not occur often in the training set.</p>
        <p>Unfortunately our submitted system does not include this part, because we were not
able to finish it in time. This means that this method was not used to get our official
shared task results. Nevertheless, we want to inform you about the process of
implementing this into our system and the scores we achieved while running it on our test
data.</p>
        <sec id="sec-3-2-1">
          <title>2 https://keras.io/scikit-learn-api/</title>
          <p>
            For our system, we specifically looked at proper nouns as topics. To extract them,
we used the NLTK POS-tagger [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], giving us a number of proper nouns per user. These
proper nouns are then used as input for our DBpedia approach, using the DBpedia
Lookup service.3.
          </p>
          <p>
            This is an online service that can be used to create and look up DBpedia URIs by
relating keywords, returning labeled information about the corresponding DBpedia URI.
We chose to extract the description and types DBpedia labels as additional information.
The reason behind this choice is that these types and descriptions are the most valuable
for the DBpedia extraction as it displays an article’s most relevant facts [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. In the case
of the descriptions this is useful, because it contains a lot of additional and general data
about a particular topic that is derived from the different articles that form the input of
the DBpedia dataset [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. The types, on the other hand, refer to the conceptual categories
in which DBpedia topics can be classified [20]. In this way, proper nouns that refer to a
DBpedia page can be generalized over particular categories. An example of the way in
which we used proper nouns on tweet level to access DBpedia information is illustrated
in Table 4.
          </p>
          <p>The example only contains a short tweet about Carrie Fisher, but not much
information is given. By including the abstract, the model can learn that she played in Star
Wars (which is something males would tweet more often about), while the DBpedia
types explicitly return that she was an actor and an artist.
DBpedia Description
Carrie Frances Fisher (born October 21, 1956) is an American actress and writer.
She is best known for her role as Princess Leia in the original Star Wars trilogy
(1977 83) and Star Wars: The Force Awakens (2015).Fisher is also known for
her semi-autobiographical novels, including Postcards from the Edge, and the
screenplay for the film of the same name, as well as her autobiographical
onewoman play, and its nonfiction book, Wishful Drinking, based on the show. Her
other film roles include Shampoo (1975), The Blues Brothers (1980), Hannah
and Her Sisters (1986), The ’Burbs (1989), and When Harry Met Sally... (1989).
DBpedia Types</p>
          <p>Person, Agent, NaturalPerson, Actor, Artist</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3 https://wiki.dbpedia.org/lookup</title>
          <p>We have two new feature sources that we can use to retrain our system. We use
them in a very straight-forward way, simply adding them to the training data when we
create our bag-of-words feature set. In the next section, we will show individual scores
of these new features, as well as scores for the combination of these features with the
tweets themselves.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In this section we describe the results on the training data and the official PAN test data.
The results on the training data (both 10-fold CV and test set) are shown in Table 5. On
English, we obtain roughly the same results for 10-fold CV and the test set, 0.792 and
0.799. For Spanish and Arabic the results are a bit worse, with similar score for test and
10-fold CV.</p>
      <p>The scores of the model on the official PAN 2018 test set are presented in Table
6. We see that the model performs best on English (0.807). Spanish and Arabic score
roughly the same with an accuracy of 0.792. Interestingly, for English and Spanish, our
model scores higher on the official test set than on our own test set with cross validation,
meaning that we did not overfit on the training data. Our average score of 0.797 gave
us 5th place in the official shared task results, showing that a feed-forward model in
combination with bag-of-words features seems to work quite well for this task.</p>
      <p>Unofficial results from the system that included the DBpedia features are presented
in Table 7. We see that the system performs better when the DBpedia types are added
to the tweets (0.815 and 0.807). When we add the descriptions to the tweets our system
performance drops to a much lower score (0.715 and 0.711). For the scores of our
system on only DBpedia descriptions or DBpedia types we can see that that, interestingly,
the descriptions score a lot higher than the types. A possible reason for this is that the
descriptions contain a lot of data in comparison to the types, which makes classification
on this data easier. However, when adding the data to the tweets, the descriptions tend
to overshadow the tweet data, as the descriptions are often longer than the tweets
themselves. This makes the system less accurate. On the other hand, the type information,
though receiving a lower score individually, is a small but beneficial feature source.</p>
      <p>
        In general, we see an improvement of 1.5 and 1.6% in accuracy for adding the
DBpedia types information. This increase should not be underestimated, as 13 out of
23 participants scored between 0.785 and 0.815 for (text-only) English on this shared
task. We believe this method can possibly be used to improve other systems as well, for
example the winner of last year also used n-gram features. Our current proof-of-concept
is only for English, but it can be easily be extended to other languages, provided that
the DBpedia lookup service and NNP-taggers are available.
In this paper we described our approach for the PAN 2018 shared task for identifying
author’s gender by Tweets. We applied a feed-forward neural network in combination
with a simple bag-of-words model, combining new methods with traditional ones. We
obtained an average result of 0.797 over three languages and a 5th place in the official
shared task rankings. It is remarkable that a small model can achieve such a score,
showing that the combination of new methods with traditional ones can work surprisingly
well. Interestingly, using pre-trained word embeddings did not work for our model,
though we did not perform a large number of experiments. Our model seems to be
quite robust, since it obtained similar scores the three different languages. In unofficial
experiments, we automatically extracted extra features from DBpedia, getting a 1.5%
improvement in accuracy for English. Further optimizing this feature resource could
be an interesting topic for future work. Also, to further test its robustness, it would be
interesting to apply our model to other languages and different domains.
20. Suchanek, F., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge unifying
wordnet and wikipedia. https://hal.archives-ouvertes.fr/hal-01472497/documen (2007)
21. Tellez, E.S., Miranda-Jiménez, S., Graff, M., Moctezuma, D.: Gender and language variety
identification with microtc. Cappellato et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (2017)
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adame-Arcia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro-Castro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bueno</surname>
            ,
            <given-names>R.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muñoz</surname>
          </string-name>
          , R.:
          <article-title>Author profiling, instance-based similarity classification</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haagsma</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>N-gram: New groningen author-profiling model</article-title>
          .
          <source>arXiv preprint arXiv:1707.03764</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dbpedia-a crystallization point for the web of data</article-title>
          .
          <source>Web Semantics: science, services and agents on the world wide web 7</source>
          (
          <issue>3</issue>
          ),
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.: Keras. https://keras.io (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Nltk: The natural language toolkit (</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plotnikova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pawar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benajiba</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Subword-based deep averaging networks for author profiling in social media</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ] (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kheng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laporte</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granitzer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Insa lyon and uni pasauâA˘ Z´s participation at pan@ clefâA˘ Z´17: Author profiling task</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Škrjanec</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zupan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Pan 2017:
          <article-title>Author profiling-gender and language variety prediction</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ] (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Miura</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taniguchi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taniguchi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohkuma</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Author profiling with word+ character neural attention network</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Oliveira</surname>
            , R.R., de Oliveira Neto,
            <given-names>R.F.</given-names>
          </string-name>
          :
          <article-title>Using character n-grams and style features for gender and language variety classification</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Use of language and author profiling:identification of gender and age (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Working Notes Papers of the CLEF 2018 Evaluation Labs</article-title>
          .
          <source>CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter</article-title>
          .
          <source>Working Notes Papers of the CLEF</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Overview of the 3rd author profiling task at pan 2015</article-title>
          . In: CLEF. p.
          <year>2015</year>
          . sn (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 4th author profiling task at pan 2016: cross-genre evaluations</article-title>
          .
          <source>In: Working Notes Papers of the CLEF</source>
          <year>2016</year>
          <article-title>Evaluation Labs</article-title>
          . CEUR Workshop Proceedings/Balog, Krisztian [edit.]; et al. pp.
          <fpage>750</fpage>
          -
          <lpage>784</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Schaetti</surname>
          </string-name>
          , N.: Unine at clef 2017:
          <article-title>Tf-idf and deep-learning for author profiling</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ] (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Sierra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-y Gómez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>González</surname>
            ,
            <given-names>F.A.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for author profiling</article-title>
          .
          <source>Working Notes Papers of the CLEF</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</article-title>
          . In: Bellot,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Trabelsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Murtagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Sanjuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          .
          <source>9th International Conference of the CLEF Initiative (CLEF 18)</source>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>