<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CLEF</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Bots and Gender Prediction Using Language Independent Stylometry-based Approach</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, COMSATS University Islamabad, Lahore Campus</institution>
          ,
          <country country="PK">Pakistan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Shaina Ashraf</institution>
          ,
          <addr-line>Omer Javed, Muhammad Adeel</addr-line>
          ,
          <country>Haider Ali Rao Muhammad Adeel Nawab</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>9</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper describes our participation for the Bots and Gender Profiling task at PAN 20191. The aim of this task is to first classify a profile either as bot or human. If the profile is written by a human, it should be further classified as male or female. Our proposed approach is based on language independent stylometry features. A total of 27 language independent stylometry features (18 are character-based features and remaining 9 are emotion-based features) are used to build the system for Bots and Gender Profiling task. On training dataset, for English language, Accuracy scores of 0.97 and 0.80 are obtained for bot and human classification task and male / female classification task respectively. For Spanish language, Accuracy of 0.93 and 0.75 is obtained for bot and human classification task and male / female classification task respectively. On test dataset 1, for English language, Accuracy scores of 0.92 and 0.76 are obtained for bot and human classification task and male / female classification task. For Spanish language, Accuracy of 0.86 and 0.75 is obtained for bot and human classification task and male / female classification task respectively. On test dataset 2, for English language, bot and human classification task and male/female classification task obtained Accuracy scores of 0.92 and 0.76 respectively, whereas for Spanish language, bot and human classification task and male/female classification task obtained Accuracy scores of 0.88 and 0.72 respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Bot and Gender Profiling</kwd>
        <kwd>Author Profiling</kwd>
        <kwd>Stylometry-based Features</kwd>
        <kwd>Emotion-based Features</kwd>
        <kwd>Emojis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>As the usage of social networking platforms such as Facebook, Twitter, Instagram,
blogs and community forums is arising, the communication methods are changing.
People feel free to talk, discuss and post their reviews, comments on such channels
more frequently. Many people rely on social forums i.e. Reddit, Yelp, Quora and
Amazon message boards, etc., to get information, feedback and recommendations for
different products and services. However, a large number of users on social networking
sites are taking miss-advantage of such forums by making fake profiles, spams and
bots. In recent years, bots are being used to pose as humans on social networking
platforms to influence other social media users with ideological, political or
commercial purposes. Bots can exaggerate the popularity of products by writing positive
reviews and rating them. They can also sabotage the reputation of competitive products
through negative reviews and ratings. Furthermore, bots are also being widely used
for fake news spreading. Therefore, it is important to develop author profiling systems
which can discriminate bot profiles from human ones.</p>
      <p>The study presents a stylometry-based approach to address the problem of Bots and
Gender Profiling. A total of 27 language independent features are used, which can be
broadly categorized into: (1) character-based features and (2) emotions-based
features. A range of classifiers have been applied including Logistic Regression, Random
Forest, Linear SVC, BernoulliNB, MultinomialNB and SVC (Support Vector
Classifier) to train and test our proposed system. The developed system is deployed on TIRA
[9] for final evaluation on test datasets. A detailed comparison of all the systems
presented in the PAN 2019 Bot and Gender Profiling task can be found in [10].</p>
      <p>The rest of this paper is organized as follows: Section 2 describes related work on
author profiling, Section 3 presents our proposed approach, Section 4 describes the
experimental setup, Section 5 presents results and their analysis. Finally, Section 6
concludes the paper with future work directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In previous studies, many researchers have explored different methods i.e.
stylometry-based, content-based, topic-based, emotion-based and deep learning for finding
different demographics of an author on social media. In [1], the authors have applied
stylometry-based approach for cross-genre author profiling. Their set of
stylometrybased features included 6 vocabulary richness features, 26 character-based features, 16
syntactic features and 7 lexical features. Promising results were obtained using their
proposed set of stylometry-based features (Accuracy of 0.576 for gender classification,
0.371 for age classification and 0.256 for combined classification of age and gender).</p>
      <p>In [3], the authors have classified the humans and bots by learning tweets patterns
and then further categorized bots in to classes i.e. spam bots, consumption and
broadcast bots. They proposed a new profiling framework that consists of entropy-based
features such as timings of tweets, hashtags, URL’s and followers count etc. The author
worked on nearly 159 thousand bots and human data on Twitter. The experiments
results show efficient results on malicious and benign bots to find the interesting behavior
traits. In [14], authors have investigated content-based features (word and character
ngrams) and 64 stylometry-based features (11 lexical word-based, 47 lexical
characterbased and 6 vocabulary measures) for the identification of gender and age traits on
multilingual corpora.</p>
      <p>In [18], the authors have focused on instance-based, prototype based and
distancebased classification strategy. They have extracted different features i.e. frequency of
negative and positive emoticons, mark of retweets, no of hashtags and part of speech
tags for the identification gender and language task.</p>
      <p>In [6], the authors have detected bots from Wikidata by extracting comment-based
features of user. The comments-based features help to examine the editing behavior of
registered and non-registered users. The author used the random forest classifier and a
gradient boosting classifier and applied optimization by hyper parameter for both
models. The performance of model is efficient against the registered user information.</p>
      <p>In [19], the authors have used image and text-based combined features for gender
identification. They have represented text using bag of terms (BoT) model and for CNN
model for image representation.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Language Independent Stylometry-based Approach</title>
      <p>Writing style of an author helps to identify various attributes of an author, for
example, age, gender, personality type, occupation and political interest etc. It is expected
that the writing style of a human is significantly different from a bot. Therefore,
stylometry features [13] are likely to be very helpful in discriminating bot profiles from
human ones. Another major difference between a human profile and a bot profile is the
usage of emotions. The profile generated by a bot is likely to be plain text, whereas on
the other hand, a human profile is likely to be a mixture of both text and emotions.
Considering the above two factors, our proposed approach uses a combination of
character-based stylometry features and emotions-based features to distinguish human from
bot. Note that our proposed approach uses language independent stylometry features
i.e. they can be applied on any language for bot and human profiling.</p>
      <p>
        In our proposed system, a total of 27 stylometry-based features are used (18 features
are character-based and 9 are emotion-based). The set of character-based features
includes: (1) url_count, (2) space_count, (3) capital_count, (4) text_length, (5)
curly_brackets_count, (6) round_brackets_count, (7) underscore_count, (8)
question_mark_count, (9) exclamation_mark_count, (
        <xref ref-type="bibr" rid="ref12">10</xref>
        ) dollar_mark_count, (
        <xref ref-type="bibr" rid="ref13">11</xref>
        )
ampersand_mark_count, (12) hash_count, (
        <xref ref-type="bibr" rid="ref14">13</xref>
        ) tag_count, (
        <xref ref-type="bibr" rid="ref15">14</xref>
        ) slashes_count, (
        <xref ref-type="bibr" rid="ref16">15</xref>
        )
operator_count, (
        <xref ref-type="bibr" rid="ref17">16</xref>
        ) punc_count, (
        <xref ref-type="bibr" rid="ref18">17</xref>
        ) line_count, (
        <xref ref-type="bibr" rid="ref19">18</xref>
        ) word_count. The set of
emotionbased features includes: (1) emoji_count, (2) face_smiling, (3) face_affection, (4)
face_tongue, (5) face_hand, (6) face_neutral_skeptical, (7) face_concerned, (8)
monkey_face, (9) emotions (for details see Table 3.1).
      </p>
      <p>Emotions
url_count
space_count
capital_count
text_length
curly_brackets_count</p>
      <sec id="sec-3-1">
        <title>Count { } face_neutral_skeptical</title>
        <p>Count 
face_concerned
☹
monkey_face
Count</p>
      </sec>
      <sec id="sec-3-2">
        <title>Count</title>
        <p />
      </sec>
      <sec id="sec-3-3">
        <title>Count</title>
        <p>❣❤</p>
      </sec>
      <sec id="sec-3-4">
        <title>Count all kind of link/URLs</title>
      </sec>
      <sec id="sec-3-5">
        <title>Spaces count</title>
      </sec>
      <sec id="sec-3-6">
        <title>Capital letter count</title>
      </sec>
      <sec id="sec-3-7">
        <title>Total length of message</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1 Training Corpus</title>
        <p>This section describes the main statistics of the training corpus, evaluation
methodology and evaluation measures.</p>
        <p>We used PAN19-author-profiling-training dataset to train our proposed system. We
have performed author profiling task for both languages i.e. English and Spanish. The
English training corpus contains 4,120 author profiles and each profile contains 100
tweets in English, whereas Spanish training corpus contains 3,000 author profiles and
each profile consists of 100 tweets in Spanish (see Table 4.1 for detailed statistics of
both corpora). Note that, in our proposed approach, no pre-processing or cleaning
operations were performed on both training and test datasets because URL’s and hashtags
were used as features in the classification task.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Evaluation Methodology</title>
        <p>The tasks of predicting an author’s type as bot or human and determining gender
from his/her text are treated as supervised document classification tasks. We performed
binary classification tasks for distinguishing bot from human and then identification of
its gender. A range of classifiers were explored including Logistic Regression, Random
Forest classifier, LinearSVC, BernoulliNB, MultinomialNB and SVC to train and test
our proposed system. The numeric values generated by the 27 stylometry features (see
Section 3) were used as input to these classifiers.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Evaluation Measure</title>
        <p>Evaluation is carried out using Accuracy measure. Accuracy is defined as ratio of
correctly predicted profiles to total number of profiles.</p>
        <p>Accuracy = !"#$%&amp; () *(&amp;&amp;%*+,- *,.//0)0%1 2&amp;()0,%/</p>
        <p>3(+., 4"#$%&amp; () 2&amp;()0,%/
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Analysis</title>
      <sec id="sec-5-1">
        <title>5.1 Results on Training Dataset</title>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Results on Test Datasets</title>
        <p>In PAN 2019 Bot and Gender Profiling task, final evaluation is carried out on two
test corpora: (1) PAN19-author-profiling-test-dataset1 corpus and (2)
PAN19-authorprofiling-test-dataset2 corpus. Table 5.2 shows results obtained using our proposed
language independent stylometry-based approach on both test corpora. On
PAN19-author-profiling-test-dataset1 corpus, for English language, Accuracy scores of 0.9280
and 0.7652 are obtained for bot/human and male/female classification tasks
respectively, whereas for Spanish language, 0.8611 and 0.7556 Accuracy scores are obtained
for human/bot and male/female classification tasks respectively. Similarly, on
PAN19author-profiling-test-dataset2 corpus, for English language, Accuracy scores of 0.9227
and 0.7583 are obtained for bot/human and male/female classification tasks
respectively, whereas for Spanish language, 0.8839 and 0.7261 Accuracy scores are obtained
for human/bot and male/female classification tasks respectively.</p>
        <p>It can be noted that Accuracy results for English tweets are higher compared to
Spanish, even though same language independent features are extracted for both
languages. The possible reason for this is that Spanish profiles in the train and test
0.871
0.935
0.749
0.822
0.796
0.505</p>
        <p>Male/Femal</p>
        <p>e
0.678
0.755
0.577
0.603
0.657
0.469</p>
      </sec>
      <sec id="sec-5-3">
        <title>Corpus</title>
        <p>PAN19-authorprofiling-testdataset1
PAN19-authorprofiling-testdataset2</p>
      </sec>
      <sec id="sec-5-4">
        <title>English</title>
      </sec>
      <sec id="sec-5-5">
        <title>Type:</title>
      </sec>
      <sec id="sec-5-6">
        <title>Bot/Huma n</title>
        <p>datasets of the PAN 2019 Bot and Gender Profiling task may contain text in more than
one language since the datasets provided by the PAN organizers contain raw tweets and
re-tweets i.e. no pre-processing and / or cleaning is performed. Consequently,
performance drops for the Spanish language. These results also show that the Accuracy for
the identification of type i.e. human/bot is very high compared to gender prediction
which shows that our proposed stylistic features are more suitable for discriminating
bot from human than gender discrimination. This is likely to happen because bots are
likely to generate profiles without emotions and humans generate profiles with a
combination of emotions and texts. Consequently, it makes it easier for the classifiers to
distinguish human from bot.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper presents a language independent stylometry-based approach for the PAN
2019 Bot and Gender Profiling task. A total of 27 stylistic features were used to build
the proposed system (18 are character-based and 9 emotion-based). A range of
classifiers were also applied including Logistic Regression, Random Forest, LinearSVC,
BernoulliNB, MultinomialNB and SVC. Promising results were obtained on both test
datasets in the final evaluation.</p>
      <p>In future, we plan to apply deep learning methods for the PAN 2019 Bot and Gender
Profiling task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ashraf</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iqbal</surname>
            ,
            <given-names>H. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nawab</surname>
            ,
            <given-names>R. M. A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>Cross-Genre Author Profile Prediction Using Stylometry-Based Approach</article-title>
          .
          <source>In CLEF (Working Notes)</source>
          (pp.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varol</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          , March).
          <article-title>Detection of promoted social media campaigns</article-title>
          .
          <source>In tenth international AAAI conference on web and social media.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Oentaryo</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murdopo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasetyo</surname>
            ,
            <given-names>P. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          (
          <year>2016</year>
          , November).
          <article-title>On profiling bots in social media</article-title>
          .
          <source>In International Conference on Social Informatics</source>
          (pp.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          92-
          <fpage>109</fpage>
          ). Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Shu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2018</year>
          , April).
          <article-title>Understanding user profiles on social media for fake news detection</article-title>
          .
          <source>In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)</source>
          (pp.
          <fpage>430</fpage>
          -
          <lpage>435</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter</article-title>
          .
          <source>Working Notes Papers of the CLEF.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terveen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Halfaker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Bot Detection in Wikidata Using Behavioral and Other Informal Cues</article-title>
          .
          <source>Proceedings of the ACM on Human-Computer Interaction, 2(CSCW)</source>
          ,
          <volume>64</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Rangel</surname>
          </string-name>
          , Francisco, Paolo Rosso,
          <article-title>Manuel Montes-y-</article-title>
          <string-name>
            <surname>Gómez</surname>
            ,
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            , and
            <given-names>Benno</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          .
          <article-title>"Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter." Working Notes Papers of the CLEF (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavancas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          , E.: Overview of PAN 2019:
          <article-title>Author Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Heinatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <source>Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of</article-title>
          CLEF. Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          , H. (eds.)
          <article-title>CLEF 2019 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR-WS.org (Sep</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Evaluations Concerning Cross-genre Author Profiling</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 12</source>
          .
          <string-name>
            <surname>Soler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A semi-supervised approach for gender identification</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources</source>
          and
          <article-title>Evaluation (LREC-</article-title>
          <year>2016</year>
          ), Portorozˇ, Slovenia,
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          13.
          <string-name>
            <surname>Flekova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Preotiuc-Pietro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Exploring stylistic variation with age and income on Twitter</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL</source>
          <year>2016</year>
          ), Berlin, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fatima</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anwar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nawab</surname>
            ,
            <given-names>R. M. A.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Multilingual author profiling on Facebook</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>53</volume>
          (
          <issue>4</issue>
          ):
          <fpage>886</fpage>
          -
          <lpage>904</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          15.
          <string-name>
            <surname>Przybyla</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Teisseyre</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>What do your look-alikes say about you? Exploiting strong and weak similarities for author profiling-Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>In Evaluation Labs</article-title>
          and Workshop - Working Notes Papers (CLEF-
          <year>2015</year>
          ), Toulouse, France.
          <source>CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>A Low Dimensionality Representation for Language Variety Identification</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'16)</source>
          , Springer-Verlag,
          <source>LNCS(9624)</source>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>169</lpage>
          ,
          <year>2018</year>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          17.
          <string-name>
            <surname>Shrestha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey-Villamizar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sadeque</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Age and gender prediction on health forum data</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources</source>
          and
          <string-name>
            <surname>Evaluation (LREC-2016). European Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          18.
          <string-name>
            <surname>Adame-Arcia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro-Castro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortega-Bueno</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Muñ oz, R.,:
          <article-title>Author Profiling, instance-based Similarity Classification</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          19.
          <string-name>
            <surname>Taniguchi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakaki</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigenaka</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsuboshita</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohkuma</surname>
          </string-name>
          ,T.:
          <article-title>AWeighted Combination of Text and Image Classifiers for User Gender Inference</article-title>
          , pages
          <fpage>87</fpage>
          -
          <lpage>93</lpage>
          . Association for Computational Linguistics (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>