<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CAPS: A Cross-genre Author Profiling System Notebook for PAN at CLEF 2016</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Bilan</string-name>
          <email>ivan.bilan@gmx.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Desislava Zhekova</string-name>
          <email>zhekova@cis.uni-muenchen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Information and Language Processing, LMU Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the Cross-genre Author Profiling System (CAPS) in the PAN16 shared task [15]. The classification system considers parts-of-speech, collocations, connective words and various other stylometric features to differentiate between the writing styles of male and female authors as well as between different age groups. The system achieves the second best score - 74.36% accuracy (with the best performing system (BPS) reaching 75.64%) for gender identification on the official test set (test set 2) for English. Further, for age classification, we report accuracy of 44.87% (BPS: 58.97%). For Spanish, CAPS reaches performance of 62.50% (BPS: 73.21%) for gender and 46.43% (BPS: 51.79) for age, while for Dutch, the accuracy for gender (the task did not target age) is lowest - 55.00% (BPS: 61.80%). For comparison, we also tested CAPS on single genre classification of author gender and age on the PAN14 and PAN15 datasets achieving comparable performance.</p>
      </abstract>
      <kwd-group>
        <kwd>author profiling</kwd>
        <kwd>cross-genre</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Author profiling is the process of analyzing the sociolect aspect of the writer of an
anonymous text sample. The classification tasks in this field range from revealing the
author’s age and gender to determining the native language or even personality traits
of the writer [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. With the advent of social media websites, the Internet became a
treasure trove for classification tasks, similar to author profiling, as the amount of
available data increased immensely. The authors often disclose their
sociodemographic attributes in their online profiles, although sometimes the
information is difficult to trust. For this reason, manually labeling the data is the most
appropriate way of collecting a trustworthy dataset. Moreover, since the data is
usergenerated, it is often freely available in many diverse languages.
      </p>
      <p>Profiling social media authors may be utilized for various purposes. It can, for
instance, be used for user-based analysis to find out the demographic characteristics
of an average consumer or for targeted advertising to place an ad, tailored to the
user’s age and gender, next to their written online text. Apart from online texts,
profiling the author may also be useful in the field of forensics, for example, to
analyze a ransom note.</p>
      <p>
        Previous research into author profiling always concentrated on a single genre. A
new view on this problem was proposed by the “Plagiarism Analysis, Authorship
Identification, and Near-Duplicate Detection” (PAN) 2016 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (further referred to as
PAN16) shared task1, namely the differentiation among stylistic and structural
differences between author classes in one text genre when these differences are
observed, analyzed and learned from another text genre.
      </p>
      <p>In this paper, we present the participation of the Cross-genre Author Profiling
System (CAPS) in the PAN16 shared task on cross-genre author profiling. Further, in
section 2, we provide an overview of relevant work and in section 3 we describe the
experimental setup we used. In section 4, we present the results CAPS achieved at the
PAN16 task, as well as its evaluation on the PAN14 and PAN15 datasets. Finally, in
section 5, we conclude our findings.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        The work by Koppel et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] may be considered the pioneering work into the area of
author profiling. With the help of various stylometric features, the authors showed
that the gender classification of an author is possible with the aid of machine learning,
achieving about 80% accuracy on a small dataset of fiction and non-fiction text
samples. Schler et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] explored both gender and age profiling on a dataset of over
70.000 blog posts, subsequently reaching approximately 80% and 75% accuracy for
gender and age classification respectively.
      </p>
      <p>
        Since 2013, the yearly PAN shared task includes author profiling as one of its
subtasks. PAN13 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], conducted in 2013, concentrated on a social media dataset
collected from the Netlog2 website consisting of blogs and including gender and age
labels. The task spanned English and Spanish text samples. The best performance for
gender profiling reached 59% approximate accuracy for English, whereas the best
result for age classification achieved around 66% accuracy.
      </p>
      <p>
        The following year PAN14 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] included a wider range of genres, such as tweets
collected from Twitter3, blogs scraped from LinkedIn4, hotel reviews gathered from
the TripAdvisor5 website and also included a refined version of the PAN13 dataset. In
addition to English and Spanish, Dutch and Italian were also targeted by the task. The
best result for gender classification in English was achieved on a Twitter dataset with
about 76% accuracy. Other genres showed much lower results for gender
identification ranging from 53% accuracy for the social media dataset to 73%
accuracy for the classification of hotel reviews for English. Age identification proved
to be an even more complex task with the best accuracies being between 35% and
50% for various genres in English.
      </p>
      <p>
        PAN15 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] concentrated on Twitter samples only and expanded the profiling task
to personality traits. The best performance on the PAN15 shared task for both gender
and age identification was about 84% and 79% for gender and age classification
respectively for the English language. This is significant performance improvement
considering the results reached at PAN13.
1 http://pan.webis.de/clef16/pan16-web/author-profiling.html
2 http://www.netlog.com
3 https://twitter.com
4 https://www.linkedin.com
5 http://www.tripadvisor.com
      </p>
      <p>The PAN16 author profiling task concentrates on cross-genre gender and age
classification, implying the use of one genre for training and an unseen genre to test
the classification model. Such modification of the task can be helpful for
underrepresented genres for which no or only small amount of training data is
available. However, this modification also increases the complexity of the author
profiling task immensely. Not only would a system need to be developed in a general
enough manner, but also, to achieve good results, the training set needs to be as
similar or close to the test dataset genre as possible.</p>
      <p>Our work further demonstrates the discrepancies between the datasets as well as
the need to use similar text genres for both training/development and the testing
phase. In the following section, we describe our experimental setup and the CAPS
system with which we participated in the PAN16 shared task.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Experimental Setup</title>
      <sec id="sec-3-1">
        <title>3.1 Workflow Overview</title>
        <p>CAPS includes the following main pipeline processes: preprocessing, TF-IDF
representation, topic modeling, chi-square term extraction and custom feature
extraction. Figure 1 gives a visual representation of the classification pipeline.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Data Preprocessing and Feature Extraction</title>
        <p>
          Preprocessing: The HTML content and the Bulletin Board Code present in the data
are removed as a first preprocessing step. Furthermore, all links are normalized to the
special token [URL]. Additionally, all user mentions of the form @username are
translated to [USER]. Since the dataset included a considerable number of duplicate
text samples, probably due to its automatic collection with the help of web scraping,
we excluded all duplicate samples. Table 1 shows the training dataset distribution into
each gender and age class as well as a detailed breakdown of the number of authors
and the underlying text samples after all duplicate text samples have been discarded.
Lemmas and part-of-speech (POS) tags are produced using the TreeTagger [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          Term Frequency-Inverse Document Frequency (TF-IDF): TF-IDF is a
procedure widely used in information retrieval and data mining to measure the
importance of each word in a corpus which was first formulated in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. For the task
of cross-genre author profiling a TF-IDF implementation of the scikit-learn machine
learning toolkit [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is used to convert the lemmas and POS-tags, as well as,
categorical character n-grams to a matrix of TF-IDF features. The latter is a
subdivision of character n-grams into finer classes first introduced by Sapkota et al.
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and successfully used for the task of author profiling by Maharjan et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>At the early stage of model training, grid search 5-fold cross-validation was
performed for TF-IDF optimization. The TF-IDF vector for the lemma representation
shows best results when considering unigrams, bigrams, and trigrams. The
part-ofspeech TF-IDF vector additionally considers four-gram POS sequences. The
categorical character TF-IDF representation considers only trigram characters.
Fig. 2 shows the most informative lemma, uni-, bi- and trigrams used by male and
female authors. The representation is based on the PAN16 Twitter dataset merged
with the PAN14 training datasets. Fig. 2 indicates that mentioning the opposite gender
provides reliable information about the gender of the author, suggested by the
presence of lemmas/n-grams, such as male, my husband, boyfriend, husband (used by
female authors) and women, wife, my wife, girlfriend (used by male authors) amongst
the most informative lemmas and bigrams.</p>
        <p>
          Topic Modeling: We apply Latent Dirichlet Allocation (LDA) for topic modeling
of the lemmas. LDA is a generative statistical model that assigns probability weights
to words and according to those probabilities, assigns the word to a certain
automatically generated topic [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Chi-Square Term Selection: Since the use of TF-IDF vector representations for
the lemma, part-of-speech and character n-grams produces exceedingly high
dimensional vectors, for the final classification model only a smaller percentile of the
features is selected. For this purpose, the chi-square test is used which tests the
independence of term and class occurrences. Subsequently, the terms that are
independent of the class are eliminated. This technique ensures the selection of
classdependent features and shrinks the number of the latter, allowing custom features to
influence the classification significantly more.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Custom Features</title>
        <p>Apart from TF-IDF vector representation and topic modeling about 40 additional
custom features have been developed for age and gender cross-genre profiling. Most
of these are stylometric features representing the linguistic style of the author. The
features are grouped into four clusters: dictionary-based, POS-based, text structure
and stylistic.</p>
        <p>Dictionary-Based: The purpose of this set of features is to check whether the raw
text representation includes words from a predefined list of tokens. The feature cluster
Feature Cluster</p>
        <p>Feature Name
Dictionary-based</p>
        <p>Connective Words
Emotion Words
Contractions
Familial Words</p>
        <p>Collocations
Abbreviations and</p>
        <p>Acronyms
Stop Words</p>
        <p>English
furthermore, firstly,
moreover, hence …
sad, bored, angry,
nervous, upset …
I’d, let’s, I’ll, he’d, can’t,</p>
        <p>he’d …
wife, husband, gf, bf, mom</p>
        <p>…
dodgy, telly, awesome,</p>
        <p>freak, troll …
a.m., p.m., Mr., Inc.,</p>
        <p>
          NASA, asap …
did, we, ours, you, who,
these, because …
consists of dictionaries of connective words, emotion words, contractions, family
related words (as proposed by Maharjan et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]), collocations, abbreviations, and
acronyms, as well as stop words. All of the dictionaries are adapted for each of the
three languages: English, Spanish, and Dutch. A more detailed overview of this
cluster with its underlying examples is given in Table 2.
        </p>
        <p>
          POS-Based: This cluster captures the distribution of the various parts-of-speech.
Additionally, this set includes a more complex F-Measure feature, first introduced by
Heylighen et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which indicates how implicit or explicit the text is. The
FMeasure is calculated based on the usage of various POS tags in the text.
        </p>
        <p>Text Structure: This feature cluster attempts to analyze the structure of the text
and consists of such features as type/token ratio, average word length, and the amount
of punctuation signs used in each text sample.</p>
        <p>
          Stylistic: This set of features counts the frequency of use of different adjectival and
adverbial suffixes in the text samples. First introduced by Corney et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for the
classification of emails in English, the stylistic feature set is used to represent
variation in the writing styles of the authors. For example, men use more emotionally
intensive adverbs and adjectives, such as “awful”, “dreadfully” or “terribly” [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
which is captured by this feature set.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4 Feature Scaling</title>
        <p>After all custom feature vectors are extracted, they are scaled. There are several
techniques used for feature scaling. Some of the most widely used are normalization
and standardization. In the case of cross-genre author profiling, feature values of the
training set differ greatly from the feature values of the test set. For example, the
lengths of text samples throughout various writing styles range from short, one to two
sentence long tweets, to very long samples, as for instance, blog posts. Using
normalization or similar form of rescaling would be most suitable to cope with these
differences, but the information about outlier values, which can be best captured using
standardization, would be lost.</p>
        <p>To be able to use standardization for this particular task, a form of feature vector
pre-scaling is introduced. Pre-scaling rescales the feature values, which count the
number of occurrences of a certain token or a stylistic characteristic, relative to the
sample length. A simple solution to this problem could be a division of the feature
value by the length of the text in tokens. A more comprehensive approach is presented
in Equation 1. It rescales the sample length relative to the lowest mean length of a text
sample throughout all possible writing styles that could be represented in both
training and test sets and divides the feature value by this rescaled sample length. The
rescaled sample length represents the amount of possible smallest sample entities that
would fit into the text sample under review. Using this technique, the feature value is
always scaled relative to the minimum mean length of all text samples of all
represented writing styles. The average length is used instead of minimum or
maximum length to better represent the sample length distribution of the writing style
that has the shortest text samples on average in the dataset.</p>
        <p>( )
−
=
(
 ( )
 (  )
(  1…    )|   ≔ 
(  1)… 
(   ))
(1)</p>
        <p>Equation 1 gives a mathematical formulation of the feature pre-scaling approach,
where  ( ) represents the current feature value,   is the current text sample,  stands
for mathematical mean,   represents a genre and    is the text sample of the genre
  . len() is a function which, given a text sample, returns its length either in tokens or
in characters, which makes this interpretation suitable for both types of features that
work on the level of tokens and the ones dealing with character representation.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5 Classification</title>
        <p>
          There are various approaches to the author profiling classification. The task implies
classifying the author of a given text sample, but in many cases, there is a whole set of
documents belonging to one author, which raises the question of how to handle the
big number of samples per author. It is possible to concatenate all text samples of
each author into one uniform sample, as demonstrated by Ṣulea et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Another
approach is to build intra-profile relations between text samples and the author
profile, as described by Álvarez-Carmona et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], or to classify each text sample
separately and then classify the author class based on each text sample belonging to
the author. We made use of the latter approach for the development of CAPS.
        </p>
        <p>
          In fact, some works, including [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], suggest that gender and age classification
should be considered as a unified task since these two classes are interrelated. Then,
the classification is more accurate when both the gender and age attributes are used
simultaneously and not as a separate classification task. Others, like [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], consider it a
separate task and build different models for gender and age classification. CAPS also
approaches gender and age classification as separate problems. Although it uses the
same set of features for both classification models, the classifier used to train the
gender profiling model differs from the one used for age identification.
        </p>
        <p>
          Gender classification is performed using LinearSVC [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is based on the
LIBLINEAR Classifier [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and is an implementation of a Support Vector Machine
with linear kernel, while age identification makes use of a one-vs.-rest classifier based
on Logistic Regression [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Various other classifiers have been tested for the task, as
for instance, Decision trees classifier or Stochastic Gradient Descent. Our choice of
LinearSVC and Logistic Regression was mainly based on their overall computational
efficiency on large datasets.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <p>
        4.1 PAN16
All of the test datasets used in the PAN shared tasks since 2014 are not openly
available and can only be used through the TIRA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] submission system. TIRA
automatically evaluates the classification in terms of accuracy which makes it
impossible to review the precision and recall results of the system. For this reason,
and since the PAN16 shared task implies testing the dataset on an unseen genre, a
smaller subset of the PAN14 hotel reviews and blogs datasets have been used for
initial system evaluation in addition to the available PAN16 data.
      </p>
      <p>It is also important to note that the PAN16 task included two different test sets,
although no further information is available on them until the final evaluation results
are announced. This section reports system results on both these datasets.</p>
      <p>Table 3 gives a detailed overview of CAPS' official participation in the PAN16
shared task. CAPS reaches highly competitive performance to the best participating
systems in the task, despite the actual complexity of the problem, achieving 74.36%
accuracy (on test set 2, which is the official test set used for system ranking) for
gender classification on English, closely following the best performing system in this
setting that reached 75.64%. However, there is about 20 percent points discrepancy
between the system performance on test set 1 (53.74%) and test set 2 (74.36%). This
big performance gap is also observed across the performance of the majority of the
other participating systems, which shows that the overall performance on this task is
highly biased towards the actual test setup. In fact, test set 1 was assembled in a way,
such that none of the systems managed to beat the official baseline for this setting –
56.41%. Our assumption was that test set 1 and test set 2 differ based on their genre
and that test set 1 is a genre that is more dissimilar to the training set than test set 2.
However, this is not the case, since after the evaluation phase it was revealed that both
datasets partially overlap which can only be the case if they are from the same genre.
Yet, further details on the actual datasets would need to be provided (presumably
contained in the task overview paper presented at the PAN16 workshop during the
CLEF Conference in September 2016) in order to be able to understand this
difference better.</p>
      <p>The results for age classification reached only 44.87% for English which reflects
the complexity of author profiling when five age groups are targeted. Additionally,
the change of genre also poses a difficulty to this task. While the age indicators for
one group should stay consistent within the same genre, a different genre might pose a
change in the style of writing within the same age group. Specifically, within twitter
data, younger age groups tend to make increased use of acronyms, abbreviations,
special symbols, etc. which is learned during training. Different genres (expected
during testing in this particular task), such as blogs or reviews directly pose a
limitation on the use of such an excessive amount of Twitter-specific style as there is
no text length limit and in general, all age groups tend to write closer to the standard
language. Such a change would be particularly hard to capture within a classification
approach. One remedy for this problem could be not to use a single genre for training,
but to look into a range of genres that would represent well the different writing styles
one particular age group could have.</p>
      <p>Apart from the overall lower results reached on the age classification subtask, once
again a significant discrepancy between the performance on the two test sets is
observed with accuracy scores of 29.02% and 44.87% for each set respectively (an
issue co-occurring across all participating systems).</p>
      <p>CAPS' results on Spanish and Dutch are much lower, which is easily explained by
the fact that the main focus during system development and training lied on English.
For Spanish (gender), CAPS achieves 62.50% on test set 2, while for Spanish (age),
similar to English, the result is considerably lower – 46.43%. The latter numbers
show that Spanish follows the general tendencies observed for English with large
discrepancies in performance between test set 1 and 2 and with test set 1 leading to
lower performance. For Dutch (gender (age was not targeted during the task)),
however, the situation is slightly different – the gap between the performance on both
test sets is only 1 percent point, which is considerably lower than the gap observed for
English (20.62%) and Spanish (6.25%) for the same setting. The latter is also
observed among all other systems. These changes seem to correlate with the size of
the datasets available which presumably is also an indicator that once gender and age
indicators are well learned for one genre, the change in genre only leads to higher
error rates based on the discrepancies of the writing styles between the genders and
age groups across the different domains.
4.2 PAN14 and PAN15</p>
      <p>For comparison purposes and in addition to the participation in the PAN16 shared
task, CAPS was also trained and tested on the datasets of the PAN14 and PAN15
shared tasks through the TIRA evaluation system.</p>
      <p>Within the PAN15 setup, CAPS successfully performed on the single genre
datasets and produced results also highly competitive to the state-of-the-art systems
presented in the task. For instance, CAPS achieved 81.69% for gender profiling on
the English dataset, which is only 4.23 percent points lower than the best system
presented in that shared task. Table 4 compares the results of CAPS with the best
results achieved for gender and age classification in the PAN15 shared task across all
three languages (English, Spanish, and Dutch). For CAPS, as for the PAN16 system
participation, we provide detailed numbers for both test sets separately, as well as an
averaged accuracy score.</p>
      <p>A further comparison between CAPS' performance and the best system results in
the PAN14 is given in Table 5. Due to the limited amount of Random Access
Memory on the TIRA Virtual System, not all PAN14 datasets could be evaluated in
time. Subsequently, due to time constraints no additional resources have been
requested from the organizers.</p>
      <p>Altogether, Table 5 demonstrates the system’s effectiveness on various single
genre classifications. The best result of 71.32% in terms of accuracy is achieved on
the English hotel reviews dataset for the gender classification, falling about 1 percent
point short of the best system’s performance (72.59%).</p>
      <p>Overall, the results indicate that the current system may also be used in a single
genre setting, especially evident by the results on the PAN15 datasets where the
model reaches around 70% of average accuracy on both gender and age classification
throughout all three languages.</p>
      <p>Language
(Setting)
English (Gender)
English (Age)
English (Gender)
English (Age)
English (Gender)
English (Age)
Spanish (Gender)
Spanish (Age)
Spanish (Gender)</p>
      <p>Spanish (Age)
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The task of cross-genre author profiling is highly complex in comparison to single
genre profiling. For the task presented at PAN16, the classification model needs to be
very robust and able to adapt the observations learned from short and usually
stylistically and grammatically malformed tweet samples to work on any possible
form of text as well as samples of any length, for example, hotel reviews, blogs or any
other writing style.</p>
      <p>CAPS showed promising results in a cross-genre aspect of author profiling
regardless of the complexity of the task. The performance of the system on gender
identification reached 74.36% for English, 62.50% for Spanish and 55.00% for Dutch
displaying great variation across both datasets and across all three languages. With
respect to age, CAPS reached 44.87% for English and 46.43% for Spanish (Dutch
was not included in the age setting). These low results do outperform the
corresponding baseline, but also show that five age groups are probably a too
finegrained distinction to automatically deal with in addition to the change of genre which
also increases the complexity of the task. For comparison, we also evaluated CAPS on
single genre classification of author gender and age on the PAN14 and PAN15
datasets demonstrating that our system is highly competitive to the state-of-the-art
systems applied across all genres. On the PAN15’s English dataset CAPS achieved
81.69% for gender classification with the best PAN15 participating system reaching a
performance of 85.92%.</p>
      <p>CAPS can be further improved in various ways. Firstly, more attention needs to be
paid to Spanish and Dutch since the included custom features are only tailored for
English and simply adjusted to function on other languages represented in the shared
task. Second, the current model considers each text sample as a separate entity that
does not correlate with the other text samples belonging to the author. Some form of
text sample-author profile interrelation could improve the model performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-Monroy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VillaseñorPineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
          </string-name>
          , H.:
          <article-title>INAOE's participation at PAN'15: Author Profiling task - Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>In: CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, Toulouse, France (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Coelho</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richert</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>Building Machine Learning Systems with Python, Second Edition</article-title>
          . Packt Publishing Ltd. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Corney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Vel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohay</surname>
          </string-name>
          , G.:
          <article-title>Gender-preferential text mining of e-mail discourse</article-title>
          .
          <source>In: Proceedings - Annual Computer Security Applications Conference</source>
          , ACSAC, vol. 2002-January, pp.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>LIBLINEAR: A Library for Large Linear Classification</article-title>
          .
          <source>The Journal of Machine Learning 9</source>
          ,
          <fpage>1871</fpage>
          -
          <lpage>1874</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoppe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>TIRA: Configuring, executing, and disseminating information retrieval experiments</article-title>
          .
          <source>Proceedings - International Workshop on Database and Expert Systems Applications, DEXA</source>
          ,
          <fpage>151</fpage>
          -
          <lpage>155</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>González-Gallardo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sierra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Núñez-Juárez</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>SalinasLópez</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ek</surname>
          </string-name>
          , J.:
          <article-title>Tweets Classification Using Corpus Dependent Tags, Character and POS N-grams - Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Heylighen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dewaele</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Variation in the Contextuality of Language: An Empirical Measure</article-title>
          .
          <source>Foundations of Science</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <fpage>293</fpage>
          -
          <lpage>340</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Argamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimoni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatically Categorizing Written Texts by Author Gender</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>17</volume>
          (
          <issue>4</issue>
          ),
          <fpage>401</fpage>
          -
          <lpage>412</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Maharjan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Using Wide Range of Features for Author Profiling - Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>In: CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, Toulouse, France (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving gender classification of blog authors</article-title>
          .
          <source>In: Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference</source>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>217</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Overview of the 3rd Author Profiling Task at PAN 2015</article-title>
          .
          <article-title>In: CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, Toulouse, France (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chugur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trenkmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Overview of the 2nd Author Profiling Task at PAN 2014</article-title>
          .
          <article-title>In: CLEF 2014 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, Sheffield, UK (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inches</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the Author Profiling Task at PAN 2013</article-title>
          .
          <article-title>In: CLEF 2013 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, Valencia, Spain (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Evaluations Concerning Cross-genre Author Profiling</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sapkota</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            , S., y Gómez,
            <given-names>M. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Not all character n-grams are created equal: A study in authorship attribution</article-title>
          . In:
          <year>2015</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT</article-title>
          <year>2015</year>
          ), pp.
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Schler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Argamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Effects of Age and Gender on Blogging</article-title>
          .
          <source>In: Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs</source>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>205</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.:
          <article-title>Probabilistic Part-of-Speech Tagging Using Decision Trees</article-title>
          .
          <source>In: Proceedings of International Conference on New Methods in Language Processing</source>
          , Manchester, UK, pp.
          <fpage>44</fpage>
          -
          <lpage>49</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Spärck</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>A Statistical Interpretation of Term Specificity and its Retrieval</article-title>
          .
          <source>Journal of Documentation</source>
          <volume>28</volume>
          (
          <issue>1</issue>
          ),
          <fpage>11</fpage>
          -
          <lpage>21</lpage>
          (
          <year>1972</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ṣulea</surname>
            ,
            <given-names>O.-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dichiu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Automatic Profiling of Twitter Users Based on Their Tweets - Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>