The Winning Approach to Cross-Genre Gender Identification
                 in Russian at RUSProfiling 2017
                          Ilia Markov, Helena Gómez-Adorno, Grigori Sidorov, Alexander Gelbukh
                                                    CIC, Instituto Politécnico Nacional
                                                           Mexico City, Mexico
                        imarkov@nlp.cic.ipn.mx, helena.adorno@gmail.com, sidorov@cic.ipn.mx, www.gelbukh.com

ABSTRACT                                                                 genres: offline texts (such as a letter to a friend or a picture descrip-
We present the CIC systems submitted to the 2017 PAN shared task         tions), Facebook posts, tweets, product and service online reviews,
on Cross-Genre Gender Identification in Russian texts (RUSPro-           and gender imitation texts.
filing). We submitted five systems. One of them was based on a              Machine-learning methods are commonly used for the AP task.
statistical approach using only lexical features, and other four on      From the machine-learning perspective, the task is viewed as a
machine-learning techniques using some combinations of gender-           multi-class, single-label classification problem, in which automatic
specific Russian grammatical features, word and character n-grams,       methods are to assign class labels (e.g., male or female) to the
and suffix n-grams. Our systems achieved the highest weighted            text samples. Recently, deep-learning techniques [19], such as
accuracy across all the test datasets, occupying the first four places   character-, word-, and document-embedding approaches [10], have
in the ranking.                                                          been used for the task; however, linear models still perform better,
                                                                         since they seem to be more robust in capturing stylistic information
                                                                         in the author’s writing. Therefore, we employ the commonly-used
KEYWORDS
                                                                         linear machine-learning approaches, as well as propose a novel
Author Profiling, Gender Identification, Cross-Genre, Social Media,      statistical approach aiming to identify the gender of an author
Russian, Machine Learning, Computational Linguistics                     basing on statistical analysis of lexical information.
                                                                            The paper is organized as follows. In Section 2, we discuss the
                                                                         related work. In Section 3, we provide some characteristics of the
1     INTRODUCTION                                                       datasets used in the RUSProfiling shared task 2017. In Section 4, we
Author profiling (AP) is the task of identifying the author’s demo-      describe the conducted experiments, providing the experimental
graphics, such as age, gender, personality traits, or native language,   settings for the submitted systems. In Section 5, we give the ob-
basing on a sample of his or her writing. This task has numerous         tained results and their evaluation. Finally, in Section 6 we draw
practical applications in forensics, security, and marketing, to name    some conclusions and point to possible directions of future work.
just a few. For example, in forensics and terrorism prevention ap-
plications, knowing the characteristics of the suspect can narrow
                                                                         2   RELATED WORK
down the search space for the author of a written threat; in mar-
keting applications, this information can be important to predict a      The PAN evaluation campaign has become one of the main plat-
customer’s shopping preferences or develop new targeted products.        forms for evaluation of AP approaches and methodologies. There
   The rapid growth of social media data available on the Internet       have been various profiling aspects covered by PAN since 2013 [15],
has significantly contributed to the increased interest in this task.    including age, gender, personality traits, and language variety iden-
This interest led to establishing of the annual PAN evaluation cam-      tification, under both single- and cross-genre AP conditions.
paign1 , which is considered one of the main fora on AP, authorship          PAN 2017 [16] attracted 22 submissions. Most of the teams (in-
attribution, plagiarism detection, and other tasks related to the        cluding the top three systems) used traditional machine-learning
study of authorship and characteristics of the author of a text.         algorithms, such as SVM [9, 11, 20] or logistic regression [4, 13].
   Recent trends in the field include cross-genre AP scenario [17],      This edition can be characterized by the increased use of deep-
that is, the setting when the training corpus consists of texts of one   learning techniques [5, 18], in particular word and character em-
genre, while the test set consists of texts of another genre. Cross-     beddings [2, 4, 19], which are gaining popularity and achieving
genre AP conditions better match the requirements of a real-life         competitive, but still lower than the linear models, results for the
scenario of forensic applications, when the available texts by the       AP task.
candidate authors can belong to genre and thematic area different            Content-based and style-based features have been extensively
from the texts under investigation.                                      used in the previous editions of PAN. As content-based features, bag
   Following the recent trends in the field, the 2017 PAN shared task    of words, word n-grams, slang words, locations, brand names, topic
on Gender Identification in Russian texts (RUSProfiling) [7] pro-        words, among others, were used by several teams. As style-based
vided cross-genre AP scenario: the training corpus was composed          features, character n-grams are the most popular feature type for
of tweets, while the provided test datasets covered five different       AP, other feature types include ratio of links, character flooding,
                                                                         typed character n-grams, emoticons, hashtags, and user mentions.
                                                                             Due to the scarcity of available training data, AP research in the
1 http://pan.webis.de                                                    Russian language has been limited. The first corpus in the Russian
FIRE’17, December 08-10, 2017, Bangalore, India                                                                                 I. Markov et al.


language annotated with the authors’ metadata information—the                        Table 1: RUSProfiling datasets statistics
Ruspersonality corpus—was introduced by Litvinova et al. [6]. The
corpus is composed of texts labeled with the author gender, age,                        No. of     Words       Words      Chars        Chars
personality traits, native language, neuropsycological testing data,       Dataset
                                                                                        docs       Avg.         Std.      Avg.          Std.
and educational level. The corpus also contains a subset of truthful
and deceptive texts. At the time publication of [6], the corpus            Training         600    1,216.16    731.61     7,736.20    4,674.29
contained over 1,850 documents.                                            Test 1           370      277.75    109.83     1,650.54      639.44
   Several experiments were carried out in order to illustrate the         Test 2           228    1,096.60    164.41     6,900.66    1,106.27
usefulness of the Ruspersonality corpus [6, 8]. For gender identi-         Test 3           400      729.22    686.23     4,672.79    4,426.54
fication, Litvinova et al. [6] used a range of context-independent         Test 4           776       54.40     44.39       354.18      276.56
features such as part-of-speech (POS) tags, syntactic relations, ra-       Test 5            94      272.92    157.07     1,685.84      945.69
tios of POS tags, punctuation marks, and emotion words. They also
evaluated different machine-learning algorithms: gradient boosting,
adaBoosting, random forest, SVM, ReLU, among others. The best
                                                                         Keeping in mind that test datasets are in another genre, we kept
performance was obtained by ReLU (mean F1-score of 74%).
                                                                         only cyrillic characters (non-cyrillic characters along with punctua-
                                                                         tion marks were removed). We also performed lowercasing, which
3   DATASETS
                                                                         yielded slight improvement in accuracy. These pre-processing steps
The focus of the RUSProfiling shared task 2017 is on cross-genre         were applied in all our runs (in the context of this shared task, sys-
gender identification. The organizers provided a training dataset        tems are officially called runs).
composed of tweets and five different test datasets on the following         In all the runs based on machine-learning techniques, we used
genres:                                                                  Support Vector Machines (SVM) algorithm, which is considered
    Test 1: Offline texts (such as picture descriptions or letter to a   among the best-performing classification algorithms for text cate-
      friend) from the Ruspersonality Corpus [6].                        gorization tasks, including cross-genre AP scenario [17]. We used
    Test 2: Facebook posts.                                              the liblinear scikit-learn [14] implementation of SVM with the OvR
    Test 3: Twitter messeges.                                            multi-class strategy. We set the penalty hyper-parameter C to 100
    Test 4: Product and service online reviews.                          basing on the evaluation results. In our experiments on the training
    Test 5: Gender imitation corpus, that is, women imitating men        dataset, SVM showed higher performance than other classification
      and vice versa.                                                    algorithms we tried, such as random forest, logistic regression,
   Table 1 presents general statistics of the training and five test     multinomial Naïve Bayes, LDA, and ensemble classifier.
datasets. In the table, No. of docs stands for the number of docu-           In our machine-learning approaches, we used two different im-
ments in each dataset. The statistics of the average number (Avg.) of    plementations of the term frequency–inverse document frequency
words and characters per document, as well as standard deviation         (tf-idf) weighting: the default scikit-learn implementation and tf-
(Std.), were calculated after applying pre-processing steps, which       idf with sublinear tf scaling, i.e., tf was replaced with 1 + log(tf).
included lowercasing and removal of all non-cyrillic characters          In our experiments on the training dataset, tf-idf systematically
(punctuation marks were also removed). In terms of average num-          outperformed other examined weighting schemes, such as binary,
ber of words and characters, the Test 2 dataset is the most similar to   tf, and log entropy.
the training corpus. The main difference between the two datasets            The configurations of the five runs of the CIC team are described
is the standard deviation, which is larger in the training corpus.       below.
The Test 3 dataset is on the same genre as the training corpus, but it
contains shorter documents, of 729.22 words on average. The Test 1       4.1    Run CIC-1 (machine learning)
and Test 5 datasets have similar statistics in terms of the number       Features Since in the Russian language singular forms of the
of words and characters, but differ in the number of documents           past tense verbs change by gender (singular masculine forms have
(370 and 94, respectively). Finally, the Test 4 dataset contains the     the ending -l “-l”, while an indicator of singular feminine forms is
shortest documents, of 54.40 words on average.                           the ending -la “-la”), we used “word ending in -la” as a feature.
                                                                         Moreover, since the past tense reflexive verbs maintain the reflexive
4   EXPERIMENTAL SETTINGS                                                ending -s~ “-s’ ”, we also used the feature “word ending in -las~”
To evaluate our systems, we conducted experiments both on the            “-las’ ”. We employed the features -la “-la” and -las~ “-las’ ” in
provided training dataset under 10-fold cross-validation and using       isolation, as well as in combination with the subject of the sentence
80%–20% dataset splitting, that is, we used 80% (480 documents) of       if the subject was the first-person singular pronoun  “ya” and if
the training dataset for training and 20% (120 documents) for eval-      this subject was within the window of 6 words after, or 3 words
uation. The splitting was balanced across the genders. Following         before, the verb. This gave four additional composite features:
the official evaluation metrics of the shared task, we measured the      “ -la”, “ -las~”, “-la  ”, and “-las~  ” with the meaning such
performance in terms of classification accuracy.                         as “I -edfeminine myself”, as in I dressed myself in a skirt. The window
    We applied several pre-processing steps before feature extraction.   size (+6/−3) was selected based on grid search.
Pre-processing has proved to be a useful strategy for author pro-            In addition, since Russian adjectives agree with the pronouns
filing [3, 11] and related tasks, such as authorship attribution [12].   in gender, we used the ending -a “-aya” (nominative feminine
The Winning Approach at RUSProfiling 2017                                                       FIRE’17, December 08-10, 2017, Bangalore, India


singular form) in combination with the first person singular pro-             Table 2: 10-fold cross-validation and 80%–20% train-test split
noun  “ya” as feature if the pronoun was within the same +6/−3               results (accuracy)
window as above. This gave two more features: “ -a ” and “-a
 ”, with the meaning such as “I -feminine-singular-adjective ”, as in I am                     10FCV      No. of    80%–20%       No. of
a professor emerita.                                                                    Run
                                                                                                 acc.     features     acc.       features
    Additionally, we used the last three (cyrillic) characters of each
                                                                                       CIC-1    0.8833       3,136     0.8583       2,922
word as features (suffix n-grams, n = 3), which, in particular, in-
                                                                                       CIC-2     0.8550     19,139     0.8583      16,155
directly accounted for other grammatically meaningful endings
                                                                                       CIC-3     0.7400     22,847      0.7417     19,353
such as “nyĭ” (hinting at masculine adjective, as in I am a professor
                                                                                       CIC-4     0.8683     31,045      0.8500     27,222
emeritus).
                                                                                       CIC-5     0.8683     22,625     0.8583      20,003
Frequency threshold Fine-tuning the size of the feature set has
proved to be of a great importance in AP [11]. It allows to reduce
significantly the size of the feature set and at the same time to
                                                                              Weighting scheme We used tf-idf weighting with sublinear tf
improve the results in most cases. In this run, we selected only those
                                                                              scaling.
features that occurred in at least two documents in the training
corpus and occurred at least five times in the entire training corpus
(min_df = 2; threshold = 5).                                                  4.5    Run CIC-5 (machine learning)
Weighting scheme Tf-idf weighting with sublinear tf scaling.                  Features Word unigrams, word 3-grams, and character n-grams
                                                                              (n = 2–4).
4.2    Run CIC-2 (machine learning)                                           Frequency threshold In this run, we set a hight frequency thresh-
Features Word features represent the lexical choice of a writer.              old value: we selected only those features that occurred in at least
These features have proved to be indicative of author’s gender in             two documents in the training corpus and occurred at least 50 times
other languages, such as English, Spanish, Portuguese, and Ara-               in the entire training corpus (min_df = 2; threshold = 50). How-
bic [16]. In this run, we used word unigram features (bag-of-words            ever, setting this high frequency threshold values only marginally
approach) in combination with the last three characters of each               affected 10-fold cross-validation and 80%–20% accuracy, making it
word (suffix 3-grams).                                                        very slightly higher or very slightly lower.
Frequency threshold The threshold was the same as in the CIC-1                Weighting scheme Tf-idf with sublinear tf scaling.
run.
                                                                              5     RESULTS
Weighting scheme Tf-idf weighting without sublinear tf scaling.
                                                                              The 10-fold cross-validation results, in terms of classification ac-
4.3    Run CIC-3 (statistical)                                                curacy (acc.) for each run, as well as the results under 80%–20%
                                                                              dataset splitting, are shown in Table 2. For each experiment, the
First, we labeled the words that occur in the training corpus as
                                                                              results for 10-fold cross-validation (10FCV ) and 80%–20% splitting,
male’s or female’s, depending on whether the word was used (not
                                                                              as well as the number of features (No. of features), are provided. The
counting repetitions) more frequently in male’s or female’s docu-
                                                                              best results for each evaluation procedure is highlighted in bold
ments, except when the difference was less than 2.
                                                                              typeface.
   Next, for each document we calculated the ratio of such male’s to
                                                                                  Our first run, which included gender-specific Russian grammati-
female’s words (not counting repetitions). We labeled a document
                                                                              cal features, showed the highest 10-fold cross-validation accuracy
as male’s if this ratio was above a threshold; otherwise, as female’s.
                                                                              with the smallest number of features. Three out of five of our
Since the dataset was balanced, as the threshold we used the median
                                                                              runs (CIC-1, CIC-2, and CIC-5) showed the same accuracy under
of the distribution of this ratio.
                                                                              80%–20% splitting, probably due to small size of the dataset. Statis-
   We also experimented with taking repetitions of words into
                                                                              tical approach (run CIC-3) showed the lowest accuracy under both
account, thresholds other than 2 for classifying words, as well as
                                                                              10-fold cross-validation and 80%–20% setting, though, surprisingly,
with some formulas other than ratio for classifying documents;
                                                                              it showed the best results on several of the final test datasets, as
however, we observed a lower performance.
                                                                              shown in Table 3. We attribute this, again, to the small size of the
                                                                              datasets available for development.
4.4    Run CIC-4 (machine learning)                                               A comparison of the participating systems, including the official
Features Combination of word and character n-gram features                    ranking, is presented in [7]. We show the detailed results of our five
usually provides good results for AP, for instance, a combination             runs on the five test datasets, along with the highest result achieved
of word and character n-grams was used by the best performing                 on each test set among all participating systems and the system
system [1] at this year’s PAN shared task [16]. In this run, we used          that showed this result, in Table 3. The best result on each test
a combination of word unigrams with character n-grams (n = 2–3).              dataset is highlighted in bold typeface. Avg. stands for the average
Frequency threshold We selected only those features that oc-                  accuracy of each run across the five test datasets; if a system was
curred in at least two documents in the training corpus and oc-               not tested on some test set, we counted its accuracy on this test set
curred at least four times in the entire training corpus (min_df = 2;         as zero. Weighted stands for the accuracy weighted by the number
threshold = 4).                                                               of documents in each test set (again, counting as zero if a system
FIRE’17, December 08-10, 2017, Bangalore, India                                                                                                  I. Markov et al.

                                 Table 3: Results for the five runs of the CIC team on the five test sets

                System                Test 1       Test 2    Test 3      Test 4           Test 5           Avg.       Weighted         Norm.
                Best result          0.7838        0.9342    0.6825      0.6186         0.6596           0.6580         0.6456         0.9258
                Best system       Bits_Pilani-4    CIC-2     CIC-3       CIC-3       Bits_Pilani-5       CIC-1          CIC-3          CIC-3
                CIC-1                 0.5865       0.9211    0.6525      0.5979          0.5319          0.6580         0.6435         0.9154
                CIC-2                 0.5838       0.9342    0.6650      0.5709          0.5213          0.6550         0.6354         0.9014
                CIC-3                 0.6027       0.7851    0.6825      0.6186          0.5426          0.6463         0.6456         0.9258
                CIC-4                 0.4676       0.8860    0.5975      0.5116          0.5213          0.5968         0.5675         0.8047
                CIC-5                 0.4973       0.8991    0.6275      0.5258          0.5000          0.6099         0.5862         0.8313
                CIC best rank          4th           1st       1st        1st              4th              1st           1st            1st


was not evaluated on a test set); this was the measure used for the             used only for some of the genres. Our first run based on a machine-
official ranking. Norm. is similar to Weighted, but is normalized by            learning approach using gender-specific Russian grammatical fea-
the highest accuracy on each test set (note that this is not accuracy;          tures showed the highest average accuracy across all the test datasets,
it is the average closeness of the given system to the best system).            while our statistical approach based on lexical features showed the
    As one can see from Table 3, none of the runs consistently out-             best performance according to the weighted (official) and normal-
performed other runs across all the test datasets. The Test 3 set               ized evaluation.
consisted of documents that were collections of various tweets of                  One of the directions for future work would be to examine in
the same author, similarly to the training corpus, so it was not                more detail the importance of morphological features for gender
exactly cross-genre scenario, but the documents in the Test 3 set               identification in Russian texts, as well as to improve our statistical
contained fewer tweets than those of the training corpus. On this               approach by automatically tuning the threshold value according to
dataset, as well as on Test 4 with the shortest documents (online               the size and genre of the test data.
reviews), of our runs, the best performance was achieved by run
CIC-3, which was based on the statistical approach. Test 2 (Face-               ACKNOWLEDGMENTS
book posts) was the only test set, on which our statistical approach
(CIC-3) failed to produce good result.                                          This work was partially supported by the Mexican Government
    Surprisingly, on the gender imitation corpus (Test 5), CIC-1 was            (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20171813,
our second-best run (after CIC-3), even though CIC-1 was based                  20172008, and 20172044).
on gender-specific Russian grammatical (morphological) features,
such as the grammatical gender of verbs and adjectives, which in                REFERENCES
imitated text follow the patterns of the gender being imitated.                  [1] Angelo Basile, Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel Haagsma,
    Runs CIC-4 and CIC-5, in spite of showing similar 10-fold cross-                 and Malvina Nissim. 2017. N-GrAM: New Groningen Author-profiling Model.
                                                                                     In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop
validation and 80%–20% accuracy, performed worse on the test                         Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.
datasets than our first three runs. This can be due to the inclusion             [2] Marc Franco-Salvador, Nataliia Plotnikova, Neha Pawar, and Yassine Benajiba.
                                                                                     2017. Subword-based Deep Averaging Networks for Author Profiling in Social
of character n-grams, which probably caused overfitting. Another                     Media. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop
reason for the relatively poor performance of CIC-5 could be the                     Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.
too high frequency threshold value set for this run.                             [3] Helena Gómez-Adorno, Ilia Markov, Grigori Sidorov, Juan-Pablo Posadas-Durán,
                                                                                     Miguel A. Sanchez-Perez, and Liliana Chanona-Hernandez. 2016. Improving
    For more in-depth analysis of the obtained results, the access to                Feature Representation Based on a Neural Network for Author Profiling in Social
the golden standard for the test datasets would be required.                         Media Texts. Computational Intelligence and Neuroscience 2016 (October 2016),
                                                                                     13 pages. https://doi.org/10.1155/2016/1638936
                                                                                 [4] Andrey Ignatov, Liliya Akhtyamova, and John Cardiff. 2017. Twitter Author
6   CONCLUSIONS                                                                      Profiling Using Word Embeddings and Logistic Regression. In Working Notes
                                                                                     Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866.
We have presented the description of the five systems submitted by                   CLEF and CEUR-WS.org, Dublin, Ireland.
the CIC team to the 2017 PAN shared task on Gender Identification                [5] Don Kodiyan, Florin Hardegger, Stephan Neuhaus, and Mark Cieliebak. 2017.
                                                                                     Author Profiling with Bidirectional RNNs using Attention with GRUs. In Working
in Russian texts (RUSProfiling), four of them occupying the first four               Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings),
places in the official ranking [7]. The task focused on cross-genre                  Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.
author profiling (AP) scenario: the training corpus was composed                 [6] Tatiana Litvinova, Olga Litvinlova, Olga Zagorovskaya, Pavel Seredin, Aleksandr
                                                                                     Sboev, and Olga Romanchenko. 2016. “Ruspersonality”: A Russian Corpus
of tweets, while the provided test datasets were composed of offline                 for Authorship Profiling and Deception Detection. In Proceedings of the 2016
texts, Facebook posts, tweets, online reviews, and gender imitation                  International FRUCT Conference on Intelligence, Social Media and Web, ISMW-
                                                                                     FRUCT 2016. IEEE, St. Petersburg, Russia, 1–7.
texts.                                                                           [7] Tatiana Litvinova, Francisco Rangel, Paolo Rosso, Pavel Seredin, and Olga Litvi-
   Our systems, which were not tuned for a specific genre, showed                    nova. 2017. Overview of the RUSProfiling PAN at FIRE Track on Cross-genre
the highest accuracy on three out of five test datasets: Facebook                    Gender Identification in Russian. In Notebook Papers of FIRE 2017, FIRE 2017
                                                                                     (CEUR Workshop Proceedings). CEUR-WS.org, Bangalore, India.
posts, tweets, product and service online reviews, performing worse              [8] Tatiana Litvinova, Pavel Seredin, Olga Litvinova, Olga Zagorovskaya, Aleksandr
on two test datasets than more genre-specific systems, which were                    Sboev, Dmitry Gudovskih, Ivan Moloshnikov, and Roman Rybka. 2016. Gender
The Winning Approach at RUSProfiling 2017                                                                           FIRE’17, December 08-10, 2017, Bangalore, India


     Prediction for Authors of Russian Texts Using Regression and Classification Tech-              Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
     niques. In Proceedings of the 3rd Workshop on Concept Discovery in Unstructured                Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn:
     Data co-located with the 13th International Conference on Concept Lattices and                 Machine Learning in Python. Journal of Machine Learning Research 12 (November
     Their Applications, CDUD@CLA, Vol. 1625. CEUR-WS.org, 44–53.                                   2011), 2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195
 [9] A. Pastor Loṕez-Monroy, Manuel Montes-y-Goḿez, Hugo Jair-Escalante, Luis                [15] Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Gi-
     Villasenõr Pineda, and Thamar Solorio. 2017. Social-Media Users can be Profiled                acomo Inches. 2013. Overview of the Author Profiling Task at PAN 2013. In
     by their Similarity with other Users. In Working Notes Papers of the CLEF 2017                 Working Notes Papers of the CLEF 2013 Evaluation Labs (CEUR Workshop Proceed-
     Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org,                  ings). CLEF and CEUR-WS.org, Valencia, Spain, 23–26.
     Dublin, Ireland.                                                                          [16] Francisco Rangel, Paolo Rosso, Martin Potthast, and Benno Stein. 2017. Overview
[10] Ilia Markov, Helena Gómez-Adorno, Juan-Pablo Posadas-Durán, Grigori Sidorov,                   of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety
     and Alexander Gelbukh. 2017. Author Profiling with Doc2vec Neural Network-                     Identification in Twitter. In Working Notes Papers of the CLEF 2017 Evaluation
     Based Document Embeddings. In Proceedings of the 15th Mexican International                    Labs (CEUR Workshop Proceedings). CLEF and CEUR-WS.org, Dublin, Ireland.
     Conference on Artificial Intelligence, MICAI 2016, Vol. 10062. Part II, LNAI, Springer,   [17] Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Pot-
     Cancún, Mexico, 117–131.                                                                       thast, and Benno Stein. 2016. Overview of the 4th Author Profiling Task at
[11] Ilia Markov, Helena Gómez-Adorno, and Grigori Sidorov. 2017. Language- and                     PAN 2016: Cross-genre Evaluations. In Working Notes Papers of the CLEF 2016
     Subtask-Dependent Feature Selection and Classifier Parameter Tuning for Au-                    Evaluation Labs (CEUR Workshop Proceedings). CLEF and CEUR-WS.org, Évora,
     thor Profiling. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR                 Portugal.
     Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.                  [18] Nils Schaetti. 2017. UniNE at CLEF 2017: TF-IDF and Deep-Learning for Au-
[12] Ilia Markov, Efstathios Stamatatos, and Grigori Sidorov. 2017. Improving Cross-                thor Profiling. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR
     Topic Authorship Attribution: The Role of Pre-Processing. In Proceedings of the                Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.
     18th International Conference on Computational Linguistics and Intelligent Text           [19] Sebastian Sierra, Manuel Montes-y-Gómez, Thamar Solorio, and Fabio A.
     Processing, CICLing 2017. Springer, Budapest, Hungary.                                         González. 2017. Convolutional Neural Networks for Author Profiling. In Working
[13] Matej Martinc, Iza Škrjanec, Katja Zupan, and Senja Pollak. 2017. PAN 2017:                    Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings),
     Author Profiling - Gender and Language Variety Prediction. In Working Notes                    Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.
     Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866.           [20] Eric S. Tellez, Sabino Miranda-Jiménez, Mario Graff, and Daniela Moctezuma.
     CLEF and CEUR-WS.org, Dublin, Ireland.                                                         2017. Gender and Language-Variety Identification with microTC. In Working
[14] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,                          Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings),
     Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron                     Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.