The Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017 Ilia Markov, Helena Gómez-Adorno, Grigori Sidorov, Alexander Gelbukh CIC, Instituto Politécnico Nacional Mexico City, Mexico imarkov@nlp.cic.ipn.mx, helena.adorno@gmail.com, sidorov@cic.ipn.mx, www.gelbukh.com ABSTRACT genres: offline texts (such as a letter to a friend or a picture descrip- We present the CIC systems submitted to the 2017 PAN shared task tions), Facebook posts, tweets, product and service online reviews, on Cross-Genre Gender Identification in Russian texts (RUSPro- and gender imitation texts. filing). We submitted five systems. One of them was based on a Machine-learning methods are commonly used for the AP task. statistical approach using only lexical features, and other four on From the machine-learning perspective, the task is viewed as a machine-learning techniques using some combinations of gender- multi-class, single-label classification problem, in which automatic specific Russian grammatical features, word and character n-grams, methods are to assign class labels (e.g., male or female) to the and suffix n-grams. Our systems achieved the highest weighted text samples. Recently, deep-learning techniques [19], such as accuracy across all the test datasets, occupying the first four places character-, word-, and document-embedding approaches [10], have in the ranking. been used for the task; however, linear models still perform better, since they seem to be more robust in capturing stylistic information in the author’s writing. Therefore, we employ the commonly-used KEYWORDS linear machine-learning approaches, as well as propose a novel Author Profiling, Gender Identification, Cross-Genre, Social Media, statistical approach aiming to identify the gender of an author Russian, Machine Learning, Computational Linguistics basing on statistical analysis of lexical information. The paper is organized as follows. In Section 2, we discuss the related work. In Section 3, we provide some characteristics of the 1 INTRODUCTION datasets used in the RUSProfiling shared task 2017. In Section 4, we Author profiling (AP) is the task of identifying the author’s demo- describe the conducted experiments, providing the experimental graphics, such as age, gender, personality traits, or native language, settings for the submitted systems. In Section 5, we give the ob- basing on a sample of his or her writing. This task has numerous tained results and their evaluation. Finally, in Section 6 we draw practical applications in forensics, security, and marketing, to name some conclusions and point to possible directions of future work. just a few. For example, in forensics and terrorism prevention ap- plications, knowing the characteristics of the suspect can narrow 2 RELATED WORK down the search space for the author of a written threat; in mar- keting applications, this information can be important to predict a The PAN evaluation campaign has become one of the main plat- customer’s shopping preferences or develop new targeted products. forms for evaluation of AP approaches and methodologies. There The rapid growth of social media data available on the Internet have been various profiling aspects covered by PAN since 2013 [15], has significantly contributed to the increased interest in this task. including age, gender, personality traits, and language variety iden- This interest led to establishing of the annual PAN evaluation cam- tification, under both single- and cross-genre AP conditions. paign1 , which is considered one of the main fora on AP, authorship PAN 2017 [16] attracted 22 submissions. Most of the teams (in- attribution, plagiarism detection, and other tasks related to the cluding the top three systems) used traditional machine-learning study of authorship and characteristics of the author of a text. algorithms, such as SVM [9, 11, 20] or logistic regression [4, 13]. Recent trends in the field include cross-genre AP scenario [17], This edition can be characterized by the increased use of deep- that is, the setting when the training corpus consists of texts of one learning techniques [5, 18], in particular word and character em- genre, while the test set consists of texts of another genre. Cross- beddings [2, 4, 19], which are gaining popularity and achieving genre AP conditions better match the requirements of a real-life competitive, but still lower than the linear models, results for the scenario of forensic applications, when the available texts by the AP task. candidate authors can belong to genre and thematic area different Content-based and style-based features have been extensively from the texts under investigation. used in the previous editions of PAN. As content-based features, bag Following the recent trends in the field, the 2017 PAN shared task of words, word n-grams, slang words, locations, brand names, topic on Gender Identification in Russian texts (RUSProfiling) [7] pro- words, among others, were used by several teams. As style-based vided cross-genre AP scenario: the training corpus was composed features, character n-grams are the most popular feature type for of tweets, while the provided test datasets covered five different AP, other feature types include ratio of links, character flooding, typed character n-grams, emoticons, hashtags, and user mentions. Due to the scarcity of available training data, AP research in the 1 http://pan.webis.de Russian language has been limited. The first corpus in the Russian FIRE’17, December 08-10, 2017, Bangalore, India I. Markov et al. language annotated with the authors’ metadata information—the Table 1: RUSProfiling datasets statistics Ruspersonality corpus—was introduced by Litvinova et al. [6]. The corpus is composed of texts labeled with the author gender, age, No. of Words Words Chars Chars personality traits, native language, neuropsycological testing data, Dataset docs Avg. Std. Avg. Std. and educational level. The corpus also contains a subset of truthful and deceptive texts. At the time publication of [6], the corpus Training 600 1,216.16 731.61 7,736.20 4,674.29 contained over 1,850 documents. Test 1 370 277.75 109.83 1,650.54 639.44 Several experiments were carried out in order to illustrate the Test 2 228 1,096.60 164.41 6,900.66 1,106.27 usefulness of the Ruspersonality corpus [6, 8]. For gender identi- Test 3 400 729.22 686.23 4,672.79 4,426.54 fication, Litvinova et al. [6] used a range of context-independent Test 4 776 54.40 44.39 354.18 276.56 features such as part-of-speech (POS) tags, syntactic relations, ra- Test 5 94 272.92 157.07 1,685.84 945.69 tios of POS tags, punctuation marks, and emotion words. They also evaluated different machine-learning algorithms: gradient boosting, adaBoosting, random forest, SVM, ReLU, among others. The best Keeping in mind that test datasets are in another genre, we kept performance was obtained by ReLU (mean F1-score of 74%). only cyrillic characters (non-cyrillic characters along with punctua- tion marks were removed). We also performed lowercasing, which 3 DATASETS yielded slight improvement in accuracy. These pre-processing steps The focus of the RUSProfiling shared task 2017 is on cross-genre were applied in all our runs (in the context of this shared task, sys- gender identification. The organizers provided a training dataset tems are officially called runs). composed of tweets and five different test datasets on the following In all the runs based on machine-learning techniques, we used genres: Support Vector Machines (SVM) algorithm, which is considered Test 1: Offline texts (such as picture descriptions or letter to a among the best-performing classification algorithms for text cate- friend) from the Ruspersonality Corpus [6]. gorization tasks, including cross-genre AP scenario [17]. We used Test 2: Facebook posts. the liblinear scikit-learn [14] implementation of SVM with the OvR Test 3: Twitter messeges. multi-class strategy. We set the penalty hyper-parameter C to 100 Test 4: Product and service online reviews. basing on the evaluation results. In our experiments on the training Test 5: Gender imitation corpus, that is, women imitating men dataset, SVM showed higher performance than other classification and vice versa. algorithms we tried, such as random forest, logistic regression, Table 1 presents general statistics of the training and five test multinomial Naïve Bayes, LDA, and ensemble classifier. datasets. In the table, No. of docs stands for the number of docu- In our machine-learning approaches, we used two different im- ments in each dataset. The statistics of the average number (Avg.) of plementations of the term frequency–inverse document frequency words and characters per document, as well as standard deviation (tf-idf) weighting: the default scikit-learn implementation and tf- (Std.), were calculated after applying pre-processing steps, which idf with sublinear tf scaling, i.e., tf was replaced with 1 + log(tf). included lowercasing and removal of all non-cyrillic characters In our experiments on the training dataset, tf-idf systematically (punctuation marks were also removed). In terms of average num- outperformed other examined weighting schemes, such as binary, ber of words and characters, the Test 2 dataset is the most similar to tf, and log entropy. the training corpus. The main difference between the two datasets The configurations of the five runs of the CIC team are described is the standard deviation, which is larger in the training corpus. below. The Test 3 dataset is on the same genre as the training corpus, but it contains shorter documents, of 729.22 words on average. The Test 1 4.1 Run CIC-1 (machine learning) and Test 5 datasets have similar statistics in terms of the number Features Since in the Russian language singular forms of the of words and characters, but differ in the number of documents past tense verbs change by gender (singular masculine forms have (370 and 94, respectively). Finally, the Test 4 dataset contains the the ending -l “-l”, while an indicator of singular feminine forms is shortest documents, of 54.40 words on average. the ending -la “-la”), we used “word ending in -la” as a feature. Moreover, since the past tense reflexive verbs maintain the reflexive 4 EXPERIMENTAL SETTINGS ending -s~ “-s’ ”, we also used the feature “word ending in -las~” To evaluate our systems, we conducted experiments both on the “-las’ ”. We employed the features -la “-la” and -las~ “-las’ ” in provided training dataset under 10-fold cross-validation and using isolation, as well as in combination with the subject of the sentence 80%–20% dataset splitting, that is, we used 80% (480 documents) of if the subject was the first-person singular pronoun  “ya” and if the training dataset for training and 20% (120 documents) for eval- this subject was within the window of 6 words after, or 3 words uation. The splitting was balanced across the genders. Following before, the verb. This gave four additional composite features: the official evaluation metrics of the shared task, we measured the “ -la”, “ -las~”, “-la  ”, and “-las~  ” with the meaning such performance in terms of classification accuracy. as “I -edfeminine myself”, as in I dressed myself in a skirt. The window We applied several pre-processing steps before feature extraction. size (+6/−3) was selected based on grid search. Pre-processing has proved to be a useful strategy for author pro- In addition, since Russian adjectives agree with the pronouns filing [3, 11] and related tasks, such as authorship attribution [12]. in gender, we used the ending -a “-aya” (nominative feminine The Winning Approach at RUSProfiling 2017 FIRE’17, December 08-10, 2017, Bangalore, India singular form) in combination with the first person singular pro- Table 2: 10-fold cross-validation and 80%–20% train-test split noun  “ya” as feature if the pronoun was within the same +6/−3 results (accuracy) window as above. This gave two more features: “ -a ” and “-a  ”, with the meaning such as “I -feminine-singular-adjective ”, as in I am 10FCV No. of 80%–20% No. of a professor emerita. Run acc. features acc. features Additionally, we used the last three (cyrillic) characters of each CIC-1 0.8833 3,136 0.8583 2,922 word as features (suffix n-grams, n = 3), which, in particular, in- CIC-2 0.8550 19,139 0.8583 16,155 directly accounted for other grammatically meaningful endings CIC-3 0.7400 22,847 0.7417 19,353 such as “nyĭ” (hinting at masculine adjective, as in I am a professor CIC-4 0.8683 31,045 0.8500 27,222 emeritus). CIC-5 0.8683 22,625 0.8583 20,003 Frequency threshold Fine-tuning the size of the feature set has proved to be of a great importance in AP [11]. It allows to reduce significantly the size of the feature set and at the same time to Weighting scheme We used tf-idf weighting with sublinear tf improve the results in most cases. In this run, we selected only those scaling. features that occurred in at least two documents in the training corpus and occurred at least five times in the entire training corpus (min_df = 2; threshold = 5). 4.5 Run CIC-5 (machine learning) Weighting scheme Tf-idf weighting with sublinear tf scaling. Features Word unigrams, word 3-grams, and character n-grams (n = 2–4). 4.2 Run CIC-2 (machine learning) Frequency threshold In this run, we set a hight frequency thresh- Features Word features represent the lexical choice of a writer. old value: we selected only those features that occurred in at least These features have proved to be indicative of author’s gender in two documents in the training corpus and occurred at least 50 times other languages, such as English, Spanish, Portuguese, and Ara- in the entire training corpus (min_df = 2; threshold = 50). How- bic [16]. In this run, we used word unigram features (bag-of-words ever, setting this high frequency threshold values only marginally approach) in combination with the last three characters of each affected 10-fold cross-validation and 80%–20% accuracy, making it word (suffix 3-grams). very slightly higher or very slightly lower. Frequency threshold The threshold was the same as in the CIC-1 Weighting scheme Tf-idf with sublinear tf scaling. run. 5 RESULTS Weighting scheme Tf-idf weighting without sublinear tf scaling. The 10-fold cross-validation results, in terms of classification ac- 4.3 Run CIC-3 (statistical) curacy (acc.) for each run, as well as the results under 80%–20% dataset splitting, are shown in Table 2. For each experiment, the First, we labeled the words that occur in the training corpus as results for 10-fold cross-validation (10FCV ) and 80%–20% splitting, male’s or female’s, depending on whether the word was used (not as well as the number of features (No. of features), are provided. The counting repetitions) more frequently in male’s or female’s docu- best results for each evaluation procedure is highlighted in bold ments, except when the difference was less than 2. typeface. Next, for each document we calculated the ratio of such male’s to Our first run, which included gender-specific Russian grammati- female’s words (not counting repetitions). We labeled a document cal features, showed the highest 10-fold cross-validation accuracy as male’s if this ratio was above a threshold; otherwise, as female’s. with the smallest number of features. Three out of five of our Since the dataset was balanced, as the threshold we used the median runs (CIC-1, CIC-2, and CIC-5) showed the same accuracy under of the distribution of this ratio. 80%–20% splitting, probably due to small size of the dataset. Statis- We also experimented with taking repetitions of words into tical approach (run CIC-3) showed the lowest accuracy under both account, thresholds other than 2 for classifying words, as well as 10-fold cross-validation and 80%–20% setting, though, surprisingly, with some formulas other than ratio for classifying documents; it showed the best results on several of the final test datasets, as however, we observed a lower performance. shown in Table 3. We attribute this, again, to the small size of the datasets available for development. 4.4 Run CIC-4 (machine learning) A comparison of the participating systems, including the official Features Combination of word and character n-gram features ranking, is presented in [7]. We show the detailed results of our five usually provides good results for AP, for instance, a combination runs on the five test datasets, along with the highest result achieved of word and character n-grams was used by the best performing on each test set among all participating systems and the system system [1] at this year’s PAN shared task [16]. In this run, we used that showed this result, in Table 3. The best result on each test a combination of word unigrams with character n-grams (n = 2–3). dataset is highlighted in bold typeface. Avg. stands for the average Frequency threshold We selected only those features that oc- accuracy of each run across the five test datasets; if a system was curred in at least two documents in the training corpus and oc- not tested on some test set, we counted its accuracy on this test set curred at least four times in the entire training corpus (min_df = 2; as zero. Weighted stands for the accuracy weighted by the number threshold = 4). of documents in each test set (again, counting as zero if a system FIRE’17, December 08-10, 2017, Bangalore, India I. Markov et al. Table 3: Results for the five runs of the CIC team on the five test sets System Test 1 Test 2 Test 3 Test 4 Test 5 Avg. Weighted Norm. Best result 0.7838 0.9342 0.6825 0.6186 0.6596 0.6580 0.6456 0.9258 Best system Bits_Pilani-4 CIC-2 CIC-3 CIC-3 Bits_Pilani-5 CIC-1 CIC-3 CIC-3 CIC-1 0.5865 0.9211 0.6525 0.5979 0.5319 0.6580 0.6435 0.9154 CIC-2 0.5838 0.9342 0.6650 0.5709 0.5213 0.6550 0.6354 0.9014 CIC-3 0.6027 0.7851 0.6825 0.6186 0.5426 0.6463 0.6456 0.9258 CIC-4 0.4676 0.8860 0.5975 0.5116 0.5213 0.5968 0.5675 0.8047 CIC-5 0.4973 0.8991 0.6275 0.5258 0.5000 0.6099 0.5862 0.8313 CIC best rank 4th 1st 1st 1st 4th 1st 1st 1st was not evaluated on a test set); this was the measure used for the used only for some of the genres. Our first run based on a machine- official ranking. Norm. is similar to Weighted, but is normalized by learning approach using gender-specific Russian grammatical fea- the highest accuracy on each test set (note that this is not accuracy; tures showed the highest average accuracy across all the test datasets, it is the average closeness of the given system to the best system). while our statistical approach based on lexical features showed the As one can see from Table 3, none of the runs consistently out- best performance according to the weighted (official) and normal- performed other runs across all the test datasets. The Test 3 set ized evaluation. consisted of documents that were collections of various tweets of One of the directions for future work would be to examine in the same author, similarly to the training corpus, so it was not more detail the importance of morphological features for gender exactly cross-genre scenario, but the documents in the Test 3 set identification in Russian texts, as well as to improve our statistical contained fewer tweets than those of the training corpus. On this approach by automatically tuning the threshold value according to dataset, as well as on Test 4 with the shortest documents (online the size and genre of the test data. reviews), of our runs, the best performance was achieved by run CIC-3, which was based on the statistical approach. Test 2 (Face- ACKNOWLEDGMENTS book posts) was the only test set, on which our statistical approach (CIC-3) failed to produce good result. This work was partially supported by the Mexican Government Surprisingly, on the gender imitation corpus (Test 5), CIC-1 was (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20171813, our second-best run (after CIC-3), even though CIC-1 was based 20172008, and 20172044). on gender-specific Russian grammatical (morphological) features, such as the grammatical gender of verbs and adjectives, which in REFERENCES imitated text follow the patterns of the gender being imitated. [1] Angelo Basile, Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel Haagsma, Runs CIC-4 and CIC-5, in spite of showing similar 10-fold cross- and Malvina Nissim. 2017. N-GrAM: New Groningen Author-profiling Model. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop validation and 80%–20% accuracy, performed worse on the test Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. datasets than our first three runs. This can be due to the inclusion [2] Marc Franco-Salvador, Nataliia Plotnikova, Neha Pawar, and Yassine Benajiba. 2017. Subword-based Deep Averaging Networks for Author Profiling in Social of character n-grams, which probably caused overfitting. Another Media. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop reason for the relatively poor performance of CIC-5 could be the Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. too high frequency threshold value set for this run. [3] Helena Gómez-Adorno, Ilia Markov, Grigori Sidorov, Juan-Pablo Posadas-Durán, Miguel A. Sanchez-Perez, and Liliana Chanona-Hernandez. 2016. Improving For more in-depth analysis of the obtained results, the access to Feature Representation Based on a Neural Network for Author Profiling in Social the golden standard for the test datasets would be required. Media Texts. Computational Intelligence and Neuroscience 2016 (October 2016), 13 pages. https://doi.org/10.1155/2016/1638936 [4] Andrey Ignatov, Liliya Akhtyamova, and John Cardiff. 2017. Twitter Author 6 CONCLUSIONS Profiling Using Word Embeddings and Logistic Regression. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866. We have presented the description of the five systems submitted by CLEF and CEUR-WS.org, Dublin, Ireland. the CIC team to the 2017 PAN shared task on Gender Identification [5] Don Kodiyan, Florin Hardegger, Stephan Neuhaus, and Mark Cieliebak. 2017. Author Profiling with Bidirectional RNNs using Attention with GRUs. In Working in Russian texts (RUSProfiling), four of them occupying the first four Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), places in the official ranking [7]. The task focused on cross-genre Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. author profiling (AP) scenario: the training corpus was composed [6] Tatiana Litvinova, Olga Litvinlova, Olga Zagorovskaya, Pavel Seredin, Aleksandr Sboev, and Olga Romanchenko. 2016. “Ruspersonality”: A Russian Corpus of tweets, while the provided test datasets were composed of offline for Authorship Profiling and Deception Detection. In Proceedings of the 2016 texts, Facebook posts, tweets, online reviews, and gender imitation International FRUCT Conference on Intelligence, Social Media and Web, ISMW- FRUCT 2016. IEEE, St. Petersburg, Russia, 1–7. texts. [7] Tatiana Litvinova, Francisco Rangel, Paolo Rosso, Pavel Seredin, and Olga Litvi- Our systems, which were not tuned for a specific genre, showed nova. 2017. Overview of the RUSProfiling PAN at FIRE Track on Cross-genre the highest accuracy on three out of five test datasets: Facebook Gender Identification in Russian. In Notebook Papers of FIRE 2017, FIRE 2017 (CEUR Workshop Proceedings). CEUR-WS.org, Bangalore, India. posts, tweets, product and service online reviews, performing worse [8] Tatiana Litvinova, Pavel Seredin, Olga Litvinova, Olga Zagorovskaya, Aleksandr on two test datasets than more genre-specific systems, which were Sboev, Dmitry Gudovskih, Ivan Moloshnikov, and Roman Rybka. 2016. Gender The Winning Approach at RUSProfiling 2017 FIRE’17, December 08-10, 2017, Bangalore, India Prediction for Authors of Russian Texts Using Regression and Classification Tech- Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, niques. In Proceedings of the 3rd Workshop on Concept Discovery in Unstructured Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Data co-located with the 13th International Conference on Concept Lattices and Machine Learning in Python. Journal of Machine Learning Research 12 (November Their Applications, CDUD@CLA, Vol. 1625. CEUR-WS.org, 44–53. 2011), 2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195 [9] A. Pastor Loṕez-Monroy, Manuel Montes-y-Goḿez, Hugo Jair-Escalante, Luis [15] Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Gi- Villasenõr Pineda, and Thamar Solorio. 2017. Social-Media Users can be Profiled acomo Inches. 2013. Overview of the Author Profiling Task at PAN 2013. In by their Similarity with other Users. In Working Notes Papers of the CLEF 2017 Working Notes Papers of the CLEF 2013 Evaluation Labs (CEUR Workshop Proceed- Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org, ings). CLEF and CEUR-WS.org, Valencia, Spain, 23–26. Dublin, Ireland. [16] Francisco Rangel, Paolo Rosso, Martin Potthast, and Benno Stein. 2017. Overview [10] Ilia Markov, Helena Gómez-Adorno, Juan-Pablo Posadas-Durán, Grigori Sidorov, of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety and Alexander Gelbukh. 2017. Author Profiling with Doc2vec Neural Network- Identification in Twitter. In Working Notes Papers of the CLEF 2017 Evaluation Based Document Embeddings. In Proceedings of the 15th Mexican International Labs (CEUR Workshop Proceedings). CLEF and CEUR-WS.org, Dublin, Ireland. Conference on Artificial Intelligence, MICAI 2016, Vol. 10062. Part II, LNAI, Springer, [17] Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Pot- Cancún, Mexico, 117–131. thast, and Benno Stein. 2016. Overview of the 4th Author Profiling Task at [11] Ilia Markov, Helena Gómez-Adorno, and Grigori Sidorov. 2017. Language- and PAN 2016: Cross-genre Evaluations. In Working Notes Papers of the CLEF 2016 Subtask-Dependent Feature Selection and Classifier Parameter Tuning for Au- Evaluation Labs (CEUR Workshop Proceedings). CLEF and CEUR-WS.org, Évora, thor Profiling. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Portugal. Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. [18] Nils Schaetti. 2017. UniNE at CLEF 2017: TF-IDF and Deep-Learning for Au- [12] Ilia Markov, Efstathios Stamatatos, and Grigori Sidorov. 2017. Improving Cross- thor Profiling. In Working Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Topic Authorship Attribution: The Role of Pre-Processing. In Proceedings of the Workshop Proceedings), Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. 18th International Conference on Computational Linguistics and Intelligent Text [19] Sebastian Sierra, Manuel Montes-y-Gómez, Thamar Solorio, and Fabio A. Processing, CICLing 2017. Springer, Budapest, Hungary. González. 2017. Convolutional Neural Networks for Author Profiling. In Working [13] Matej Martinc, Iza Škrjanec, Katja Zupan, and Senja Pollak. 2017. PAN 2017: Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Author Profiling - Gender and Language Variety Prediction. In Working Notes Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland. Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Vol. 1866. [20] Eric S. Tellez, Sabino Miranda-Jiménez, Mario Graff, and Daniela Moctezuma. CLEF and CEUR-WS.org, Dublin, Ireland. 2017. Gender and Language-Variety Identification with microTC. In Working [14] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Notes Papers of the CLEF 2017 Evaluation Labs (CEUR Workshop Proceedings), Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Vol. 1866. CLEF and CEUR-WS.org, Dublin, Ireland.