Overview of the RusProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian Tatiana Litvinova Francisco Rangel Paolo Rosso RusProfiling Lab Autoritas Consulting PRHLT Research Center Russia Valencia, Spain Universitat Politècnica de centr_rus_yaz@mail.ru francisco.rangel@autoritas.es València, Spain prosso@dsic.upv.es Pavel Seredin Olga Litvinova RusProfiling Lab & RusProfiling Lab & Kurchatov Institute Kurchatov Institute Russia Russia paul@phys.vsu.ru olga_litvinova_teacher@mail.ru ABSTRACT are presented, and in Section 4 the obtained results are dis- Author profiling consists of predicting some author’s traits cussed. Finally, in Section 5 we draw some conclusions. (e.g. age, gender, personality) from her writing. After ad- dressing at PAN@CLEF1 mainly age and gender identifi- cation, in this RusProfiling PAN@FIRE track we have ad- 2. EVALUATION FRAMEWORK dressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twit- In this section we describe the construction of the cor- ter, the systems have been evaluated on five different genres pus, covering particular properties, challenges and novelties. (essays, Facebook, Twitter, reviews and texts where the au- Moreover, the evaluation measures are described. thors imitated the other gender, where the users change their idiostyle). In this paper, we analyse the 22 runs sent by 5 2.1 Corpus participant teams. The best results (although also the most In this section, we describe the datasets that have been sparse ones) have been obtained on Facebook. released for the tasks described in the previous section. We have designed these datasets using manual and automated techniques and made them available to participants through Keywords the task web page.3 author profiling; gender identification; cross-genre profiling; Russian; Twitter dataset: (500 users per gender) was split into training (300 users per gender) and testing datasets (200 users per gender). Annotating social media texts is what 1. INTRODUCTION makes designing such corpora particularly challenging. Some Author profiling involves predicting an author’s demo- researchers automatically built Twitter corpora while oth- graphics, personality traits, education and so on from her ers have solved this problem by using labor-intensive meth- writing, with gender identification being the most popular ods. For example, Rao et al. [14] use a focused search task [10, 8, 12, 13, 11, 2, 5, 6, 15, 16, 4]. Author profil- methodology followed by manual annotation to produce a ing tasks are popular among participants of PAN which is dataset of 500 English users labeled with gender. The gen- a series of scientific events and shared tasks on digital text der tag was ascribed based on the screen name, profile pic- forensics.2 Slavic languages, however, are less investigated ture, self-description (’bio’) and –in the few cases this was from an author profiling standpoint and have never been not sufficient– the use of gender markings when referring to addressed at PAN. themselves. For this research we used the same approach This year at FIRE we have introduced a PAN shared task with manual labeling for tweet author gender. For those on Cross-genre Gender Identification in Russian texts (Rus- cases where the gender information was not clear we dis- Profiling shared task) where we provided tweets as a training carded the user. Retweets were removed. dataset and Facebook posts, online reviews, texts describ- The number of tweets from one user varied from 1 to 200 ing images or letters to a friend, as well as tweets as test (depending on how active the users were at the time the datasets. The focus is especially on cross-genre gender pro- data was collected – September 2016). All tweets from one filing. user were merged together and considered as one text. As The rest of the overview paper is structured as follows. In the analysis suggests, the tweets contain a lot of non-original Section 2, we describe the construction of the corpus and the information (hashtags, hidden citations (e.g., newsfeeds that evaluation metrics. In Section 3, participants’ approaches are copied, etc.), hyperlinks, etc.), which makes it extremely challenging for them to be analyzed. 1 http://pan.webis.de/ 2 3 http://pan.webis.de/index.html http://en.rusprofilinglab.ru/rusprofiling-at-pan/korpus/ Facebook dataset: 228 users (114 authors per gender) of cal and psychological) imitation in texts. To the best of our different age groups (20+, 30+, 40+) from different Russian knowledge, this is the first corpus of this kind. Presently, cities were randomly chosen (to get minimum mutual friend- the corpus is being prepared to be made available on the ships). We used the same principals for gender labeling as RusProfiling Lab website. were used for Twitter. All posts from one user were merged In Table 1 a summary on the number of authors per into one text with average length of 1000 words. dataset is shown. As well as for collecting data from Twitter, Facebook pages of famous people involved in administration or gov- ernment or accounts of heads of major companies were not Table 1: Distribution of authors per dataset (half employed for the study. As the analysis show, in Russian per gender). Facebook texts there is less non-original information than Dataset Genre Number of authors on Twitter. Training Twitter 600 Essays dataset: 185 authors per gender, one or two texts Test Essays 370 per author (in case of two texts they were merged together Facebook 228 and considered as one text). The texts were taken randomly Twitter 400 from manually collected RusPersonality corpus [5]. RusPer- Reviews 776 sonality is the first Russian-language corpus of written texts Gender-imitated 94 labeled with data on their authors. A unique aspect of the corpus is the breadth of the metadata (gender, age, person- ality, neuropsychological testing data, education level, etc). 2.2 Performance measures The texts were written by respondents especially for this For evaluating what done in the previous approaches we corpus, do not contain any borrowings and are not edited. have used accuracy, following author profiling tasks at PAN. Topics of the texts were letter to a friend, picture descrip- In the RusProfiling shared task, we have calculated the accu- tion, letter to an employee trying to convince her to hire the racy per dataset as the number of authors correctly identified respondent. The average text length in this dataset was 150 divided by the total number of authors in this dataset. The words. global ranking has been obtained by calculating the average accuracy among all the datasets weighted by the number of Reviews dataset: 388 authors per gender, one text per documents in each dataset: author. The texts were collected from Trustpilot4 , the au- P thor’s gender was identified based on the profile information. ds accuracy(ds) ∗ size(ds)) The average text length was 80 words. global acc = P (1) ds size(ds) Gender-imitated dataset: 47 authors per gender, three texts from each author that were merged together and con- sidered as one text. The texts were randomly selected from 2.3 Baselines the existing corpus we have collected called Gender Imi- To understand the complexity of the task per genre and tation Corpus. The Gender Imitation Corpus is the first with the aim to compare the performances of the partic- Russian corpus for studies of stylistic deception. Each re- ipants approaches, we propose the following baselines, as spondent (n=142) was instructed to write three texts on well as we did at PAN at CLEF in 2017 [11]: the same topic (from a list). Let us provide an example of the task: ”Last summer you bought a package tour from a • majority. A statistical baseline that emulates random travel agency, but you were not at all pleased with your ex- choice. The baseline depends on the number of classes: perience with that company and the trip was not worth the two in case of gender identification. price. You are about to ask for a refund. Write three texts describing your negative experience providing a detailed ac- • bow. This method represents documents as a bag-of- count of it. Give a warning that you are intending to sue words with the 5,000 most common words in the train- the company”. The first text is supposed to be written in ing set, weighted by absolute frequency of occurrence, a way usual for whoever writes it (without any deception), and it uses SVM as machine learning algorithm. The the second one should be written as if by someone of the op- texts are preprocessed as follows: lowercase words, re- posite gender (”imitation”); the third one should be as if one moval of punctuation signs and numbers, and removal by another individual of the same gender so that her per- of stop words for the corresponding language. sonal writing style will not be recognized (what is referred • LDR [9]. This method represents documents on the to as ”obfuscation”). Most of the texts are 80-150 words basis of the probability distribution of occurrence of long. All of the respondents are students of Russian univer- their words in the different classes. The key concept sities. Besides the texts, the corpus includes metadata with of LDR is a weight, representing the probability of a the authors’ characteristics: gender, age, native language, term to belong to one of the different categories (e.g. handedness, psychological gender (femininity/masculinity). female vs. male). The distribution of weights for a Therefore, the corpus provides countless opportunities for given document should be closer to the weights of its investigating problems arising in imitating properties of the corresponding category. LDR takes advantage of the written speech in different aspects as well as gender (biologi- whole vocabulary. 4 https://ru.trustpilot.com/ 3. OVERVIEW OF THE SUBMITTED AP- Table 3: Number of participants’ runs per dataset. PROACHES Dataset Number of runs Following, we briefly describe the systems submitted by the five participants of the task, from three perspectives: Essays 18 preprocessing, features to represent the authors’ texts and Facebook 19 classification approaches. In Table 3 the teams and the cor- Twitter 18 responding references are presented. Reviews 19 Imitated 19 Total 93 Table 2: Participating teams and their references. Team Author AmritaNLP [18] The distribution of the results per dataset is shown in BITS Pilani [1] Figure 1. It is noteworthy the highest accuracy obtained on CIC [7] Facebook, with the median value about 75% and the high- DUBL [17] est one over 90%. However, results on this genre are the RBG [3] most sparse ones, with a standard deviation of 0.16. On the other hand, results on the gender-imitated corpus are the lowest ones, with most of the participants obtaining accu- racies close to 50%, that would correspond to the majority Preprocessing. Preprocessing was carried out to obtain class baseline. However, there were two participants who plain text [1]. Various participants removed stopwords [1, obtained results about 65%. In the following subsections we 17], short words [17] and Twitter specific elements (user analyse the results per dataset more in depth. mentions, hashtags and links) [1, 17]. Some of them also re- moved punctuation marks [7, 1] as well as numbers [1], and the authors in [7] removed non-cyrillic characters. Finally, lemmatisation has been performed by the authors in [17]. Features. Traditionally, author profiling tasks have been approached with content and style-based features. In this vein, the authors in [18] extracted features such as the num- ber of user mentions, hashtags and urls, emoticons, punctu- ation marks, and average word length, combined with tf-idf bag-of-words. Similarly, the authors in [7] combined dif- ferent kinds of features in their systems such as word and character n-grams, words most frequently used per gender, linguistic patterns such as word endings or the use of first person singular pronouns within a distance to a verb in past tense. The mentioned linguistic rule has been combined with deep learning techniques in [1]. Finally, the authors in [17] Figure 1: Distribution of results for gender identifi- performed topic modelling and the authors in [3] developed cation in the different datasets. a representation scheme based on the texts belonging to the corresponding target classes. Classification Approaches. Traditional features have 4.1 Essays been used with machine learning methods such as Support Results on the essays dataset (Table 4) set forth an av- Vector Machines (SVM) [18, 7, 3], Random Forest [18] and erage accuracy of 55.39%, a median of 54.86% and a total AdaBoost [18]. The authors in [17] used Additive Regular- of seven runs below the majority class and bow baselines. ization for Topic Modelling. Finally, the authors in [1], who Apart from these low results, there are four runs improv- combined a rule-based approach with deep learning, have ing in more than 10% this baseline, with accuracies between used variations of Long-Short Term Memory networks. 60.27% and 78.38%. The best result (78.38%) has been obtained by Bits Pilani, who combined linguistic rules with deep learning techniques. The second best result (68.11%) has been obtained by Am- 4. EVALUATION AND DISCUSSION OF THE ritaNLP, who used stylistic features with traditional ma- chine learning algorithms. As can be seen, the first result is SUBMITTED APPROACHES more than 10% higher than the second one, and about 23% Due to the cross-genre perspective of the task, five datasets higher than the average, showing the power of deep learn- were provided. Five teams submitted a total of 22 runs, ing in this task when training on Twitter and evaluating on whose distribution per dataset is shown in Table 3. As can essays. However, none of these systems overcame the LDR be seen, a total of 93 runs have been analysed, with 18-19 baseline (81.41%), that obtained a performance that was 3% runs per dataset. and 13% higher, respectively. In Table 5 the results on the Facebook dataset are shown. Table 4: Accuracy in gender identification in essays. Both the average value (71.19%), the median (75%), the Q3 Ranking Team Run Accuracy (86.19%) and the best value (93.42%) are the highest of all LDR 0.8141 datasets. Indeed, they are even higher than the obtained on 1 Bits Pilani 4 0.7838 the Twitter dataset (shown in Table 6). However, the sys- 2 AmritaNLP 3 0.6811 tems behaved in a heterogeneous way among datasets, ob- 3 dubl 4 0.6297 taining the most sparse results with an inter-quartile range 4 CIC 3 0.6027 of 34.44%. The reason is due to five runs equal or below the 5 AmritaNLP 2 0.5973 majority baseline, and another run from the same partici- 6 CIC 1 0.5865 7 CIC 2 0.5838 pant very close to 50%. Furthermore, 12 systems performed 8 dubl 1 0.5486 worst than the bow baseline, that obtained an accuracy of 9 dubl 2 0.5486 76.32%, even higher than the mean (71.19%) and the median 10 dubl 3 0.5486 (75%). 11 AmritaNLP 1 0.5243 The four best results have been obtained by CIC, that bow 0.5027 trained SVMs with combinations of n-grams and linguistic majority 0.5000 rules, among others. The fifth and sixth best results have 12 RBG 4 0.5000 been obtained by BITS Pilani with linguistic rules combined 13 CIC 5 0.4973 with deep learning. The best runs obtained a better perfor- 14 RBG 2 0.4919 15 CIC 4 0.4676 mance than the LDR baseline of 2% and 12%, respectively. 16 RBG 1 0.4595 In this case, although the deep learning techniques obtained 17 RBG 3 0.4595 good results, they are more than 5% lower than traditional 18 RBG 5 0.4595 approaches. Min 0.4595 4.3 Twitter Q1 0.4933 Median 0.5486 Mean 0.5539 SDev 0.0861 Table 6: Accuracy in gender identification in Twit- Q3 0.5946 ter. Max 0.7838 Ranking Team Run Accuracy 1 CIC 3 0.6825 4.2 Facebook LDR 0.6759 2 CIC 2 0.6650 3 Bits Pilani 4 0.6525 Table 5: Accuracy in gender identification in Face- 4 CIC 1 0.6525 book. 5 dubl 3 0.6300 6 CIC 5 0.6275 Ranking Team Run Accuracy 7 dubl 4 0.6275 1 CIC 2 0.9342 8 AmritaNLP 3 0.6175 2 CIC 1 0.9211 9 dubl 2 0.6125 3 CIC 5 0.8991 10 AmritaNLP 2 0.6100 4 CIC 4 0.8860 11 CIC 4 0.5975 5 Bits Pilani 5 0.8728 12 AmritaNLP 1 0.5700 LDR 0.8596 13 Bits Pilani 2 0.5400 6 Bits Pilani 3 0.8509 14 RBG 2 0.5125 7 CIC 3 0.7851 majority 0.5000 bow 0.7632 15 RBG 4 0.5000 8 dubl 3 0.7588 bow 0.4937 9 dubl 2 0.7544 16 RBG 1 0.4650 10 dubl 4 0.7500 17 RBG 3 0.4550 11 AmritaNLP 1 0.7456 18 RBG 5 0.4000 12 AmritaNLP 2 0.7237 Min 0.4000 13 AmritaNLP 3 0.6228 Q1 0.5194 14 RBG 2 0.5351 Median 0.6112 majority 0.5000 Mean 0.5787 15 RBG 3 0.5000 SDev 0.0815 16 RBG 4 0.5000 Q3 0.6294 17 RBG 5 0.5000 Max 0.6825 18 RBG 1 0.4956 19 Bits Pilani 2 0.4912 Min 0.4912 The results obtained on the Twitter dataset are shown Q1 0.5175 in Table 6. The two best results (68.25%, 66.50%) have Median 0.7500 been obtained by CIC team, with the next result tied with Mean 0.7119 BITS Pilani (65.25%). These results are very similar to the SDev 0.1642 one obtained by the LDR baseline (67.59%). The average Q3 0.8619 result falls down to 57.87%, below the median of 61.12%, Max 0.9342 due to the low results obtained by most of the runs sent by RBG team. In this vein, it is noteworthy to see that the 4.5 Gender Imitation results are below the majority baseline obtained by the bow In the gender-imitated corpus, the authors were asked to baseline (49.37%). write the texts as if they were of the other gender or obfus- Although the results on the Twitter dataset were expected cating their style, besides texts without imitation. In Table 8 to be the highest ones, they are much lower than the ob- the results of the gender identification task on this genre are tained on the Facebook dataset. In Facebook, besides main- shown. The average and median accuracies obtained by the taining the spontaneity of Twitter, posts use to be longer systems on this dataset are the lowest (51.90% and 50% re- and grammatically richer, with fewer syntactic errors and spectively). Most participants obtained accuracies close to misspellings. This may be the cause of the increase in ac- the majority class and the bow baseline: 11 teams with an curacy. Furthermore, although the mean is higher, the best accuracy equal or lower than 50% and 6 teams with less than result in Twitter (68.25%) is 10% lower than the obtained 5% of improvement. Only two runs of Bits Pilani team ob- in the essays dataset (78.38%). tained a significant improvement of 13% and 15% over the majority class. This team combined linguistic rules with 4.4 Reviews deep learning techniques, showing the robustness of these techniques when the authors imitate the other gender and Results on the reviews dataset (Table 7) are lower than style. In this vein, we should highlight that LDR baseline on the previous datasets although with lowest sparsity: most (55.32%), AmritaNLP (54.26%) and CIC (54.26%), that ob- of the participants obtained results close to the average and tained similar results among them, performed about 10% median (52.87% and 52.06% respectively). As can be ob- worst than the aforementioned deep learning techniques. served, these results are very close to the majority class (50%) and the bow baseline (50%), with five runs equal or below, and nine runs with less than a 5% of improvement. These low results expose the difficulty of the task on this Table 8: Accuracy in gender identification in gender- genre when the training data comes from Twitter. imitated texts. The best results have been achieved by CIC (61.86% and Ranking Team Run Accuracy 59.79%) and Bits Pilani (57.86% and 57.73%) teams, such 1 Bits Pilani 5 0.6596 as in the previous datasets (although about 4% lower than 2 Bits Pilani 3 0.6383 the 65.81% obtained by the LDR baseline). However, the LDR 0.5532 difference is more than 7% in case of Twitter, 17% in case 3 AmritaNLP 1 0.5426 of essays and 30% in case of Facebook. 4 CIC 3 0.5426 5 CIC 1 0.5319 6 CIC 2 0.5213 7 CIC 4 0.5213 8 Bits Pilani 1 0.5106 Table 7: Accuracy in gender identification in re- majority 0.5000 views. bow 0.5000 Ranking Team Run Accuracy 9 CIC 5 0.5000 10 dubl 2 0.5000 LDR 0.6581 11 dubl 3 0.5000 1 CIC 3 0.6186 12 dubl 4 0.5000 2 CIC 1 0.5979 13 RBG 1 0.5000 3 Bits Pilani 5 0.5786 14 RBG 3 0.5000 4 Bits Pilani 4 0.5773 15 RBG 4 0.5000 5 CIC 2 0.5709 16 RBG 5 0.5000 6 AmritaNLP 1 0.5412 17 RBG 2 0.4894 7 AmritaNLP 3 0.5296 18 AmritaNLP 2 0.4574 8 CIC 5 0.5258 19 AmritaNLP 3 0.4468 9 RBG 2 0.5232 10 RBG 4 0.5206 Min 0.4468 11 AmritaNLP 2 0.5155 Q1 0.5000 12 Bits Pilani 2 0.5142 Median 0.5000 13 CIC 4 0.5116 Mean 0.5190 14 RBG 3 0.5013 SDev 0.0517 majority 0.5000 Q3 0.5266 bow 0.5000 Max 0.6596 15 RBG 1 0.5000 16 RBG 5 0.5000 17 dubl 3 0.4794 4.6 Global Ranking 18 dubl 2 0.4755 The global ranking shown in Table 9 has been calculated 19 dubl 4 0.4639 following Formula 1. It is noteworthy that most participants Min 0.4639 obtained a weighted accuracy between 47% and 57%, with Q1 0.5007 a median of 54.42%. That means that most of the partici- Median 0.5206 pants obtained results close to the majority class (50%) and Mean 0.5287 the bow baseline (53.13%). There are also three runs that SDev 0.0424 obtained results much lower than the majority class due to Q3 0.5561 their participation only on some datasets. Max 0.6186 At the top of the ranking, we can highlight that the CIC team obtained the best first four results, with accuracies on the genre, these approaches performed the best, such ranging from 58.62% to 64.56%, showing the robustness and as in case of essays or the gender-imitated texts where they homogeneity of their approach. However, it should be high- obtained more than 10% of improvement over the traditional lighted that, as Bits Pilani runs different systems on the ones. different datasets, although they obtained one of the bests Contrary to what was expected, the best results have not results in each of them, a fair comparison has not been possi- been achieved in Twitter but in Facebook. The reason may ble. For example, run 4 obtained 78.38% accuracy on essays be that, although Facebook maintains the spontaneity of (more than 10% than the next one), was not run neither Twitter, their posts use to be longer and grammatically on Facebook nor on gender-imitated sets, where the overall richer, with fewer syntactic errors and misspellings. On the accuracy was lower. It is worth to mention that none of the other hand, almost the worst results have been obtained on systems outperformed the LDR baseline (71.21%), that ob- reviews. Similar cross-genre effects were also observed at tained a 6.65% better performance with respect to the best PAN-2014 [8]. system. In case of the gender-imitated texts, most systems failed, with 11 runs equal or below the majority baseline, and 6 runs with less than 5% of improvement. Only two systems Table 9: Global ranking by averaging the accuracies of Bits Pilani obtained results with more than 10% of im- on the different datasets, weighting by the size of provement over the baseline. In this more difficult scenario, the dataset. the deep learning approaches showed their superiority over Ranking Team Run Accuracy traditional approaches. LDR 0.7121 1 CIC 3 0.6456 6. ACKNOWLEDGMENTS 2 CIC 1 0.6435 3 CIC 2 0.6354 This work was supported in part of creation of Gender Im- 4 CIC 5 0.5862 itation Corpus by the Russian Science Foundation, project 5 AmritaNLP 3 0.5857 No. 16-18-10050 ”Identifying the Gender and Age of Online 6 AmritaNLP 2 0.5744 Chatters Using Formal Parameters of their Texts”. Texts 7 AmritaNLP 1 0.5691 with style obfuscation were collected in the framework of the 8 dubl 4 0.5685 project ”Lie Detection in a Written Text: A Corpus Study” 9 CIC 4 0.5675 supported by the Russian Foundation for Basic Research N 10 dubl 3 0.5605 15-34-01221. The third author acknowledges the SomEM- 11 dubl 2 0.5546 12 Bits Pilani 4 0.5337 BED TIN2015-71147-C2-1-P MINECO research project. bow 0.5313 13 RBG 2 0.5145 7. REFERENCES majority 0.5000 [1] R. Bhargava, G. Goel, A. Shah, and Y. Sharma. 14 RBG 4 0.5086 15 RBG 1 0.4839 Gender identification in russian texts. In Working 16 RBG 3 0.4829 Notes for PAN-RUSProfiling at FIRE’17. Workshops 17 RBG 5 0.4706 Proceedings of the 9th International Forum for 18 Bits Pilani 2 0.3881 Information Retrieval Evaluation (Fire’17), 19 Bits Pilani 5 0.3790 Bangalore, India. CEUR-WS.org, 2017. 20 Bits Pilani 3 0.1344 [2] F. Celli, B. Lepri, J.-I. Biel, D. Gatica-Perez, 21 dubl 1 0.1065 G. Riccardi, and F. Pianesi. The workshop on 22 Bits Pilani 1 0.0236 computational personality recognition 2014. In Min 0.0236 Proceedings of the ACM International Conference on Q1 0.4737 Multimedia, pages 1245–1246. ACM, 2014. Median 0.5442 [3] B. Ganesh HB, A. Kumar M, and S. KP. Mean 0.4780 SDev 0.1740 Representation of target classes for text classification - Q3 0.5731 amrita cen nlp@rusprofiling pan 2017. In Working Max 0.6456 Notes for PAN-RUSProfiling at FIRE’17. Workshops Proceedings of the 9th International Forum for Information Retrieval Evaluation (Fire’17), 5. CONCLUSION Bangalore, India. CEUR-WS.org, 2017. This paper describes the 22 systems sent by 5 partici- [4] T. Litvinova, D. Gudovskikh, A. Sboev, P. Seredin, pants to the RusProfiling shared task at PAN-FIRE 2017. O. Litvinova, D. Pisarevskaya, and P. Rosso. Author Participants submitted a total of 93 runs on the five differ- gender prediction in russian social media texts. In ent datasets, with 18-19 runs per each dataset. They had to Conf. on Analysis of Images, Social networks, and address the identification of the author’s gender from a cross- Texts, AIST-2017. genre perspective: given a training set of Twitter data, the [5] T. Litvinova, O. Litvinlova, O. Zagorovskaya, systems have been evaluated on five different sets (essays, P. Seredin, A. Sboev, and O. Romanchenko. ” Facebook, Twitter, reviews and gender-imitated texts). ruspersonality”: A russian corpus for authorship Participants have used different kinds of approaches, from profiling and deception detection. In Intelligence, traditional ones based on hand-crafted features and machine Social Media and Web (ISMW FRUCT), 2016 learning techniques such as Support Vector Machines, to the International FRUCT Conference on, pages 1–7. nowadays fashionable deep learning techniques. Depending IEEE, 2016. [6] T. Litvinova, P. Seredin, O. Litvinova, Computational Intelligence (CSCI), 2016 International O. Zagorovskaya, A. Sboev, D. Gudovskih, Conference on, pages 1101–1106. IEEE, 2016. I. Moloshnikov, and R. Rybka. Gender prediction for [17] G. Skitalinskaya, L. Akhtyamova, and J. Cardiff. authors of russian texts using regression and Cross-genre gender identification in russian texts using classification techniques. In CDUD@ CLA, pages topic modeling working note: Team dubl. In Working 44–53, 2016. Notes for PAN-RUSProfiling at FIRE’17. Workshops [7] I. Markov, H. Gomez-Adorno, G. Sidorov, and Proceedings of the 9th International Forum for A. Gelbukh. The winning approach to cross-genre Information Retrieval Evaluation (Fire’17), gender identification in russian at rusprofiling 2017. In Bangalore, India. CEUR-WS.org, 2017. Working Notes for PAN-RUSProfiling at FIRE’17. [18] V. Vinayan, N. J.R., H. NB, A. Kumar M, and Workshops Proceedings of the 9th International Forum S. K P. Amritanlp@pan-rusprofiling: Author profiling for Information Retrieval Evaluation (Fire’17), using machine learning techniques. In Working Notes Bangalore, India. CEUR-WS.org, 2017. for PAN-RUSProfiling at FIRE’17. Workshops [8] F. Rangel, P. Rosso, I. Chugur, M. Potthast, Proceedings of the 9th International Forum for M. Trenkmann, B. Stein, B. Verhoeven, and Information Retrieval Evaluation (Fire’17), W. Daelemans. Overview of the 2nd author profiling Bangalore, India. CEUR-WS.org, 2017. task at pan 2014. In Cappellato L., Ferro N., Halvey M., Kraaij W. (Eds.) CLEF 2014 labs and workshops, notebook papers. CEUR-WS.org, vol. 1180, 2014. [9] F. Rangel, P. Rosso, and M. Franco-Salvador. A low dimensionality representation for language variety identification. In 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing. Springer-Verlag, LNCS, arXiv:1705.10754, 2016. [10] F. Rangel, P. Rosso, M. Moshe Koppel, E. Stamatatos, and G. Inches. Overview of the author profiling task at pan 2013. In Forner P., Navigli R., Tufis D. (Eds.), CLEF 2013 labs and workshops, notebook papers. CEUR-WS.org, vol. 1179, 2013. [11] F. Rangel, P. Rosso, M. Potthast, and B. Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In Working Notes Papers of the CLEF 2017 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org, Sept. 2017. [12] F. Rangel, P. Rosso, M. Potthast, B. Stein, and W. Daelemans. Overview of the 3rd author profiling task at pan 2015. In Cappellato L., Ferro N., Jones G., San Juan E. (Eds.) CLEF 2015 labs and workshops, notebook papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1391, 2015. [13] F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans, M. Potthast, and B. Stein. Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org, Sept. 2016. [14] D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 37–44. ACM, 2010. [15] A. Sboev, T. Litvinova, D. Gudovskikh, R. Rybka, and I. Moloshnikov. Machine learning models of text categorization by author gender using topic-independent features. Procedia Computer Science, 101:135–142, 2016. [16] A. Sboev, T. Litvinova, I. Voronina, D. Gudovskikh, and R. Rybka. Deep learning network models to categorize texts according to author’s gender and to identify text sentiment. In Computational Science and