Overview of the RusProfiling PAN at FIRE Track on
             Cross-genre Gender Identification in Russian

                 Tatiana Litvinova                   Francisco Rangel                         Paolo Rosso
                  RusProfiling Lab                    Autoritas Consulting    PRHLT Research Center
                      Russia                            Valencia, Spain       Universitat Politècnica de
             centr_rus_yaz@mail.ru              francisco.rangel@autoritas.es     València, Spain
                                                                                      prosso@dsic.upv.es
                                      Pavel Seredin                        Olga Litvinova
                                     RusProfiling Lab &                   RusProfiling Lab &
                                     Kurchatov Institute                  Kurchatov Institute
                                          Russia                               Russia
                                  paul@phys.vsu.ru                olga_litvinova_teacher@mail.ru

ABSTRACT                                                           are presented, and in Section 4 the obtained results are dis-
Author profiling consists of predicting some author’s traits       cussed. Finally, in Section 5 we draw some conclusions.
(e.g. age, gender, personality) from her writing. After ad-
dressing at PAN@CLEF1 mainly age and gender identifi-
cation, in this RusProfiling PAN@FIRE track we have ad-            2.      EVALUATION FRAMEWORK
dressed the problem of predicting author’s gender in Russian
from a cross-genre perspective: given a training set on Twit-        In this section we describe the construction of the cor-
ter, the systems have been evaluated on five different genres      pus, covering particular properties, challenges and novelties.
(essays, Facebook, Twitter, reviews and texts where the au-        Moreover, the evaluation measures are described.
thors imitated the other gender, where the users change their
idiostyle). In this paper, we analyse the 22 runs sent by 5        2.1      Corpus
participant teams. The best results (although also the most          In this section, we describe the datasets that have been
sparse ones) have been obtained on Facebook.                       released for the tasks described in the previous section. We
                                                                   have designed these datasets using manual and automated
                                                                   techniques and made them available to participants through
Keywords                                                           the task web page.3
author profiling; gender identification; cross-genre profiling;
Russian;                                                             Twitter dataset: (500 users per gender) was split into
                                                                   training (300 users per gender) and testing datasets (200
                                                                   users per gender). Annotating social media texts is what
1.     INTRODUCTION                                                makes designing such corpora particularly challenging. Some
    Author profiling involves predicting an author’s demo-         researchers automatically built Twitter corpora while oth-
graphics, personality traits, education and so on from her         ers have solved this problem by using labor-intensive meth-
writing, with gender identification being the most popular         ods. For example, Rao et al. [14] use a focused search
task [10, 8, 12, 13, 11, 2, 5, 6, 15, 16, 4]. Author profil-       methodology followed by manual annotation to produce a
ing tasks are popular among participants of PAN which is           dataset of 500 English users labeled with gender. The gen-
a series of scientific events and shared tasks on digital text     der tag was ascribed based on the screen name, profile pic-
forensics.2 Slavic languages, however, are less investigated       ture, self-description (’bio’) and –in the few cases this was
from an author profiling standpoint and have never been            not sufficient– the use of gender markings when referring to
addressed at PAN.                                                  themselves. For this research we used the same approach
    This year at FIRE we have introduced a PAN shared task         with manual labeling for tweet author gender. For those
on Cross-genre Gender Identification in Russian texts (Rus-        cases where the gender information was not clear we dis-
Profiling shared task) where we provided tweets as a training      carded the user. Retweets were removed.
dataset and Facebook posts, online reviews, texts describ-           The number of tweets from one user varied from 1 to 200
ing images or letters to a friend, as well as tweets as test       (depending on how active the users were at the time the
datasets. The focus is especially on cross-genre gender pro-       data was collected – September 2016). All tweets from one
filing.                                                            user were merged together and considered as one text. As
    The rest of the overview paper is structured as follows. In    the analysis suggests, the tweets contain a lot of non-original
Section 2, we describe the construction of the corpus and the      information (hashtags, hidden citations (e.g., newsfeeds that
evaluation metrics. In Section 3, participants’ approaches         are copied, etc.), hyperlinks, etc.), which makes it extremely
                                                                   challenging for them to be analyzed.
1
    http://pan.webis.de/
2                                                                  3
    http://pan.webis.de/index.html                                     http://en.rusprofilinglab.ru/rusprofiling-at-pan/korpus/
   Facebook dataset: 228 users (114 authors per gender) of        cal and psychological) imitation in texts. To the best of our
different age groups (20+, 30+, 40+) from different Russian       knowledge, this is the first corpus of this kind. Presently,
cities were randomly chosen (to get minimum mutual friend-        the corpus is being prepared to be made available on the
ships). We used the same principals for gender labeling as        RusProfiling Lab website.
were used for Twitter. All posts from one user were merged          In Table 1 a summary on the number of authors per
into one text with average length of 1000 words.                  dataset is shown.
   As well as for collecting data from Twitter, Facebook
pages of famous people involved in administration or gov-
ernment or accounts of heads of major companies were not          Table 1: Distribution of authors per dataset (half
employed for the study. As the analysis show, in Russian          per gender).
Facebook texts there is less non-original information than              Dataset       Genre                   Number of authors
on Twitter.
                                                                        Training      Twitter                           600
   Essays dataset: 185 authors per gender, one or two texts             Test          Essays                            370
per author (in case of two texts they were merged together                            Facebook                          228
and considered as one text). The texts were taken randomly                            Twitter                           400
from manually collected RusPersonality corpus [5]. RusPer-                            Reviews                           776
sonality is the first Russian-language corpus of written texts                        Gender-imitated                    94
labeled with data on their authors. A unique aspect of the
corpus is the breadth of the metadata (gender, age, person-
ality, neuropsychological testing data, education level, etc).
                                                                  2.2     Performance measures
The texts were written by respondents especially for this           For evaluating what done in the previous approaches we
corpus, do not contain any borrowings and are not edited.         have used accuracy, following author profiling tasks at PAN.
Topics of the texts were letter to a friend, picture descrip-     In the RusProfiling shared task, we have calculated the accu-
tion, letter to an employee trying to convince her to hire the    racy per dataset as the number of authors correctly identified
respondent. The average text length in this dataset was 150       divided by the total number of authors in this dataset. The
words.                                                            global ranking has been obtained by calculating the average
                                                                  accuracy among all the datasets weighted by the number of
  Reviews dataset: 388 authors per gender, one text per           documents in each dataset:
author. The texts were collected from Trustpilot4 , the au-                                   P
thor’s gender was identified based on the profile information.                                    ds accuracy(ds) ∗ size(ds))
The average text length was 80 words.                                          global acc =             P                         (1)
                                                                                                          ds size(ds)

   Gender-imitated dataset: 47 authors per gender, three
texts from each author that were merged together and con-
sidered as one text. The texts were randomly selected from        2.3     Baselines
the existing corpus we have collected called Gender Imi-            To understand the complexity of the task per genre and
tation Corpus. The Gender Imitation Corpus is the first           with the aim to compare the performances of the partic-
Russian corpus for studies of stylistic deception. Each re-       ipants approaches, we propose the following baselines, as
spondent (n=142) was instructed to write three texts on           well as we did at PAN at CLEF in 2017 [11]:
the same topic (from a list). Let us provide an example of
the task: ”Last summer you bought a package tour from a              • majority. A statistical baseline that emulates random
travel agency, but you were not at all pleased with your ex-           choice. The baseline depends on the number of classes:
perience with that company and the trip was not worth the              two in case of gender identification.
price. You are about to ask for a refund. Write three texts
describing your negative experience providing a detailed ac-         • bow. This method represents documents as a bag-of-
count of it. Give a warning that you are intending to sue              words with the 5,000 most common words in the train-
the company”. The first text is supposed to be written in              ing set, weighted by absolute frequency of occurrence,
a way usual for whoever writes it (without any deception),             and it uses SVM as machine learning algorithm. The
the second one should be written as if by someone of the op-           texts are preprocessed as follows: lowercase words, re-
posite gender (”imitation”); the third one should be as if one         moval of punctuation signs and numbers, and removal
by another individual of the same gender so that her per-              of stop words for the corresponding language.
sonal writing style will not be recognized (what is referred         • LDR [9]. This method represents documents on the
to as ”obfuscation”). Most of the texts are 80-150 words               basis of the probability distribution of occurrence of
long. All of the respondents are students of Russian univer-           their words in the different classes. The key concept
sities. Besides the texts, the corpus includes metadata with           of LDR is a weight, representing the probability of a
the authors’ characteristics: gender, age, native language,            term to belong to one of the different categories (e.g.
handedness, psychological gender (femininity/masculinity).             female vs. male). The distribution of weights for a
Therefore, the corpus provides countless opportunities for             given document should be closer to the weights of its
investigating problems arising in imitating properties of the          corresponding category. LDR takes advantage of the
written speech in different aspects as well as gender (biologi-        whole vocabulary.

4
    https://ru.trustpilot.com/
3.   OVERVIEW OF THE SUBMITTED AP-
                                                                  Table 3: Number of participants’ runs per dataset.
     PROACHES                                                                Dataset     Number of runs
   Following, we briefly describe the systems submitted by
the five participants of the task, from three perspectives:                      Essays               18
preprocessing, features to represent the authors’ texts and                      Facebook             19
classification approaches. In Table 3 the teams and the cor-                     Twitter              18
responding references are presented.                                             Reviews              19
                                                                                 Imitated             19
                                                                                 Total                93

Table 2: Participating teams and their references.
               Team         Author
                   AmritaNLP         [18]                            The distribution of the results per dataset is shown in
                   BITS Pilani        [1]                         Figure 1. It is noteworthy the highest accuracy obtained on
                   CIC                [7]                         Facebook, with the median value about 75% and the high-
                   DUBL              [17]                         est one over 90%. However, results on this genre are the
                   RBG                [3]                         most sparse ones, with a standard deviation of 0.16. On the
                                                                  other hand, results on the gender-imitated corpus are the
                                                                  lowest ones, with most of the participants obtaining accu-
                                                                  racies close to 50%, that would correspond to the majority
  Preprocessing. Preprocessing was carried out to obtain          class baseline. However, there were two participants who
plain text [1]. Various participants removed stopwords [1,        obtained results about 65%. In the following subsections we
17], short words [17] and Twitter specific elements (user         analyse the results per dataset more in depth.
mentions, hashtags and links) [1, 17]. Some of them also re-
moved punctuation marks [7, 1] as well as numbers [1], and
the authors in [7] removed non-cyrillic characters. Finally,
lemmatisation has been performed by the authors in [17].

   Features. Traditionally, author profiling tasks have been
approached with content and style-based features. In this
vein, the authors in [18] extracted features such as the num-
ber of user mentions, hashtags and urls, emoticons, punctu-
ation marks, and average word length, combined with tf-idf
bag-of-words. Similarly, the authors in [7] combined dif-
ferent kinds of features in their systems such as word and
character n-grams, words most frequently used per gender,
linguistic patterns such as word endings or the use of first
person singular pronouns within a distance to a verb in past
tense. The mentioned linguistic rule has been combined with
deep learning techniques in [1]. Finally, the authors in [17]     Figure 1: Distribution of results for gender identifi-
performed topic modelling and the authors in [3] developed        cation in the different datasets.
a representation scheme based on the texts belonging to the
corresponding target classes.

   Classification Approaches. Traditional features have           4.1   Essays
been used with machine learning methods such as Support              Results on the essays dataset (Table 4) set forth an av-
Vector Machines (SVM) [18, 7, 3], Random Forest [18] and          erage accuracy of 55.39%, a median of 54.86% and a total
AdaBoost [18]. The authors in [17] used Additive Regular-         of seven runs below the majority class and bow baselines.
ization for Topic Modelling. Finally, the authors in [1], who     Apart from these low results, there are four runs improv-
combined a rule-based approach with deep learning, have           ing in more than 10% this baseline, with accuracies between
used variations of Long-Short Term Memory networks.               60.27% and 78.38%.
                                                                     The best result (78.38%) has been obtained by Bits Pilani,
                                                                  who combined linguistic rules with deep learning techniques.
                                                                  The second best result (68.11%) has been obtained by Am-
4.   EVALUATION AND DISCUSSION OF THE                             ritaNLP, who used stylistic features with traditional ma-
                                                                  chine learning algorithms. As can be seen, the first result is
     SUBMITTED APPROACHES                                         more than 10% higher than the second one, and about 23%
  Due to the cross-genre perspective of the task, five datasets   higher than the average, showing the power of deep learn-
were provided. Five teams submitted a total of 22 runs,           ing in this task when training on Twitter and evaluating on
whose distribution per dataset is shown in Table 3. As can        essays. However, none of these systems overcame the LDR
be seen, a total of 93 runs have been analysed, with 18-19        baseline (81.41%), that obtained a performance that was 3%
runs per dataset.                                                 and 13% higher, respectively.
                                                           In Table 5 the results on the Facebook dataset are shown.
Table 4: Accuracy in gender identification in essays.   Both the average value (71.19%), the median (75%), the Q3
        Ranking Team         Run Accuracy               (86.19%) and the best value (93.42%) are the highest of all
                 LDR                 0.8141             datasets. Indeed, they are even higher than the obtained on
           1     Bits Pilani   4     0.7838             the Twitter dataset (shown in Table 6). However, the sys-
           2     AmritaNLP     3     0.6811             tems behaved in a heterogeneous way among datasets, ob-
           3     dubl          4     0.6297             taining the most sparse results with an inter-quartile range
           4     CIC           3     0.6027             of 34.44%. The reason is due to five runs equal or below the
           5     AmritaNLP     2     0.5973             majority baseline, and another run from the same partici-
           6     CIC           1     0.5865
           7     CIC           2     0.5838             pant very close to 50%. Furthermore, 12 systems performed
           8     dubl          1     0.5486             worst than the bow baseline, that obtained an accuracy of
           9     dubl          2     0.5486             76.32%, even higher than the mean (71.19%) and the median
          10     dubl          3     0.5486             (75%).
          11     AmritaNLP     1     0.5243                The four best results have been obtained by CIC, that
                 bow                 0.5027             trained SVMs with combinations of n-grams and linguistic
                 majority            0.5000             rules, among others. The fifth and sixth best results have
          12     RBG           4     0.5000             been obtained by BITS Pilani with linguistic rules combined
          13     CIC           5     0.4973             with deep learning. The best runs obtained a better perfor-
          14     RBG           2     0.4919
          15     CIC           4     0.4676             mance than the LDR baseline of 2% and 12%, respectively.
          16     RBG           1     0.4595             In this case, although the deep learning techniques obtained
          17     RBG           3     0.4595             good results, they are more than 5% lower than traditional
          18     RBG           5     0.4595             approaches.
                 Min                 0.4595             4.3   Twitter
                 Q1                  0.4933
                 Median              0.5486
                 Mean                0.5539
                 SDev                0.0861             Table 6: Accuracy in gender identification in Twit-
                 Q3                  0.5946             ter.
                 Max                 0.7838                     Ranking Team        Run Accuracy
                                                                    1       CIC             3     0.6825
4.2   Facebook                                                              LDR                   0.6759
                                                                     2      CIC             2     0.6650
                                                                     3      Bits Pilani     4     0.6525
Table 5: Accuracy in gender identification in Face-                  4      CIC             1     0.6525
book.                                                                5      dubl            3     0.6300
                                                                     6      CIC             5     0.6275
        Ranking Team        Run Accuracy
                                                                     7      dubl            4     0.6275
           1     CIC           2     0.9342                          8      AmritaNLP       3     0.6175
           2     CIC           1     0.9211                          9      dubl            2     0.6125
           3     CIC           5     0.8991                         10      AmritaNLP       2     0.6100
           4     CIC           4     0.8860                         11      CIC             4     0.5975
           5     Bits Pilani   5     0.8728                         12      AmritaNLP       1     0.5700
                 LDR                 0.8596                         13      Bits Pilani     2     0.5400
           6     Bits Pilani   3     0.8509                         14      RBG             2     0.5125
           7     CIC           3     0.7851                                 majority              0.5000
                 bow                 0.7632                         15      RBG             4     0.5000
           8     dubl          3     0.7588                                 bow                   0.4937
           9     dubl          2     0.7544                         16      RBG             1     0.4650
          10     dubl          4     0.7500                         17      RBG             3     0.4550
          11     AmritaNLP     1     0.7456                         18      RBG             5     0.4000
          12     AmritaNLP     2     0.7237
                                                                            Min                   0.4000
          13     AmritaNLP     3     0.6228
                                                                            Q1                    0.5194
          14     RBG           2     0.5351
                                                                            Median                0.6112
                 majority            0.5000
                                                                            Mean                  0.5787
          15     RBG           3     0.5000
                                                                            SDev                  0.0815
          16     RBG           4     0.5000
                                                                            Q3                    0.6294
          17     RBG           5     0.5000
                                                                            Max                   0.6825
          18     RBG           1     0.4956
          19     Bits Pilani   2     0.4912
                 Min                 0.4912               The results obtained on the Twitter dataset are shown
                 Q1                  0.5175             in Table 6. The two best results (68.25%, 66.50%) have
                 Median              0.7500             been obtained by CIC team, with the next result tied with
                 Mean                0.7119             BITS Pilani (65.25%). These results are very similar to the
                 SDev                0.1642             one obtained by the LDR baseline (67.59%). The average
                 Q3                  0.8619             result falls down to 57.87%, below the median of 61.12%,
                 Max                 0.9342
                                                        due to the low results obtained by most of the runs sent by
RBG team. In this vein, it is noteworthy to see that the        4.5    Gender Imitation
results are below the majority baseline obtained by the bow       In the gender-imitated corpus, the authors were asked to
baseline (49.37%).                                              write the texts as if they were of the other gender or obfus-
  Although the results on the Twitter dataset were expected     cating their style, besides texts without imitation. In Table 8
to be the highest ones, they are much lower than the ob-        the results of the gender identification task on this genre are
tained on the Facebook dataset. In Facebook, besides main-      shown. The average and median accuracies obtained by the
taining the spontaneity of Twitter, posts use to be longer      systems on this dataset are the lowest (51.90% and 50% re-
and grammatically richer, with fewer syntactic errors and       spectively). Most participants obtained accuracies close to
misspellings. This may be the cause of the increase in ac-      the majority class and the bow baseline: 11 teams with an
curacy. Furthermore, although the mean is higher, the best      accuracy equal or lower than 50% and 6 teams with less than
result in Twitter (68.25%) is 10% lower than the obtained       5% of improvement. Only two runs of Bits Pilani team ob-
in the essays dataset (78.38%).                                 tained a significant improvement of 13% and 15% over the
                                                                majority class. This team combined linguistic rules with
4.4   Reviews                                                   deep learning techniques, showing the robustness of these
                                                                techniques when the authors imitate the other gender and
   Results on the reviews dataset (Table 7) are lower than
                                                                style. In this vein, we should highlight that LDR baseline
on the previous datasets although with lowest sparsity: most
                                                                (55.32%), AmritaNLP (54.26%) and CIC (54.26%), that ob-
of the participants obtained results close to the average and
                                                                tained similar results among them, performed about 10%
median (52.87% and 52.06% respectively). As can be ob-
                                                                worst than the aforementioned deep learning techniques.
served, these results are very close to the majority class
(50%) and the bow baseline (50%), with five runs equal or
below, and nine runs with less than a 5% of improvement.
These low results expose the difficulty of the task on this     Table 8: Accuracy in gender identification in gender-
genre when the training data comes from Twitter.                imitated texts.
   The best results have been achieved by CIC (61.86% and               Ranking Team          Run Accuracy
59.79%) and Bits Pilani (57.86% and 57.73%) teams, such                      1       Bits Pilani      5      0.6596
as in the previous datasets (although about 4% lower than                    2       Bits Pilani      3      0.6383
the 65.81% obtained by the LDR baseline). However, the                               LDR                     0.5532
difference is more than 7% in case of Twitter, 17% in case                   3       AmritaNLP        1      0.5426
of essays and 30% in case of Facebook.                                       4       CIC              3      0.5426
                                                                             5       CIC              1      0.5319
                                                                             6       CIC              2      0.5213
                                                                             7       CIC              4      0.5213
                                                                             8       Bits Pilani      1      0.5106
Table 7: Accuracy in gender identification in re-                                    majority                0.5000
views.                                                                               bow                     0.5000
       Ranking Team       Run Accuracy                                       9       CIC              5      0.5000
                                                                            10       dubl             2      0.5000
                    LDR                    0.6581                           11       dubl             3      0.5000
             1      CIC             3      0.6186                           12       dubl             4      0.5000
             2      CIC             1      0.5979                           13       RBG              1      0.5000
             3      Bits Pilani     5      0.5786                           14       RBG              3      0.5000
             4      Bits Pilani     4      0.5773                           15       RBG              4      0.5000
             5      CIC             2      0.5709                           16       RBG              5      0.5000
             6      AmritaNLP       1      0.5412                           17       RBG              2      0.4894
             7      AmritaNLP       3      0.5296                           18       AmritaNLP        2      0.4574
             8      CIC             5      0.5258                           19       AmritaNLP        3      0.4468
             9      RBG             2      0.5232
            10      RBG             4      0.5206                                    Min                     0.4468
            11      AmritaNLP       2      0.5155                                    Q1                      0.5000
            12      Bits Pilani     2      0.5142                                    Median                  0.5000
            13      CIC             4      0.5116                                    Mean                    0.5190
            14      RBG             3      0.5013                                    SDev                    0.0517
                    majority               0.5000                                    Q3                      0.5266
                    bow                    0.5000                                    Max                     0.6596
            15      RBG             1      0.5000
            16      RBG             5      0.5000
            17      dubl            3      0.4794               4.6    Global Ranking
            18      dubl            2      0.4755                  The global ranking shown in Table 9 has been calculated
            19      dubl            4      0.4639               following Formula 1. It is noteworthy that most participants
                    Min                    0.4639               obtained a weighted accuracy between 47% and 57%, with
                    Q1                     0.5007               a median of 54.42%. That means that most of the partici-
                    Median                 0.5206               pants obtained results close to the majority class (50%) and
                    Mean                   0.5287               the bow baseline (53.13%). There are also three runs that
                    SDev                   0.0424               obtained results much lower than the majority class due to
                    Q3                     0.5561               their participation only on some datasets.
                    Max                    0.6186
                                                                   At the top of the ranking, we can highlight that the CIC
team obtained the best first four results, with accuracies        on the genre, these approaches performed the best, such
ranging from 58.62% to 64.56%, showing the robustness and         as in case of essays or the gender-imitated texts where they
homogeneity of their approach. However, it should be high-        obtained more than 10% of improvement over the traditional
lighted that, as Bits Pilani runs different systems on the        ones.
different datasets, although they obtained one of the bests          Contrary to what was expected, the best results have not
results in each of them, a fair comparison has not been possi-    been achieved in Twitter but in Facebook. The reason may
ble. For example, run 4 obtained 78.38% accuracy on essays        be that, although Facebook maintains the spontaneity of
(more than 10% than the next one), was not run neither            Twitter, their posts use to be longer and grammatically
on Facebook nor on gender-imitated sets, where the overall        richer, with fewer syntactic errors and misspellings. On the
accuracy was lower. It is worth to mention that none of the       other hand, almost the worst results have been obtained on
systems outperformed the LDR baseline (71.21%), that ob-          reviews. Similar cross-genre effects were also observed at
tained a 6.65% better performance with respect to the best        PAN-2014 [8].
system.                                                              In case of the gender-imitated texts, most systems failed,
                                                                  with 11 runs equal or below the majority baseline, and 6
                                                                  runs with less than 5% of improvement. Only two systems
Table 9: Global ranking by averaging the accuracies               of Bits Pilani obtained results with more than 10% of im-
on the different datasets, weighting by the size of               provement over the baseline. In this more difficult scenario,
the dataset.                                                      the deep learning approaches showed their superiority over
        Ranking Team         Run Accuracy                         traditional approaches.
                     LDR                     0.7121
             1       CIC              3      0.6456               6.   ACKNOWLEDGMENTS
             2       CIC              1      0.6435
             3       CIC              2      0.6354                  This work was supported in part of creation of Gender Im-
             4       CIC              5      0.5862               itation Corpus by the Russian Science Foundation, project
             5       AmritaNLP        3      0.5857               No. 16-18-10050 ”Identifying the Gender and Age of Online
             6       AmritaNLP        2      0.5744               Chatters Using Formal Parameters of their Texts”. Texts
             7       AmritaNLP        1      0.5691               with style obfuscation were collected in the framework of the
             8       dubl             4      0.5685               project ”Lie Detection in a Written Text: A Corpus Study”
             9       CIC              4      0.5675               supported by the Russian Foundation for Basic Research N
            10       dubl             3      0.5605               15-34-01221. The third author acknowledges the SomEM-
            11       dubl             2      0.5546
            12       Bits Pilani      4      0.5337               BED TIN2015-71147-C2-1-P MINECO research project.
                     bow                     0.5313
            13       RBG              2      0.5145               7.   REFERENCES
                     majority                0.5000
                                                                   [1] R. Bhargava, G. Goel, A. Shah, and Y. Sharma.
            14       RBG              4      0.5086
            15       RBG              1      0.4839                    Gender identification in russian texts. In Working
            16       RBG              3      0.4829                    Notes for PAN-RUSProfiling at FIRE’17. Workshops
            17       RBG              5      0.4706                    Proceedings of the 9th International Forum for
            18       Bits Pilani      2      0.3881                    Information Retrieval Evaluation (Fire’17),
            19       Bits Pilani      5      0.3790                    Bangalore, India. CEUR-WS.org, 2017.
            20       Bits Pilani      3      0.1344                [2] F. Celli, B. Lepri, J.-I. Biel, D. Gatica-Perez,
            21       dubl             1      0.1065                    G. Riccardi, and F. Pianesi. The workshop on
            22       Bits Pilani      1      0.0236
                                                                       computational personality recognition 2014. In
                     Min                     0.0236                    Proceedings of the ACM International Conference on
                     Q1                      0.4737                    Multimedia, pages 1245–1246. ACM, 2014.
                     Median                  0.5442                [3] B. Ganesh HB, A. Kumar M, and S. KP.
                     Mean                    0.4780
                     SDev                    0.1740                    Representation of target classes for text classification -
                     Q3                      0.5731                    amrita cen nlp@rusprofiling pan 2017. In Working
                     Max                     0.6456                    Notes for PAN-RUSProfiling at FIRE’17. Workshops
                                                                       Proceedings of the 9th International Forum for
                                                                       Information Retrieval Evaluation (Fire’17),
5.   CONCLUSION                                                        Bangalore, India. CEUR-WS.org, 2017.
   This paper describes the 22 systems sent by 5 partici-          [4] T. Litvinova, D. Gudovskikh, A. Sboev, P. Seredin,
pants to the RusProfiling shared task at PAN-FIRE 2017.                O. Litvinova, D. Pisarevskaya, and P. Rosso. Author
Participants submitted a total of 93 runs on the five differ-          gender prediction in russian social media texts. In
ent datasets, with 18-19 runs per each dataset. They had to            Conf. on Analysis of Images, Social networks, and
address the identification of the author’s gender from a cross-        Texts, AIST-2017.
genre perspective: given a training set of Twitter data, the       [5] T. Litvinova, O. Litvinlova, O. Zagorovskaya,
systems have been evaluated on five different sets (essays,            P. Seredin, A. Sboev, and O. Romanchenko. ”
Facebook, Twitter, reviews and gender-imitated texts).                 ruspersonality”: A russian corpus for authorship
   Participants have used different kinds of approaches, from          profiling and deception detection. In Intelligence,
traditional ones based on hand-crafted features and machine            Social Media and Web (ISMW FRUCT), 2016
learning techniques such as Support Vector Machines, to the            International FRUCT Conference on, pages 1–7.
nowadays fashionable deep learning techniques. Depending               IEEE, 2016.
 [6] T. Litvinova, P. Seredin, O. Litvinova,                          Computational Intelligence (CSCI), 2016 International
     O. Zagorovskaya, A. Sboev, D. Gudovskih,                         Conference on, pages 1101–1106. IEEE, 2016.
     I. Moloshnikov, and R. Rybka. Gender prediction for         [17] G. Skitalinskaya, L. Akhtyamova, and J. Cardiff.
     authors of russian texts using regression and                    Cross-genre gender identification in russian texts using
     classification techniques. In CDUD@ CLA, pages                   topic modeling working note: Team dubl. In Working
     44–53, 2016.                                                     Notes for PAN-RUSProfiling at FIRE’17. Workshops
 [7] I. Markov, H. Gomez-Adorno, G. Sidorov, and                      Proceedings of the 9th International Forum for
     A. Gelbukh. The winning approach to cross-genre                  Information Retrieval Evaluation (Fire’17),
     gender identification in russian at rusprofiling 2017. In        Bangalore, India. CEUR-WS.org, 2017.
     Working Notes for PAN-RUSProfiling at FIRE’17.              [18] V. Vinayan, N. J.R., H. NB, A. Kumar M, and
     Workshops Proceedings of the 9th International Forum             S. K P. Amritanlp@pan-rusprofiling: Author profiling
     for Information Retrieval Evaluation (Fire’17),                  using machine learning techniques. In Working Notes
     Bangalore, India. CEUR-WS.org, 2017.                             for PAN-RUSProfiling at FIRE’17. Workshops
 [8] F. Rangel, P. Rosso, I. Chugur, M. Potthast,                     Proceedings of the 9th International Forum for
     M. Trenkmann, B. Stein, B. Verhoeven, and                        Information Retrieval Evaluation (Fire’17),
     W. Daelemans. Overview of the 2nd author profiling               Bangalore, India. CEUR-WS.org, 2017.
     task at pan 2014. In Cappellato L., Ferro N., Halvey
     M., Kraaij W. (Eds.) CLEF 2014 labs and workshops,
     notebook papers. CEUR-WS.org, vol. 1180, 2014.
 [9] F. Rangel, P. Rosso, and M. Franco-Salvador. A low
     dimensionality representation for language variety
     identification. In 17th International Conference on
     Intelligent Text Processing and Computational
     Linguistics, CICLing. Springer-Verlag, LNCS,
     arXiv:1705.10754, 2016.
[10] F. Rangel, P. Rosso, M. Moshe Koppel,
     E. Stamatatos, and G. Inches. Overview of the author
     profiling task at pan 2013. In Forner P., Navigli R.,
     Tufis D. (Eds.), CLEF 2013 labs and workshops,
     notebook papers. CEUR-WS.org, vol. 1179, 2013.
[11] F. Rangel, P. Rosso, M. Potthast, and B. Stein.
     Overview of the 5th Author Profiling Task at PAN
     2017: Gender and Language Variety Identification in
     Twitter. In Working Notes Papers of the CLEF 2017
     Evaluation Labs, CEUR Workshop Proceedings. CLEF
     and CEUR-WS.org, Sept. 2017.
[12] F. Rangel, P. Rosso, M. Potthast, B. Stein, and
     W. Daelemans. Overview of the 3rd author profiling
     task at pan 2015. In Cappellato L., Ferro N., Jones
     G., San Juan E. (Eds.) CLEF 2015 labs and
     workshops, notebook papers. CEUR Workshop
     Proceedings. CEUR-WS.org, vol. 1391, 2015.
[13] F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans,
     M. Potthast, and B. Stein. Overview of the 4th author
     profiling task at PAN 2016: cross-genre evaluations. In
     Working Notes Papers of the CLEF 2016 Evaluation
     Labs, CEUR Workshop Proceedings. CLEF and
     CEUR-WS.org, Sept. 2016.
[14] D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta.
     Classifying latent user attributes in twitter. In
     Proceedings of the 2nd international workshop on
     Search and mining user-generated contents, pages
     37–44. ACM, 2010.
[15] A. Sboev, T. Litvinova, D. Gudovskikh, R. Rybka,
     and I. Moloshnikov. Machine learning models of text
     categorization by author gender using
     topic-independent features. Procedia Computer
     Science, 101:135–142, 2016.
[16] A. Sboev, T. Litvinova, I. Voronina, D. Gudovskikh,
     and R. Rybka. Deep learning network models to
     categorize texts according to author’s gender and to
     identify text sentiment. In Computational Science and