=Paper=
{{Paper
|id=Vol-2036/T1-1
|storemode=property
|title=Overview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
|pdfUrl=https://ceur-ws.org/Vol-2036/T1-1.pdf
|volume=Vol-2036
|authors=Tatiana Litvinova,Francisco Rangel,Paolo Rosso,Pavel Seredin,Olga Litvinova
|dblpUrl=https://dblp.org/rec/conf/fire/LitvinovaPRSL17
}}
==Overview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian==
Overview of the RusProfiling PAN at FIRE Track on
Cross-genre Gender Identification in Russian
Tatiana Litvinova Francisco Rangel Paolo Rosso
RusProfiling Lab Autoritas Consulting PRHLT Research Center
Russia Valencia, Spain Universitat Politècnica de
centr_rus_yaz@mail.ru francisco.rangel@autoritas.es València, Spain
prosso@dsic.upv.es
Pavel Seredin Olga Litvinova
RusProfiling Lab & RusProfiling Lab &
Kurchatov Institute Kurchatov Institute
Russia Russia
paul@phys.vsu.ru olga_litvinova_teacher@mail.ru
ABSTRACT are presented, and in Section 4 the obtained results are dis-
Author profiling consists of predicting some author’s traits cussed. Finally, in Section 5 we draw some conclusions.
(e.g. age, gender, personality) from her writing. After ad-
dressing at PAN@CLEF1 mainly age and gender identifi-
cation, in this RusProfiling PAN@FIRE track we have ad- 2. EVALUATION FRAMEWORK
dressed the problem of predicting author’s gender in Russian
from a cross-genre perspective: given a training set on Twit- In this section we describe the construction of the cor-
ter, the systems have been evaluated on five different genres pus, covering particular properties, challenges and novelties.
(essays, Facebook, Twitter, reviews and texts where the au- Moreover, the evaluation measures are described.
thors imitated the other gender, where the users change their
idiostyle). In this paper, we analyse the 22 runs sent by 5 2.1 Corpus
participant teams. The best results (although also the most In this section, we describe the datasets that have been
sparse ones) have been obtained on Facebook. released for the tasks described in the previous section. We
have designed these datasets using manual and automated
techniques and made them available to participants through
Keywords the task web page.3
author profiling; gender identification; cross-genre profiling;
Russian; Twitter dataset: (500 users per gender) was split into
training (300 users per gender) and testing datasets (200
users per gender). Annotating social media texts is what
1. INTRODUCTION makes designing such corpora particularly challenging. Some
Author profiling involves predicting an author’s demo- researchers automatically built Twitter corpora while oth-
graphics, personality traits, education and so on from her ers have solved this problem by using labor-intensive meth-
writing, with gender identification being the most popular ods. For example, Rao et al. [14] use a focused search
task [10, 8, 12, 13, 11, 2, 5, 6, 15, 16, 4]. Author profil- methodology followed by manual annotation to produce a
ing tasks are popular among participants of PAN which is dataset of 500 English users labeled with gender. The gen-
a series of scientific events and shared tasks on digital text der tag was ascribed based on the screen name, profile pic-
forensics.2 Slavic languages, however, are less investigated ture, self-description (’bio’) and –in the few cases this was
from an author profiling standpoint and have never been not sufficient– the use of gender markings when referring to
addressed at PAN. themselves. For this research we used the same approach
This year at FIRE we have introduced a PAN shared task with manual labeling for tweet author gender. For those
on Cross-genre Gender Identification in Russian texts (Rus- cases where the gender information was not clear we dis-
Profiling shared task) where we provided tweets as a training carded the user. Retweets were removed.
dataset and Facebook posts, online reviews, texts describ- The number of tweets from one user varied from 1 to 200
ing images or letters to a friend, as well as tweets as test (depending on how active the users were at the time the
datasets. The focus is especially on cross-genre gender pro- data was collected – September 2016). All tweets from one
filing. user were merged together and considered as one text. As
The rest of the overview paper is structured as follows. In the analysis suggests, the tweets contain a lot of non-original
Section 2, we describe the construction of the corpus and the information (hashtags, hidden citations (e.g., newsfeeds that
evaluation metrics. In Section 3, participants’ approaches are copied, etc.), hyperlinks, etc.), which makes it extremely
challenging for them to be analyzed.
1
http://pan.webis.de/
2 3
http://pan.webis.de/index.html http://en.rusprofilinglab.ru/rusprofiling-at-pan/korpus/
Facebook dataset: 228 users (114 authors per gender) of cal and psychological) imitation in texts. To the best of our
different age groups (20+, 30+, 40+) from different Russian knowledge, this is the first corpus of this kind. Presently,
cities were randomly chosen (to get minimum mutual friend- the corpus is being prepared to be made available on the
ships). We used the same principals for gender labeling as RusProfiling Lab website.
were used for Twitter. All posts from one user were merged In Table 1 a summary on the number of authors per
into one text with average length of 1000 words. dataset is shown.
As well as for collecting data from Twitter, Facebook
pages of famous people involved in administration or gov-
ernment or accounts of heads of major companies were not Table 1: Distribution of authors per dataset (half
employed for the study. As the analysis show, in Russian per gender).
Facebook texts there is less non-original information than Dataset Genre Number of authors
on Twitter.
Training Twitter 600
Essays dataset: 185 authors per gender, one or two texts Test Essays 370
per author (in case of two texts they were merged together Facebook 228
and considered as one text). The texts were taken randomly Twitter 400
from manually collected RusPersonality corpus [5]. RusPer- Reviews 776
sonality is the first Russian-language corpus of written texts Gender-imitated 94
labeled with data on their authors. A unique aspect of the
corpus is the breadth of the metadata (gender, age, person-
ality, neuropsychological testing data, education level, etc).
2.2 Performance measures
The texts were written by respondents especially for this For evaluating what done in the previous approaches we
corpus, do not contain any borrowings and are not edited. have used accuracy, following author profiling tasks at PAN.
Topics of the texts were letter to a friend, picture descrip- In the RusProfiling shared task, we have calculated the accu-
tion, letter to an employee trying to convince her to hire the racy per dataset as the number of authors correctly identified
respondent. The average text length in this dataset was 150 divided by the total number of authors in this dataset. The
words. global ranking has been obtained by calculating the average
accuracy among all the datasets weighted by the number of
Reviews dataset: 388 authors per gender, one text per documents in each dataset:
author. The texts were collected from Trustpilot4 , the au- P
thor’s gender was identified based on the profile information. ds accuracy(ds) ∗ size(ds))
The average text length was 80 words. global acc = P (1)
ds size(ds)
Gender-imitated dataset: 47 authors per gender, three
texts from each author that were merged together and con-
sidered as one text. The texts were randomly selected from 2.3 Baselines
the existing corpus we have collected called Gender Imi- To understand the complexity of the task per genre and
tation Corpus. The Gender Imitation Corpus is the first with the aim to compare the performances of the partic-
Russian corpus for studies of stylistic deception. Each re- ipants approaches, we propose the following baselines, as
spondent (n=142) was instructed to write three texts on well as we did at PAN at CLEF in 2017 [11]:
the same topic (from a list). Let us provide an example of
the task: ”Last summer you bought a package tour from a • majority. A statistical baseline that emulates random
travel agency, but you were not at all pleased with your ex- choice. The baseline depends on the number of classes:
perience with that company and the trip was not worth the two in case of gender identification.
price. You are about to ask for a refund. Write three texts
describing your negative experience providing a detailed ac- • bow. This method represents documents as a bag-of-
count of it. Give a warning that you are intending to sue words with the 5,000 most common words in the train-
the company”. The first text is supposed to be written in ing set, weighted by absolute frequency of occurrence,
a way usual for whoever writes it (without any deception), and it uses SVM as machine learning algorithm. The
the second one should be written as if by someone of the op- texts are preprocessed as follows: lowercase words, re-
posite gender (”imitation”); the third one should be as if one moval of punctuation signs and numbers, and removal
by another individual of the same gender so that her per- of stop words for the corresponding language.
sonal writing style will not be recognized (what is referred • LDR [9]. This method represents documents on the
to as ”obfuscation”). Most of the texts are 80-150 words basis of the probability distribution of occurrence of
long. All of the respondents are students of Russian univer- their words in the different classes. The key concept
sities. Besides the texts, the corpus includes metadata with of LDR is a weight, representing the probability of a
the authors’ characteristics: gender, age, native language, term to belong to one of the different categories (e.g.
handedness, psychological gender (femininity/masculinity). female vs. male). The distribution of weights for a
Therefore, the corpus provides countless opportunities for given document should be closer to the weights of its
investigating problems arising in imitating properties of the corresponding category. LDR takes advantage of the
written speech in different aspects as well as gender (biologi- whole vocabulary.
4
https://ru.trustpilot.com/
3. OVERVIEW OF THE SUBMITTED AP-
Table 3: Number of participants’ runs per dataset.
PROACHES Dataset Number of runs
Following, we briefly describe the systems submitted by
the five participants of the task, from three perspectives: Essays 18
preprocessing, features to represent the authors’ texts and Facebook 19
classification approaches. In Table 3 the teams and the cor- Twitter 18
responding references are presented. Reviews 19
Imitated 19
Total 93
Table 2: Participating teams and their references.
Team Author
AmritaNLP [18] The distribution of the results per dataset is shown in
BITS Pilani [1] Figure 1. It is noteworthy the highest accuracy obtained on
CIC [7] Facebook, with the median value about 75% and the high-
DUBL [17] est one over 90%. However, results on this genre are the
RBG [3] most sparse ones, with a standard deviation of 0.16. On the
other hand, results on the gender-imitated corpus are the
lowest ones, with most of the participants obtaining accu-
racies close to 50%, that would correspond to the majority
Preprocessing. Preprocessing was carried out to obtain class baseline. However, there were two participants who
plain text [1]. Various participants removed stopwords [1, obtained results about 65%. In the following subsections we
17], short words [17] and Twitter specific elements (user analyse the results per dataset more in depth.
mentions, hashtags and links) [1, 17]. Some of them also re-
moved punctuation marks [7, 1] as well as numbers [1], and
the authors in [7] removed non-cyrillic characters. Finally,
lemmatisation has been performed by the authors in [17].
Features. Traditionally, author profiling tasks have been
approached with content and style-based features. In this
vein, the authors in [18] extracted features such as the num-
ber of user mentions, hashtags and urls, emoticons, punctu-
ation marks, and average word length, combined with tf-idf
bag-of-words. Similarly, the authors in [7] combined dif-
ferent kinds of features in their systems such as word and
character n-grams, words most frequently used per gender,
linguistic patterns such as word endings or the use of first
person singular pronouns within a distance to a verb in past
tense. The mentioned linguistic rule has been combined with
deep learning techniques in [1]. Finally, the authors in [17] Figure 1: Distribution of results for gender identifi-
performed topic modelling and the authors in [3] developed cation in the different datasets.
a representation scheme based on the texts belonging to the
corresponding target classes.
Classification Approaches. Traditional features have 4.1 Essays
been used with machine learning methods such as Support Results on the essays dataset (Table 4) set forth an av-
Vector Machines (SVM) [18, 7, 3], Random Forest [18] and erage accuracy of 55.39%, a median of 54.86% and a total
AdaBoost [18]. The authors in [17] used Additive Regular- of seven runs below the majority class and bow baselines.
ization for Topic Modelling. Finally, the authors in [1], who Apart from these low results, there are four runs improv-
combined a rule-based approach with deep learning, have ing in more than 10% this baseline, with accuracies between
used variations of Long-Short Term Memory networks. 60.27% and 78.38%.
The best result (78.38%) has been obtained by Bits Pilani,
who combined linguistic rules with deep learning techniques.
The second best result (68.11%) has been obtained by Am-
4. EVALUATION AND DISCUSSION OF THE ritaNLP, who used stylistic features with traditional ma-
chine learning algorithms. As can be seen, the first result is
SUBMITTED APPROACHES more than 10% higher than the second one, and about 23%
Due to the cross-genre perspective of the task, five datasets higher than the average, showing the power of deep learn-
were provided. Five teams submitted a total of 22 runs, ing in this task when training on Twitter and evaluating on
whose distribution per dataset is shown in Table 3. As can essays. However, none of these systems overcame the LDR
be seen, a total of 93 runs have been analysed, with 18-19 baseline (81.41%), that obtained a performance that was 3%
runs per dataset. and 13% higher, respectively.
In Table 5 the results on the Facebook dataset are shown.
Table 4: Accuracy in gender identification in essays. Both the average value (71.19%), the median (75%), the Q3
Ranking Team Run Accuracy (86.19%) and the best value (93.42%) are the highest of all
LDR 0.8141 datasets. Indeed, they are even higher than the obtained on
1 Bits Pilani 4 0.7838 the Twitter dataset (shown in Table 6). However, the sys-
2 AmritaNLP 3 0.6811 tems behaved in a heterogeneous way among datasets, ob-
3 dubl 4 0.6297 taining the most sparse results with an inter-quartile range
4 CIC 3 0.6027 of 34.44%. The reason is due to five runs equal or below the
5 AmritaNLP 2 0.5973 majority baseline, and another run from the same partici-
6 CIC 1 0.5865
7 CIC 2 0.5838 pant very close to 50%. Furthermore, 12 systems performed
8 dubl 1 0.5486 worst than the bow baseline, that obtained an accuracy of
9 dubl 2 0.5486 76.32%, even higher than the mean (71.19%) and the median
10 dubl 3 0.5486 (75%).
11 AmritaNLP 1 0.5243 The four best results have been obtained by CIC, that
bow 0.5027 trained SVMs with combinations of n-grams and linguistic
majority 0.5000 rules, among others. The fifth and sixth best results have
12 RBG 4 0.5000 been obtained by BITS Pilani with linguistic rules combined
13 CIC 5 0.4973 with deep learning. The best runs obtained a better perfor-
14 RBG 2 0.4919
15 CIC 4 0.4676 mance than the LDR baseline of 2% and 12%, respectively.
16 RBG 1 0.4595 In this case, although the deep learning techniques obtained
17 RBG 3 0.4595 good results, they are more than 5% lower than traditional
18 RBG 5 0.4595 approaches.
Min 0.4595 4.3 Twitter
Q1 0.4933
Median 0.5486
Mean 0.5539
SDev 0.0861 Table 6: Accuracy in gender identification in Twit-
Q3 0.5946 ter.
Max 0.7838 Ranking Team Run Accuracy
1 CIC 3 0.6825
4.2 Facebook LDR 0.6759
2 CIC 2 0.6650
3 Bits Pilani 4 0.6525
Table 5: Accuracy in gender identification in Face- 4 CIC 1 0.6525
book. 5 dubl 3 0.6300
6 CIC 5 0.6275
Ranking Team Run Accuracy
7 dubl 4 0.6275
1 CIC 2 0.9342 8 AmritaNLP 3 0.6175
2 CIC 1 0.9211 9 dubl 2 0.6125
3 CIC 5 0.8991 10 AmritaNLP 2 0.6100
4 CIC 4 0.8860 11 CIC 4 0.5975
5 Bits Pilani 5 0.8728 12 AmritaNLP 1 0.5700
LDR 0.8596 13 Bits Pilani 2 0.5400
6 Bits Pilani 3 0.8509 14 RBG 2 0.5125
7 CIC 3 0.7851 majority 0.5000
bow 0.7632 15 RBG 4 0.5000
8 dubl 3 0.7588 bow 0.4937
9 dubl 2 0.7544 16 RBG 1 0.4650
10 dubl 4 0.7500 17 RBG 3 0.4550
11 AmritaNLP 1 0.7456 18 RBG 5 0.4000
12 AmritaNLP 2 0.7237
Min 0.4000
13 AmritaNLP 3 0.6228
Q1 0.5194
14 RBG 2 0.5351
Median 0.6112
majority 0.5000
Mean 0.5787
15 RBG 3 0.5000
SDev 0.0815
16 RBG 4 0.5000
Q3 0.6294
17 RBG 5 0.5000
Max 0.6825
18 RBG 1 0.4956
19 Bits Pilani 2 0.4912
Min 0.4912 The results obtained on the Twitter dataset are shown
Q1 0.5175 in Table 6. The two best results (68.25%, 66.50%) have
Median 0.7500 been obtained by CIC team, with the next result tied with
Mean 0.7119 BITS Pilani (65.25%). These results are very similar to the
SDev 0.1642 one obtained by the LDR baseline (67.59%). The average
Q3 0.8619 result falls down to 57.87%, below the median of 61.12%,
Max 0.9342
due to the low results obtained by most of the runs sent by
RBG team. In this vein, it is noteworthy to see that the 4.5 Gender Imitation
results are below the majority baseline obtained by the bow In the gender-imitated corpus, the authors were asked to
baseline (49.37%). write the texts as if they were of the other gender or obfus-
Although the results on the Twitter dataset were expected cating their style, besides texts without imitation. In Table 8
to be the highest ones, they are much lower than the ob- the results of the gender identification task on this genre are
tained on the Facebook dataset. In Facebook, besides main- shown. The average and median accuracies obtained by the
taining the spontaneity of Twitter, posts use to be longer systems on this dataset are the lowest (51.90% and 50% re-
and grammatically richer, with fewer syntactic errors and spectively). Most participants obtained accuracies close to
misspellings. This may be the cause of the increase in ac- the majority class and the bow baseline: 11 teams with an
curacy. Furthermore, although the mean is higher, the best accuracy equal or lower than 50% and 6 teams with less than
result in Twitter (68.25%) is 10% lower than the obtained 5% of improvement. Only two runs of Bits Pilani team ob-
in the essays dataset (78.38%). tained a significant improvement of 13% and 15% over the
majority class. This team combined linguistic rules with
4.4 Reviews deep learning techniques, showing the robustness of these
techniques when the authors imitate the other gender and
Results on the reviews dataset (Table 7) are lower than
style. In this vein, we should highlight that LDR baseline
on the previous datasets although with lowest sparsity: most
(55.32%), AmritaNLP (54.26%) and CIC (54.26%), that ob-
of the participants obtained results close to the average and
tained similar results among them, performed about 10%
median (52.87% and 52.06% respectively). As can be ob-
worst than the aforementioned deep learning techniques.
served, these results are very close to the majority class
(50%) and the bow baseline (50%), with five runs equal or
below, and nine runs with less than a 5% of improvement.
These low results expose the difficulty of the task on this Table 8: Accuracy in gender identification in gender-
genre when the training data comes from Twitter. imitated texts.
The best results have been achieved by CIC (61.86% and Ranking Team Run Accuracy
59.79%) and Bits Pilani (57.86% and 57.73%) teams, such 1 Bits Pilani 5 0.6596
as in the previous datasets (although about 4% lower than 2 Bits Pilani 3 0.6383
the 65.81% obtained by the LDR baseline). However, the LDR 0.5532
difference is more than 7% in case of Twitter, 17% in case 3 AmritaNLP 1 0.5426
of essays and 30% in case of Facebook. 4 CIC 3 0.5426
5 CIC 1 0.5319
6 CIC 2 0.5213
7 CIC 4 0.5213
8 Bits Pilani 1 0.5106
Table 7: Accuracy in gender identification in re- majority 0.5000
views. bow 0.5000
Ranking Team Run Accuracy 9 CIC 5 0.5000
10 dubl 2 0.5000
LDR 0.6581 11 dubl 3 0.5000
1 CIC 3 0.6186 12 dubl 4 0.5000
2 CIC 1 0.5979 13 RBG 1 0.5000
3 Bits Pilani 5 0.5786 14 RBG 3 0.5000
4 Bits Pilani 4 0.5773 15 RBG 4 0.5000
5 CIC 2 0.5709 16 RBG 5 0.5000
6 AmritaNLP 1 0.5412 17 RBG 2 0.4894
7 AmritaNLP 3 0.5296 18 AmritaNLP 2 0.4574
8 CIC 5 0.5258 19 AmritaNLP 3 0.4468
9 RBG 2 0.5232
10 RBG 4 0.5206 Min 0.4468
11 AmritaNLP 2 0.5155 Q1 0.5000
12 Bits Pilani 2 0.5142 Median 0.5000
13 CIC 4 0.5116 Mean 0.5190
14 RBG 3 0.5013 SDev 0.0517
majority 0.5000 Q3 0.5266
bow 0.5000 Max 0.6596
15 RBG 1 0.5000
16 RBG 5 0.5000
17 dubl 3 0.4794 4.6 Global Ranking
18 dubl 2 0.4755 The global ranking shown in Table 9 has been calculated
19 dubl 4 0.4639 following Formula 1. It is noteworthy that most participants
Min 0.4639 obtained a weighted accuracy between 47% and 57%, with
Q1 0.5007 a median of 54.42%. That means that most of the partici-
Median 0.5206 pants obtained results close to the majority class (50%) and
Mean 0.5287 the bow baseline (53.13%). There are also three runs that
SDev 0.0424 obtained results much lower than the majority class due to
Q3 0.5561 their participation only on some datasets.
Max 0.6186
At the top of the ranking, we can highlight that the CIC
team obtained the best first four results, with accuracies on the genre, these approaches performed the best, such
ranging from 58.62% to 64.56%, showing the robustness and as in case of essays or the gender-imitated texts where they
homogeneity of their approach. However, it should be high- obtained more than 10% of improvement over the traditional
lighted that, as Bits Pilani runs different systems on the ones.
different datasets, although they obtained one of the bests Contrary to what was expected, the best results have not
results in each of them, a fair comparison has not been possi- been achieved in Twitter but in Facebook. The reason may
ble. For example, run 4 obtained 78.38% accuracy on essays be that, although Facebook maintains the spontaneity of
(more than 10% than the next one), was not run neither Twitter, their posts use to be longer and grammatically
on Facebook nor on gender-imitated sets, where the overall richer, with fewer syntactic errors and misspellings. On the
accuracy was lower. It is worth to mention that none of the other hand, almost the worst results have been obtained on
systems outperformed the LDR baseline (71.21%), that ob- reviews. Similar cross-genre effects were also observed at
tained a 6.65% better performance with respect to the best PAN-2014 [8].
system. In case of the gender-imitated texts, most systems failed,
with 11 runs equal or below the majority baseline, and 6
runs with less than 5% of improvement. Only two systems
Table 9: Global ranking by averaging the accuracies of Bits Pilani obtained results with more than 10% of im-
on the different datasets, weighting by the size of provement over the baseline. In this more difficult scenario,
the dataset. the deep learning approaches showed their superiority over
Ranking Team Run Accuracy traditional approaches.
LDR 0.7121
1 CIC 3 0.6456 6. ACKNOWLEDGMENTS
2 CIC 1 0.6435
3 CIC 2 0.6354 This work was supported in part of creation of Gender Im-
4 CIC 5 0.5862 itation Corpus by the Russian Science Foundation, project
5 AmritaNLP 3 0.5857 No. 16-18-10050 ”Identifying the Gender and Age of Online
6 AmritaNLP 2 0.5744 Chatters Using Formal Parameters of their Texts”. Texts
7 AmritaNLP 1 0.5691 with style obfuscation were collected in the framework of the
8 dubl 4 0.5685 project ”Lie Detection in a Written Text: A Corpus Study”
9 CIC 4 0.5675 supported by the Russian Foundation for Basic Research N
10 dubl 3 0.5605 15-34-01221. The third author acknowledges the SomEM-
11 dubl 2 0.5546
12 Bits Pilani 4 0.5337 BED TIN2015-71147-C2-1-P MINECO research project.
bow 0.5313
13 RBG 2 0.5145 7. REFERENCES
majority 0.5000
[1] R. Bhargava, G. Goel, A. Shah, and Y. Sharma.
14 RBG 4 0.5086
15 RBG 1 0.4839 Gender identification in russian texts. In Working
16 RBG 3 0.4829 Notes for PAN-RUSProfiling at FIRE’17. Workshops
17 RBG 5 0.4706 Proceedings of the 9th International Forum for
18 Bits Pilani 2 0.3881 Information Retrieval Evaluation (Fire’17),
19 Bits Pilani 5 0.3790 Bangalore, India. CEUR-WS.org, 2017.
20 Bits Pilani 3 0.1344 [2] F. Celli, B. Lepri, J.-I. Biel, D. Gatica-Perez,
21 dubl 1 0.1065 G. Riccardi, and F. Pianesi. The workshop on
22 Bits Pilani 1 0.0236
computational personality recognition 2014. In
Min 0.0236 Proceedings of the ACM International Conference on
Q1 0.4737 Multimedia, pages 1245–1246. ACM, 2014.
Median 0.5442 [3] B. Ganesh HB, A. Kumar M, and S. KP.
Mean 0.4780
SDev 0.1740 Representation of target classes for text classification -
Q3 0.5731 amrita cen nlp@rusprofiling pan 2017. In Working
Max 0.6456 Notes for PAN-RUSProfiling at FIRE’17. Workshops
Proceedings of the 9th International Forum for
Information Retrieval Evaluation (Fire’17),
5. CONCLUSION Bangalore, India. CEUR-WS.org, 2017.
This paper describes the 22 systems sent by 5 partici- [4] T. Litvinova, D. Gudovskikh, A. Sboev, P. Seredin,
pants to the RusProfiling shared task at PAN-FIRE 2017. O. Litvinova, D. Pisarevskaya, and P. Rosso. Author
Participants submitted a total of 93 runs on the five differ- gender prediction in russian social media texts. In
ent datasets, with 18-19 runs per each dataset. They had to Conf. on Analysis of Images, Social networks, and
address the identification of the author’s gender from a cross- Texts, AIST-2017.
genre perspective: given a training set of Twitter data, the [5] T. Litvinova, O. Litvinlova, O. Zagorovskaya,
systems have been evaluated on five different sets (essays, P. Seredin, A. Sboev, and O. Romanchenko. ”
Facebook, Twitter, reviews and gender-imitated texts). ruspersonality”: A russian corpus for authorship
Participants have used different kinds of approaches, from profiling and deception detection. In Intelligence,
traditional ones based on hand-crafted features and machine Social Media and Web (ISMW FRUCT), 2016
learning techniques such as Support Vector Machines, to the International FRUCT Conference on, pages 1–7.
nowadays fashionable deep learning techniques. Depending IEEE, 2016.
[6] T. Litvinova, P. Seredin, O. Litvinova, Computational Intelligence (CSCI), 2016 International
O. Zagorovskaya, A. Sboev, D. Gudovskih, Conference on, pages 1101–1106. IEEE, 2016.
I. Moloshnikov, and R. Rybka. Gender prediction for [17] G. Skitalinskaya, L. Akhtyamova, and J. Cardiff.
authors of russian texts using regression and Cross-genre gender identification in russian texts using
classification techniques. In CDUD@ CLA, pages topic modeling working note: Team dubl. In Working
44–53, 2016. Notes for PAN-RUSProfiling at FIRE’17. Workshops
[7] I. Markov, H. Gomez-Adorno, G. Sidorov, and Proceedings of the 9th International Forum for
A. Gelbukh. The winning approach to cross-genre Information Retrieval Evaluation (Fire’17),
gender identification in russian at rusprofiling 2017. In Bangalore, India. CEUR-WS.org, 2017.
Working Notes for PAN-RUSProfiling at FIRE’17. [18] V. Vinayan, N. J.R., H. NB, A. Kumar M, and
Workshops Proceedings of the 9th International Forum S. K P. Amritanlp@pan-rusprofiling: Author profiling
for Information Retrieval Evaluation (Fire’17), using machine learning techniques. In Working Notes
Bangalore, India. CEUR-WS.org, 2017. for PAN-RUSProfiling at FIRE’17. Workshops
[8] F. Rangel, P. Rosso, I. Chugur, M. Potthast, Proceedings of the 9th International Forum for
M. Trenkmann, B. Stein, B. Verhoeven, and Information Retrieval Evaluation (Fire’17),
W. Daelemans. Overview of the 2nd author profiling Bangalore, India. CEUR-WS.org, 2017.
task at pan 2014. In Cappellato L., Ferro N., Halvey
M., Kraaij W. (Eds.) CLEF 2014 labs and workshops,
notebook papers. CEUR-WS.org, vol. 1180, 2014.
[9] F. Rangel, P. Rosso, and M. Franco-Salvador. A low
dimensionality representation for language variety
identification. In 17th International Conference on
Intelligent Text Processing and Computational
Linguistics, CICLing. Springer-Verlag, LNCS,
arXiv:1705.10754, 2016.
[10] F. Rangel, P. Rosso, M. Moshe Koppel,
E. Stamatatos, and G. Inches. Overview of the author
profiling task at pan 2013. In Forner P., Navigli R.,
Tufis D. (Eds.), CLEF 2013 labs and workshops,
notebook papers. CEUR-WS.org, vol. 1179, 2013.
[11] F. Rangel, P. Rosso, M. Potthast, and B. Stein.
Overview of the 5th Author Profiling Task at PAN
2017: Gender and Language Variety Identification in
Twitter. In Working Notes Papers of the CLEF 2017
Evaluation Labs, CEUR Workshop Proceedings. CLEF
and CEUR-WS.org, Sept. 2017.
[12] F. Rangel, P. Rosso, M. Potthast, B. Stein, and
W. Daelemans. Overview of the 3rd author profiling
task at pan 2015. In Cappellato L., Ferro N., Jones
G., San Juan E. (Eds.) CLEF 2015 labs and
workshops, notebook papers. CEUR Workshop
Proceedings. CEUR-WS.org, vol. 1391, 2015.
[13] F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans,
M. Potthast, and B. Stein. Overview of the 4th author
profiling task at PAN 2016: cross-genre evaluations. In
Working Notes Papers of the CLEF 2016 Evaluation
Labs, CEUR Workshop Proceedings. CLEF and
CEUR-WS.org, Sept. 2016.
[14] D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta.
Classifying latent user attributes in twitter. In
Proceedings of the 2nd international workshop on
Search and mining user-generated contents, pages
37–44. ACM, 2010.
[15] A. Sboev, T. Litvinova, D. Gudovskikh, R. Rybka,
and I. Moloshnikov. Machine learning models of text
categorization by author gender using
topic-independent features. Procedia Computer
Science, 101:135–142, 2016.
[16] A. Sboev, T. Litvinova, I. Voronina, D. Gudovskikh,
and R. Rybka. Deep learning network models to
categorize texts according to author’s gender and to
identify text sentiment. In Computational Science and