=Paper=
{{Paper
|id=Vol-2769/77
|storemode=property
|title=Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_77.pdf
|volume=Vol-2769
|authors=Elisa Bassignana,Malvina Nissim,Viviana Patti
|dblpUrl=https://dblp.org/rec/conf/clic-it/BassignanaNP20
}}
==Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian==
Personal-ITY: A Novel YouTube-based Corpus
for Personality Prediction in Italian
Elisa Bassignana Malvina Nissim Viviana Patti
Dipartimento di Informatica CLCG Dipartimento di Informatica
University of Turin University of Groningen University of Turin
elisa.bassignana@edu.unito.it m.nissim@rug.nl viviana.patti@unito.it
Abstract second one, instead, considers 16 fixed personal-
ity types, coming from the combination of the op-
We present a novel corpus for person- posite poles of 4 main dimensions (E XTRAVERT-
ality prediction in Italian, containing a I NTROVERT, I N TUITIVE -S ENSING, F EELING -
larger number of authors and a different T HINKING, P ERCEIVING -J UDGING). Examples
genre compared to previously available of full personality types are therefore four letter
resources. The corpus is built exploit- labels such as ENTJ or ISFP.
ing Distant Supervision, assigning Myers- The tests used to detect prevalence of traits in-
Briggs Type Indicator (MBTI) labels to clude human judgements regarding semantic sim-
YouTube comments, and can lend itself to ilarity and relations between adjectives that peo-
a variety of experiments. We report on ple use to describe themselves and others. This
preliminary experiments on Personal-ITY, is because language is believed to be a prime car-
which can serve as a baseline for future rier of personality traits (Schwartz et al., 2013).
work, showing that some types are easier This aspect, together with the progressive increase
to predict than others, and discussing the of available user-generated data on social media,
perks of cross-dataset prediction. has prompted the task of Personality Detection,
i.e., the automatic prediction of personality from
1 Introduction written texts (Youyou et al., 2015; Argamon et al.,
2009; Litvinova et al., 2016; Whelan and Davies,
When faced with the same situation, different hu-
2006).
mans behave differently. This is, of course, due
Personality detection can be useful in predicting
to different backgrounds, education paths, and life
life outcomes such as substance use, political atti-
experiences, but according to psychologists there
tudes and physical health. Other fields of applica-
is another important aspect: personality (Snyder,
tion are marketing, politics and psychological and
1983; Parks and Guay, 2009).
social assessment.
Human Personality is a psychological construct
As a contribution to personality detection in
aimed at explaining the wide variety of human be-
Italian, we present Personal-ITY, a new corpus of
haviours in terms of a few, stable and measurable
YouTube comments annotated with MBTI person-
individual characteristics (Vinciarelli and Moham-
ality traits, and some preliminary experiments to
madi, 2014).
highlight its characteristics and test its potential.
Such characteristics are formalised in Trait
The corpus is made available to the community1 .
Models, and there are currently two of these mod-
els that are widely adopted: Big Five (John and
Srivastava, 1999) and Myers-Briggs Type Indica- 2 Related Work
tor (MBTI) (Myers and Myers, 1995). The first There exist a few datasets annotated for personal-
examines five dimensions (O PENNESS TO EX - ity traits. For the shared tasks organised within the
PERIENCE, C ONSCIENTIOUSNESS , E XTROVER -
Workshop on Computational Personality Recog-
SION , AGREEABLENESS and N EUROTICISM ) and
nition (Celli et al., 2013), two datasets annotated
for each of them assigns a score in a range. The with the Big Five traits have been released in 2013
Copyright © 2020 for this paper by its authors. Use per-
1
mitted under Creative Commons License Attribution 4.0 In- https://github.com/elisabassignana/
ternational (CC BY 4.0). Personal-ITY
Corpus Model # user Avg. the corpus creators themselves using a Lin-
earSVM with word (1-2) and character (3-4) n-
PAN2015 Big Five 38 1258
grams. Their results (reported in Table 2 for the
T WI S TY MBTI 490 21.343
Italian portion of the dataset) are obtained through
Personal-ITY MBTI 1048 10.585
10-fold cross-validation; the model is compared to
a weighted random baseline (WRB) and a major-
Table 1: Summary of Italian corpora with person-
ity baseline (MAJ).
ality labels. Avg.: average tokens per user.
Trait WRB MAJ f-score
(Essays (Pennebaker and King, 2000) and myPer- EI 65.54 77.88 77.78
sonality2 ) and two in 2014 (YouTube Personality NS 75.60 85.78 79.21
Dataset (Biel and Gatica-Perez, 2013) and Mobile FT 50.31 53.95 52.13
Phones interactions (Staiano et al., 2012)). PJ 50.19 53.05 47.01
For the 2015 PAN Author Profiling Shared Task
Avg 60.41 67.67 64.06
(Pardo et al., 2015), personality was added to gen-
der and age in the profiling task, with tweets in En-
glish, Spanish, Italian and Dutch. These are also Table 2: T WI S TY scores from the original paper.
annotated according to the Big Five model. Note that all results are reported as micro-average
F-score.
Still in the Big Five landscape, Schwartz et al.
(2013) collected a dataset of FaceBook comments
(700 millions words) written by 136.000 users 3 Personal-ITY
who shared their status updates. Interesting cor-
relations were observed between word usage and First, we explain two major choices that we made
personality traits. in creating Personal-ITY, namely the source of the
If looking at data labelled with the MBTI traits, data and the trait model. Second, we describe in
we find a corpus of 1.2M English tweets annotated detail the procedure we followed to construct the
with personality and gender (Plank and Hovy, corpus. Lastly, we provide a description of the re-
2015), and the multilingual T WI S TY (Verhoeven sulting dataset.
et al., 2016). The latter is a corpus of data col- Data YouTube is the source of data for our cor-
lected from Twitter annotated with MBTI person- pus. The decision is grounded on the fact that
ality labels and gender for six languages (Dutch, compared to the more commonly collected tweets,
German, French, Italian, Portuguese and Spanish) YouTube comments can be longer, so that users
and a total of 18,168 authors. We are interested in are freer to express themselves without limita-
the Italian portion of T WI S TY. tions. Additionally, there is a substantial amount
Table 1 contains an overview of the available of available data on the YouTube platform, which
Italian corpora labelled with personality traits. We is easy to access thanks to the free YouTube APIs.
include our own, which is described in Section 3.
Regarding detection approaches, Mairesse et al. Trait Model Our model of choice is the MBTI.
(2007) tested the usefulness of different sets of The first benefit of this decision is that this model
textual features making use of mostly SVMs. is easy to use in association with a Distant Super-
At the PAN 2015 challenge (see above) a va- vision approach (just checking if a message con-
riety of algorithms were tested (such as Random tains one of the 16 personality types; see Sec-
Forests, decision trees, logistic regression for clas- tion 3.1). Another benefit is related to the ex-
sification, and also various regression models), but istence of T WI S TY. Since both T WI S TY and
overall most successful participants used SVMs. Personal-ITY implement the MBTI model, analy-
Regarding features, participants approached the ses and experiments over personality detection can
task with combinations of style-based and content- be carried out also in a cross-domain setting.
based features, as well as their combination in n- Ethics Statement
gram models (Pardo et al., 2015).
Personality profiling must be carefully evaluated
Experiments on T WI S TY were performed by
from an ethical point of view. In particular, of-
2
http://mypersonality.org ten, personality detection involves ethical dilem-
mas regarding appropriate utilization and interpre- Comment User - MBTI label
tations of the prediction outcomes (Weiner and
Io sono ENFJ!!! User1 - ENFJ
Greene, 2017). Concerns have been raised regard-
ing the inappropriate use of these tests with respect Ho sempre saputo di User2 - ISFP
to invasion of privacy, cultural bias and confiden- essere connessa con
tiality (Mehta et al., 2019). Lady Gaga! ISFP!
The data included in the Personal-ITY dataset
were publicly available on the YouTube platform Table 3: Examples of automatic associations user
at the time of the collection. As we will explain in - MBTI personality type.
detail in this Section, the information collected are
comments published under public videos on the
was associated to a user if they included an MBTI
YouTube platform by authors themselves. For a
combination in one of their comments. Table 3
major protection of user identities, in the released
shows some examples of such associations. The
corpus only the YouTube usernames of the authors
association process is an approximation typical of
are mentioned which are not unique identifiers.
DS approaches. To assess its validity, we manually
The YouTube IDs of the corresponding channels,
checked 300 random comments to see whether the
which are the real identifiers in the platform, al-
mention of an MBTI label was indeed referred to
lowing to trace the identity of the authors, are not
the author’s own personality. We found that in 19
released. Note also that the corpus was created for
cases (6.3%) our method led to a wrong or unsure
academic purposes and is not intended to be used
classification of the user’s personality (e.g. O tutti
for commercial deployment or applications.
gli INTJ del mondo stanno commentando questo
3.1 Corpus Creation video oppure le statistiche sono sbagliate :-)). We
can assume that our dataset might therefore con-
The fact that users often self-disclose information tain about 6-7% of noisy labels.
about themselves on social media makes it possi- Using the acquired list of authors, we meant to
ble to adopt Distant Supervision (DS) for the ac- obtain as many comments as possible written by
quisition of training data. DS is a semi-supervised them. The YouTube API, however, does not al-
method that has been abundantly and successfully low to retrieve all comments by one user on the
used in affective computing and profiling to assign platform. In order to get around this problem we
silver labels to data on the basis of indicative prox- relied on video similarities, and tried to expand as
ies (Go et al., 2009; Pool and Nissim, 2016; Em- much as possible our video collection. Therefore,
mery et al., 2017). as a third step, we retrieved the list of channels
Users left comments to some videos on the that feature our initial 10 videos, and then all of
MBTI theory in which they were stating their own the videos within those channels.
personality type (e.g. Sono ENTJ...chi altro? [en: Fourth, through a second AJAX request, we
”I’m ENTJ...anyone else?”]). We exploited such downloaded all comments appearing below all
comments to create Personal-ITY with the follow- videos retrieved through the previous step.
ing procedure. Lastly, we filtered all comments retaining those
First, we searched for as many Italian YouTube written by authors included in our original list.
videos about MBTI as possible, ending up with This does not obviously cover all comments by
a selection of ten with a conspicuous number of a relevant user, but it provided us with additional
comments as the ones above3 . data per author.
Second, we retrieved all the comments to these
videos using an AJAX request, and built a list of 3.2 Final Corpus Statistics
authors and their associated MBTI label. A label
For the final dataset, we decided to keep only the
3
Links to the 10 YouTube videos: authors with a sufficient amount of data. More
https://www.youtube.com/watch?v=VCo9RlDRpz0
https://www.youtube.com/watch?v=N4kC8iqUNyk specifically, we retained only users with at least
https://www.youtube.com/watch?v=Z8S8PgW8t2U
https://www.youtube.com/watch?v=wHZOG8k7nSw five comments, each at least five token long.
https://www.youtube.com/watch?v=lO2z3_DINqs
https://www.youtube.com/watch?v=NaKPl_y1JXg Personal-ITY includes 96, 815 comments by
https://www.youtube.com/watch?v=8l4o4VBXlGY
https://www.youtube.com/watch?v=GK5J6PLj218 1048 users, each annotated with an MBTI label.
https://www.youtube.com/watch?v=9P95dkVLmps
https://www.youtube.com/watch?v=g0ZIFNgUmoE The average number of comments per user is 92
Figure 1: Distribution of the 16 personality types
in the YouTube corpus and in the Italian section of
(a) Extravert - Introvert
T WI S TY.
and each message has on average 115 tokens.
The amount of the 16 personality types in the
corpus is not uniform. Figure 1 shows such dis-
tribution and also compares it with the one in
T WI S TY. The unbalanced distribution can be
due to personality types not being uniformly dis-
tributed in the population, and to the fact that dif-
ferent personality types can make different choices
(b) Sensing - Intuitive
about their online presence. Goby (2006) for ex-
ample, observed that there is a significant correla-
tion between online–offline choices and the MBTI
dimension of E XTRAVERT-I NTROVERT: extro-
verts are more likely to opt for offline modes of
communication, while online communication is
presumably easier for introverts. In Figure 1, we
also see that the four most frequent types are intro-
verts in both datasets. The conclusion is that, de-
spite the different biases, collecting linguistic data
in this way has the advantage that it reflects ac- (c) Thinking - Feeling
tual language use and allows large-scale analysis
(Plank and Hovy, 2015).
Figure 2 shows more in detail, trait by trait,
the distribution of the opposite poles through the
users in Personal-ITY and in T WI S TY. As we
might have expected, in line with what is observed
in Figure 1, the two datasets present very similar
trends. Such similarities between Personal-ITY
and T WI S TY are these similarities are a further
confirmation of the reliability of the data we col-
(d) Judging - Perceiving
lected.
Figure 2: Comparison of the distributions of the
4 Preliminary Experiments four MBTI traits between Personal-ITY and the
We ran a series of preliminary experiments on Italian part of T WI S TY.
Personali-ITY which can also serve as a baseline
for future work on this dataset. We pre-processed
texts by replacing hashtags, urls, usernames and
emojis with four corresponding placeholders. We Trait MAJ Lex Sty Emb FL
adopted the sklearn (Pedregosa et al., 2011)
EI 40.55 51.85 40.46 40.55 51.65
implementation of a linear SVM (LinearSVM),
NS 44.34 51.92 44.34 44.34 49.04
with standard parameters. We tested three types
FT 35.01 50.67 36.27 35.01 50.86
of features. At the lexical level, we experimented
PJ 29.49 50.53 51.04 47.06 51.03
with word (1-2) and character (3-4) n-grams, both
as raw counts as well as tf-idf weighted. Charac- Avg 37.35 51.24 43.03 41.74 50.65
ter n-grams were tested also with a word-boundary
option. At a more stylistically level, we con- Table 4: Results of the experiments on Personal-
sidered the use of emojis, hashtags, pronouns, ITY. FL: prediction of the full MBTI label at once,
punctuation and capitalisation. Lastly, we also with a character n-gram model.
experimented with embeddings-based representa-
tions, by using, on the one hand, YouTube-specific micro F macro F
(Nieuwenhuis and Nissim, 2019) pre-trained mod- Trait MAJ Lex MAJ Lex Sty Emb
els, on the other hand, more generic embeddings,
EI 77.75 79.18 43.69 55.23 43.69 43.69
such as the Italian version of GloVe (Penning-
NS 85.92 85.92 46.15 46.15 46.15 46.15
ton et al., 2014), which is trained on the Italian
FT 53.67 55.31 34.79 52.98 35.34 34.70
Wikipedia4 . We looked for all the available em- PJ 53.06 54.08 34.56 53.01 35.20 34.90
beddings of the words written by each author, and
used the average as feature. If a word appeared Avg 67.6 68.62 39.80 51.84 40.09 39.86
more than once in the string of comments, we con-
Table 5: Results of our experiments on T WI S TY.
sidered it multiple times in the final average.
We used 10-fold cross-validation, and assessed
the models using macro f-score. Note that the two datasets on the performance of models for per-
original T WI S TY paper uses micro f-score. Thus, sonality detection, maintaining the 10-fold cross-
for the sake of comparison, we include also micro- validation setting and by using the model perform-
F in Table 5 for the MAJ baseline and our lexical ing better on average for YouTube and Twitter data
n-gram model. Table 4 shows the results of our (a character n-grams model). Table 6 contains the
experiments with different feature types.5 Over- result of such experiments6 . Scores are almost
all, lexical features (n-grams) perform best. Com- always lower compared to the in-domain experi-
bining different feature types did not lead to any ments (excepts for NS as regards Twitter scores
improvement. Classification was performed with reported in Table 5: 46.15 → 48.31), but quite in-
four separate binary classifiers (one per dimen- creased compared to the majority baseline.
sion), and with one single classifier predicting four
classes, i.e, the whole MBTI labels at once. In the Trait MAJ Lex
latter case, we observe that the results are quite
high considering the increased difficulty of the EI 41.64 50.57
task. Table 5 reports the scores of our models on NS 44.93 48.31
T WI S TY. As for Personal-ITY, best results were FT 35.04 51.31
achieved using lexical features (tf-idf n-grams); PJ 30.66 48.24
stylistic features and embeddings are just above Avg 38.07 49.61
the baseline. Our model outperforms the one in
(Verhoeven et al., 2016) for all traits (micro-F). Table 6: Merging Personal-ITY with T WI S TY.
To test compatibility of resources and to assess
model portability, we also ran cross-domain ex- In the second setting, instead, we divided both
periments on Personal-ITY and T WI S TY. In the corpora in fixed training and test sets with a pro-
first setting, we tested the effect of merging the portion of 80/20 and ran the models using lexi-
4
cal features, in order to run a cross-domain experi-
https://hlt.isti.cnr.it/
wordembeddings ment. For direct comparison, we run the model in-
5
In Tables 4–5, we report the highest scores based on av- domain again using this split. Results are shown
erages of the four traits. Considering the dimensions individ-
6
ually, better results can be obtained by using specific models. Prediction of the full label at once.
Train Personal-ITY T WI S TY References
Test IN CROSS IN CROSS Shlomo Argamon, Moshe Koppel, James W. Pen-
Pers MAJ T WI T WI MAJ Pers nebaker, and Jonathan Schler. 2009. Automatically
profiling the author of an anonymous text. Commun.
EI 58.94 44.94 49.33 55.66 44.59 44.59 ACM, 52(2):119–123, February.
NS 52.88 47.87 47.31 47.87 45.31 45.31
Elisa Bassignana, Malvina Nissim, and Viviana Patti.
FT 49.20 37.58 47.09 65.26 39.13 51.04
2020. Personal-ITY: a YouTube Comments Cor-
PJ 54.43 32.41 32.50 56.87 36.56 38.54 pus for Personality Profiling in Italian Social Me-
Avg 53.86 40.70 44.06 56.42 41.40 44.87 dia. In Viviana Patti, Malvina Nissim, and Barbara
Plank, editors, Proceedings of the Third Workshop
on Computational Modeling of People’s Opinions,
Table 7: Results of the cross-domain experiments. Personality, and Emotions in Social Media, (PEO-
MAJ = baseline on the cross-domain testset. PLES@COLING 2020). Association for Computa-
tional Linguistics.
Joan-Isaac Biel and Daniel Gatica-Perez. 2013. The
in Table 7. Cross-domain scores are obtained youtube lens: Crowdsourced personality impres-
with the best in-domain model.7 They drop sub- sions and audiovisual analysis of vlogs. Multimedia,
stantially compared to in-domain, but are always IEEE Transactions on, 15(1):41–55.
above the baseline. Fabio Celli, Fabio Pianesi, David Stillwell, and Michal
Kosinski. 2013. Workshop on computational per-
5 Conclusions sonality recognition: Shared task. In Seventh Inter-
national AAAI Conference on Weblogs and Social
The experiments show that there is no single best Media.
model for personality prediction, as the feature Chris Emmery, Grzegorz Chrupała, and Walter Daele-
contribution depends on the dimension consid- mans. 2017. Simple queries as distant labels for
ered, and on the dataset. Lexical features perform predicting gender on Twitter. In Proceedings of the
3rd Workshop on Noisy User-generated Text, pages
best, but they tend to be strictly related to the con-
50–55, Copenhagen, Denmark, September. Associ-
text in which the model is trained and so to overfit. ation for Computational Linguistics.
The inherent difficulty of the task itself is con-
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-
firmed and deserves further investigations, as as- ter sentiment classification using distant supervision.
signing a definite personality is an extremely sub- CS224N project report, Stanford, 1(12):2009.
jective and complex task, even for humans.
Valerie Goby. 2006. Personality and Online/Offline
Personal-ITY is made available to further in- Choices: MBTI Profiles and Favored Communica-
vestigate the above and other issues related tion Modes in a Singapore Study. Cyberpsychology
to personality detection in Italian. The cor- & behavior : the impact of the Internet, multimedia
pus can lend itself to a psychological analysis and virtual reality on behavior and society, 9:5–13,
03.
of the linguistic cues for the MBTI personal-
ity traits. On this line, it is interesting to in- Oliver P. John and Sanjay Srivastava. 1999. The big
vestigate the presence of evidences linking lin- five trait taxonomy: History, measurement, and the-
oretical perspectives. In L. A. Pervin and O. P. John,
guistic features with psychological theories about editors, Handbook of personality: Theory and re-
the four considered dimensions (E XTRAVERT- search, page 102–138. Guilford Press.
I NTROVERT, I N TUITIVE -S ENSING, F EELING -
Tatiana Litvinova, P. Seredin, Olga Litvinova, and Olga
T HINKING, P ERCEIVING -J UDGING). First re- Zagorovskaya. 2016. Profiling a set of personality
sults in this direction are presented in (Bassignana traits of text author: What our words reveal about us.
et al., 2020). Research in Language, 14, 12.
François Mairesse, Marilyn A. Walker, Matthias R.
Acknowledgments Mehl, and Roger K. Moore. 2007. Using linguis-
tic cues for the automatic recognition of personality
The work of Elisa Bassignana was partially car- in conversation and text. Journal of Artificial Intel-
ried out at the University of Groningen within the ligence Research, 30:457–500, sep.
framework of the Erasmus+ program 2019/20. Yash Mehta, Navonil Majumder, Alexander Gelbukh,
and Erik Cambria. 2019. Recent trends in deep
learning based personality detection. Artificial In-
7
Better results can be obtained with other specific models. telligence Review, pages 1–27.
I.B. Myers and P.B. Myers. 1995. Gifts Differing: Un- Mark Snyder. 1983. The influence of individuals on
derstanding Personality Type. Mobius. situations: Implications for understanding the links
between personality and social behavior. Journal of
Moniek Nieuwenhuis and Malvina Nissim. 2019. The personality, 51(3):497–516.
Contribution of Embeddings to Sentiment Analy-
sis on YouTube. In Proceedings of the Sixth Ital- Jacopo Staiano, Bruno Lepri, Nadav Aharony, Fabio
ian Conference on Computational Linguistics, Bari, Pianesi, Nicu Sebe, and Alex Pentland. 2012.
Italy, November 13-15, 2019, volume 2481 of CEUR Friends don’t lie - inferring personality traits from
Workshop Proceedings. CEUR-WS.org. social network structure. In UbiComp’12 - Proceed-
ings of the 2012 ACM Conference on Ubiquitous
Francisco M. Rangel Pardo, Fabio Celli, Paolo Rosso, Computing, pages 321–330, 09.
Martin Potthast, Benno Stein, and Walter Daele-
mans. 2015. Overview of the 3rd Author Profil- Ben Verhoeven, Walter Daelemans, and Barbara Plank.
ing Task at PAN 2015. In Working Notes of CLEF 2016. TwiSty: A multilingual Twitter stylome-
2015 - Conference and Labs of the Evaluation fo- try corpus for gender and personality profiling. In
rum, Toulouse, France, September 8-11, 2015, vol- Proceedings of the Tenth International Conference
ume 1391 of CEUR Workshop Proceedings. CEUR- on Language Resources and Evaluation (LREC’16),
WS.org. pages 1632–1637, Portorož, Slovenia, May. Euro-
pean Language Resources Association (ELRA).
Laura Parks and Russell P Guay. 2009. Personality,
values, and motivation. Personality and individual Alessandro Vinciarelli and Gelareh Mohammadi.
differences, 47(7):675–684. 2014. A survey of personality computing. IEEE
Transactions on Affective Computing, 5(3):273–291.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Pretten- Irving B. Weiner and Roger L. Greene, 2017. Ethi-
hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- cal Considerations In Personality Assessment, chap-
sos, D. Cournapeau, M. Brucher, M. Perrot, and ter 4, pages 59–74. Wiley.
E. Duchesnay. 2011. Scikit-learn: Machine learn-
ing in Python. Journal of Machine Learning Re- Susan Whelan and Gary Davies. 2006. Profiling con-
search, 12:2825–2830. sumers of own brands and national brands using hu-
man personality. Journal of Retailing and Consumer
James Pennebaker and Laura King. 2000. Linguis- Services, 13(6):393–402.
tic styles: Language use as an individual differ-
Wu Youyou, Michal Kosinski, and David Stillwell.
ence. Journal of personality and social psychology,
2015. Computer-based personality judgments are
77:1296–312, 01.
more accurate than those made by humans. Pro-
Jeffrey Pennington, Richard Socher, and Christo- ceedings of the National Academy of Sciences,
pher D. Manning. 2014. Glove: Global vectors for 112(4):1036–1040.
word representation. In Empirical Methods in Nat-
ural Language Processing (EMNLP), pages 1532–
1543.
Barbara Plank and Dirk Hovy. 2015. Personality traits
on Twitter—or—How to get 1,500 personality tests
in a week. In Proceedings of the 6th Workshop
on Computational Approaches to Subjectivity, Senti-
ment and Social Media Analysis, pages 92–98, Lis-
boa, Portugal, September. Association for Computa-
tional Linguistics.
Chris Pool and Malvina Nissim. 2016. Distant su-
pervision for emotion detection using Facebook re-
actions. In Proceedings of the Workshop on Com-
putational Modeling of People’s Opinions, Person-
ality, and Emotions in Social Media (PEOPLES),
pages 30–39, Osaka, Japan, December. The COL-
ING 2016 Organizing Committee.
H. Andrew Schwartz, Johannes C. Eichstaedt, Mar-
garet L. Kern, Lukasz Dziurzynski, Stephanie M.
Ramones, Megha Agrawal, Achal Shah, Michal
Kosinski, David Stillwell, Martin E.P. Seligman,
et al. 2013. Personality, gender, and age in the
language of social media: The open-vocabulary ap-
proach. PloS one, 8(9):e73791.