Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian Elisa Bassignana Malvina Nissim Viviana Patti Dipartimento di Informatica CLCG Dipartimento di Informatica University of Turin University of Groningen University of Turin elisa.bassignana@edu.unito.it m.nissim@rug.nl viviana.patti@unito.it Abstract second one, instead, considers 16 fixed personal- ity types, coming from the combination of the op- We present a novel corpus for person- posite poles of 4 main dimensions (E XTRAVERT- ality prediction in Italian, containing a I NTROVERT, I N TUITIVE -S ENSING, F EELING - larger number of authors and a different T HINKING, P ERCEIVING -J UDGING). Examples genre compared to previously available of full personality types are therefore four letter resources. The corpus is built exploit- labels such as ENTJ or ISFP. ing Distant Supervision, assigning Myers- The tests used to detect prevalence of traits in- Briggs Type Indicator (MBTI) labels to clude human judgements regarding semantic sim- YouTube comments, and can lend itself to ilarity and relations between adjectives that peo- a variety of experiments. We report on ple use to describe themselves and others. This preliminary experiments on Personal-ITY, is because language is believed to be a prime car- which can serve as a baseline for future rier of personality traits (Schwartz et al., 2013). work, showing that some types are easier This aspect, together with the progressive increase to predict than others, and discussing the of available user-generated data on social media, perks of cross-dataset prediction. has prompted the task of Personality Detection, i.e., the automatic prediction of personality from 1 Introduction written texts (Youyou et al., 2015; Argamon et al., 2009; Litvinova et al., 2016; Whelan and Davies, When faced with the same situation, different hu- 2006). mans behave differently. This is, of course, due Personality detection can be useful in predicting to different backgrounds, education paths, and life life outcomes such as substance use, political atti- experiences, but according to psychologists there tudes and physical health. Other fields of applica- is another important aspect: personality (Snyder, tion are marketing, politics and psychological and 1983; Parks and Guay, 2009). social assessment. Human Personality is a psychological construct As a contribution to personality detection in aimed at explaining the wide variety of human be- Italian, we present Personal-ITY, a new corpus of haviours in terms of a few, stable and measurable YouTube comments annotated with MBTI person- individual characteristics (Vinciarelli and Moham- ality traits, and some preliminary experiments to madi, 2014). highlight its characteristics and test its potential. Such characteristics are formalised in Trait The corpus is made available to the community1 . Models, and there are currently two of these mod- els that are widely adopted: Big Five (John and Srivastava, 1999) and Myers-Briggs Type Indica- 2 Related Work tor (MBTI) (Myers and Myers, 1995). The first There exist a few datasets annotated for personal- examines five dimensions (O PENNESS TO EX - ity traits. For the shared tasks organised within the PERIENCE, C ONSCIENTIOUSNESS , E XTROVER - Workshop on Computational Personality Recog- SION , AGREEABLENESS and N EUROTICISM ) and nition (Celli et al., 2013), two datasets annotated for each of them assigns a score in a range. The with the Big Five traits have been released in 2013 Copyright © 2020 for this paper by its authors. Use per- 1 mitted under Creative Commons License Attribution 4.0 In- https://github.com/elisabassignana/ ternational (CC BY 4.0). Personal-ITY Corpus Model # user Avg. the corpus creators themselves using a Lin- earSVM with word (1-2) and character (3-4) n- PAN2015 Big Five 38 1258 grams. Their results (reported in Table 2 for the T WI S TY MBTI 490 21.343 Italian portion of the dataset) are obtained through Personal-ITY MBTI 1048 10.585 10-fold cross-validation; the model is compared to a weighted random baseline (WRB) and a major- Table 1: Summary of Italian corpora with person- ity baseline (MAJ). ality labels. Avg.: average tokens per user. Trait WRB MAJ f-score (Essays (Pennebaker and King, 2000) and myPer- EI 65.54 77.88 77.78 sonality2 ) and two in 2014 (YouTube Personality NS 75.60 85.78 79.21 Dataset (Biel and Gatica-Perez, 2013) and Mobile FT 50.31 53.95 52.13 Phones interactions (Staiano et al., 2012)). PJ 50.19 53.05 47.01 For the 2015 PAN Author Profiling Shared Task Avg 60.41 67.67 64.06 (Pardo et al., 2015), personality was added to gen- der and age in the profiling task, with tweets in En- glish, Spanish, Italian and Dutch. These are also Table 2: T WI S TY scores from the original paper. annotated according to the Big Five model. Note that all results are reported as micro-average F-score. Still in the Big Five landscape, Schwartz et al. (2013) collected a dataset of FaceBook comments (700 millions words) written by 136.000 users 3 Personal-ITY who shared their status updates. Interesting cor- relations were observed between word usage and First, we explain two major choices that we made personality traits. in creating Personal-ITY, namely the source of the If looking at data labelled with the MBTI traits, data and the trait model. Second, we describe in we find a corpus of 1.2M English tweets annotated detail the procedure we followed to construct the with personality and gender (Plank and Hovy, corpus. Lastly, we provide a description of the re- 2015), and the multilingual T WI S TY (Verhoeven sulting dataset. et al., 2016). The latter is a corpus of data col- Data YouTube is the source of data for our cor- lected from Twitter annotated with MBTI person- pus. The decision is grounded on the fact that ality labels and gender for six languages (Dutch, compared to the more commonly collected tweets, German, French, Italian, Portuguese and Spanish) YouTube comments can be longer, so that users and a total of 18,168 authors. We are interested in are freer to express themselves without limita- the Italian portion of T WI S TY. tions. Additionally, there is a substantial amount Table 1 contains an overview of the available of available data on the YouTube platform, which Italian corpora labelled with personality traits. We is easy to access thanks to the free YouTube APIs. include our own, which is described in Section 3. Regarding detection approaches, Mairesse et al. Trait Model Our model of choice is the MBTI. (2007) tested the usefulness of different sets of The first benefit of this decision is that this model textual features making use of mostly SVMs. is easy to use in association with a Distant Super- At the PAN 2015 challenge (see above) a va- vision approach (just checking if a message con- riety of algorithms were tested (such as Random tains one of the 16 personality types; see Sec- Forests, decision trees, logistic regression for clas- tion 3.1). Another benefit is related to the ex- sification, and also various regression models), but istence of T WI S TY. Since both T WI S TY and overall most successful participants used SVMs. Personal-ITY implement the MBTI model, analy- Regarding features, participants approached the ses and experiments over personality detection can task with combinations of style-based and content- be carried out also in a cross-domain setting. based features, as well as their combination in n- Ethics Statement gram models (Pardo et al., 2015). Personality profiling must be carefully evaluated Experiments on T WI S TY were performed by from an ethical point of view. In particular, of- 2 http://mypersonality.org ten, personality detection involves ethical dilem- mas regarding appropriate utilization and interpre- Comment User - MBTI label tations of the prediction outcomes (Weiner and Io sono ENFJ!!! User1 - ENFJ Greene, 2017). Concerns have been raised regard- ing the inappropriate use of these tests with respect Ho sempre saputo di User2 - ISFP to invasion of privacy, cultural bias and confiden- essere connessa con tiality (Mehta et al., 2019). Lady Gaga! ISFP! The data included in the Personal-ITY dataset were publicly available on the YouTube platform Table 3: Examples of automatic associations user at the time of the collection. As we will explain in - MBTI personality type. detail in this Section, the information collected are comments published under public videos on the was associated to a user if they included an MBTI YouTube platform by authors themselves. For a combination in one of their comments. Table 3 major protection of user identities, in the released shows some examples of such associations. The corpus only the YouTube usernames of the authors association process is an approximation typical of are mentioned which are not unique identifiers. DS approaches. To assess its validity, we manually The YouTube IDs of the corresponding channels, checked 300 random comments to see whether the which are the real identifiers in the platform, al- mention of an MBTI label was indeed referred to lowing to trace the identity of the authors, are not the author’s own personality. We found that in 19 released. Note also that the corpus was created for cases (6.3%) our method led to a wrong or unsure academic purposes and is not intended to be used classification of the user’s personality (e.g. O tutti for commercial deployment or applications. gli INTJ del mondo stanno commentando questo 3.1 Corpus Creation video oppure le statistiche sono sbagliate :-)). We can assume that our dataset might therefore con- The fact that users often self-disclose information tain about 6-7% of noisy labels. about themselves on social media makes it possi- Using the acquired list of authors, we meant to ble to adopt Distant Supervision (DS) for the ac- obtain as many comments as possible written by quisition of training data. DS is a semi-supervised them. The YouTube API, however, does not al- method that has been abundantly and successfully low to retrieve all comments by one user on the used in affective computing and profiling to assign platform. In order to get around this problem we silver labels to data on the basis of indicative prox- relied on video similarities, and tried to expand as ies (Go et al., 2009; Pool and Nissim, 2016; Em- much as possible our video collection. Therefore, mery et al., 2017). as a third step, we retrieved the list of channels Users left comments to some videos on the that feature our initial 10 videos, and then all of MBTI theory in which they were stating their own the videos within those channels. personality type (e.g. Sono ENTJ...chi altro? [en: Fourth, through a second AJAX request, we ”I’m ENTJ...anyone else?”]). We exploited such downloaded all comments appearing below all comments to create Personal-ITY with the follow- videos retrieved through the previous step. ing procedure. Lastly, we filtered all comments retaining those First, we searched for as many Italian YouTube written by authors included in our original list. videos about MBTI as possible, ending up with This does not obviously cover all comments by a selection of ten with a conspicuous number of a relevant user, but it provided us with additional comments as the ones above3 . data per author. Second, we retrieved all the comments to these videos using an AJAX request, and built a list of 3.2 Final Corpus Statistics authors and their associated MBTI label. A label For the final dataset, we decided to keep only the 3 Links to the 10 YouTube videos: authors with a sufficient amount of data. More https://www.youtube.com/watch?v=VCo9RlDRpz0 https://www.youtube.com/watch?v=N4kC8iqUNyk specifically, we retained only users with at least https://www.youtube.com/watch?v=Z8S8PgW8t2U https://www.youtube.com/watch?v=wHZOG8k7nSw five comments, each at least five token long. https://www.youtube.com/watch?v=lO2z3_DINqs https://www.youtube.com/watch?v=NaKPl_y1JXg Personal-ITY includes 96, 815 comments by https://www.youtube.com/watch?v=8l4o4VBXlGY https://www.youtube.com/watch?v=GK5J6PLj218 1048 users, each annotated with an MBTI label. https://www.youtube.com/watch?v=9P95dkVLmps https://www.youtube.com/watch?v=g0ZIFNgUmoE The average number of comments per user is 92 Figure 1: Distribution of the 16 personality types in the YouTube corpus and in the Italian section of (a) Extravert - Introvert T WI S TY. and each message has on average 115 tokens. The amount of the 16 personality types in the corpus is not uniform. Figure 1 shows such dis- tribution and also compares it with the one in T WI S TY. The unbalanced distribution can be due to personality types not being uniformly dis- tributed in the population, and to the fact that dif- ferent personality types can make different choices (b) Sensing - Intuitive about their online presence. Goby (2006) for ex- ample, observed that there is a significant correla- tion between online–offline choices and the MBTI dimension of E XTRAVERT-I NTROVERT: extro- verts are more likely to opt for offline modes of communication, while online communication is presumably easier for introverts. In Figure 1, we also see that the four most frequent types are intro- verts in both datasets. The conclusion is that, de- spite the different biases, collecting linguistic data in this way has the advantage that it reflects ac- (c) Thinking - Feeling tual language use and allows large-scale analysis (Plank and Hovy, 2015). Figure 2 shows more in detail, trait by trait, the distribution of the opposite poles through the users in Personal-ITY and in T WI S TY. As we might have expected, in line with what is observed in Figure 1, the two datasets present very similar trends. Such similarities between Personal-ITY and T WI S TY are these similarities are a further confirmation of the reliability of the data we col- (d) Judging - Perceiving lected. Figure 2: Comparison of the distributions of the 4 Preliminary Experiments four MBTI traits between Personal-ITY and the We ran a series of preliminary experiments on Italian part of T WI S TY. Personali-ITY which can also serve as a baseline for future work on this dataset. We pre-processed texts by replacing hashtags, urls, usernames and emojis with four corresponding placeholders. We Trait MAJ Lex Sty Emb FL adopted the sklearn (Pedregosa et al., 2011) EI 40.55 51.85 40.46 40.55 51.65 implementation of a linear SVM (LinearSVM), NS 44.34 51.92 44.34 44.34 49.04 with standard parameters. We tested three types FT 35.01 50.67 36.27 35.01 50.86 of features. At the lexical level, we experimented PJ 29.49 50.53 51.04 47.06 51.03 with word (1-2) and character (3-4) n-grams, both as raw counts as well as tf-idf weighted. Charac- Avg 37.35 51.24 43.03 41.74 50.65 ter n-grams were tested also with a word-boundary option. At a more stylistically level, we con- Table 4: Results of the experiments on Personal- sidered the use of emojis, hashtags, pronouns, ITY. FL: prediction of the full MBTI label at once, punctuation and capitalisation. Lastly, we also with a character n-gram model. experimented with embeddings-based representa- tions, by using, on the one hand, YouTube-specific micro F macro F (Nieuwenhuis and Nissim, 2019) pre-trained mod- Trait MAJ Lex MAJ Lex Sty Emb els, on the other hand, more generic embeddings, EI 77.75 79.18 43.69 55.23 43.69 43.69 such as the Italian version of GloVe (Penning- NS 85.92 85.92 46.15 46.15 46.15 46.15 ton et al., 2014), which is trained on the Italian FT 53.67 55.31 34.79 52.98 35.34 34.70 Wikipedia4 . We looked for all the available em- PJ 53.06 54.08 34.56 53.01 35.20 34.90 beddings of the words written by each author, and used the average as feature. If a word appeared Avg 67.6 68.62 39.80 51.84 40.09 39.86 more than once in the string of comments, we con- Table 5: Results of our experiments on T WI S TY. sidered it multiple times in the final average. We used 10-fold cross-validation, and assessed the models using macro f-score. Note that the two datasets on the performance of models for per- original T WI S TY paper uses micro f-score. Thus, sonality detection, maintaining the 10-fold cross- for the sake of comparison, we include also micro- validation setting and by using the model perform- F in Table 5 for the MAJ baseline and our lexical ing better on average for YouTube and Twitter data n-gram model. Table 4 shows the results of our (a character n-grams model). Table 6 contains the experiments with different feature types.5 Over- result of such experiments6 . Scores are almost all, lexical features (n-grams) perform best. Com- always lower compared to the in-domain experi- bining different feature types did not lead to any ments (excepts for NS as regards Twitter scores improvement. Classification was performed with reported in Table 5: 46.15 → 48.31), but quite in- four separate binary classifiers (one per dimen- creased compared to the majority baseline. sion), and with one single classifier predicting four classes, i.e, the whole MBTI labels at once. In the Trait MAJ Lex latter case, we observe that the results are quite high considering the increased difficulty of the EI 41.64 50.57 task. Table 5 reports the scores of our models on NS 44.93 48.31 T WI S TY. As for Personal-ITY, best results were FT 35.04 51.31 achieved using lexical features (tf-idf n-grams); PJ 30.66 48.24 stylistic features and embeddings are just above Avg 38.07 49.61 the baseline. Our model outperforms the one in (Verhoeven et al., 2016) for all traits (micro-F). Table 6: Merging Personal-ITY with T WI S TY. To test compatibility of resources and to assess model portability, we also ran cross-domain ex- In the second setting, instead, we divided both periments on Personal-ITY and T WI S TY. In the corpora in fixed training and test sets with a pro- first setting, we tested the effect of merging the portion of 80/20 and ran the models using lexi- 4 cal features, in order to run a cross-domain experi- https://hlt.isti.cnr.it/ wordembeddings ment. For direct comparison, we run the model in- 5 In Tables 4–5, we report the highest scores based on av- domain again using this split. Results are shown erages of the four traits. Considering the dimensions individ- 6 ually, better results can be obtained by using specific models. Prediction of the full label at once. Train Personal-ITY T WI S TY References Test IN CROSS IN CROSS Shlomo Argamon, Moshe Koppel, James W. Pen- Pers MAJ T WI T WI MAJ Pers nebaker, and Jonathan Schler. 2009. Automatically profiling the author of an anonymous text. Commun. EI 58.94 44.94 49.33 55.66 44.59 44.59 ACM, 52(2):119–123, February. NS 52.88 47.87 47.31 47.87 45.31 45.31 Elisa Bassignana, Malvina Nissim, and Viviana Patti. FT 49.20 37.58 47.09 65.26 39.13 51.04 2020. Personal-ITY: a YouTube Comments Cor- PJ 54.43 32.41 32.50 56.87 36.56 38.54 pus for Personality Profiling in Italian Social Me- Avg 53.86 40.70 44.06 56.42 41.40 44.87 dia. In Viviana Patti, Malvina Nissim, and Barbara Plank, editors, Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Table 7: Results of the cross-domain experiments. Personality, and Emotions in Social Media, (PEO- MAJ = baseline on the cross-domain testset. PLES@COLING 2020). Association for Computa- tional Linguistics. Joan-Isaac Biel and Daniel Gatica-Perez. 2013. The in Table 7. Cross-domain scores are obtained youtube lens: Crowdsourced personality impres- with the best in-domain model.7 They drop sub- sions and audiovisual analysis of vlogs. Multimedia, stantially compared to in-domain, but are always IEEE Transactions on, 15(1):41–55. above the baseline. Fabio Celli, Fabio Pianesi, David Stillwell, and Michal Kosinski. 2013. Workshop on computational per- 5 Conclusions sonality recognition: Shared task. In Seventh Inter- national AAAI Conference on Weblogs and Social The experiments show that there is no single best Media. model for personality prediction, as the feature Chris Emmery, Grzegorz Chrupała, and Walter Daele- contribution depends on the dimension consid- mans. 2017. Simple queries as distant labels for ered, and on the dataset. Lexical features perform predicting gender on Twitter. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages best, but they tend to be strictly related to the con- 50–55, Copenhagen, Denmark, September. Associ- text in which the model is trained and so to overfit. ation for Computational Linguistics. The inherent difficulty of the task itself is con- Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit- firmed and deserves further investigations, as as- ter sentiment classification using distant supervision. signing a definite personality is an extremely sub- CS224N project report, Stanford, 1(12):2009. jective and complex task, even for humans. Valerie Goby. 2006. Personality and Online/Offline Personal-ITY is made available to further in- Choices: MBTI Profiles and Favored Communica- vestigate the above and other issues related tion Modes in a Singapore Study. Cyberpsychology to personality detection in Italian. The cor- & behavior : the impact of the Internet, multimedia pus can lend itself to a psychological analysis and virtual reality on behavior and society, 9:5–13, 03. of the linguistic cues for the MBTI personal- ity traits. On this line, it is interesting to in- Oliver P. John and Sanjay Srivastava. 1999. The big vestigate the presence of evidences linking lin- five trait taxonomy: History, measurement, and the- oretical perspectives. In L. A. Pervin and O. P. John, guistic features with psychological theories about editors, Handbook of personality: Theory and re- the four considered dimensions (E XTRAVERT- search, page 102–138. Guilford Press. I NTROVERT, I N TUITIVE -S ENSING, F EELING - Tatiana Litvinova, P. Seredin, Olga Litvinova, and Olga T HINKING, P ERCEIVING -J UDGING). First re- Zagorovskaya. 2016. Profiling a set of personality sults in this direction are presented in (Bassignana traits of text author: What our words reveal about us. et al., 2020). Research in Language, 14, 12. François Mairesse, Marilyn A. Walker, Matthias R. Acknowledgments Mehl, and Roger K. Moore. 2007. Using linguis- tic cues for the automatic recognition of personality The work of Elisa Bassignana was partially car- in conversation and text. Journal of Artificial Intel- ried out at the University of Groningen within the ligence Research, 30:457–500, sep. framework of the Erasmus+ program 2019/20. Yash Mehta, Navonil Majumder, Alexander Gelbukh, and Erik Cambria. 2019. Recent trends in deep learning based personality detection. Artificial In- 7 Better results can be obtained with other specific models. telligence Review, pages 1–27. I.B. Myers and P.B. Myers. 1995. Gifts Differing: Un- Mark Snyder. 1983. The influence of individuals on derstanding Personality Type. Mobius. situations: Implications for understanding the links between personality and social behavior. Journal of Moniek Nieuwenhuis and Malvina Nissim. 2019. The personality, 51(3):497–516. Contribution of Embeddings to Sentiment Analy- sis on YouTube. In Proceedings of the Sixth Ital- Jacopo Staiano, Bruno Lepri, Nadav Aharony, Fabio ian Conference on Computational Linguistics, Bari, Pianesi, Nicu Sebe, and Alex Pentland. 2012. Italy, November 13-15, 2019, volume 2481 of CEUR Friends don’t lie - inferring personality traits from Workshop Proceedings. CEUR-WS.org. social network structure. In UbiComp’12 - Proceed- ings of the 2012 ACM Conference on Ubiquitous Francisco M. Rangel Pardo, Fabio Celli, Paolo Rosso, Computing, pages 321–330, 09. Martin Potthast, Benno Stein, and Walter Daele- mans. 2015. Overview of the 3rd Author Profil- Ben Verhoeven, Walter Daelemans, and Barbara Plank. ing Task at PAN 2015. In Working Notes of CLEF 2016. TwiSty: A multilingual Twitter stylome- 2015 - Conference and Labs of the Evaluation fo- try corpus for gender and personality profiling. In rum, Toulouse, France, September 8-11, 2015, vol- Proceedings of the Tenth International Conference ume 1391 of CEUR Workshop Proceedings. CEUR- on Language Resources and Evaluation (LREC’16), WS.org. pages 1632–1637, Portorož, Slovenia, May. Euro- pean Language Resources Association (ELRA). Laura Parks and Russell P Guay. 2009. Personality, values, and motivation. Personality and individual Alessandro Vinciarelli and Gelareh Mohammadi. differences, 47(7):675–684. 2014. A survey of personality computing. IEEE Transactions on Affective Computing, 5(3):273–291. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pretten- Irving B. Weiner and Roger L. Greene, 2017. Ethi- hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- cal Considerations In Personality Assessment, chap- sos, D. Cournapeau, M. Brucher, M. Perrot, and ter 4, pages 59–74. Wiley. E. Duchesnay. 2011. Scikit-learn: Machine learn- ing in Python. Journal of Machine Learning Re- Susan Whelan and Gary Davies. 2006. Profiling con- search, 12:2825–2830. sumers of own brands and national brands using hu- man personality. Journal of Retailing and Consumer James Pennebaker and Laura King. 2000. Linguis- Services, 13(6):393–402. tic styles: Language use as an individual differ- Wu Youyou, Michal Kosinski, and David Stillwell. ence. Journal of personality and social psychology, 2015. Computer-based personality judgments are 77:1296–312, 01. more accurate than those made by humans. Pro- Jeffrey Pennington, Richard Socher, and Christo- ceedings of the National Academy of Sciences, pher D. Manning. 2014. Glove: Global vectors for 112(4):1036–1040. word representation. In Empirical Methods in Nat- ural Language Processing (EMNLP), pages 1532– 1543. Barbara Plank and Dirk Hovy. 2015. Personality traits on Twitter—or—How to get 1,500 personality tests in a week. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Senti- ment and Social Media Analysis, pages 92–98, Lis- boa, Portugal, September. Association for Computa- tional Linguistics. Chris Pool and Malvina Nissim. 2016. Distant su- pervision for emotion detection using Facebook re- actions. In Proceedings of the Workshop on Com- putational Modeling of People’s Opinions, Person- ality, and Emotions in Social Media (PEOPLES), pages 30–39, Osaka, Japan, December. The COL- ING 2016 Organizing Committee. H. Andrew Schwartz, Johannes C. Eichstaedt, Mar- garet L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E.P. Seligman, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary ap- proach. PloS one, 8(9):e73791.