Subjective Well-Being and Social Media: A Semantically Annotated Twitter Corpus on Fertility and Parenthood Emilio Sulis, Cristina Bosco, Viviana Patti Mirko Lai Dipartimento di Informatica Delia Irazú Hernández Farı́as University of Turin University of Turin, Italy Italy Univ. Politècnica de València, Spain {bosco,patti,sulis@di.unito.it} {hernande,lai}@unito.it Letizia Mencarini Michele Mozzachiodi Dondena Centre for Research on Daniele Vignoli Social Dynamics and Public Policy University of Florence, Italy Bocconi University, Italy {mozzachiodi,vignoli}@disia.unifi.it letizia.mencarini@unibocconi.it Abstract are the target of sentiment in the fertility-SWB do- main. The relationship between big data and of- English. This article describes a Twit- ficial statistics is increasingly a subject of atten- ter corpus of social media contents in the tion (Mitchell et al., 2013; Reimsbach-Kounatze, Subjective Well-Being domain. A multi- 2015; Sulis et al., 2015; Zagheni and Weber, layered manual annotation for exploring 2005). In this work we focus on Twitter data attitudes on fertility and parenthood has for two main reasons. First, Twitter individu- been applied. The corpus was further als’ opinions are posted spontaneously (not re- analysed by using sentiment and emo- sponding to a question) and often as a reaction tion lexicons in order to highlight rela- to some emotional driven observation. Moreover, tionships between the use of affective lan- using Twitter we can incorporate additional mea- guage and specific sub-topics in the do- sures of attitudes towards children and parent- main. This analysis is useful to identify hood, with a wider geographical coverage than features for the development of an auto- what is the case for traditional survey. Sentiment matic tool for sentiment-related classifica- analysis in Twitter has been also used to mon- tion tasks in this domain. The gold stan- itor political sentiment (Tumasjan et al., 2010), dard is available to the community. to extract critical information during times of mass emergency (Verma et al., 2011; Buscaldi Italiano. L’articolo descrive la creazione and Hernández Farı́as, 2015), or to analyse user di un corpus tratto da Twitter sui temi del stance in political debates on controversial topics Subjective Well-Being, fertilità e genitori- (Stranisci et al., 2016; Bosco et al., 2016; Moham- alità. Un’analisi lessicale ha mostrato il mad et al., 2015). A comprehensive overview of legame tra l’uso di linguaggio affettivo e sentiment analysis with annotated corpora is of- specifiche categorie di messaggi. Questo fered in (Nissim and Patti, 2016). Focusing on esame è utile per se e per l’addestramento Italian, among the existing resources we mention di sistemi di classificazione automatica sul the Senti-TUT corpus (Bosco et al., 2013) and dominio. Il gold standard è disponibile su the TWITA corpus (Basile and Nissim, 2013) that richiesta. were recently exploited in the SENTIment PO- Larity Classification (SENTIPOLC) shared task (Basile et al., 2014). The corpus described in this 1 Introduction paper enriches the scenario of datasets available The key research questions we address in this pa- for Italian, enabling also a finer grained analysis per concern how subjective well-being drives fer- of sentiment related phenomena in a novel domain tility trends (and vice versa). We developed a related to parenthood and fertility. Twitter Italian corpus annotated with a novel se- 2 Dataset and Methodology mantic annotation scheme for marking informa- tion not only about sentiment polarity, but also As a reference dataset, we adopted all the tweets about the specific semantic areas/sub-topics which posted in Italian language in 2014 (TWITA14 henceforth), which were retrieved through the 2.2 Annotation scheme Twitter Streaming API and applying the Italian Given the TW-SWELLFER dataset, we developed filter proposed within the TWITA project (Basile and applied an annotation model aimed at studying and Nissim, 2013). The TWITA14 dataset in- not only the sentiment expressed in the tweets, but cluded 259,893,081 tweets (4,766,342 geotagged). also specific parenthood-related topics discussed We applied a multi-step methodology in order to in Twitter that are the target of the sentiment. filter and select those relevant tweets concerning To build our annotation model, we relied on a fertility and parenthood. standard annotation scheme on sentiment polar- ity (POLARITY), by exploiting the same labels POS, NEG, NONE and MIXED provided the or- 2.1 Filtering steps on the dataset ganizers of the shared task for sentiment analysis in Twitter for Italian (Basile et al., 2014). Also A number of filtering steps have been applied the presence/absence of irony has been marked in for selecting from TWITA14 a corpus of tweets order to be able to reason on sentiment polarity where users talk about fertility and parenthood also in case of use of figurative devices. Annotat- (TW-SWELLFER corpus, henceforth). We could ing the presence of ironic devices is a challenging not rely on the exploitation of one or few hash- task because the inferring process of this figure of tags or other elements that allow identifying posts speech does not always lie on semantic and syn- on fertility and parenthood. In fact, these top- tactic elements of texts (Ghosh et al., 2015; Reyes ics are somehow spread in the dataset and mes- et al., 2013; Hernández Farı́as et al., 2016), but of- sages may contain relevant information on such ten requires contextual knowledge (Wilson, 2006). subjects even if the main topic of the post is dif- In order to mark irony, we introduced two polar- ferent. Therefore, we are facing a situation where, ized ironic labels: HUMNEG, for ironic tweets on the one hand, the set of the data that are po- with negative polarity, and HUMPOS for ironic tentially relevant for our specific analysis is wider tweets with positive polarity. Finally, a set of la- than usual; on the other hand, it is more diffi- bels marks the specific semantic areas (or SUB- cult to identify the presence of information re- TOPICS) of the tweets related to the parenthood lated to the topics we are interested in. This domain. This part of the annotation scheme is very leaded us to adopt a multi-step thematic filtering important since somehow provides us with a se- approach. In a first step (Keyword-based filter- mantic grid in order to analyse which are the as- ing step), eleven hashtags1 and other 19 keywords pects of parenthood that are discussed on Twitter. have been chosen for selecting tweets of interest, For the annotation of sub-topics we considered 7 including 8 roots to consider diminutives, singu- labels, suggested by a group three experts on the lars and plurals. This list is the result of a com- SWELLFER (subjective well-being and fertility) bination of a manual content analysis on 2,500 domain, after a manual analysis of a subset of the tweets sampled at completely random (taken as tweets: a starting point) and a linguistic analysis on syn- onyms. We obtain a total amount of 3.9 mil- • TOBEPA - To be parents. This tag is in- lion tweets. A second filtering step consisted in troduced to mark when the user generically removing noisy tweets from corpus (User-based comments about his status of parent. filtering step), as the off-topic ones (messages not concerning individual expression on fertility • TOBESO - To be sons. This tag marks the and parenthood topics). Tweets posted by com- sons point of view, i.e. on when the user is pany/institutions/newspapers accounts have been a son that comments on the parent-son rela- deleted. Finally, duplicated tweets not marked as tionship. RT were deleted (Duplicate-based filtering step). • DAILYLIFE - Daily life. This tag marks mes- The resulting TW-SWELLFER corpus consists of sages commenting on recurring situation in 2,760,416 tweets. everyday life for what concerns the relation- ship between parents and children. 1 #papà, #mamma, #babbo, #incinta, #primofiglio, #sec- ondofiglio, #futuremamme, #maternità, #paternità, #allatta- • JUDGOTHERPA - Judgment over other par- mento, #gravidanza ents behaviour. The tag allows to mark comments on educations of children, for in- select the true label we used majority voting. stance comments of behaviours which does In-topic vs off-topic: manual annotation on this not seems to be appropri - ated for the parent aspect resulted in 2,355 in-topic tweets (42.3%) role. and 3,136 off-topic (56.3%); the remaining 75 tweets were unknown or null (cases of disagree- • FUTURE - Children’ future. This tag is used ment). Thanks to the preliminary filtering steps, for tweets where parents do express senti- the proportion of in-topic tweets is pretty high ments about the future of children. compared to common results from different Twit- • BECOMPA - To become parents. This tag is ter based content and opinion analysis (Ceron et introduced to mark tweets where users speak al., 2014). about the prospect or fear of being parents. Polarity, irony, sub-topics (in-topic tweets): at the end of the manual annotation process we col- • POL - Political side. This tag is introduced to lected 1,545 labeled with the same tags for all the mark tweets talking about laws having impact layers. on being parents. Notice that in the analysis in the next section will report results also on tweets labeled as IN-TOPIC Finally, two additional tags (IN-TOPIC/OFF- after the manual annotation (2,355), but where an- TOPIC) have been added to allow annotators to notators did not agree on the polarity, irony and mark if the tweet is relevant. The addition of this subtopics labels. We refer to those tweets as tag was necessary because of the noise still present NULL messages. in the dataset. Moreover, in this way, the manual Summarizing, the TWSWELLFER-GOLD cor- annotation will produce also data to be used in or- pus includes 1,545 IN-TOPIC tweets labeled with der to create a supervised topic classifier from the the same tags for all the layers (POLARITY, whole TW-SWELLFER corpus. This opens the IRONY and SUBTOPICS). way to the exploitation of the corpus for a fine- grained sentiment analysis, by identifying differ- 3 Analysis of the gold standard ent aspects and topics of the Twitter debate on par- enthood and the sentiment expressed towards each aspect/topic. 2.3 Manual annotation A random sample of 5,566 tweets from TW- SWELLFER has been collected. On this sample we applied crowdsourcing for manual annotation via the Crowdflower platform already used in lit- erature (Nakov et al., 2016). We relied on Crowd- Flower controls to exclude unreliable annotators and spammers based on hidden tests, which we created by developing a set of gold-standard test Figure 1: TW-SWELLFER: Distribution of polar- questions equipped with gold reasons2 . The anno- ity tags in IN-TOPIC messages tator’s task was, first, to mark if the post is IN- or OFF-TOPIC (or unintelligible), and then to mark Regarding IN-TOPIC tweets, the 26.4% has for IN-TOPIC posts, on the one hand, the polarity been labeled as positive and 22.3% as negative and presence of irony, on the other hand, the sub- (See Fig.1), giving us a guidance on what might topics. Precise guidelines were provided to the an- be the general feeling in Twitter about the re- notators. Overall, for each tweet at least three in- search topics on happiness and parenthood. The dependent annotations were provided3 . In order to irony issue is limited to a 15.7% of all the mes- 2 Test questions resulted from the agreement of three ex- sages and negative irony prevails (10.1% of neg- pert annotators. ative ironic tweets and 5.6% of positive ironic 3 We selected the CrowdFlower’s dynamic judgment op- tweets), while neutral tweets are just the 8.3%. tion: having the goal of collecting at least 3 reliable annota- tions for each tweet, the system was collecting up to a max- fidence score is low). In our jobs we set 0.7 as minimum imum of 5 annotations (to deal with cases when row’s con- accuracy threshold. The amount of mixed tweets is limited to 1.2% sentiment lexical resources4 the whole polarity of (remaining 26% are labelled as NULL because messages is computed summing positive and neg- of annotators disagreement). Regarding these re- ative terms. A normalization is finally performed, sults, it appears that positive and negative feel- i.e. dividing the polarity value by the number of ings towards family, parenthood and fertility ap- terms in each group. In particular, the four lexica pear more or less equally spread through Twitter considered count more positive terms in positive Italy. Even if the positive posts are a little bit messages. Similarly, negative terms are more fre- more than the negative ones, ironic tweets must quent in negative messages. Ironic messages re- be considered: most of them are negative ironic veal a similar pattern, even if smoothed. Table 1 posts (i.e., insulting/damaging the target) balanc- presents some of these results. ing the slight difference between pure positive and In addition, the emotion lexicon indicates a negative tweets. Furthermore, this particular topic, larger frequency of terms related to anger, sadness, combined with the Twitter nature which provides fear and disgust in negative messages than in pos- short direct message, discourages people to stand itive ones (See Fig. 3). On the contrary, positive in the grey (neutral) area, as could happens in other cases: about the 90% of the tweets shows an ex- plicit polarity, meaning people take a side and ex- press their opinions. Figure 3: Distribution of emotions by polarity tags messages contain more terms related to joy, antic- ipation and surprise. Some suggestions can be de- Figure 2: TW-SWELLFER: Distribution of sub- rived in the comparison of polarity categories and topic tags in IN-TOPIC messages the corresponding ironic ones. For instance, terms related to joy are more frequent in ironic negative messages than in negative ones. It is an insight of Which are these opinions and about what? Go- the polarity reversal phenomena, where a shift is ing further with the analysis and looking also at the produced by the adoption of a seemingly positive contents, so taking into consideration the “topic statement, to reflect a negative one (Sulis et al., specification attribute and its values (Fig. 2), the 2016). largest category refers to sons tweets (TOBESO) The analysis of topic specification messages re- (40.3%), in which children are discussing and veals a positive polarity for messages concerning posting about being children and/or about relating TOBEPA (to be parents), while BECOMEPA (to themselves with parents. Parents tag (TOBEPA) become parents) has a more negative polarity (See settles on 15% and becoming tag (BECOMEPA) Table 1). Focusing on emotion lexicon, TOBEPA on 10%. Remaining categories have minor impact, has an higher incidence of Joy words (Fig. 4). all being in between 1% and 6%. Messages concerning educations of children (JUDGOTHERPA) contain a high frequency of 3.1 Sentiment and emotion analysis anger and disgust term. The category TOBESO (to The exam of the corpus includes a lexical analy- 4 EmoLex (Mohammad and Turney, 2013) as well as sis on different aspects of affect: sentiment and an own-house Italian version of LIWC (Pennebaker et al., 2001), Hu&Liu (Hu and Liu, 2004), AFINN (Nielsen, 2011). emotions. The distribution of terms in each group Lexicons were translated from English in (Buscaldi and of messages reveals interesting patterns. Adopting Hernández Farı́as, 2015). tag pLIWC pHuLiu pEmolex Afinn pAVG POS 1.06 0.22 0.62 3.51 1.35 NEG -1.61 0.04 0.12 0.39 -0.27 HUMPOS 0.19 0.12 0.23 2.29 -0.71 HUMNEG -0.34 0.08 0.64 0.61 0.25 TOBESO 1.97 0.88 0.02 1.56 1.11 TOBECOMEPA 1.5 0.73 0.18 -1.64 0.19 TOBEPA 1.94 1.38 0.18 5.04 2.13 DAILYLIFE 1.13 1.56 0.32 6.04 2.26 Table 1: Polarity values according to different lexicons in tweets tagged with the following labels: POS, NEG, HUMPOS, HUMNEG (polarity tags) and TOBESO, TOBECOMEPA,TOBEPA, DAILYLIFE (sub-topic tags). our understanding of attitudes on fertility and par- enthood. We are currently extending the corpus by ex- ploring the very interesting debate around the “Fertility Day’s initiative” from the Italy’s Minis- ter of Health Beatrice Lorenzin, which had a re- markable echo on social media such as Twitter, with a substantial number of (also sarcastic) mes- sages with hashtag #fertilityday posted. Acknowledgments Figure 4: Distribution of emotions by sub-topic The authors gratefully acknowledge financial sup- tags port from the European Research Council under the European ERC Grant Agreement n. StG- 313617 (SWELL-FER: Subjective Well-being and be sons) is more controversial, having the higher Fertility, P.I. Letizia Mencarini). frequency of negative terms as fear, but also trust, as well as having the lower frequency of Joy terms. Coherently, anticipation is more frequent in the References BECOMEPA group of messages. Summarizing, Valerio Basile and Malvina Nissim. 2013. Sentiment it seems that children are more critics toward par- analysis on italian tweets. In Proceedings of the 4th ents. On the contrary, parents seem express an at- Workshop on Computational Approaches to Subjec- titude more positive towards children. tivity, Sentiment and Social Media Analysis, pages 100–107, Atlanta, Georgia. Association for Compu- 4 Conclusions and Future Work tational Linguistics. The contribution of this paper is the exploration Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi- viana Patti, and Paolo Rosso. 2014. Overview of of opinions and semantic orientation about fertil- the Evalita 2014 SENTIment POLarity Classifica- ity and parenthood by scrutinizing about 3 million tion Task. In Proc. of EVALITA 2014, pages 50–57, Italian tweets. This analysis is useful to identify Pisa, Italy. Pisa University Press. features for the development of an automatic sys- Cristina Bosco, Viviana Patti, and Andrea Bolioli. tem to address automatic classification tasks in this 2013. Developing corpora for sentiment analysis: domain. The corpus is available to the community. The case of irony and senti-tut. IEEE Intelligent Sys- Its development constitutes a first step and a pre- tems, 28(2):55–63, March. condition to a further analysis that can be applied Cristina Bosco, Mirko Lai, Viviana Patti, and Daniela on such contents in order to extract, from semanti- Virone. 2016. Tweeting and being ironic in the de- cally enriched data, measures of SWB constructed bate about a political reform: the french annotated in an indirect way. This will hopefully improve corpus twitter-mariagepourtous. In Proceedings of the Tenth International Conference on Language Re- Malvina Nissim and Viviana Patti. 2016. Semantic sources and Evaluation (LREC 2016), pages 1619– aspects in sentiment analysis. In Fersini Elisabetta, 1626, Portoroz, Slovenia. ELRA. Bing Liu, Enza Messina, and Federico Pozzi, edi- tors, Sentiment Analysis in Social Networks, chap- Davide Buscaldi and Delia Irazú Hernández Farı́as. ter 3, pages 31–48. Elsevier. 2015. Sentiment analysis on microblogs for natural disasters management: A study on the 2014 genoa James W. Pennebaker, Martha E. Francis, and Roger J. floodings. In Proceedings of the 24th International Booth. 2001. Linguistic Inquiry and Word Count: Conference on World Wide Web, WWW ’15 Com- LIWC 2001. Mahway: Lawrence Erlbaum Asso- panion, pages 1185–1188. ACM. ciates, 71. A. Ceron, L. Curini, and S.M. Iacus. 2014. Social Me- Christian Reimsbach-Kounatze. 2015. The prolifera- dia e Sentiment Analysis: L’evoluzione dei fenomeni tion of big data and implications for official statistics sociali attraverso la Rete. SxI - Springer for Inno- and statistical agencies. vation / SxI - Springer per l’Innovazione. Springer Milan. Antonio Reyes, Paolo Rosso, and Tony Veale. 2013. A multidimensional approach for detecting irony in Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, twitter. Lang. Resour. Eval., 47(1):239–268, March. Ekaterina Shutova, Antonio Reyes, and Jhon Barn- den. 2015. Semeval-2015 task 11: Sentiment anal- Marco Stranisci, Cristina Bosco, Delia ysis of figurative language in Twitter. In Proceed- Irazú Hernández Farı́as, and Viviana Patti. 2016. ings of the 9th International Workshop on Semantic Annotating sentiment and irony in the online italian Evaluation (SemEval 2015), pages 470–475, Den- political debate on #labuonascuola. In Proceedings ver, Colorado, USA. Association for Computational of the Tenth International Conference on Lan- Linguistics. guage Resources and Evaluation (LREC 2016), Paris, France, may. European Language Resources Delia Irazú Hernández Farı́as, Viviana Patti, and Paolo Association (ELRA). Rosso. 2016. Irony detection in Twitter: The role of affective content. ACM Transaction of Internet Emilio Sulis, Mirko Lai, Manuela Vinai, and Manuela Technology, 16(3):19:1–19:24. Sanguinetti. 2015. Exploring sentiment in social Minqing Hu and Bing Liu. 2004. Mining and summa- media and official statistics: a general framework. rizing customer reviews. In Proceedings of the Tenth In Proceedings of the 2nd International Workshop ACM SIGKDD International Conference on Knowl- on Emotion and Sentiment in Social and Expres- edge Discovery and Data Mining, KDD ’04, pages sive Media, co-located with AAMAS 2015, Istanbul, 168–177, Seattle, WA, USA. ACM. Turkey, May 5, 2015., volume 1351 of CEUR Work- shop Proceedings, pages 96–105. CEUR-WS.org. L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth. 2013. The geography of happi- Emilio Sulis, Irazú Hernández Farı́as, Paolo Rosso, Vi- ness: Connecting Twitter sentiment and expression, viana Patti, and Giancarlo Ruffo. 2016. Figurative demographics, and objective characteristics of place. messages and affect in twitter: Differences between PLoS ONE, 8(5), 05. #irony, #sarcasm and #not. Knowledge-Based Sys- tems, 108:132 – 143. New Avenues in Knowledge Saif M. Mohammad and Peter D. Turney. 2013. Bases for Natural Language Processing. Crowdsourcing a Word–Emotion Association Lex- icon. Computational Intelligence, 29(3):436–465. Andranik Tumasjan, Timm Sprenger, Philipp Sandner, and Isabell Welpe. 2010. Predicting elections with Saif M. Mohammad, Xiaodan Zhu, Svetlana Kir- Twitter: What 140 characters reveal about politi- itchenko, and Joel Martin. 2015. Sentiment, emo- cal sentiment. In International AAAI Conference on tion, purpose, and style in electoral tweets. Informa- Web and Social Media. tion Processing and Management, 51:480–499. Sudha Verma, Sarah Vieweg, William Corvey, Leysia Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Palen, James Martin, Martha Palmer, Aaron Schram, Sebastiani, and Veselin Stoyanov. 2016. Semeval- and Kenneth Anderson. 2011. Natural language 2016 task 4: Sentiment analysis in twitter. In processing to the rescue? extracting ”situational Proceedings of the 10th International Workshop on awareness” tweets during mass emergency. In Inter- Semantic Evaluation (SemEval-2016), pages 1–18, national AAAI Conference on Web and Social Me- San Diego, California, June. Association for Com- dia. putational Linguistics. Deirdre Wilson. 2006. The pragmatics of verbal irony: Finn Årup Nielsen. 2011. A new ANEW: evaluation of Echo or pretence? Lingua, 116(10):1722 – 1743. a word list for sentiment analysis in microblogs. In Proceedings of the ESWC2011 Workshop on ’Mak- Emilio Zagheni and Ingmar Weber. 2005. Demo- ing Sense of Microposts’: Big things come in small graphic research with non-representative internet packages, volume 718 of CEUR Workshop Pro- data. International Journal of Manpower, 36(1):13– ceedings, pages 93–98, Heraklion, Crete, Greece. 25. CEUR-WS.org.