=Paper=
{{Paper
|id=Vol-2253/paper52
|storemode=property
|title="Buon appetito!" - Analyzing Happiness in Italian Tweets
|pdfUrl=https://ceur-ws.org/Vol-2253/paper52.pdf
|volume=Vol-2253
|authors=Pierpaolo Basile,Nicole Novielli
|dblpUrl=https://dblp.org/rec/conf/clic-it/BasileN18
}}
=="Buon appetito!" - Analyzing Happiness in Italian Tweets==
“Buon appetito!” - Analyzing Happiness in Italian Tweets Pierpaolo Basile and Nicole Novielli Department of Computer Science University of Bari Aldo Moro Via, E. Orabona, 4 - 70125 Bari (Italy) {firstname.lastname}@uniba.it Abstract 2017) using corpus-based approaches to linguistic ethnography (Mihalcea and Liu, 2006). Such anal- English. We report the results of an ex- yses, can further enhance our understanding on ploratory study aimed at investigating the how people conceptualize the experience of emo- language of happiness in Italian tweets. tions and what are their more common triggers. Specifically, we conduct a time-wise anal- Recent studies even envisaged the emergence of ysis of the happiness load of tweets by tools for monitoring the public mood 1 and health leveraging a lexicon of happiness ex- through the analysis of Twitter users’ reaction to tracted from 8.6M tweets. Furthermore, major social, political, economics events (Bollen we report the results of a statistical lin- et al., 2009). guistic analysis aimed at extracting the In this study we report the results of an ex- most frequent concepts associated with the ploratory analysis of the language of happiness in happy and sad words in our lexicon. Twitter. In particular, we perform a partial repli- Italiano. Riportiamo i risultati cation of the approach proposed by (Mihalcea and dell’analisi esplorativa di un corpus Liu, 2006) for mining sources of happiness in blog di tweet in Italiano, al fine di individuare i posts. The contributions of this paper are as fol- concetti tipicamente associati alla felicità. lows. First, we extract a happiness dictionary from Riportiamo inoltre i risultati di un’analisi a sample of about 8.6M tweets from the TWITA time-wise dell’happiness load dei tweet corpus of Italian tweets (Basile and Nissim, 2013). nelle diverse ore della giornata e nei For each word in the dictionary, we compute a diversi giorni della settimana. happiness factor by adapting the approach pro- posed in the original study. Furthermore, we per- form a qualitative investigation of the 100 happi- 1 Introduction est and saddest words by mapping them into psy- The widespread diffusion of social media has re- cholinguistic word categories (see Section 2). As shaped the way we interact and communicate. a second step, we use our dictionary to perform a Among others, microblogging platforms as Twit- time-wise analysis of happiness as shared in dif- ter are becoming extremely popular and people ferent hours and days of the week (see Section 3). constantly use them for sharing opinions about Third, we extract concepts most frequently asso- facts of public interest. Furthermore, its world- ciated with happy words in our dictionary, which wide adoption and the fact that tweets are publicly we map into WordNet super-senses (see Section available, makes Twitter an extremely appealing 4). We discuss limitations and provide suggestions virtual place for researchers interested in language for future work in Section 5. analysis as a mean to investigate social phenom- ena (Bollen et al., 2009; Garimella et al., 2016). 2 The Happiness Dictionary In addition, recent research showed how mi- 2.1 A Dataset of Happy and Sad Tweets croblogging is also used for self-disclosure of in- dividual feelings (Roberts et al., 2012; Andalibi Our study is based on TWITA (Basile and Nis- et al., 2017). As such, microblogs constitute an sim, 2013), the largest available corpus of Ital- invaluable wealth of data ready to be mined for 1 ’What Twitter tells us about our happiness’ https:// discovering affective stereotypes (Joseph et al., goo.gl/fmYBP3 - Last accessed: Oct. 2018 ian tweets. In particular, we analyze a subset of new user or when they ask sombebody to follow 400M tweets obtained by filtering-out re-tweets them back (fback) as in: @usermention ciao sono from all the 500M tweets collected from February nuova, fback? Grazie mille :) Sad words refer to 2012 to September 2015. Following the idea pro- negative emotions or evaluations, such as triste, posed in (Read, 2005; Go et al., 2009), we select dispiace, brutto, peccato. Interestingly, several positive and negative tweets based on the presence negative words emerge from the school domain of positive or negative emoticons2 . Since a tweet (compiti, studiare) and the word scuola has a neg- can contain multiple emoticons, we selected only ative score of -0.93 itself. tweets that contain a single emoticon appearing at the end of the tweet. Using this procedure we ob- 2.3 Happiness by Psycholinguistic Categories tain a corpus Chappy of 8,648,476 tweets. We are interested in understanding how happiness 2.2 Happy/Sad Word Extraction and Scoring words map into psycholinguistic word classes. From the Chappy corpus, we extract a subset of Hence, we check their distribution along the word words and we assign them an happiness factor categories in the Linguistic Inquiry and Word (hf ) computed according to the log of the odds Count (LIWC) taxonomy (Pennebaker and Fran- ratio between the probability that the word occurs cis, 2001). To this aim, we perform a qualitative in positive tweets phappy (wi ) and the probability investigation on the 100 most happy and 100 most that it occurs in negative tweets psad (wi ) as in Eq. sad words, that are the words with the highest and 1. lowest happiness scores, respectively. We map phappy (wi ) each word into LIWC word categories. LIWC hf (wi ) = log (1) organizes words into psychologically meaningful psad (wi ) categories, based on the assumption that the lan- We adopt additive smoothing (Laplace smoothing) guage reflects the cognitive and emotional phe- for computing both phappy and psad probabilities. nomena involved in communication. It has been In our lexicon, we include and compute the hap- used for a wide range of psycholinguistics exper- piness factor only for words that occur at least imental settings, including investigation on emo- 10,000 times, for a total of 718 words. We call tions, social relationships, and thinking styles this list “the happiness dictionary” (Dh )3 . Table 1 (Tausczik and Pennebaker, 2010). reports the most happy/sad words with the corre- We perform a coding of the English transla- sponding happiness factor (score(hf)). tion of the happy/sad words into LIWC categories. Table 1: The happiness factor of the most When translating, we keep the information about happy/sad words. the subject conveyed by the Italian verbs (e.g., ’penso’ is translated as ’I think’). The coding happy score (hf) sad score (hf) is performed manually by the authors: in a first fback 4.04 triste -2.37 round, one rater associates each word with the ricambi 3.83 purtroppo -1.91 corresponding LIWC category; then, the other re- benvenuta 3.17 dispiace -1.68 vises the annotation, checking for consistency and grazie 2.32 brutto -1.68 verifying also the correctness of the translation. buon 2.14 peccato -1.63 22 words are discarded and replaced with others piacere 2.03 manca -1.53 from the dictionary because we could not find a gentile 1.91 compiti -1.35 mapping with any of the categories. Furthermore, auguro 1.86 paura -1.33 we add an ad hoc category to enable modeling of dolcezza 1.74 studiare -1.30 words from the social media domain (retweet, fol- low). We observe that some happy words (fback, Figure 1 shows how the happy and sad words ricambi, benvenuta) are due to several positive distribute along the dimensions associated with the tweets that users post when they establish new most frequent categories. Sample words for each connections, i.e. when they start following a word category are reported in Table 2. We observe 2 that happy words in the dictionary mainly refer to We use :-) and :) for happy and :-( and :( for sad. 3 The dictionary is available on github https:// positive emotions as well as to the social and social github.com/pippokill/happyFactor media dimensions. Conversely, sad words mainly describe negative emotions with focus on the au- thor. Words describing cognitive mechanisms are also associated with sadness. (a) Happiness load by day of the week (b) Happiness load for a 24-hour day Figure 2: Time-wise analysis. Figure 1: Comparing the most happy/sad words along dimensions associated with word categories. of words occurring in the happiness dictionary in each different time period. Using this strategy, in Table 2: Mapping the happiness dictionary to each time period the word has an happiness load word categories obtained by multiplying its frequency in that pe- riod by its happiness factor. The happiness load Category Sample words of each time period is the average of all the happi- Affect buono/a, ottimo, triste, brutto ness load in that period. The obtained values are Cogmech avrei, pensare, capisco, so, volevo mapped in the interval [-1, 1] and plotted in Figure Comm benvenut*, buonanotte, ciao 2a (for days) and in Figure 2b (for hours). I mi, io, first person verbs Our time-wise analysis reveals a drop in happi- Negate mai, nulla, non ness on Thurdsay, with a subsequent twist towards Negemo difficile, peggio, sola positive mood on Friday, before the weekend that Posemo benvenuta, piacere, sorriso, cara is the happiest moment in the week. This is consis- Posfeel cara, contenta, adoro, felice tent with the findings of the original study report- Present avermi, trovi, riesco ing mid-week blues around Wednesday and a hap- Self mi, io, first person verbs piness peak on Saturday (Mihalcea and Liu, 2006). Social ricambi, gruppo Regarding the hours, we observe the highest hap- S. media fback, follow, seguire, Instagram piness load in the morning, with a peak around 6 Time serata, anticipo, periodo, ultima AM, and it constantly decreases over the day, with You te, tuo, second person verbs the lowest value observed around 11 PM. 4 Concept analysis 3 Time-wise analysis We are interested in concepts related to words in As observed in the original study, happiness is not the happiness dictionary. In the original study, the constant in our life and different degrees of hap- authors extract the ’ingredients’ for their recipe of piness might be observed at different moments in happiness by ranking the most relevant 2- and 3- time. As such, we analyze how happiness changes grams from their corpus according to their happi- over time. In particular we take into account the ness load. Such an approach is not easy to repli- days of the week and the different hours in a day. cate as the number of 2- and 3-grams extracted For this analysis, we exploit the whole corpus from 400M tweets is potentially huge. Hence, of 400M tweets and we compute the distribution starting from the words in our happiness dictio- Table 3: The most happy and sad word pairs. performed a mapping of Italian words to the En- glish WordNet through the use of both Morph-it! word pair score (Zanchetta and Baroni, 2005) and MultiWordNet buon, appetito 9.74 (Pianta et al., 2002), while sense occurrences are buon, auspicio 8.84 extracted from MultiSemCor (Bentivogli and Pi- dolcezza, infinita 6.94 anta, 2005). happy grazie, mille 5.23 In Table 4 we report the most happy and piacere, ciao 5.12 sad super-senses with the most frequent words grazie, esistere 4.50 extracted by our corpus. Consistently with dispiacere, deludervi -9.28 the evidence provided by the analysis of the brutto, presentimento -8.45 psycholinguistic word categories (see Section triste, arrabbiata -8.10 2.3), we observe that socialness is associ- peccato, potevamo -4.85 ated with positive feelings, with concepts refer- sad triste, piangere -3.68 ring to people (noun.person) and communication studiare, matematica -3.55 (verb.communication, noun.communication) scor- peccato, gola -2.63 ing high in happiness. Food (noun.food) also manca, vederlo -1.97 seems to be a major cause of positive mood, as well as money and gifts (noun.possession), sport nary, we extract the most 50 co-occurring words achievements (’vittoria and ’gol’ in noun.act), in a window of two words. Then we rank all the and mundane locations and events (’centro’, ’pi- word pairs (dictionary word, co-occurring word)4 azza’, ’concerto’, ’viaggio’ in noun.location and according to the Pointwise Mutual Information noun.act). This is consistent with suggestion by (PMI) multiplied by the happiness factor. Table (Mihalcea and Liu, 2006) to enjoy food and drinks 3 reports some of the most happy and sad pairs. in an ’interesting social place’ as a recipe for hap- Starting from word pairs, we perform another piness. People also report their desires and prefer- kind of analysis aiming at mapping the words oc- ences (voglio, amo, spero in verb.emotion). curring in each pair with super-senses in WordNet. Also for sadness, results confirm findings A super-sense is a general semantic taxonomy de- emerging from the analysis of psycholinguis- fined by the WordNet lexicographer classes as a tic categories in LIWC. In fact, we ob- way for defining logical aggregation of senses in serve that people tend to report their own each syntactic category. We assign a happiness individual negative feelings (rido, piango in score to each super-sense by averaging the hap- verb.body), thoughts (verb.cognition), percep- piness factor associated with the dictionary word tions (e.g., ’vedo’, ’sento’), and personal needs in the pair. Since each pair contains a dictionary (’bisogno’ and ’sonno’ in noun.state). We observe word and a co-occurring word, we map the co- also stereotypical complaints about weather (pi- occurring word to its super-sense and increment ove) as well as swear words (noun.body). the score of the super-sense by summing the hap- piness factor associated with the dictionary word. 5 Discussion and Conclusions Finally, the score of each super-sense is divided We performed an exploratory analysis of the lex- by the number of the co-occurring words belong- icon and concepts associated with happiness in ing to the super-sense. For ambiguous words, we Italian tweets. We leveraged a corpus of happy select the super-sense associated with the most fre- and sad tweets to extract a ”happiness dictionary’, quent sense. In this study, we do not rely on which we use to perform a time-wise analysis of a Word Sense Disambiguation (WSD) algorithm happiness on Twitter and to extract the most fre- since WSD is a critical task. We need to test quent concepts and psycholinguistic categories as- the WSD performance on tweets before to use sociated to positive and negative emotions. it. Generally, WSD algorithms give performance This study is a partial replication of the pre- slightly above the most frequent sense. We plan vious one by (Mihalcea and Liu, 2006) on blog to test WSD in a further study. As super-senses posts. The main differences with respect to the are defined in the English version of WordNet, we original study are in the size, language and source 4 We do not take into account the word order in the pairs. of the corpus used for extracting the happiness Table 4: The most happy and sad super-senses based in our corpus. super-sense most frequent concepts noun.relation resto, ricambio noun.food cena, pranzo, colazione, caffé noun.attribute coraggio, voce, numero, bellezza, splendore, silenzio noun.person mamma, ragazz*, amic*, dio, tesoro, donna verb.communication dico(no), parlare, prego, profilo, parla, chiedere noun.communication film, scusa, merda, musica, buongiorno, canzone, concerto happy verb.possession trov*, dare, perdere, perso, averti, comprato verb.emotion voglio/vorrei, amo, piace, vuoi, spero, odio, auguri noun.location sito, centro, post, piazza, scena, sud, nord, regione noun.possession soldi, regalo, fondo noun.event vittoria, gara, onda, campagna, scarica, fuoco, episodio, meraviglia noun.act cose, partita, gol, colpa, ricerca, viaggio, tour, bacio, corso, sesso verb.consumption bisogna, mangiare, usare, mangio/mangiato, usa/o, usato, mangio verb.body piangere, dormire, ridere, sveglia, sorridere, piango, rido noun.body swear words, testa, occhi, mano/i, capelli verb.change inizio/inizia(re), cambiare, finito, morire/morte, successo, finisce sad verb.perception vedere, vedo, sento, sentire, guarda, guardare, ascoltare, pare verb.cognition so, sai, penso, letto, credo, sa, leggere, sapere, pensare, studiare noun.state bisogno, punto, problemi/a, accordo, pace, crisi, situazione, sonno noun.substance aria, acqua verb.weather piove lexicon. Specifically, (Mihalcea and Liu, 2006) time. rely on a collection of 10,000 blog posts in En- We are aware of the main limitations of this glish from LiveJournal.com to extract a list of study. First of all, by relying on microblogs we happy/sad words with their associated happiness are probably able to mine emotion triggers that scores, while we leverage a bigger corpus consist- do not necessarily coincide with those shared in ing of 8.6M Italian tweets. Furthermore, the blog daily face-to-face conversations or reported in pri- posts were labeled as happy or sad by their au- vate logs. Furthermore, we do not attempt to make thors. Conversely, for tweets we relied on silver any categorization of the authors of tweets. In- labeling based on the presence of emoticons as a deed, different target user groups could be studied proxy the author self-reporting of her own positive to fulfill specific research goals and enable per- or negative emotions. spective applications, i.e. for supporting creative Our analysis of psycholinguistic categories and writing or for providing personalized recommen- the extraction of concepts and WordNet super- dations based on moods. Finally, we consider only senses associated with them reveals interesting Twitter as a source of data. The same methodology findings. Happiness appears related to the so- could produce different results if applied to other cial aspects of life while sad tweets mainly re- social media. Indeed, recent research (Andalibi et volves around self-centered negative feelings and al., 2017) showed that other media, such as Insta- thoughts. In addition, our-time wise analysis re- gram, are also used for sharing extremely private veals a mid-week drop in happiness also observed emotions, such as feelings linked to depression. in the original study. We also observe that hap- Based on these observations, further replications piness is high in the morning and decreases over could focus on finer-grained emotions, also lever- the day. As a future work, it would be interesting aging corpora from different platforms and includ- to investigate if time-wise analysis based on hours ing consideration of demographics and geograph- produces consistent results if a weekday or the ical information (Mitchell et al., 2013; Allisio et weekend is considered and if emotion-triggering al., 2013) as additional dimensions of analysis. concepts associated with happiness also vary over References Christopher M. Danforth. 2013. The geography of happiness: Connecting twitter sentiment and expres- [Allisio et al.2013] Leonardo Allisio, Valeria Mussa, sion, demographics, and objective characteristics of Cristina Bosco, Viviana Patti, and Giancarlo Ruffo. place. PLOS ONE, 8(5):1–15, 05. 2013. Felicittà: Visualizing and estimating hap- piness in italian cities from geotagged tweets. In [Pennebaker and Francis2001] J. Pennebaker and Proceedings of the First International Workshop on M. Francis. 2001. Linguistic inquiry and word Emotion and Sentiment in Social and Expressive count: Liwc. Mahway: Lawrence Erlbaum Media: approaches and perspectives from AI (ES- Associates, 71. SEM 2013) A workshop of the XIII International Conference of the Italian Association for Artificial [Pianta et al.2002] Emanuele Pianta, Luisa Bentivogli, Intelligence (AI*IA 2013), Turin, Italy, December 3, and Christian Girardi. 2002. Multiwordnet: de- 2013., pages 95–106. veloping an aligned multilingual database. 1st gwc. In Proceedings of the First International Conference [Andalibi et al.2017] Nazanin Andalibi, Pinar Ozturk, on Global WordNet. and Andrea Forte. 2017. Sensitive self-disclosures, responses, and social support on instagram: The [Read2005] Jonathon Read. 2005. Using emoticons to case of #depression. In Proceedings of the 2017 reduce dependency in machine learning techniques ACM Conference on Computer Supported Coopera- for sentiment classification. In Proceedings of the tive Work and Social Computing, CSCW ’17, pages ACL student research workshop, pages 43–48. As- 1485–1500, New York, NY, USA. ACM. sociation for Computational Linguistics. [Basile and Nissim2013] Valerio Basile and Malvina [Roberts et al.2012] Kirk Roberts, Michael A. Roach, Nissim. 2013. Sentiment analysis on Italian tweets. Joseph Johnson, Josh Guthrie, and Sanda M. In Proceedings of the 4th Workshop on Computa- Harabagiu. 2012. EmpaTweet: Annotating tional Approaches to Subjectivity, Sentiment and So- and Detecting Emotions on Twitter. In Nico- cial Media Analysis, pages 100–107. letta C. Chair, Khalid Choukri, Thierry Declerck, Mehmet U. Dou gan, Bente Maegaard, Joseph Mar- [Bentivogli and Pianta2005] Luisa Bentivogli and iani, Jan Odijk, and Stelios Piperidis, editors, Pro- Emanuele Pianta. 2005. Exploiting parallel texts ceedings of the Eight International Conference on in the creation of multilingual semantically anno- Language Resources and Evaluation (LREC’12), Is- tated resources: the multisemcor corpus. Natural tanbul, Turkey, May. European Language Resources Language Engineering, 11(3):247–261. Association (ELRA). [Bollen et al.2009] Johan Bollen, Alberto Pepe, and [Tausczik and Pennebaker2010] Yla R. Tausczik and Huina Mao. 2009. Modeling public mood and emo- James W. Pennebaker. 2010. The psychologi- tion: Twitter sentiment and socio-economic phe- cal meaning of words: Liwc and computerized text nomena. CoRR, abs/0911.1583. analysis methods. Journal of Language and Social Psychology, 29(1):24–54. [Garimella et al.2016] Kiran Garimella, Michael Math- ioudakis, Gianmarco De Francisci Morales, and [Zanchetta and Baroni2005] Eros Zanchetta and Marco Aristides Gionis. 2016. Exploring controversy in Baroni. 2005. Morph-it!: a free corpus-based mor- twitter. In Proceedings of the 19th ACM Confer- phological resource for the italian language. ence on Computer Supported Cooperative Work and Social Computing Companion, CSCW ’16 Compan- ion, pages 33–36, New York, NY, USA. ACM. [Go et al.2009] Alec Go, Lei Huang, and Richa Bhayani. 2009. Twitter sentiment analysis. En- tropy, 17:252. [Joseph et al.2017] Kenneth Joseph, Wei Wei, and Kathleen M. Carley. 2017. Girls rule, boys drool: Extracting semantic and affective stereotypes from twitter. In Proceedings of the 2017 ACM Confer- ence on Computer Supported Cooperative Work and Social Computing, CSCW 2017, Portland, OR, USA, February 25 - March 1, 2017, pages 1362–1374. [Mihalcea and Liu2006] Rada Mihalcea and Hugo Liu. 2006. A corpus-based approach to finding happi- ness. In Proc. AAAI Spring Symposium and Compu- tational Approaches to Weblogs, page 6 pages. [Mitchell et al.2013] Lewis Mitchell, Morgan R. Frank, Kameron Decker Harris, Peter Sheridan Dodds, and