=Paper= {{Paper |id=Vol-2253/paper52 |storemode=property |title="Buon appetito!" - Analyzing Happiness in Italian Tweets |pdfUrl=https://ceur-ws.org/Vol-2253/paper52.pdf |volume=Vol-2253 |authors=Pierpaolo Basile,Nicole Novielli |dblpUrl=https://dblp.org/rec/conf/clic-it/BasileN18 }} =="Buon appetito!" - Analyzing Happiness in Italian Tweets== https://ceur-ws.org/Vol-2253/paper52.pdf
            “Buon appetito!” - Analyzing Happiness in Italian Tweets

                                Pierpaolo Basile and Nicole Novielli
                                  Department of Computer Science
                                     University of Bari Aldo Moro
                                Via, E. Orabona, 4 - 70125 Bari (Italy)
                              {firstname.lastname}@uniba.it


                     Abstract                          2017) using corpus-based approaches to linguistic
                                                       ethnography (Mihalcea and Liu, 2006). Such anal-
    English. We report the results of an ex-           yses, can further enhance our understanding on
    ploratory study aimed at investigating the         how people conceptualize the experience of emo-
    language of happiness in Italian tweets.           tions and what are their more common triggers.
    Specifically, we conduct a time-wise anal-         Recent studies even envisaged the emergence of
    ysis of the happiness load of tweets by            tools for monitoring the public mood 1 and health
    leveraging a lexicon of happiness ex-              through the analysis of Twitter users’ reaction to
    tracted from 8.6M tweets. Furthermore,             major social, political, economics events (Bollen
    we report the results of a statistical lin-        et al., 2009).
    guistic analysis aimed at extracting the
                                                          In this study we report the results of an ex-
    most frequent concepts associated with the
                                                       ploratory analysis of the language of happiness in
    happy and sad words in our lexicon.
                                                       Twitter. In particular, we perform a partial repli-
    Italiano.         Riportiamo i risultati           cation of the approach proposed by (Mihalcea and
    dell’analisi esplorativa di un corpus              Liu, 2006) for mining sources of happiness in blog
    di tweet in Italiano, al fine di individuare i     posts. The contributions of this paper are as fol-
    concetti tipicamente associati alla felicità.     lows. First, we extract a happiness dictionary from
    Riportiamo inoltre i risultati di un’analisi       a sample of about 8.6M tweets from the TWITA
    time-wise dell’happiness load dei tweet            corpus of Italian tweets (Basile and Nissim, 2013).
    nelle diverse ore della giornata e nei             For each word in the dictionary, we compute a
    diversi giorni della settimana.                    happiness factor by adapting the approach pro-
                                                       posed in the original study. Furthermore, we per-
                                                       form a qualitative investigation of the 100 happi-
1   Introduction                                       est and saddest words by mapping them into psy-
The widespread diffusion of social media has re-       cholinguistic word categories (see Section 2). As
shaped the way we interact and communicate.            a second step, we use our dictionary to perform a
Among others, microblogging platforms as Twit-         time-wise analysis of happiness as shared in dif-
ter are becoming extremely popular and people          ferent hours and days of the week (see Section 3).
constantly use them for sharing opinions about         Third, we extract concepts most frequently asso-
facts of public interest. Furthermore, its world-      ciated with happy words in our dictionary, which
wide adoption and the fact that tweets are publicly    we map into WordNet super-senses (see Section
available, makes Twitter an extremely appealing        4). We discuss limitations and provide suggestions
virtual place for researchers interested in language   for future work in Section 5.
analysis as a mean to investigate social phenom-
ena (Bollen et al., 2009; Garimella et al., 2016).
                                                       2     The Happiness Dictionary
   In addition, recent research showed how mi-         2.1    A Dataset of Happy and Sad Tweets
croblogging is also used for self-disclosure of in-
dividual feelings (Roberts et al., 2012; Andalibi      Our study is based on TWITA (Basile and Nis-
et al., 2017). As such, microblogs constitute an       sim, 2013), the largest available corpus of Ital-
invaluable wealth of data ready to be mined for          1
                                                           ’What Twitter tells us about our happiness’ https://
discovering affective stereotypes (Joseph et al.,      goo.gl/fmYBP3 - Last accessed: Oct. 2018
ian tweets. In particular, we analyze a subset of        new user or when they ask sombebody to follow
400M tweets obtained by filtering-out re-tweets          them back (fback) as in: @usermention ciao sono
from all the 500M tweets collected from February         nuova, fback? Grazie mille :) Sad words refer to
2012 to September 2015. Following the idea pro-          negative emotions or evaluations, such as triste,
posed in (Read, 2005; Go et al., 2009), we select        dispiace, brutto, peccato. Interestingly, several
positive and negative tweets based on the presence       negative words emerge from the school domain
of positive or negative emoticons2 . Since a tweet       (compiti, studiare) and the word scuola has a neg-
can contain multiple emoticons, we selected only         ative score of -0.93 itself.
tweets that contain a single emoticon appearing at
the end of the tweet. Using this procedure we ob-        2.3   Happiness by Psycholinguistic Categories
tain a corpus Chappy of 8,648,476 tweets.
                                                         We are interested in understanding how happiness
2.2    Happy/Sad Word Extraction and Scoring             words map into psycholinguistic word classes.
From the Chappy corpus, we extract a subset of           Hence, we check their distribution along the word
words and we assign them an happiness factor             categories in the Linguistic Inquiry and Word
(hf ) computed according to the log of the odds          Count (LIWC) taxonomy (Pennebaker and Fran-
ratio between the probability that the word occurs       cis, 2001). To this aim, we perform a qualitative
in positive tweets phappy (wi ) and the probability      investigation on the 100 most happy and 100 most
that it occurs in negative tweets psad (wi ) as in Eq.   sad words, that are the words with the highest and
1.                                                       lowest happiness scores, respectively. We map
                            phappy (wi )                 each word into LIWC word categories. LIWC
             hf (wi ) = log                        (1)   organizes words into psychologically meaningful
                             psad (wi )
                                                         categories, based on the assumption that the lan-
We adopt additive smoothing (Laplace smoothing)
                                                         guage reflects the cognitive and emotional phe-
for computing both phappy and psad probabilities.
                                                         nomena involved in communication. It has been
In our lexicon, we include and compute the hap-
                                                         used for a wide range of psycholinguistics exper-
piness factor only for words that occur at least
                                                         imental settings, including investigation on emo-
10,000 times, for a total of 718 words. We call
                                                         tions, social relationships, and thinking styles
this list “the happiness dictionary” (Dh )3 . Table 1
                                                         (Tausczik and Pennebaker, 2010).
reports the most happy/sad words with the corre-
                                                            We perform a coding of the English transla-
sponding happiness factor (score(hf)).
                                                         tion of the happy/sad words into LIWC categories.
Table 1: The happiness factor of the most                When translating, we keep the information about
happy/sad words.                                         the subject conveyed by the Italian verbs (e.g.,
                                                         ’penso’ is translated as ’I think’). The coding
 happy         score (hf)    sad          score (hf)     is performed manually by the authors: in a first
 fback         4.04          triste       -2.37          round, one rater associates each word with the
 ricambi       3.83          purtroppo    -1.91          corresponding LIWC category; then, the other re-
 benvenuta     3.17          dispiace     -1.68          vises the annotation, checking for consistency and
 grazie        2.32          brutto       -1.68          verifying also the correctness of the translation.
 buon          2.14          peccato      -1.63          22 words are discarded and replaced with others
 piacere       2.03          manca        -1.53          from the dictionary because we could not find a
 gentile       1.91          compiti      -1.35          mapping with any of the categories. Furthermore,
 auguro        1.86          paura        -1.33          we add an ad hoc category to enable modeling of
 dolcezza      1.74          studiare     -1.30          words from the social media domain (retweet, fol-
                                                         low).
   We observe that some happy words (fback,                 Figure 1 shows how the happy and sad words
ricambi, benvenuta) are due to several positive          distribute along the dimensions associated with the
tweets that users post when they establish new           most frequent categories. Sample words for each
connections, i.e. when they start following a            word category are reported in Table 2. We observe
   2                                                     that happy words in the dictionary mainly refer to
   We use :-) and :) for happy and :-( and :( for sad.
   3
   The dictionary is available on github https://        positive emotions as well as to the social and social
github.com/pippokill/happyFactor                         media dimensions. Conversely, sad words mainly
describe negative emotions with focus on the au-
thor. Words describing cognitive mechanisms are
also associated with sadness.




                                                                (a) Happiness load by day of the week




                                                                 (b) Happiness load for a 24-hour day

                                                                Figure 2: Time-wise analysis.
Figure 1: Comparing the most happy/sad words
along dimensions associated with word categories.
                                                      of words occurring in the happiness dictionary in
                                                      each different time period. Using this strategy, in
Table 2: Mapping the happiness dictionary to          each time period the word has an happiness load
word categories                                       obtained by multiplying its frequency in that pe-
                                                      riod by its happiness factor. The happiness load
    Category   Sample words                           of each time period is the average of all the happi-
    Affect     buono/a, ottimo, triste, brutto        ness load in that period. The obtained values are
    Cogmech    avrei, pensare, capisco, so, volevo    mapped in the interval [-1, 1] and plotted in Figure
    Comm       benvenut*, buonanotte, ciao            2a (for days) and in Figure 2b (for hours).
    I          mi, io, first person verbs                Our time-wise analysis reveals a drop in happi-
    Negate     mai, nulla, non                        ness on Thurdsay, with a subsequent twist towards
    Negemo     difficile, peggio, sola                positive mood on Friday, before the weekend that
    Posemo     benvenuta, piacere, sorriso, cara      is the happiest moment in the week. This is consis-
    Posfeel    cara, contenta, adoro, felice          tent with the findings of the original study report-
    Present    avermi, trovi, riesco                  ing mid-week blues around Wednesday and a hap-
    Self       mi, io, first person verbs             piness peak on Saturday (Mihalcea and Liu, 2006).
    Social     ricambi, gruppo                        Regarding the hours, we observe the highest hap-
    S. media   fback, follow, seguire, Instagram      piness load in the morning, with a peak around 6
    Time       serata, anticipo, periodo, ultima      AM, and it constantly decreases over the day, with
    You        te, tuo, second person verbs           the lowest value observed around 11 PM.

                                                      4   Concept analysis
3     Time-wise analysis
                                                      We are interested in concepts related to words in
As observed in the original study, happiness is not   the happiness dictionary. In the original study, the
constant in our life and different degrees of hap-    authors extract the ’ingredients’ for their recipe of
piness might be observed at different moments in      happiness by ranking the most relevant 2- and 3-
time. As such, we analyze how happiness changes       grams from their corpus according to their happi-
over time. In particular we take into account the     ness load. Such an approach is not easy to repli-
days of the week and the different hours in a day.    cate as the number of 2- and 3-grams extracted
For this analysis, we exploit the whole corpus        from 400M tweets is potentially huge. Hence,
of 400M tweets and we compute the distribution        starting from the words in our happiness dictio-
  Table 3: The most happy and sad word pairs.                     performed a mapping of Italian words to the En-
                                                                  glish WordNet through the use of both Morph-it!
              word pair                     score                 (Zanchetta and Baroni, 2005) and MultiWordNet
              buon, appetito                9.74                  (Pianta et al., 2002), while sense occurrences are
              buon, auspicio                8.84                  extracted from MultiSemCor (Bentivogli and Pi-
              dolcezza, infinita            6.94                  anta, 2005).
 happy
              grazie, mille                 5.23
                                                                     In Table 4 we report the most happy and
              piacere, ciao                 5.12
                                                                  sad super-senses with the most frequent words
              grazie, esistere              4.50
                                                                  extracted by our corpus.         Consistently with
              dispiacere, deludervi         -9.28                 the evidence provided by the analysis of the
              brutto, presentimento         -8.45                 psycholinguistic word categories (see Section
              triste, arrabbiata            -8.10                 2.3), we observe that socialness is associ-
              peccato, potevamo             -4.85                 ated with positive feelings, with concepts refer-
 sad
              triste, piangere              -3.68                 ring to people (noun.person) and communication
              studiare, matematica          -3.55                 (verb.communication, noun.communication) scor-
              peccato, gola                 -2.63                 ing high in happiness. Food (noun.food) also
              manca, vederlo                -1.97                 seems to be a major cause of positive mood, as
                                                                  well as money and gifts (noun.possession), sport
nary, we extract the most 50 co-occurring words                   achievements (’vittoria and ’gol’ in noun.act),
in a window of two words. Then we rank all the                    and mundane locations and events (’centro’, ’pi-
word pairs (dictionary word, co-occurring word)4                  azza’, ’concerto’, ’viaggio’ in noun.location and
according to the Pointwise Mutual Information                     noun.act). This is consistent with suggestion by
(PMI) multiplied by the happiness factor. Table                   (Mihalcea and Liu, 2006) to enjoy food and drinks
3 reports some of the most happy and sad pairs.                   in an ’interesting social place’ as a recipe for hap-
   Starting from word pairs, we perform another                   piness. People also report their desires and prefer-
kind of analysis aiming at mapping the words oc-                  ences (voglio, amo, spero in verb.emotion).
curring in each pair with super-senses in WordNet.                   Also for sadness, results confirm findings
A super-sense is a general semantic taxonomy de-                  emerging from the analysis of psycholinguis-
fined by the WordNet lexicographer classes as a                   tic categories in LIWC. In fact, we ob-
way for defining logical aggregation of senses in                 serve that people tend to report their own
each syntactic category. We assign a happiness                    individual negative feelings (rido, piango in
score to each super-sense by averaging the hap-                   verb.body), thoughts (verb.cognition), percep-
piness factor associated with the dictionary word                 tions (e.g., ’vedo’, ’sento’), and personal needs
in the pair. Since each pair contains a dictionary                (’bisogno’ and ’sonno’ in noun.state). We observe
word and a co-occurring word, we map the co-                      also stereotypical complaints about weather (pi-
occurring word to its super-sense and increment                   ove) as well as swear words (noun.body).
the score of the super-sense by summing the hap-
piness factor associated with the dictionary word.                5   Discussion and Conclusions
Finally, the score of each super-sense is divided
                                                                  We performed an exploratory analysis of the lex-
by the number of the co-occurring words belong-
                                                                  icon and concepts associated with happiness in
ing to the super-sense. For ambiguous words, we
                                                                  Italian tweets. We leveraged a corpus of happy
select the super-sense associated with the most fre-
                                                                  and sad tweets to extract a ”happiness dictionary’,
quent sense. In this study, we do not rely on
                                                                  which we use to perform a time-wise analysis of
a Word Sense Disambiguation (WSD) algorithm
                                                                  happiness on Twitter and to extract the most fre-
since WSD is a critical task. We need to test
                                                                  quent concepts and psycholinguistic categories as-
the WSD performance on tweets before to use
                                                                  sociated to positive and negative emotions.
it. Generally, WSD algorithms give performance
                                                                     This study is a partial replication of the pre-
slightly above the most frequent sense. We plan
                                                                  vious one by (Mihalcea and Liu, 2006) on blog
to test WSD in a further study. As super-senses
                                                                  posts. The main differences with respect to the
are defined in the English version of WordNet, we
                                                                  original study are in the size, language and source
   4
       We do not take into account the word order in the pairs.   of the corpus used for extracting the happiness
                  Table 4: The most happy and sad super-senses based in our corpus.

       super-sense               most frequent concepts
       noun.relation             resto, ricambio
       noun.food                 cena, pranzo, colazione, caffé
       noun.attribute            coraggio, voce, numero, bellezza, splendore, silenzio
       noun.person               mamma, ragazz*, amic*, dio, tesoro, donna
       verb.communication        dico(no), parlare, prego, profilo, parla, chiedere
       noun.communication        film, scusa, merda, musica, buongiorno, canzone, concerto
 happy
       verb.possession           trov*, dare, perdere, perso, averti, comprato
       verb.emotion              voglio/vorrei, amo, piace, vuoi, spero, odio, auguri
       noun.location             sito, centro, post, piazza, scena, sud, nord, regione
       noun.possession           soldi, regalo, fondo
       noun.event                vittoria, gara, onda, campagna, scarica, fuoco, episodio, meraviglia
       noun.act                  cose, partita, gol, colpa, ricerca, viaggio, tour, bacio, corso, sesso
       verb.consumption          bisogna, mangiare, usare, mangio/mangiato, usa/o, usato, mangio
       verb.body                 piangere, dormire, ridere, sveglia, sorridere, piango, rido
       noun.body                 swear words, testa, occhi, mano/i, capelli
       verb.change               inizio/inizia(re), cambiare, finito, morire/morte, successo, finisce
 sad   verb.perception           vedere, vedo, sento, sentire, guarda, guardare, ascoltare, pare
       verb.cognition            so, sai, penso, letto, credo, sa, leggere, sapere, pensare, studiare
       noun.state                bisogno, punto, problemi/a, accordo, pace, crisi, situazione, sonno
       noun.substance            aria, acqua
       verb.weather              piove


lexicon. Specifically, (Mihalcea and Liu, 2006)        time.
rely on a collection of 10,000 blog posts in En-
                                                          We are aware of the main limitations of this
glish from LiveJournal.com to extract a list of
                                                       study. First of all, by relying on microblogs we
happy/sad words with their associated happiness
                                                       are probably able to mine emotion triggers that
scores, while we leverage a bigger corpus consist-
                                                       do not necessarily coincide with those shared in
ing of 8.6M Italian tweets. Furthermore, the blog
                                                       daily face-to-face conversations or reported in pri-
posts were labeled as happy or sad by their au-
                                                       vate logs. Furthermore, we do not attempt to make
thors. Conversely, for tweets we relied on silver
                                                       any categorization of the authors of tweets. In-
labeling based on the presence of emoticons as a
                                                       deed, different target user groups could be studied
proxy the author self-reporting of her own positive
                                                       to fulfill specific research goals and enable per-
or negative emotions.
                                                       spective applications, i.e. for supporting creative
   Our analysis of psycholinguistic categories and     writing or for providing personalized recommen-
the extraction of concepts and WordNet super-          dations based on moods. Finally, we consider only
senses associated with them reveals interesting        Twitter as a source of data. The same methodology
findings. Happiness appears related to the so-         could produce different results if applied to other
cial aspects of life while sad tweets mainly re-       social media. Indeed, recent research (Andalibi et
volves around self-centered negative feelings and      al., 2017) showed that other media, such as Insta-
thoughts. In addition, our-time wise analysis re-      gram, are also used for sharing extremely private
veals a mid-week drop in happiness also observed       emotions, such as feelings linked to depression.
in the original study. We also observe that hap-       Based on these observations, further replications
piness is high in the morning and decreases over       could focus on finer-grained emotions, also lever-
the day. As a future work, it would be interesting     aging corpora from different platforms and includ-
to investigate if time-wise analysis based on hours    ing consideration of demographics and geograph-
produces consistent results if a weekday or the        ical information (Mitchell et al., 2013; Allisio et
weekend is considered and if emotion-triggering        al., 2013) as additional dimensions of analysis.
concepts associated with happiness also vary over
 References                                                   Christopher M. Danforth. 2013. The geography of
                                                              happiness: Connecting twitter sentiment and expres-
[Allisio et al.2013] Leonardo Allisio, Valeria Mussa,         sion, demographics, and objective characteristics of
    Cristina Bosco, Viviana Patti, and Giancarlo Ruffo.       place. PLOS ONE, 8(5):1–15, 05.
    2013. Felicittà: Visualizing and estimating hap-
    piness in italian cities from geotagged tweets. In     [Pennebaker and Francis2001] J. Pennebaker and
    Proceedings of the First International Workshop on         M. Francis. 2001. Linguistic inquiry and word
    Emotion and Sentiment in Social and Expressive             count: Liwc.      Mahway: Lawrence Erlbaum
    Media: approaches and perspectives from AI (ES-            Associates, 71.
    SEM 2013) A workshop of the XIII International
    Conference of the Italian Association for Artificial   [Pianta et al.2002] Emanuele Pianta, Luisa Bentivogli,
    Intelligence (AI*IA 2013), Turin, Italy, December 3,       and Christian Girardi. 2002. Multiwordnet: de-
    2013., pages 95–106.                                       veloping an aligned multilingual database. 1st gwc.
                                                               In Proceedings of the First International Conference
[Andalibi et al.2017] Nazanin Andalibi, Pinar Ozturk,          on Global WordNet.
   and Andrea Forte. 2017. Sensitive self-disclosures,
   responses, and social support on instagram: The         [Read2005] Jonathon Read. 2005. Using emoticons to
   case of #depression. In Proceedings of the 2017            reduce dependency in machine learning techniques
   ACM Conference on Computer Supported Coopera-              for sentiment classification. In Proceedings of the
   tive Work and Social Computing, CSCW ’17, pages            ACL student research workshop, pages 43–48. As-
   1485–1500, New York, NY, USA. ACM.                         sociation for Computational Linguistics.
[Basile and Nissim2013] Valerio Basile and Malvina         [Roberts et al.2012] Kirk Roberts, Michael A. Roach,
   Nissim. 2013. Sentiment analysis on Italian tweets.        Joseph Johnson, Josh Guthrie, and Sanda M.
   In Proceedings of the 4th Workshop on Computa-             Harabagiu.      2012.    EmpaTweet: Annotating
   tional Approaches to Subjectivity, Sentiment and So-       and Detecting Emotions on Twitter. In Nico-
   cial Media Analysis, pages 100–107.                        letta C. Chair, Khalid Choukri, Thierry Declerck,
                                                              Mehmet U. Dou gan, Bente Maegaard, Joseph Mar-
[Bentivogli and Pianta2005] Luisa Bentivogli and              iani, Jan Odijk, and Stelios Piperidis, editors, Pro-
   Emanuele Pianta. 2005. Exploiting parallel texts           ceedings of the Eight International Conference on
   in the creation of multilingual semantically anno-         Language Resources and Evaluation (LREC’12), Is-
   tated resources: the multisemcor corpus. Natural           tanbul, Turkey, May. European Language Resources
   Language Engineering, 11(3):247–261.                       Association (ELRA).
[Bollen et al.2009] Johan Bollen, Alberto Pepe, and        [Tausczik and Pennebaker2010] Yla R. Tausczik and
   Huina Mao. 2009. Modeling public mood and emo-              James W. Pennebaker. 2010. The psychologi-
   tion: Twitter sentiment and socio-economic phe-             cal meaning of words: Liwc and computerized text
   nomena. CoRR, abs/0911.1583.                                analysis methods. Journal of Language and Social
                                                               Psychology, 29(1):24–54.
[Garimella et al.2016] Kiran Garimella, Michael Math-
   ioudakis, Gianmarco De Francisci Morales, and           [Zanchetta and Baroni2005] Eros Zanchetta and Marco
   Aristides Gionis. 2016. Exploring controversy in           Baroni. 2005. Morph-it!: a free corpus-based mor-
   twitter. In Proceedings of the 19th ACM Confer-            phological resource for the italian language.
   ence on Computer Supported Cooperative Work and
   Social Computing Companion, CSCW ’16 Compan-
   ion, pages 33–36, New York, NY, USA. ACM.
[Go et al.2009] Alec Go, Lei Huang, and Richa
   Bhayani. 2009. Twitter sentiment analysis. En-
   tropy, 17:252.
[Joseph et al.2017] Kenneth Joseph, Wei Wei, and
    Kathleen M. Carley. 2017. Girls rule, boys drool:
    Extracting semantic and affective stereotypes from
    twitter. In Proceedings of the 2017 ACM Confer-
    ence on Computer Supported Cooperative Work and
    Social Computing, CSCW 2017, Portland, OR, USA,
    February 25 - March 1, 2017, pages 1362–1374.
[Mihalcea and Liu2006] Rada Mihalcea and Hugo Liu.
   2006. A corpus-based approach to finding happi-
   ness. In Proc. AAAI Spring Symposium and Compu-
   tational Approaches to Weblogs, page 6 pages.
[Mitchell et al.2013] Lewis Mitchell, Morgan R. Frank,
   Kameron Decker Harris, Peter Sheridan Dodds, and