-

10.7910/DVN/JZAS66

The CL-A Happiness Shared Task: Results and Key Insights

Kokil Jaidka

jaidka@ntu.edu.sg 2 3

Saran Mumick

1 3

Niyati Chhaya

Lyle Ungar

3 0 Adobe Research , India 1 Megagon Labs , USA 2 Nanyang Technological University , Singapore 3 University of Pennsylvania , USA

This overview describes the o cial results of the CL-A Shared Task 2019 { in Pursuit of Happiness. The dataset comprised a semi-supervised classi cation task and an open-ended knowledge modeling task on a dataset of over 80,000 brief autobiographical accounts of happy moments, crowdsourced from Amazon Mechanical Turk. The Shared Task was organized as a part of the 2nd Workshop on A ective Content Analysis @ AAAAI-19, held in Honolulu, USA on January 27, 2019. This paper compares the participating systems in terms of their accuracy and F-1 scores at predicting two facets of happiness. The complete annotated dataset is available on Harvard Dataverse at https: //goo.gl/3rcZqf. The annotation instructions and the scripts used for evaluation are available at the Git repository at https://github.com/ kj2013/claff-happydb.

The purpose of the CL-A Shared Task is to challenge the current understanding of emotion through a task that models the experiential, contextual and agentic attributes of happy moments. It has long been known that human a ect is context-driven, and that labeled datasets should account for these factors in generating predictive models of a ect. The Shared Task is organized in collaboration with researchers at Megagon Labs and builds upon the HappyDB dataset [ 1 ], comprising human accounts of `happy moments'. The Shared Task comprised of two sub-tasks for analyzing happiness and well-being in written language, on a corpus of over 80,000 descriptions of happy moments, as described here: Given: An account of a happy moment, marked with individual's demographics, recollection time and relevant labels. 5 In the annotation task and the Shared Task, the label names we provided were `Agency' and `Social'. We have since renamed `Social' to `Sociality' so that both Agency and Sociality can be grammatically consistent. { Task 2: Suggest interesting ways to automatically characterize the happy moments in terms of a ect, emotion, participants and content.

The task, given its predictive and open-ended interpretive aspects is relevant for the computational linguistics, natural language processing, arti cial intelligence and the psycholinguistics communities. The aim is to engage scholarly interest and crowdsource new ideas and linguistic approaches to de ne happiness. Details on the psycholinguistic underpinnings of the annotation task are provided in a di erent, forthcoming paper [5].

Evaluation: The performance of Systems was compared based on their Accuracy and F-1 measure at predicting the Agency and Sociality labels on the unseen test dataset. This was done using an automatic evaluation script, available on Github 6. 1.1

Dataset description

The CL-A

corpus comprises the following: { Labeled training set (N = 10,560): Single-sentence happy moments from the available HappyDB corpus, annotated with demographic labels of the author, as well as labels that identify the 'agency' of the author and the 'social' characteristic of the moment, as well as concept labels describing its theme. { Unlabeled training set (N = 59,846: The remaining single-sentence

HappyDB happy moments with only the demographic labels of the author. { Test set: (N = 17,215) Previously unreleased, single-sentence happy moments, freshly collected in the same manner as the original HappyDB data. Authors' demographic labels were available to the Shared Task participants but not the `agency' or `social' characteristics.

The Agency and Sociality characteristics of each happy moment were decided by a simple majority agreement between three independent annotators using a binary (yes/no) coding. 2 2.1

Corpus development Collecting the happy moments

We followed the format of the original HappyDB AMT task[ 1 ] to collect a second dataset of 20,000 happy moments, which was to be the unseen test data in the CL-A Shared Task. The following instructions were provided to the workers.

Instructions 6 https://github.com/kj2013/cla -happydb/

What made you happy? Re ect on the past <duration>, and recall three actual events that happened to you that made you happy. Describe your happy moments with a complete sentence. Write three such moments. You will also be asked to note for how long each event made you happy. This task also has post-task questions. Please be sure to answer the questions. Examples of happy moments we are NOT looking for (e.g., events in distant past, incomplete sentence): The day I married my spouse; My dog. < Enter moment here > For how long did that event make you happy? Select the answer that is most appropriate.

Each AMT worker was required to enter three happy moments experienced within a speci c time period. Half of the questionnaires speci ed a time period of 24 hours, while the other half with a <time period> of 3 months. The options provided for the follow-up question about the duration (i.e., the length) of happiness were `All day, I'm still feeling it,' `Half a day,' `At least one hour,' `A few minutes' or `Not Applicable.' After the participant answered these questions, demographic information was collected about their country, age, gender (`Male`,`Female`,`Other`,`Not Applicable`), marital status (`single', `married', `divorced', `separated', `widowed' or `Not Applicable`), and whether or not they have children (`yes',`no'). 3

Annotation

Annotators were required to annotate each moment along two binary dimensions { Agency and Sociality. We draw from Paulhas' conceptualization of selfpresentation according to the two factors of Agency and Communion [7]. Previous work exploring the evidence of agency in writing has adapted it to mean their locus of control, or the degree to which an author in control of their surroundings [9]. Sociality conceptualizes interpersonal engagement, evinced in writing as the description of any activity performed with or in the company of others[6].

Instructions Read the following happy moment. Choose any of the following that applies: Agency: Is the author in control? YES/NO Examples of sentences where the author is in control (Answer is YES): { \I ran on the treadmill for 20 minutes straight when I could barely do 5 minutes 3 months ago." { \Going out to a special birthday lunch for my great-grandmother in law's birthday." Examples of sentences where the author is not in control (Answer is NO): { \My youngest daughter got accepted to many prestigious universities and accepted an o er to attend college in San Diego." { \A small business deal change over for small pro t." Social: Does this moment involve other people other than the author? YES/NO Please note that objects (e.g., bus, work) should not be counted as social. Examples of sentences which involve other people (Answer is YES): { \Going out to a special birthday lunch for my great-grandmother in law's birthday." { \My youngest daughter got accepted to many prestigious universities and accepted an o er to attend college in San Diego." Note that sometimes a person is implicitly involved although not explicitly mentioned. In this case, we still wish to label the happy moment as social. E.g., \I received compliments on my tattoo." Examples of sentences which are not social (Answer is NO): { \I ran on the treadmill for 20 minutes straight when I could barely do 5 minutes 3 months ago." { \The bus came on time, so I reached work early." <Happy moment appears here> Agency: Is the author in control? YES/NO Social: Does this moment involve other people other than the author? YES/NO 3.1

Topic labeling

Annotators were presented with a happy moment, and a set of four potential topics which it was likely describing. Annotators were asked to mark all the tags which referred to what it was about. Each moment could score a maximum of four tags if at least two annotators agreed on them.

Instructions Read the following text. Select all categories that are relevant to the text from among those provided. If none of the categories is a great t, select "none of the above" <Topic 1> <Topic 2> <Topic 3> <Topic 4> 4

Overview of Approaches

Eleven teams participated in the Shared Task. The following paragraphs discuss the approaches followed by the participating systems, sorted in the order in which they signed up to participate in the task.

{ Arizona State University (ASU) [10]: The team from ASU proposed a Word Pair Convolutional Model (WoPCoM) to accomplish Task 1. The proposed model is motivated by the hypothesis that a small set of word-pair features are important to capturing the agency/social nature of happy moments. They trained a convolutional neural network (CNN) to predict on the unlabeled data. { University of California Santa Cruz (UCSC) [15]: The UCSC team participated in both tasks. For Task 1, they explored the use of syntactic, emotional, and survey features with semi-supervised learning, speci cally experimenting with XGBoosted Forest and CNN models. For Task 2, the team trained similar models to predict concepts, and based on the di culty of doing so, hypothesized about the nature of the themes in the happy moments. { International Institute of Information Technology Hyderabad (IIIT-H) [12]: The IIIT-H team employed an inductive transfer learning technique (ITL). They pre-trained a AWD-LSTM neural net on the WikiText-103 corpus, and then introduced an extra step to adapt the model to Happy moments. { Gyrfalcon [11]: The team from Gyrfalcon Technology, California, proposed an algorithm to map English words into squared glyphs images. Then, they applied a 2D-CNN model over these images in order to capture the sentiment. { A*STAR [4]: The IHPC-A*STAR team participated in both tasks. For Task 1, they used emotion intensity in happy moments to predict agency and sociality labels. They de ned a set of ve emotions (valence, joy, anger, fear, sadness) and use a previously developed tool, CrystalFeed, to label each moment with the corresponding ve emotion intensities. Combining these features with additional word-embedding features, they trained a logistic regression model. For Task 2, the team explored how these di erent emotions are manifested across the di erent concept labels. { University of British Columbia (UBC) [8]: The UBC team primarily experimented with di erent embedding methods, such as CoVe and ELMo, on deep neural networks. They modeled their neural networks as long shortterm memory networks and BiLSTM, with and without attention. { University of Ottawa (UOttawa) [16]: The University of Ottawa team also proposed a deep learning CNN solution. They experimented using di erent kind of word embeddings, and also experimented with training a multi-task classi er to see whether performance could be enhanced by shared knowledge between agency and sociality. { Escuela Superior Politecnica del Litoral (ESPOL) [14]: The ESPOL team proposed a semi-supervised adaptation to traditional k-means clustering using neural networks. { Sungkyunkwan team (SKKU) [ 2 ]: The SKKU team used a semi-supervised approach. They built four one-class autoencoder models, one for social, nonsocial, agentic, and non-agentic moments. Each autoencoder model had a deep learning architecture consisting of two neural networks, one for encoding the the input, and the other for reconstructing the compressed vector. { Jordan University of Science and Technology (JUST) [13]: The JUST team proposed used a Recurrent Convolutional Neural Network, and combined words with their context in order to get a more precise word embedding. { Fraunhofer (FKIE) [3]: The team from Fraunhofer FKIE trained a three-layer CNN. They experimented with using di erent embeddings including FastText and GloVe. Additionally, they experimented with splitting the dataset by demographic location of the author, and showed that training separate classi ers on the splits enhanced performance. 5 5.1

Results Task 1: Predicting Agency and Sociality

This section compares the participating systems in terms of their performance. Four of the eleven systems that did Task 1 also did the bonus Task 2. The results are provided in Table 1. The detailed implementation of the individual runs are described in the system papers included in this proceedings volume. Some of the systems used their neural models of happiness for Task 1 to produce visual knowledge representations [11], and general insights about happiness [10,8,15,3]. Most notably, Gyrfalcon [11] transformed textual moments into visualizations to explore whether they could encode more multi-dimensional information in this manner. UBC [8] provided a visualization for \attention" in their bi-directional long short-term memory networks which highlights the patterns that were considered important by the neural network while predicting Agency and Sociality when a sequence of words was input into the model. ASU [10] showed the codependence of the individual Agency and Sociality labels across the dataset through a t-SNE visualization. Team 33 [3] and UCSC [15] both attempted to capture the linguistic patterns in the construction of happiness and their potential cultural underpinnings. 6

Error Analysis

In this section, we present a meta-analysis of system performances for Task 1 over all the (a) topics and (b) moments in the test set. Furthermore, in their data pre-processing step, the team from Fraunhofer [3] identi ed that in the subset of happy moments contributed by authors from India alone, there were duplicate or near-duplicate happy moments in the data, which reduced the total number of training samples by 25%. We will include data cleaning as an extra preprocessing step in future data releases.

Topic-level analysis: We expect that happiness in di erent situations would be experienced and expressed di erently. Table 3 aggregates the failures produced by each of the approaches (out of the set of best approaches submitted by each of the teams).

Moment-level meta-analysis: We suspect that some of the errors in our data may occur due to mislabeling or the coding scheme not being applicable to the moment. In Table 4 we provide the happy moments for which 100% of the best approaches submitted by each of the teams reported failure. We observe that in some of the cases, (e.g., \Topanga running away to Cory"), the happy moment was actually mislabeled and thus the systems actually did have the correct prediction. Overall, many of the happy moments in this Table describe a single moment in the author's life, which seem ordinary when considered in the context of regular living. In some cases, the authors have attempted to explain why this moment was special to them (e.g., the second part of the moment \I nally got a hold of my auto mechanic, and that enabled me to schedule a time to bring in my car to get my custom exhaust installed" only serves to explain the signi cance of the moment to the author.)

Conclusion and Future Work

Eleven teams participated in the inaugural CL-A Shared Task AAAI-19. We have published the complete dataset to Harvard Dataverse 8. Furthermore, we expect to release other resources complementary to the challenges of modeling a ect and emotion language from language.

In summary, our meta-analysis of system performance identi es the following key takeaways and recommendations: { Predictive modeling approaches are greatly improved when modeled as a semi-supervised task, enriched with unlabeled data or by knowledge or feature vectors trained from a di erent domain. This also highlights the generalizability of the Shared Task to other domains. { Syntactic knowledge is important for modeling Agency and Sociality (and hence, for modeling happiness). Participants incorporated the importance of the head noun and subject-verb-object word order in their language models either through interacting layers in convolution neural networks, or by mining it using lexical pattern analyses methods. { The CL-A dataset o ers replicability of more traditional emotion modeling approaches. It was feasible to apply the models developed on other annotated emotion datasets to improve the predictive modeling performance on the Shared Task [4]. We anticipate that language models from the CL-A dataset will also generalize well to other problems and datasets for emotion and a ect analysis. { In future work, scholars could consider training their classi ers based on domain-speci c word embeddings derived from the Shared Task dataset itself. { Findings support the emerging notion about the English language as a contextualized emotional vector space, with the best performances reported by approaches that incorporated task-speci c embeddings from other language models, such as ELMo and CoVe.

Acknowledgement. We thank Dr. Wang-Chiew Tan for her feedback and Megagon Labs for contributing funds towards the CL-A dataset. 3. Claeser, D.: A ective content classi cation using convolutional neural networks.

In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (AffCon2019). Honolulu, Hawaii (January 2019) 4. Gupta, R.K., Bhattacharya, P., Yang, Y.: What constitutes happiness? predicting and characterizing the ingredients of happiness using emotion intensity analysis. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (AffCon2019). Honolulu, Hawaii (January 2019) 5. Jaidka, K., Chhaya, N., Mumick, S., Killingsworth, M., Halevy, A., Ungar, L.: Towards a typology of happiness: The cl-a annotated dataset of happy moments (2019) 6. Paulhus, D.L., Robinson, J.P., Shaver, P.R., Wrightsman, L.S.: Measures of personality and social psychological attitudes. Measures of social psychological attitudes series 1, 17{59 (1991) 7. Paulhus, D.L., Trapnell, P.D.: Self-presentation of personality. Handbook of personality psychology 19, 492{517 (2008) 8. Rajendran, A., Zhang, C., Abdul-Mageed, M.: Happy together: Learning and understanding appraisal from natural language. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019) 9. Rouhizadeh, M., Jaidka, K., Smith, L., Schwartz, H.A., Bu one, A., Ungar, L.: Identifying locus of control in social media language. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018) 10. Saxon, M., Bhandari, S., Ruskin, L., Honda, G.: Word pair convolutional model for happy moment classi cation. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019) 11. Sun, B., Yang, L., Chi, C., Zhang, W., Lin, M.: [cl-a shared task] squared english word: A method of generating glyph to use super characters for sentiment analysis. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019) 12. Syed, B., Indurthi, V., Shah, K., Gupta, M., Varma, V.: Ingredients for happiness: Modeling constructs via semi-supervised content driven inductive transfer learning. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019) 13. Talafha, B., Al-Ayyoub, M.: Ioh-rcnn: Pursuing the ingredients of happiness using recurrent convolutional neural networks. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019) 14. Torres, J., Vaca, C.: Neural semi-supervised learning for short-texts. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019).

Honolulu, Hawaii (January 2019) 15. Wu, J., Compton, R., Rakshit, G., Walker, M., Anand, P., Whittaker, S.: Cruzaffect at a con 2019 shared task: A feature-rich approach to characterize happiness. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (AffCon2019). Honolulu, Hawaii (January 2019) 16. Xin, W., Inkpen, D.: [cl-a shared task] happiness ingredients detection using multi-task deep learning. In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019). Honolulu, Hawaii (January 2019)

1. Asai , A. , Evensen , S. , Golshan , B. , Halevy , A. , Li , V. , Lopatenko , A. , Stepanov , D. , Suhara , Y. , Tan , W.C. , Xu , Y. : Happydb: A corpus of 100,000 crowdsourced happy moments . In: Proceedings of LREC 2018 . European Language Resources Association (ELRA), Miyazaki , Japan (May 2018 )

2. Cheong , Y.G. , Song , Y. , Bae , B.C. : [cl-a shared task] modeling happiness using one-class autoencoders . In: Proceedings of the 2nd Workshop on A ective Content Analysis @ AAAI (A Con2019) . Honolulu, Hawaii ( January 2019 )