The CL-Aff Happiness Shared Task: Results and Key Insights Kokil Jaidka1,2 , Saran Mumick2,3 , Niyati Chhaya4 , and Lyle Ungar2 1 Nanyang Technological University, Singapore 2 University of Pennsylvania, USA 3 Megagon Labs, USA 4 Adobe Research, India jaidka@ntu.edu.sg Abstract. This overview describes the official results of the CL-Aff Shared Task 2019 – in Pursuit of Happiness. The dataset comprised a semi-supervised classification task and an open-ended knowledge mod- eling task on a dataset of over 80,000 brief autobiographical accounts of happy moments, crowdsourced from Amazon Mechanical Turk. The Shared Task was organized as a part of the 2nd Workshop on Affective Content Analysis @ AAAAI-19, held in Honolulu, USA on January 27, 2019. This paper compares the participating systems in terms of their accuracy and F-1 scores at predicting two facets of happiness. The com- plete annotated dataset is available on Harvard Dataverse at https: //goo.gl/3rcZqf. The annotation instructions and the scripts used for evaluation are available at the Git repository at https://github.com/ kj2013/claff-happydb. 1 Introduction The purpose of the CL-Aff Shared Task is to challenge the current understanding of emotion through a task that models the experiential, contextual and agen- tic attributes of happy moments. It has long been known that human affect is context-driven, and that labeled datasets should account for these factors in gen- erating predictive models of affect. The Shared Task is organized in collaboration with researchers at Megagon Labs and builds upon the HappyDB dataset [1], comprising human accounts of ‘happy moments’. The Shared Task comprised of two sub-tasks for analyzing happiness and well-being in written language, on a corpus of over 80,000 descriptions of happy moments, as described here: Given: An account of a happy moment, marked with individual’s demographics, recollection time and relevant labels. – Task 1: Semi-Supervised classification task - Predict thematic labels (Agen- cy/Sociality) on unseen data, based on a small labeled and large unlabeled training data.5 5 In the annotation task and the Shared Task, the label names we provided were ‘Agency’ and ‘Social’. We have since renamed ‘Social’ to ‘Sociality’ so that both Agency and Sociality can be grammatically consistent. – Task 2: Suggest interesting ways to automatically characterize the happy moments in terms of affect, emotion, participants and content. The task, given its predictive and open-ended interpretive aspects is relevant for the computational linguistics, natural language processing, artificial intel- ligence and the psycholinguistics communities. The aim is to engage scholarly interest and crowdsource new ideas and linguistic approaches to define happi- ness. Details on the psycholinguistic underpinnings of the annotation task are provided in a different, forthcoming paper [5]. Evaluation: The performance of Systems was compared based on their Ac- curacy and F-1 measure at predicting the Agency and Sociality labels on the unseen test dataset. This was done using an automatic evaluation script, avail- able on Github 6 . 1.1 Dataset description The CL-Aff corpus comprises the following: – Labeled training set (N = 10,560): Single-sentence happy moments from the available HappyDB corpus, annotated with demographic labels of the author, as well as labels that identify the ’agency’ of the author and the ’social’ characteristic of the moment, as well as concept labels describing its theme. – Unlabeled training set (N = 59,846: The remaining single-sentence HappyDB happy moments with only the demographic labels of the author. – Test set: (N = 17,215) Previously unreleased, single-sentence happy mo- ments, freshly collected in the same manner as the original HappyDB data. Authors’ demographic labels were available to the Shared Task participants but not the ‘agency’ or ‘social’ characteristics. The Agency and Sociality characteristics of each happy moment were decided by a simple majority agreement between three independent annotators using a binary (yes/no) coding. 2 Corpus development 2.1 Collecting the happy moments We followed the format of the original HappyDB AMT task[1] to collect a second dataset of 20,000 happy moments, which was to be the unseen test data in the CL-Aff Shared Task. The following instructions were provided to the workers. Instructions 6 https://github.com/kj2013/claff-happydb/ What made you happy? Reflect on the past , and recall three ac- tual events that happened to you that made you happy. Describe your happy moments with a complete sentence. Write three such moments. You will also be asked to note for how long each event made you happy. This task also has post-task questions. Please be sure to answer the questions. Examples of happy moments we are NOT looking for (e.g., events in distant past, incomplete sen- tence): The day I married my spouse; My dog. < Enter moment here > For how long did that event make you happy? Select the answer that is most appropriate. Each AMT worker was required to enter three happy moments experienced within a specific time period. Half of the questionnaires specified a time period of 24 hours, while the other half with a