Emotional Reactions Prediction of News Posts Anastasia Giachanou1 , Paolo Rosso2 , Ida Mele3 , Fabio Crestani1 1 Faculty of Informatics, Università della Svizzera italiana, Lugano, Switzerland 2 PRHLT Research Center, Universitat Politècnica de València, Spain 3 ISTI-CNR, Pisa, Italy anastasia.giachanou@usi.ch, prosso@dsic.upv.es, ida.mele@isti.cnr.it, fabio.crestani@usi.ch Abstract. Nowadays, on-line news agents post news articles on social media platforms with the aim to attract more users. Different types of news trigger dif- ferent emotions on users who may feel surprised or sad after reading some piece of news. In this paper, we are interested in predicting the amount of emotional re- actions triggered on users after reading a news post. To address the problem, we propose a model that is trained on features extracted from users’ early comment- ing activity. Our results show that users’ early activity features are very important and that combining those features with terms can effectively predict the amount of emotional reactions triggered on users by a news post. 1 Introduction Social media platforms such as Facebook and Twitter allow news agents to post news articles online which are accessible to users to read, comment or express their opinion about them. Some of the news articles trigger a large amount of emotional reactions whereas others do not. Predicting the amount of emotional reactions is a very important problem for dealing with the problem of information overload. For example, a system that can predict the amount of emotional reactions that are triggered by news articles allows a user to filter the articles she would like to read based not only on the articles’ content but also on the emotions they trigger. Predicting the amount of triggered emotional reactions is not a trivial problem. Net- work properties such as the structure of the platform or other external factors such as user’s location may affect the reactions of the users towards a specific news post. Intuitively, content is one of the most important factors that influences the emotional reactions [1] since there are certain terms that convey sentiment and emotion. A related problem to emotional reactions prediction is the online content popular- ity prediction. Most of prior work was based on early-stage measurements, whereas little effort has been given on the pre-publication prediction [4, 3]. Bandari et al. [4] tackled the task as both regression and classification, and reached the conclusion that the prediction is feasible without any early activity signals. However, recently Arapakis et al. [3] extended the work of [4] and showed that predicting the popularity of news articles prior to their publication is not yet a viable task. IIR 2018, May 28-30, 2018, Rome, Italy. Copyright held by the authors. 2 Giachanou et al. A closely related work is perfromed by Clos et al. [5] who proposed a unigram mixture model to create an emotional lexicon which was then used to predict the prob- abilities of different emotional reactions. More recently, Giachanou et al. [6] focused on predicting the amount of emotional reactions triggered on users. However, they only explored pre-publication features including content based similarities and frequencies of entities. 2 Emotional Reactions Prediction The problem of emotional reactions’ amount prediction of news posts published on a social network is defined as: Given a news article post and data about early activity, the task consists in predicting the amount of emotional reactions that the post will trigger on users. Note that our aim is to classify a news post with regards to the amount of the emotional reactions (e.g., love, surprise, joy, sadness, anger) it will trigger on users. We address the problem as a 3-class task. Given a news post we assign to it one of the labels low, medium, high that refer to the amount of each emotional reaction that the post will trigger. 2.1 Features Intuitively, the content of the post is very important for predicting if a news article will trigger a high number of a certain emotional reaction. To this end, in our study we start with terms. Furthermore, we extract features from users’ early commenting activity to investigate if there are temporal patterns in commenting activity. Frequencies. The simplest textual feature is the terms that the news post contains. Although this is a simple feature, it is one of the most important features for news articles’ popularity prediction [1, 9] as well as similar information retrieval tasks [2, 7]. We use the bag-of-words representation to model the terms. Each term in the vector is weighted using the term frequency-inverse document frequency (TF-IDF) approach that considers how important is the term in a corpus. In the rest of the paper, we use terms to refer to the TF-IDF representation of the terms. Commenting Activity. Once a news post is published on a social network, the users are allowed to publish their comments about the specific post. These comments which are published below the news post are very important because they are an early indicator of users’ interest and reaction regarding the news post. We use the activity of users in publishing comments regarding the news post to extract our early activity features regarding three time range scenarios: 10, 20, and 30 minutes after the publication of the news article. We use the following features: 1. First comment: time difference in seconds between publication date of the news post and the first comment, if the first comment is published within the specified time range. 2. Number of comments: number of comments published within the specified time range. 3. Commenting ratio: mean time of commenting for those published within the spec- ified time range. Emotional Reactions Prediction of News Posts 3 3 Experimental Setup We used the same dataset as in Giachanou et al. [6] which contains news posts from The New York Times group in Facebook together with the amount of 5 different emotional reactions: love, surprise, joy, sadness, and anger for each post. The collection consists of 26,560 news posts that span from April 2016 to September 2017. We used a 10-fold cross validation to perform the experiments. We kept training and test sets separate. We performed a 3-class classification task according to which a news post can get one of the following labels: low, medium, high. We predicted the amount level of the following emotional reactions: love, surprise, joy, sadness, and anger, which were ad- dressed individually. For all the expirements, we used the Random Forest classifier. We report F1 score for each emotional reaction. We compare our results with terms that is based only on the terms of the posts and the All (+terms) that is based on the approach proposed in Giachanou et al. [6]. Significance is measured with the McNemar test. 4 Results and Discussion From the results in Table 1 we observe that terms are better predictors compared to using only the early activity. This suggests that for the specific task terms contain more predictive power compared to early activity, that is considered the most important fea- ture for popularity prediction [8]. When the early activity features are used alone, the best performance is obtained for joy. In addition, we observe that for the emotions sur- prise and joy the difference between terms and early activity is smaller compared to the rest of the emotions. Indeed, in case of joy, earlyt=30 obtains a slightly worse perfor- mance compared to terms. One possible explanation is that in case of news that trigger joy and surprise, users post more comments compared to the rest of emotions. Table 1. Performance results (F1-scores) using early activity features. Scores with ∗ and † indi- cate statistically significant improvements with respect to the terms and All (+terms) approaches. Love Surprise Joy Sadness Anger Terms 0.491 0.494 0.578 0.555 0.597 All (+terms) [6] 0.478 0.486 0.554 0.543 0.576 earlyt=10 0.416 0.476 0.549 0.435 0.509 earlyt=20 0.415 0.477 0.557 0.444 0.518 earlyt=30 0.415 0.480 0.574 0.448 0.535 Terms+earlyt=10 0.534∗ † 0.563∗ † 0.644∗ † 0.586∗ † 0.633∗ † Terms+earlyt=20 0.536∗ † 0.573∗ † 0.652∗ † 0.592∗ † 0.642∗ † Terms+earlyt=30 0.541∗ † 0.577∗ † 0.653∗ † 0.593∗ † 0.645∗ † https://www.facebook.com/nytimes/ 4 Giachanou et al. Table 1 shows that, in most of the cases, the performance improves when the time range is increased. The only exception is the reaction love for which the performance slightly decreases. For some emotions (e.g., surprise), the improvement is little, whereas for other emotions (e.g., anger) the improvement is larger. However, we expect that ex- tracting features from even the first ten minutes is very useful for the prediction while keeping the advantage of quick access after the post is published. Finally, Table 1 shows that combining terms with early commenting activity is the most effective approach and leads to significant improvements over both terms and All (+terms) approaches. 5 Conclusions and Future Work In this study, we presented a methodology for predicting the amount of emotional re- actions that will be triggered towards a specific news post. Our results suggested that early commenting activity is very important for the emotional prediction task. However, terms contain more predictive power compared to using only early activity predictors. More importantly, we showed that models trained on both terms and early commenting activity can effectively address the problem. As future work, we plan to address the task as an ordinal classification or a regres- sion problem and we will try to predict the exact number of each emotional reaction. Acknowledgments. This research was partially funded by the Swiss National Science Foundation (SNSF) under the project OpiTrack. The work of the second author was partially funded by the the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P). References 1. Alam, F., Celli, F., Stepanov, E.A., Ghosh, A., Riccardi, G.: The social mood of news: Self- reported annotations to design automatic mood detection systems. In: PEOPLES ’16. pp. 143– 152 (2016) 2. Aliannejadi, M., Crestani, F.: Venue suggestion using social-centric scores. CoRR abs/1803.08354 (2018) 3. Arapakis, I., Cambazoglu, B.B., Lalmas, M.: On the feasibility of predicting popular news at cold start. Journal of the Association for Information Science and Technology 68(5), 1149– 1164 (2017) 4. Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: Forecasting popu- larity. In: ICWSM ’12. pp. 26–33 (2012) 5. Clos, J., Bandhakavi, A., Wiratunga, N., Cabanac, G.: Predicting emotional reaction in social networks. In: ECIR ’17. pp. 527–533 (2017) 6. Giachanou, A., Rosso, P., Mele, I., Crestani, F.: Emotional influence prediction of news posts. In: ICWSM’18 (2018) 7. Paltoglou, G., Giachanou, A.: Opinion Retrieval: Searching for Opinions in Social Media, pp. 193–214. Springer International Publishing (2014) 8. Shulman, B., Sharma, A., Cosley, D.: Predictability of popularity: Gaps between prediction and understanding. In: ICWSM ’16. pp. 348–357 (2016) 9. Tsagkias, M., Weerkamp, W., De Rijke, M.: Predicting the volume of comments on online news stories. In: CIKM ’09. pp. 1765–1768 (2009)