Affective recommender systems: the role of emotions in recommender systems Marko Tkalčič Andrej Košir Jurij Tasič University of Ljubljana Faculty University of Ljubljana Faculty University of Ljubljana Faculty of electrical engineering of electrical engineering of electrical engineering Tržaška 25, Ljubljana, Tržaška 25, Ljubljana, Tržaška 25, Ljubljana, Slovenia Slovenia Slovenia marko.tkalcic@fe.uni-lj.si andrej.kosir@fe.uni-lj.si jurij.tasic@fe.uni-lj.si ABSTRACT of the research community to identify the position of their activities and to benefit from each other’s work. Recommender systems have traditionally relied on data-centric descriptors for content and user modeling. In recent years we have witnessed an increasing number of attempts to use 2. THE UNIFYING FRAMEWORK emotions in di↵erent ways to improve the quality of rec- When using applications with recommender systems the ommender systems. In this paper we introduce a unifying user is constantly receiving various stimuli (e.g. visual, au- framework that positions the research work, that has been ditory etc.) that induce emotive states. These emotions done so far in a scattered manner, in a three stage model. influence, at least partially (according to the bounded ratio- We provide examples of research that cover various aspects nality model [16]) the user’s decisions on which content to of the detection of emotions and the inclusion of emotions choose. Thus it is important for the recommender system into recommender systems. application to detect and make good use of emotive infor- mation. Categories and Subject Descriptors 2.1 Describing emotions H.4 [Information Systems Applications]: Miscellaneous; There are two main approaches to describe the emotive D.2.8 [Software Engineering]: Metrics—complexity mea- state of a user: (i) the universal emotions model and (ii) sures, performance measures the dimensional model. The universal emotions model as- sumes there is a limited set of distinct emotional categories. There is no unanimity as to which are the universal emo- Keywords tions, however, the categories proposed by Ekman [10] (i.e. recommender systems, emotions happiness, anger, sadness, fear, disgust and surprise) appear to be very popular. The dimensional model, on the contrary, describes each emotion as a point in a continuous multidi- 1. INTRODUCTION mensional space where each dimension represents a quality In the pursuit of increasing the accuracy of recommender of the emotion. The dimensions that are used most fre- systems, researchers started to turn to more user-centric quently are valence, arousal and dominance (thus the VAD content descriptors in recent years. The advances made in acronym) although some authors refer to these dimensions a↵ective computing, especially in automatic emotion detec- with di↵erent names (e.g. pleasure instead of valence in tion techniques, paved the way for the exploitation of emo- [20] or activation instead of arousal in [13]). The circum- tions and personality as descriptors that account for a larger plex model, proposed by Posner et al. [24], maps the basic part of variance in user preferences than the generic descrip- emotions into the VAD space (as depicted in Fig. 1) tors (e.g. genre) used so far. However, these research e↵orts have been conducted in- 2.2 The role of emotions in the consumption dependently, stretched among the two major research areas, chain recommender systems and a↵ective computing. In this pa- During the user interaction with a recommender system per we (i) survey the research work that helps improving and the content consumption that follows, emotions play recommender systems with a↵ective information and (ii) we di↵erent roles in di↵erent stages of the process. We divided provide a unifying framework that will allow the members the user interaction process in three stages, based on the role that emotions play (as shown in Fig. 2): (i) the entry stage, (ii) the consumption stage and (iii) the exit stage. The work surveyed in this paper can be divided in two   main categories: (i) generic emotion detection algorithms  (that can be used in all three stages) and (ii) usage of emo-  tion parameters in the various stages. This paper does not  aim at providing an overall survey of related work but rather  Decisions@RecSys 2011, Chicago, USA to point out good examples of how to address various as-  pects of recommender systems with the usage of techniques 9 not aware of it. Furthermore, Pantić et al. [22] argued that explicit acquisition of users’ a↵ect has further negative prop- Arousal erties as users may have side-interests that drive their ex- high plicit a↵ective labeling process (egoistic tagging, reputation- driven tagging or asocial tagging). joy The most commonly used procedure for the explicit asess- anger ment of emotions is the Self Assessment Manikin (SAM) de- surprise veloped by [7]. It is a questionnaire where users assess their disgust emotional state in the three dimensions: valence, arousal and dominance. fear The implicit acquisition of emotions is usually done through Valence neutral a variety of modalities and sensors: video cameras, speech, negative positive EEG, ECG etc. These sensors measure various changes sadness of the human body (e.g. facial changes, posture changes, changes in the skin conductance etc.) that are known to be related to specific emotions. For example, the Facial Ac- tion Coding System (FACS), proposed by Ekman [9], maps emotions to changes of facial characteristic poionts. There are excellent surveys on the topic of multimodal emotion de- tection: [31, 22, 14]. In general, raw data is acquired from low one or more sensors during the user interaction. These sig- nals are processed to extract some low level features (e.g. Gabor based features are popular in the processing of fa- Figure 1: Basic emotions in the valence-arousal cial expression video signals). Then some kind of classifi- plane of the dimensional model cation or regression technique is applied to yield distinct emotional classes or continuous values. The accuracy of emotion detection ranges from over 90% on posed datasets (like the Kanade-Cohn dataset [18]) to slightly better than borrowed from a↵ective computing. coin tossing on spontaneous datasets (like the LDOS-PerA↵- 1 dataset [29]) [27, 6]. time Entry mood Content-induced affective state Exit mood 4. ENTRY STAGE The first part of the proposed framework (see Fig. 2) is choice the entry stage. When a user starts to use a recommender system, she is in an a↵ective state, the entry mood. The entry mood is caused by some previous user’s activities, un- Detect Detect entry Give recommendations Give content Observe user exit known to the system. When the recommender system sug- mood mood gests a limited amount of content items to the user, the entry mood influences the user’s choice. In fact, the user’s decision making process depends on two types of cognitive processes, Content application the rational and the intuitive, the latter being strongly in- fluenced by the emotive state of the user, as explained by the bounded rationality paradigm [16]. For example, a user Entry stage Consumption stage Exit stage might want to consume a di↵erent type of content when she is happy than when she is sad. In order to adapt the list of recommended items to the user’s entry mood the system Figure 2: The unifying framework: the role of emo- must be able to detect the mood and to use it in the content tions in user interaction with a recommender sys- filtering algorithm as contextual information. tem. In the entry part of user-RS interaction one of the aspects where emotions can be exploited is to influence the user’s In the remainder of the paper we address each stage sepa- choice. Creed [8] explored how the way we represent infor- rately by surveying the existing research work and providing mation influences the user’s choices. lists of open research areas. At the end we discuss the pro- It has been observed by Porayska-Pomsta et al. [23] that posed framework and give the final conclusions. in tutoring systems there is a strong relation between the entry mood and learning. They analysed the actions that a human tutor took when the student showed signs of specific 3. DETECTING AFFECTIVE STATES a↵ective states to improve the e↵ectiveness of an interactive A↵ective states of end users (in any stage of the proposed learning environment. interaction chain) can be detected in two ways: (i) explic- A user modeling approach that maps a touristic attraction itly or (ii) implicitly. The implicit detection of emotions is with a piece of music that induces a related emotion has been more accurate but it’s an intrusive process that breaks the developed by Kaminskas and Ricci [17]. Their goal was to interaction. The implicit approach is less accurate but it’s find an appropriate musical score that would reinforce the well suited for user interaction purposes since the user is a↵ective state induced by the touristic attraction. 10 Using the entry mood as a contextual parameter (as de- anticipate that implicit tagging can be used for user profiling scribed by Adomavicius and Tuzhilin in [2]) could improve in recommender systems. the recommender’s performance. Both Koren et al. [19] and Joho et al. [15] used emotion detection from facial ex- Baltrunas et al. [5] suggest using the matrix factorization pressions to provide an a↵ective profile of video clips. They approach and enrich it with contextual parameters. At the used an item profile structure that labels changes of users context-aware recommender systems contest in 2010 1 the emotions through time relative to the video clip start. The goal was to select a number of movies that would fit the authors used their approach for summarizing highlights of user’s entry mood. The contest winners’ contribution, Shi video clips. et al. [25] used several approaches amog which the best was Hanjalić et al. [11] approached the summarization of video the joint matrix factorization model with a mood-specific highlights from the other side: they used the source’s low regularization. level features (audio and video) to detect higlihjts without As an extension to the usage of emotions as contextual taking into account the responses of end users. information an interesting research area is to diversify the The research work described so far in this section is in- recommendations. For example, if a user is sad, would it be teresting because allows us to model the content items (im- better to recommend happy content to cheer her up or to rec- ages, movies, music etc.) with a↵ective labels. These a↵ec- ommend sad content to be in line with the current mood? tive labels describe the emotions experienced by the users Research on information retrieval results diversification is who consume the items. In our previous work [26] we have getting increased attention, especially after the criticism of shown that the usage of such a↵ective labels over generic the recommendation bubble has started2 . Although we are labels (e.g. genre) significantly improves the performance of not aware of any work done on results diversification con- a content-based recommender system for images. We used nected with emotions, a fair amount of work has been done explicitly acquired a↵ective metadata to model the items on political news aggregators in order to stimulate political and the users’ preferences. However, in another experiment pluralism [21]. [28], where we used implicitly acquired a↵ective metadata, the accuracy of the recommender system was significantly 5. CONSUMPTION STAGE lower but still better than with generic metadata only. In a similar experiment, Arapakis et al. [4] built a recom- The second part of the proposed framework is the con- mender system that uses real time emotion detection infor- sumption stage (see Fig. 2). After the user starts with maion. the consumption of the content she experiences a↵ective re- sponses that are induced by the content. Depending on the type of content, these responses can be (i) single values (e.g. 6. EXIT STAGE the emotive response to watching an image) or (ii) a vec- After the user has finished with the content consumption tor of emotions that change over time (e.g. while watch- she is in what we call the exit mood. The main di↵erence ing a movie or a sequence of images). Figure 3 shows how between the consumption stage and the exit stage is that the emotions change over time in the consumption stage. The exit mood will influence the user’s next actions, thus having automatic detection of emotions can help building emotive an active part, while in the consumption stage the induced profiles of users and content items that can be exploited for emotions did not influence any actions but were a passive content-based recommender algorithms. response to the stimuli. In case that the user continues to E use the recommender system the exit mood for the content tT just consumed is the entry mood for the next content to be ✏4 consumed. The automatic detection of the exit mood can be useful as ✏3 an indicator of the user’s satisfaction with the content. Thus ✏2 the detection of the exit mood can be seen as an unobtrusive feedback collection technique. ✏1 Arapakis et al. [3] used the exit mood, detected through videos of users’ facial expressions, as an implicit feedback in ✏N their recommender system for video sequences. In an experiment with games, Yannakakis et al. [30], used t t(h1 ) t(h2 ) t(h3 ) t(h4 ) heart rate activity to infer the “fun” that the subjects expe- rience in physical interactive playgrounds. Figure 3: The user’s emotional state ✏ is continu- ously changing as the time sequence of the visual 7. OPEN RESEARCH AREAS stimuli hi 2 H induce di↵erent emotions. We identified four main areas where further research should be conducted in order to build true a↵ective recommender Using emotional responses for generating implicit a↵ective systems: (i) using emotions as context in the entry stage, tags for content is the main research area in the consumption (ii) modeling a↵ective content profiles, (iii) using a↵ective section. Pantić et al. [22] argued why the usage of automatic profiles for recommending content and (iv) building a set of emotion detection methods improves content tagging: the datasets. minimization of the drawbacks caused by egoistic tagging, Although some work has been carried out on exploiting reputation-driven tagging and asocial tagging. They also the entry mood we believe that there is still the need to 1 http://www.dai-labor.de/camra2010/ answer tha basic question of the entry stage: which items 2 http://www.thefilterbubble.com/ to recommend when the user is in the emotive state A?. We 11 further believe that there are firm di↵erences between what [6] M. S. Bartlett, G. C. Littlewort, M. G. Frank, the user wants now and what is good for a user on a long C. Lainscsek, I. R. Fasel, and J. R. Movellan. run. Thus bringing the research on results diversification Automatic Recognition of Facial Actions in (see the work done in [21, 1]) into a↵ective recommender Spontaneous Expressions. Journal of Multimedia, systems is a highly important topic. 1(6):22–35, Sept. 2006. A↵ective content profiling is still an open question, espe- [7] M. Bradley and P. Lang. Measuring emotion: the cially profiling content items that last longer than a single self-assessment manikin and the semantic di↵erential. emotive response. The time dependancy of content profiles Journal of behavior therapy and experimental has also o strong impact on the algorithms that exploit the psychiatry, 25(1):49–59, 1994. profiles for recommending items. [8] C. Creed and R. Beale. Using emotion simulation to With the except of the LDOS-PerA↵-1 dataset [29] (which influence user attitudes and behavior. HCI workshop is limited in the amount of content items and users), the at BCS 2005, pages 1–3, 2005. research community does not have a suitable dataset upon [9] P. Ekman. Facial expression and emotion. American which to work. It is thus required that a large-scale dataset, Psychologist, 48(4):384, 1993. compareable to the MovieLens or Netflix datasets, is built. [10] P. Ekman. Basic Emotions. In Handbook of Cognition and Emotion, pages 45—-60. 1999. 8. CONCLUSION [11] A. Hanjalic. Adaptive extraction of highlights from a In this paper we have provided a framework that describes sport video based on excitement modeling. IEEE three ways in which emotions can be used to improve the Transactions on Multimedia, 7(6):1114–1122, Dec. quality of recommender systems. We also surveyed some 2005. work that deals with parts of the isuues that arise in te [12] J. L. Herlocker, J. A. Konstan, L. Terveen, and J. T. pursuit of a↵ective recommender systems. Riedl. Evaluating collaborative filtering recommender An important issue in recommender systems, especially systems. ACM Trans. Inf. Syst, 22(1):5–53, 2004. when it comes to user-centric systems, is to move from data- [13] S. V. Ioannou, A. T. Raouzaiou, V. a. Tzouvaras, centric assessment criteria to user-centred assessment crite- T. P. Mailis, K. C. Karpouzis, and S. D. Kollias. ria. We have not addressed this issue in this paper as it Emotion recognition through facial expression analysis appears to larger dimensions. The recsys community has so based on a neurofuzzy network. Neural networks : the far relied on metrics borrowed from information retrieval: official journal of the International Neural Network confusion matrices, precision, recall etc. (see [12] for an Society, 18(4):423–35, May 2005. overview). However recommender systems are used by end [14] A. Jaimes and N. Sebe. Multimodal human–computer users and thus the assessment of the end users should be interaction: A survey. Computer Vision and Image taken more into account. We suggest to move towards met- Understanding, 108(1-2):116–134, 2007. rics that take into account the user experience as pointed [15] H. Joho, J. M. Jose, R. Valenti, and N. Sebe. out in http://www.usabart.nl/portfolio/ Exploiting facial expressions for a↵ective video KnijnenburgWillemsen-UMUAI2011_UIRecSy.pdf. summarisation. Proceeding of the ACM International Conference on Image and Video Retrieval - CIVR ’09, 9. REFERENCES page 1, 2009. [1] G. Adomavicius and Y. Kwon. Improving Aggregate [16] D. Kahneman. A perspective on judgment and choice: Recommendation Diversity Using Ranking-Based mapping bounded rationality. The American Techniques. IEEE Transactions on Knowledge and psychologist, 58(9):697–720, Sept. 2003. Data Engineering, (99):1–1, 2011. [17] M. Kaminskas and F. Ricci. Location-Adapted Music [2] G. Adomavicius, R. Sankaranarayanan, S. Sen, and Recommendation Using Tags. User Modeling, A. Tuzhilin. Incorporating contextual information in Adaption and Personalization, pages 183–194, 2011. recommender systems using a multidimensional [18] T. Kanade, J. Cohn, and Y. Tian. Comprehensive approach. ACM Transactions on Information Systems database for facial expression analysis. In Automatic (TOIS), 23(1):103–145, 2005. Face and Gesture Recognition, 2000. Proceedings. [3] I. Arapakis, J. Jose, and P. Gray. A↵ective feedback: Fourth IEEE International Conference on, pages an investigation into the role of emotions in the 46–53. IEEE, 2000. information seeking process. Proceedings of the 31st [19] Y. Koren. Collaborative filtering with temporal annual international ACM SIGIR conference on dynamics. Communications of the ACM, 53(4):89, Research and development in information retrieval, Apr. 2010. (January):395–402, 2008. [20] A. Mehrabian. Pleasure-arousal-dominance: A general [4] I. Arapakis, Y. Moshfeghi, H. Joho, R. Ren, framework for describing and measuring individual D. Hannah, and J. M. Jose. Enriching user profiling di↵erences in Temperament. Current Psychology, with a↵ective features for the improvement of a 14(4):261–292, Dec. 1996. multimodal recommender system. Proceeding of the [21] S. Munson, D. X. Zhou, and P. Resnick. Sidelines: An ACM International Conference on Image and Video algorithm for increasing diversity in news and opinion Retrieval - CIVR ’09, (i):1, 2009. aggregators. Proceedings of ICWSM09 Conference on [5] L. Baltrunas. Exploiting contextual information in Weblogs and Social Media. San Jose, CA., 2009. recommender systems. Proceedings of the 2008 ACM [22] M. Pantic and A. Vinciarelli. Implicit human-centered conference on Recommender systems - RecSys ’08, tagging [Social Sciences. IEEE Signal Processing page 295, 2008. 12 Magazine, 26(6):173–180, Nov. 2009. [23] K. Porayska-Pomsta, M. Mavrikis, and H. Pain. Diagnosing and acting on student a↵ect: the tutor’s perspective. User Modeling and User-Adapted Interaction: The Journal of Personalization Research, 18(1-2):125–173, 2007. [24] J. Posner, J. a. Russell, and B. S. Peterson. The circumplex model of a↵ect: an integrative approach to a↵ective neuroscience, cognitive development, and psychopathology. Development and psychopathology, 17(3):715–34, Jan. 2005. [25] Y. Shi, M. Larson, and A. Hanjalic. Mining mood-specific movie similarity with matrix factorization for context-aware recommendation. Proceedings of the Workshop on Context-Aware Movie Recommendation, pages 34–40, 2010. [26] M. Tkalčič, U. Burnik, and A. Košir. Using a↵ective parameters in a content-based recommender system for images. User Modeling and User-Adapted Interaction: The Journal of Personalization Research, pages 1–33–33, Sept. 2010. [27] M. Tkalčič, A. Odić, A. Košir, and J. Tasič. Comparison of an Emotion Detection Technique on Posed and Spontaneous Datasets. Proceedings of the 19th ERK conference, Portorož, 2010, 2010. [28] M. Tkalčič, A. Odić, A. Košir, and J. Tasič. Impact of Implicit and Explicit A↵ective Labeling on a Recommender System’s Performance. Joint Proceedings of the Workshop on Decision Making and Recommendation Acceptance Issues in Recommender Systems (DEMRA 2011) and the 2nd Workshop on User Models for Motivational Systems: The a↵ective and the rational routes to persuasion (UMMS 2011), page 112, 2011. [29] M. Tkalčič, J. Tasič, and A. Košir. The LDOS-PerA↵-1 Corpus of Face Video Clips with A↵ective and Personality Metadata. Proceedings of Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (Malta, 2010), LREC, page 111, 2009. [30] G. N. Yannakakis, J. Hallam, and H. H. Lund. Entertainment capture through heart rate activity in physical interactive playgrounds. User Modeling and User-Adapted Interaction, 18(1-2):207–243, Sept. 2008. [31] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A Survey of A↵ect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Trans. Pattern Analysis & Machine Intelligence, Vol. 31, No., 1pp:39–58, 2009. 13