Affective recommender systems: the role of emotions in
                    recommender systems

                  Marko Tkalčič                       Andrej Košir                             Jurij Tasič
          University of Ljubljana Faculty       University of Ljubljana Faculty       University of Ljubljana Faculty
            of electrical engineering             of electrical engineering             of electrical engineering
              Tržaška 25, Ljubljana,                Tržaška 25, Ljubljana,                Tržaška 25, Ljubljana,
                     Slovenia                              Slovenia                              Slovenia
          marko.tkalcic@fe.uni-lj.si             andrej.kosir@fe.uni-lj.si               jurij.tasic@fe.uni-lj.si

ABSTRACT                                                             of the research community to identify the position of their
                                                                     activities and to benefit from each other’s work.
Recommender systems have traditionally relied on data-centric
descriptors for content and user modeling. In recent years
we have witnessed an increasing number of attempts to use            2.    THE UNIFYING FRAMEWORK
emotions in di↵erent ways to improve the quality of rec-                When using applications with recommender systems the
ommender systems. In this paper we introduce a unifying              user is constantly receiving various stimuli (e.g. visual, au-
framework that positions the research work, that has been            ditory etc.) that induce emotive states. These emotions
done so far in a scattered manner, in a three stage model.           influence, at least partially (according to the bounded ratio-
We provide examples of research that cover various aspects           nality model [16]) the user’s decisions on which content to
of the detection of emotions and the inclusion of emotions           choose. Thus it is important for the recommender system
into recommender systems.                                            application to detect and make good use of emotive infor-
                                                                     mation.
Categories and Subject Descriptors                                   2.1   Describing emotions
H.4 [Information Systems Applications]: Miscellaneous;                  There are two main approaches to describe the emotive
D.2.8 [Software Engineering]: Metrics—complexity mea-                state of a user: (i) the universal emotions model and (ii)
sures, performance measures                                          the dimensional model. The universal emotions model as-
                                                                     sumes there is a limited set of distinct emotional categories.
                                                                     There is no unanimity as to which are the universal emo-
Keywords                                                             tions, however, the categories proposed by Ekman [10] (i.e.
recommender systems, emotions                                        happiness, anger, sadness, fear, disgust and surprise) appear
                                                                     to be very popular. The dimensional model, on the contrary,
                                                                     describes each emotion as a point in a continuous multidi-
1.    INTRODUCTION                                                   mensional space where each dimension represents a quality
   In the pursuit of increasing the accuracy of recommender          of the emotion. The dimensions that are used most fre-
systems, researchers started to turn to more user-centric            quently are valence, arousal and dominance (thus the VAD
content descriptors in recent years. The advances made in            acronym) although some authors refer to these dimensions
a↵ective computing, especially in automatic emotion detec-           with di↵erent names (e.g. pleasure instead of valence in
tion techniques, paved the way for the exploitation of emo-          [20] or activation instead of arousal in [13]). The circum-
tions and personality as descriptors that account for a larger       plex model, proposed by Posner et al. [24], maps the basic
part of variance in user preferences than the generic descrip-       emotions into the VAD space (as depicted in Fig. 1)
tors (e.g. genre) used so far.
   However, these research e↵orts have been conducted in-            2.2   The role of emotions in the consumption
dependently, stretched among the two major research areas,                 chain
recommender systems and a↵ective computing. In this pa-
                                                                        During the user interaction with a recommender system
per we (i) survey the research work that helps improving
                                                                     and the content consumption that follows, emotions play
recommender systems with a↵ective information and (ii) we
                                                                     di↵erent roles in di↵erent stages of the process. We divided
provide a unifying framework that will allow the members
                                                                     the user interaction process in three stages, based on the
                                                                     role that emotions play (as shown in Fig. 2): (i) the entry
                                                                     stage, (ii) the consumption stage and (iii) the exit stage.
                                                                        The work surveyed in this paper can be divided in two

                                                                    main categories: (i) generic emotion detection algorithms
                                                                    (that can be used in all three stages) and (ii) usage of emo-
                                                                    tion parameters in the various stages. This paper does not
                                                                    aim at providing an overall survey of related work but rather

Decisions@RecSys 2011, Chicago, USA                                  to point out good examples of how to address various as-
                                                                    pects of recommender systems with the usage of techniques


                                                                 9
                                                                                                           not aware of it. Furthermore, Pantić et al. [22] argued that
                                                                                                           explicit acquisition of users’ a↵ect has further negative prop-
                                                          Arousal                                          erties as users may have side-interests that drive their ex-
                                                         high                                              plicit a↵ective labeling process (egoistic tagging, reputation-
                                                                                                           driven tagging or asocial tagging).
                                                                      joy
                                                                                                              The most commonly used procedure for the explicit asess-
                                 anger                                                                     ment of emotions is the Self Assessment Manikin (SAM) de-
                                                         surprise
                                                                                                           veloped by [7]. It is a questionnaire where users assess their
                                         disgust                                                           emotional state in the three dimensions: valence, arousal
                                                                                                           and dominance.
                                  fear                                                                        The implicit acquisition of emotions is usually done through
                                                                                        Valence
                                                   neutral                                                 a variety of modalities and sensors: video cameras, speech,
negative                                                                                 positive          EEG, ECG etc. These sensors measure various changes
                             sadness
                                                                                                           of the human body (e.g. facial changes, posture changes,
                                                                                                           changes in the skin conductance etc.) that are known to be
                                                                                                           related to specific emotions. For example, the Facial Ac-
                                                                                                           tion Coding System (FACS), proposed by Ekman [9], maps
                                                                                                           emotions to changes of facial characteristic poionts. There
                                                                                                           are excellent surveys on the topic of multimodal emotion de-
                                                                                                           tection: [31, 22, 14]. In general, raw data is acquired from
                                                          low                                              one or more sensors during the user interaction. These sig-
                                                                                                           nals are processed to extract some low level features (e.g.
                                                                                                           Gabor based features are popular in the processing of fa-
Figure 1: Basic emotions in the valence-arousal                                                            cial expression video signals). Then some kind of classifi-
plane of the dimensional model                                                                             cation or regression technique is applied to yield distinct
                                                                                                           emotional classes or continuous values. The accuracy of
                                                                                                           emotion detection ranges from over 90% on posed datasets
                                                                                                           (like the Kanade-Cohn dataset [18]) to slightly better than
borrowed from a↵ective computing.                                                                          coin tossing on spontaneous datasets (like the LDOS-PerA↵-
                                                                                                           1 dataset [29]) [27, 6].
                                         time


     Entry mood                                       Content-induced affective state     Exit mood        4.   ENTRY STAGE
                                                                                                              The first part of the proposed framework (see Fig. 2) is
                       choice
                                                                                                           the entry stage. When a user starts to use a recommender
                                                                                                           system, she is in an a↵ective state, the entry mood. The
                                                                                                           entry mood is caused by some previous user’s activities, un-
      Detect                                                                               Detect
      entry
                        Give
                   recommendations
                                              Give
                                            content
                                                                     Observe user           exit           known to the system. When the recommender system sug-
      mood                                                                                 mood
                                                                                                           gests a limited amount of content items to the user, the entry
                                                                                                           mood influences the user’s choice. In fact, the user’s decision
                                                                                                           making process depends on two types of cognitive processes,
                                                Content application
                                                                                                           the rational and the intuitive, the latter being strongly in-
                                                                                                           fluenced by the emotive state of the user, as explained by
                                                                                                           the bounded rationality paradigm [16]. For example, a user
               Entry stage                             Consumption stage                Exit stage         might want to consume a di↵erent type of content when she
                                                                                                           is happy than when she is sad. In order to adapt the list
                                                                                                           of recommended items to the user’s entry mood the system
Figure 2: The unifying framework: the role of emo-                                                         must be able to detect the mood and to use it in the content
tions in user interaction with a recommender sys-                                                          filtering algorithm as contextual information.
tem.                                                                                                          In the entry part of user-RS interaction one of the aspects
                                                                                                           where emotions can be exploited is to influence the user’s
   In the remainder of the paper we address each stage sepa-                                               choice. Creed [8] explored how the way we represent infor-
rately by surveying the existing research work and providing                                               mation influences the user’s choices.
lists of open research areas. At the end we discuss the pro-                                                  It has been observed by Porayska-Pomsta et al. [23] that
posed framework and give the final conclusions.                                                            in tutoring systems there is a strong relation between the
                                                                                                           entry mood and learning. They analysed the actions that a
                                                                                                           human tutor took when the student showed signs of specific
3.        DETECTING AFFECTIVE STATES                                                                       a↵ective states to improve the e↵ectiveness of an interactive
   A↵ective states of end users (in any stage of the proposed                                              learning environment.
interaction chain) can be detected in two ways: (i) explic-                                                   A user modeling approach that maps a touristic attraction
itly or (ii) implicitly. The implicit detection of emotions is                                             with a piece of music that induces a related emotion has been
more accurate but it’s an intrusive process that breaks the                                                developed by Kaminskas and Ricci [17]. Their goal was to
interaction. The implicit approach is less accurate but it’s                                               find an appropriate musical score that would reinforce the
well suited for user interaction purposes since the user is                                                a↵ective state induced by the touristic attraction.


                                                                                                      10
  Using the entry mood as a contextual parameter (as de-               anticipate that implicit tagging can be used for user profiling
scribed by Adomavicius and Tuzhilin in [2]) could improve              in recommender systems.
the recommender’s performance. Both Koren et al. [19] and                 Joho et al. [15] used emotion detection from facial ex-
Baltrunas et al. [5] suggest using the matrix factorization            pressions to provide an a↵ective profile of video clips. They
approach and enrich it with contextual parameters. At the              used an item profile structure that labels changes of users
context-aware recommender systems contest in 2010 1 the                emotions through time relative to the video clip start. The
goal was to select a number of movies that would fit the               authors used their approach for summarizing highlights of
user’s entry mood. The contest winners’ contribution, Shi              video clips.
et al. [25] used several approaches amog which the best was               Hanjalić et al. [11] approached the summarization of video
the joint matrix factorization model with a mood-specific              highlights from the other side: they used the source’s low
regularization.                                                        level features (audio and video) to detect higlihjts without
  As an extension to the usage of emotions as contextual               taking into account the responses of end users.
information an interesting research area is to diversify the              The research work described so far in this section is in-
recommendations. For example, if a user is sad, would it be            teresting because allows us to model the content items (im-
better to recommend happy content to cheer her up or to rec-           ages, movies, music etc.) with a↵ective labels. These a↵ec-
ommend sad content to be in line with the current mood?                tive labels describe the emotions experienced by the users
Research on information retrieval results diversification is           who consume the items. In our previous work [26] we have
getting increased attention, especially after the criticism of         shown that the usage of such a↵ective labels over generic
the recommendation bubble has started2 . Although we are               labels (e.g. genre) significantly improves the performance of
not aware of any work done on results diversification con-             a content-based recommender system for images. We used
nected with emotions, a fair amount of work has been done              explicitly acquired a↵ective metadata to model the items
on political news aggregators in order to stimulate political          and the users’ preferences. However, in another experiment
pluralism [21].                                                        [28], where we used implicitly acquired a↵ective metadata,
                                                                       the accuracy of the recommender system was significantly
5.         CONSUMPTION STAGE                                           lower but still better than with generic metadata only.
                                                                          In a similar experiment, Arapakis et al. [4] built a recom-
  The second part of the proposed framework is the con-
                                                                       mender system that uses real time emotion detection infor-
sumption stage (see Fig. 2). After the user starts with
                                                                       maion.
the consumption of the content she experiences a↵ective re-
sponses that are induced by the content. Depending on the
type of content, these responses can be (i) single values (e.g.        6.   EXIT STAGE
the emotive response to watching an image) or (ii) a vec-                 After the user has finished with the content consumption
tor of emotions that change over time (e.g. while watch-               she is in what we call the exit mood. The main di↵erence
ing a movie or a sequence of images). Figure 3 shows how               between the consumption stage and the exit stage is that the
emotions change over time in the consumption stage. The                exit mood will influence the user’s next actions, thus having
automatic detection of emotions can help building emotive              an active part, while in the consumption stage the induced
profiles of users and content items that can be exploited for          emotions did not influence any actions but were a passive
content-based recommender algorithms.                                  response to the stimuli. In case that the user continues to
    E
                                                                       use the recommender system the exit mood for the content
             tT
                                                                       just consumed is the entry mood for the next content to be
    ✏4
                                                                       consumed.
                                                                          The automatic detection of the exit mood can be useful as
    ✏3
                                                                       an indicator of the user’s satisfaction with the content. Thus
    ✏2                                                                 the detection of the exit mood can be seen as an unobtrusive
                                                                       feedback collection technique.
    ✏1
                                                                          Arapakis et al. [3] used the exit mood, detected through
                                                                       videos of users’ facial expressions, as an implicit feedback in
    ✏N
                                                                       their recommender system for video sequences.
                                                                          In an experiment with games, Yannakakis et al. [30], used
                                                             t
         t(h1 )   t(h2 )       t(h3 )   t(h4 )                         heart rate activity to infer the “fun” that the subjects expe-
                                                                       rience in physical interactive playgrounds.
Figure 3: The user’s emotional state ✏ is continu-
ously changing as the time sequence of the visual                      7.   OPEN RESEARCH AREAS
stimuli hi 2 H induce di↵erent emotions.
                                                                          We identified four main areas where further research should
                                                                       be conducted in order to build true a↵ective recommender
  Using emotional responses for generating implicit a↵ective
                                                                       systems: (i) using emotions as context in the entry stage,
tags for content is the main research area in the consumption
                                                                       (ii) modeling a↵ective content profiles, (iii) using a↵ective
section. Pantić et al. [22] argued why the usage of automatic
                                                                       profiles for recommending content and (iv) building a set of
emotion detection methods improves content tagging: the
                                                                       datasets.
minimization of the drawbacks caused by egoistic tagging,
                                                                          Although some work has been carried out on exploiting
reputation-driven tagging and asocial tagging. They also
                                                                       the entry mood we believe that there is still the need to
1
    http://www.dai-labor.de/camra2010/                                 answer tha basic question of the entry stage: which items
2
    http://www.thefilterbubble.com/                                    to recommend when the user is in the emotive state A?. We


                                                                  11
further believe that there are firm di↵erences between what            [6] M. S. Bartlett, G. C. Littlewort, M. G. Frank,
the user wants now and what is good for a user on a long                   C. Lainscsek, I. R. Fasel, and J. R. Movellan.
run. Thus bringing the research on results diversification                 Automatic Recognition of Facial Actions in
(see the work done in [21, 1]) into a↵ective recommender                   Spontaneous Expressions. Journal of Multimedia,
systems is a highly important topic.                                       1(6):22–35, Sept. 2006.
   A↵ective content profiling is still an open question, espe-         [7] M. Bradley and P. Lang. Measuring emotion: the
cially profiling content items that last longer than a single              self-assessment manikin and the semantic di↵erential.
emotive response. The time dependancy of content profiles                  Journal of behavior therapy and experimental
has also o strong impact on the algorithms that exploit the                psychiatry, 25(1):49–59, 1994.
profiles for recommending items.                                       [8] C. Creed and R. Beale. Using emotion simulation to
   With the except of the LDOS-PerA↵-1 dataset [29] (which                 influence user attitudes and behavior. HCI workshop
is limited in the amount of content items and users), the                  at BCS 2005, pages 1–3, 2005.
research community does not have a suitable dataset upon               [9] P. Ekman. Facial expression and emotion. American
which to work. It is thus required that a large-scale dataset,             Psychologist, 48(4):384, 1993.
compareable to the MovieLens or Netflix datasets, is built.           [10] P. Ekman. Basic Emotions. In Handbook of Cognition
                                                                           and Emotion, pages 45—-60. 1999.
8.   CONCLUSION                                                       [11] A. Hanjalic. Adaptive extraction of highlights from a
   In this paper we have provided a framework that describes               sport video based on excitement modeling. IEEE
three ways in which emotions can be used to improve the                    Transactions on Multimedia, 7(6):1114–1122, Dec.
quality of recommender systems. We also surveyed some                      2005.
work that deals with parts of the isuues that arise in te             [12] J. L. Herlocker, J. A. Konstan, L. Terveen, and J. T.
pursuit of a↵ective recommender systems.                                   Riedl. Evaluating collaborative filtering recommender
   An important issue in recommender systems, especially                   systems. ACM Trans. Inf. Syst, 22(1):5–53, 2004.
when it comes to user-centric systems, is to move from data-          [13] S. V. Ioannou, A. T. Raouzaiou, V. a. Tzouvaras,
centric assessment criteria to user-centred assessment crite-              T. P. Mailis, K. C. Karpouzis, and S. D. Kollias.
ria. We have not addressed this issue in this paper as it                  Emotion recognition through facial expression analysis
appears to larger dimensions. The recsys community has so                  based on a neurofuzzy network. Neural networks : the
far relied on metrics borrowed from information retrieval:                 official journal of the International Neural Network
confusion matrices, precision, recall etc. (see [12] for an                Society, 18(4):423–35, May 2005.
overview). However recommender systems are used by end                [14] A. Jaimes and N. Sebe. Multimodal human–computer
users and thus the assessment of the end users should be                   interaction: A survey. Computer Vision and Image
taken more into account. We suggest to move towards met-                   Understanding, 108(1-2):116–134, 2007.
rics that take into account the user experience as pointed            [15] H. Joho, J. M. Jose, R. Valenti, and N. Sebe.
out in http://www.usabart.nl/portfolio/                                    Exploiting facial expressions for a↵ective video
KnijnenburgWillemsen-UMUAI2011_UIRecSy.pdf.                                summarisation. Proceeding of the ACM International
                                                                           Conference on Image and Video Retrieval - CIVR ’09,
9.   REFERENCES                                                            page 1, 2009.
 [1] G. Adomavicius and Y. Kwon. Improving Aggregate                  [16] D. Kahneman. A perspective on judgment and choice:
     Recommendation Diversity Using Ranking-Based                          mapping bounded rationality. The American
     Techniques. IEEE Transactions on Knowledge and                        psychologist, 58(9):697–720, Sept. 2003.
     Data Engineering, (99):1–1, 2011.                                [17] M. Kaminskas and F. Ricci. Location-Adapted Music
 [2] G. Adomavicius, R. Sankaranarayanan, S. Sen, and                      Recommendation Using Tags. User Modeling,
     A. Tuzhilin. Incorporating contextual information in                  Adaption and Personalization, pages 183–194, 2011.
     recommender systems using a multidimensional                     [18] T. Kanade, J. Cohn, and Y. Tian. Comprehensive
     approach. ACM Transactions on Information Systems                     database for facial expression analysis. In Automatic
     (TOIS), 23(1):103–145, 2005.                                          Face and Gesture Recognition, 2000. Proceedings.
 [3] I. Arapakis, J. Jose, and P. Gray. A↵ective feedback:                 Fourth IEEE International Conference on, pages
     an investigation into the role of emotions in the                     46–53. IEEE, 2000.
     information seeking process. Proceedings of the 31st             [19] Y. Koren. Collaborative filtering with temporal
     annual international ACM SIGIR conference on                          dynamics. Communications of the ACM, 53(4):89,
     Research and development in information retrieval,                    Apr. 2010.
     (January):395–402, 2008.                                         [20] A. Mehrabian. Pleasure-arousal-dominance: A general
 [4] I. Arapakis, Y. Moshfeghi, H. Joho, R. Ren,                           framework for describing and measuring individual
     D. Hannah, and J. M. Jose. Enriching user profiling                   di↵erences in Temperament. Current Psychology,
     with a↵ective features for the improvement of a                       14(4):261–292, Dec. 1996.
     multimodal recommender system. Proceeding of the                 [21] S. Munson, D. X. Zhou, and P. Resnick. Sidelines: An
     ACM International Conference on Image and Video                       algorithm for increasing diversity in news and opinion
     Retrieval - CIVR ’09, (i):1, 2009.                                    aggregators. Proceedings of ICWSM09 Conference on
 [5] L. Baltrunas. Exploiting contextual information in                    Weblogs and Social Media. San Jose, CA., 2009.
     recommender systems. Proceedings of the 2008 ACM                 [22] M. Pantic and A. Vinciarelli. Implicit human-centered
     conference on Recommender systems - RecSys ’08,                       tagging [Social Sciences. IEEE Signal Processing
     page 295, 2008.


                                                                 12
     Magazine, 26(6):173–180, Nov. 2009.
[23] K. Porayska-Pomsta, M. Mavrikis, and H. Pain.
     Diagnosing and acting on student a↵ect: the tutor’s
     perspective. User Modeling and User-Adapted
     Interaction: The Journal of Personalization Research,
     18(1-2):125–173, 2007.
[24] J. Posner, J. a. Russell, and B. S. Peterson. The
     circumplex model of a↵ect: an integrative approach to
     a↵ective neuroscience, cognitive development, and
     psychopathology. Development and psychopathology,
     17(3):715–34, Jan. 2005.
[25] Y. Shi, M. Larson, and A. Hanjalic. Mining
     mood-specific movie similarity with matrix
     factorization for context-aware recommendation.
     Proceedings of the Workshop on Context-Aware Movie
     Recommendation, pages 34–40, 2010.
[26] M. Tkalčič, U. Burnik, and A. Košir. Using a↵ective
     parameters in a content-based recommender system
     for images. User Modeling and User-Adapted
     Interaction: The Journal of Personalization Research,
     pages 1–33–33, Sept. 2010.
[27] M. Tkalčič, A. Odić, A. Košir, and J. Tasič.
     Comparison of an Emotion Detection Technique on
     Posed and Spontaneous Datasets. Proceedings of the
     19th ERK conference, Portorož, 2010, 2010.
[28] M. Tkalčič, A. Odić, A. Košir, and J. Tasič. Impact of
     Implicit and Explicit A↵ective Labeling on a
     Recommender System’s Performance. Joint
     Proceedings of the Workshop on Decision Making and
     Recommendation Acceptance Issues in Recommender
     Systems (DEMRA 2011) and the 2nd Workshop on
     User Models for Motivational Systems: The a↵ective
     and the rational routes to persuasion (UMMS 2011),
     page 112, 2011.
[29] M. Tkalčič, J. Tasič, and A. Košir. The
     LDOS-PerA↵-1 Corpus of Face Video Clips with
     A↵ective and Personality Metadata. Proceedings of
     Multimodal Corpora: Advances in Capturing, Coding
     and Analyzing Multimodality (Malta, 2010), LREC,
     page 111, 2009.
[30] G. N. Yannakakis, J. Hallam, and H. H. Lund.
     Entertainment capture through heart rate activity in
     physical interactive playgrounds. User Modeling and
     User-Adapted Interaction, 18(1-2):207–243, Sept. 2008.
[31] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang.
     A Survey of A↵ect Recognition Methods: Audio,
     Visual, and Spontaneous Expressions. IEEE Trans.
     Pattern Analysis & Machine Intelligence, Vol. 31,
     No., 1pp:39–58, 2009.


                                                                   13