=Paper=
{{Paper
|id=Vol-2554/paper10
|storemode=property
|title=Leveraging Emotion Features in News Recommendations
|pdfUrl=https://ceur-ws.org/Vol-2554/paper_10.pdf
|volume=Vol-2554
|authors=Nastaran Babanejad,Ameeta Agrawal,Heidar Davoudi,Aijun An,Manos Papagelis
|dblpUrl=https://dblp.org/rec/conf/recsys/BabanejadADAP19
}}
==Leveraging Emotion Features in News Recommendations==
Leveraging Emotion Features in News Recommendations
Nastaran Babanejad Ameeta Agrawal Heidar Davoudi
York University York University Ontario Tech University
Toronto, Canada Toronto, Canada Oshawa, Canada
nasba@eecs.yorku.ca ameeta@eecs.yorku.ca heidar.davoudi@uoit.ca
Aijun An Manos Papagelis
York University York University
Toronto, Canada Toronto, Canada
aan@eecs.yorku.ca papaggel@eecs.yorku.ca
ABSTRACT
Online news reading has become very popular as the web provides
access to news articles from millions of sources around the world.
As a specific application domain, news recommender systems aim
to give the most relevant news article recommendations to users
according to their personal interests and preferences. Recently, a
family of models has emerged that aims to improve recommenda-
tions by adapting to the contextual situation of users. These models
provide the premise of being more accurate as they are tailored to
satisfy the continuously changing needs of users. However, little
attention has been paid to the emotional context and its potential
on improving the accuracy of news recommendations. The main
objective of this paper is to investigate whether, how and to what
extent emotion features can improve recommendations. Towards
that end, we derive a large number of emotion features that can be
attributed to both items and users in the domain of news. Then, we
devise state-of-the-art emotion-aware recommendation models by
Figure 1: Illustrative example of emotions expressed in ar-
systematically leveraging these features. We conducted a thorough
ticles read by two different users, U1 (top) and U2 (bottom),
experimental evaluation on a real dataset coming from news do-
over a three month period. Can we leverage the emotional
main. Our results demonstrate that the proposed models outperform
context to improve recommendations?
state-of-the-art non-emotion-based recommendation models. Our
study provides evidence of the usefulness of the emotion features
at large, as well as the feasibility of our approach on incorporating
them to existing models to improve recommendations.
of the items and users themselves (e.g., user behavior, interests and
CCS CONCEPTS preferences) and the history of a user’s interactions with the items
through ratings, reviews, clicks and more [20, 33, 34]. However, lit-
• Information systems → Recommender systems; Sentiment
tle attention has been paid to the emotional context and its relation
analysis; • Computing methodologies → Neural networks.
to recommendations.
KEYWORDS While emotions can be manifested in various ways, we focus on
emotions expressed in textual information that is associated with
news recommender systems, contextual information, emotion fea- items or users in the system. For example, the content of a news
tures article, the content of an online review or the lyrics of a song are
good examples of textual information directly associated with an
1 INTRODUCTION item’s emotional context. On the other hand, the emotional profile
Recommender systems (RS) have widely and successfully been em- of a user can be determined through explicit or implicit feedback of
ployed in domains as diverse as news and media, entertainment, users to items. Explicit feedback, such as providing ratings and/or
e-commerce and financial services, to name a few. The main util- submitting reviews to items, can represent an accurate reflection
ity of such systems is their ability to suggest items to users that of a user’s opinion about the item, but it is considered an intrusive
they might like or find useful. Traditionally, research on recom- process that disrupts the user-system interaction and negatively
mendation algorithms has focused on improving the accuracy of impacts user experience [32]. In addition, while it might be avail-
predictive models based on a combination of descriptive features able for certain domains (e.g, product recommendations [8], movie
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons recommendations [29], etc.), it is not easily obtainable in domains
License Attribution 4.0 International (CC BY 4.0). such as news, where users typically interact with items at a fast
pace and are less inclined to provide feedback. In the absence, spar-
sity or high cost of acquisition of explicit feedback, incorporating items
Raw Data
implicit feedback, which is generally abundant and non-intrusive,
user-specific properties
might be beneficial. Therefore, we focus on indirectly capturing item-specific properties
users
the emotional context of users’ activity by monitoring their inter- user-item interactions
actions with items over time. For instance, one can monitor the
Stage 1
tone of the stories in news article users are reading. Effectively, this Feature Generation
information can be used to model a user’s historical or temporal
emotional profile.
Feature Extraction
To further motivate this, consider Figure 1 that illustrates the
emotional profiles of two users, U1 and U2 , based on eight basic Non-Emotion-based Features Emotion-based Features
item-related item-related
emotions, expressed in articles read by them over a period of three user-related user-related
months. One can notice that emotions of sadness and fear are mostly
expressed in the articles read by U1 while other emotions, such as
joy are less expressed. In addition, one can observe trends such as Feature Selection
focus of
the expression of anger increasing over time. On the other hand, this paper
for U2 , the emotions of joy and trust are mostly expressed and other Stage 2
Model Training
emotions, such as disgust are less expressed. Moreover, emotions
of fear and anticipation are increasingly expressed in the articles
read by this user. Although, the emotional tone derived from news
articles read by a user cannot justify the personality and state of Blending Stage 3
mind of the user, it can be considered as the taste or preference &
of the user, where it shows the type of articles they are more in- Ensemble
terested in. Inspired by these observations, recent advancement in
methods for emotion detection and the success of emotion-aware Item
recommendation algorithms, the main motivation of our research Predictions
is to investigate whether, how and to what extent emotion features
can improve the accuracy of recommendations.
The Problem. More formally, the recommendation task can be de- Figure 2: Overview of an emotion-aware recommendation
scribed as follows. Let a set of 𝑚 users U = {𝑢 1, 𝑢 2, ..., 𝑢𝑚 } and system and the focus of the main contributions of the paper.
a set of 𝑛 items I = {𝑖 1, 𝑖 2, ..., 𝑖𝑛 }. Let us also assume that each
user 𝑢𝑖 has already interacted with a set of items I𝑢𝑖 ⊆ I (e.g.,
consumed news articles). Then, the problem is to accurately pre- Contributions. The major contributions of this paper are as follows:
dict the probability 𝑝𝑢𝑎 ,𝑖 𝑗 with which a user 𝑢𝑎 ∈ U will like item
• We systematically identify, extract and select the most rele-
𝑖 𝑗 ∈ I \ I𝑢𝑎 . The task can also take the form of recommending a
vant emotion-based features for use in news recommenda-
set I𝑘 ⊆ I \ I𝑢𝑎 of 𝑘 items that the user will find most interesting
tion models. These features are associated with both items
(top-𝑘 recommendations). For example, in the news domain, the
(e.g., news articles) and users (e.g., readers).
task is that of recommending an unread article.
• We devise a number of state-of-the-art models for generating
Challenges & Approach. In order to evaluate the importance of recommendations that incorporate the additional emotion
the emotional context to recommendations, we had to incorporate features. These models include variations of gradient boosting
emotional features [2, 36, 45] to state-of-the-art recommendation al- decision trees, deep matrix factorization methods and deep
gorithms and evaluate their accuracy performance. Figure 2, shows neural network architectures. In addition, we use ensembling
a schematic diagram of the emotion-aware recommendation algo- methods to increase the predictive performance by blending
rithm process we designed, which consists of three main stages: i) or combining the predictions of multiple constituent models.
feature engineering, ii) model training, and iii) blending & ensemble • We propose EmoRec, an emotion-aware recommendation
learning. Each of these components, define a number of challenges model, which demonstrates the best accuracy performance
that need to be addressed. During feature engineering, we had to in news recommendation task. EmoRec itself is an ensemble
generate a number of features attributed to both users and items. model.
Emphasis was given in capturing the most important non-emotional • We conduct a thorough experimental evaluation on a real
and emotional features for the prediction task. Once features are dataset coming from news domain. Our results demonstrate
extracted, off-the-shelf feature selection methods are employed to that the emotion-aware recommendation models consis-
select a subset of them that are more relevant for use in model tently outperform state-of-the-art non-emotion-based rec-
construction. During model training, we experiment with a number ommendation models. Our study provides evidence of the
of state-of-the-art models for generating recommendations. During usefulness of the emotion features at large, as well as the
blending & ensemble we combine alternative models to obtain better feasibility of our approach on incorporating them to existing
predictive models than any of the constituent models alone. models to improve recommendations.
2 RELATED WORK
Prior research has found a range of features to be useful in the
context of news recommender systems, such as user location [15],
time of the day [26], demographic information [21], or article social
media profile [50]. However, emotion, which is one of the important
elements of human nature that has a significant impact on our
behavior and choices [49], has received little attention. A number
of studies in the area of psychology, neurology, and behavioral
sciences have shown that individuals’ choices are related to their
Figure 3: Example emotions expressed in textual content
feelings and mental moods [24].
In the context of recommender systems, one of the earliest Table 1: Emotion Resources
works [17], pointed out that emotions are crucial for users’ de-
cision making and that users transmit their decisions together with Resources Size Emotion Taxonomy
emotions. Tkalcic et al. [42] introduced a unifying framework for
WordNet-Affect [39] 4787 words Several
using emotions in user interactions with a recommender system,
ISEAR [46] 7600 sentences ISEAR
and suggested that while an implicit approach of user feedback
NRC [25] 14,182 words Plutchik
may be less accurate, it is well suited for user interaction purposes SentiWordNet 3.0 [4] 11,000+ synset Sentiments
since the user is not aware of it [41].
While emotions as features have been studied in movie recom-
mendations [28, 29], music recommendations [18] and restaurant and so on. Note that we extract emotion features for both users and
recommendations [44], to name a few, much less work has explored items.
the role of emotion features in news recommender systems. 3.1.1 Item Emotion-based Features.
Emotion in news articles has been studied for categorizing news Number of Emotion Words: This feature represents the num-
stories into eight emotion categories [3]. Specifically for recom- ber of words in an emotion lexicon (i.e., WordNet-Affect, see Table
mender systems, Parizi and Kazemifard [35] introduced a model for 1) that occur in the item (i.e., news article) more than once.
Persian news utilizing both, the emotion of news as well as user’s Ekman’s Emotion Label: We count the number of emotion
preference. More recently, Mizgajski and Morzy [23] introduced a words occurring in the text document for each emotion type (Ek-
recommender system for recommending news items by leveraging man’s six emotion categories [13]) and then the text is assigned
a multi-dimensional model of emotions, where emotion is derived an emotion label with the highest number of emotion words ap-
through user’s self-assessed reactions (i.e., explicit feedback) which pearing in the text. If more than one emotion category has the
can be considered as intrusive collection. In contrast to previous highest count, 0 is assigned to this feature, leaving the next feature
studies, our work focuses on studying the role of emotion features to indicate mixed emotions. A combination of different lexicons
in news recommender systems using implicit user feedback. (WordNet-Affect and NRC, see Table 1) is used to find the emotion
labels. We use multiple resources to have a bigger set of emotion
3 FEATURES FOR RECOMMENDATION words for each emotion.
This section describes the feature extraction procedure which is Mixed Emotions: This feature indicates whether an item has
utilized in our proposed framework. The features are grouped into more than one document-level emotion labels based on Ekman’s
two main categories: (i) emotion-based features for items and users, emotion model (i.e., if two or more emotions have the highest score,
and (ii) non-emotion-based features for items and users. this feature is valued at 1, otherwise 0). Since the initial annotation
effort (previous feature) illustrated that in many cases, a sentence
3.1 Emotion-based Features can exhibit more than one emotion, we have an additional category
called mixed emotion to account for all such instances.
The main objective of this paper is to improve the performance of Sentiment Feature: The text is classified into three categories:
recommender system by leveraging the user/item emotion features. positive, negative and neutral. We utilize the approach introduced
Figure 3 shows an example of textual content of items (i.e., an in [30] and use SentiWordNet [4].
article) in news domains. As it can be observed, there are several Interjections: This feature counts the number of interjections
words such as win and gratifying, expressing the emotion of happi- in a document. A short sound, word or phrase spoken suddenly to
ness. Moreover, interjections such as yay and oh can be indicators express an emotion, e.g., oh, look out!, ah, are called interjections1 .
of different emotions [16]. In this section, we describe how we Our preliminary analysis found that interjections were common
extract such features to improve the recommendation system ef- in quotes in news articles, which can be detected for potential
fectiveness. In order to maintain consistency, each news article is emotions.
preprocessed by tokenizing into words, removing the stopwords Capitalized Words: This feature counts the number of words
and POS-tagging to extract nouns, verbs, adverbs and adjectives. In in a document with all uppercase characters. People use capital
particular, we focus on two approaches for computing emotion fea-
tures: sentiment analysis, which classifies text into neutral, positive 1 List of interjections derived from: i) https://surveyanyplace.com/
and negative sentiments, and emotion analysis which categorizes the-ultimate-interjection-list, ii) https://7esl.com/interjections-exclamations,
text into emotions such as happiness, sadness, anger, disgust, fear and iii) https://www.thoughtco.com/interjections-in-english-1692798
words to express an emotion [43] and make it bold to the readers Table 2: List of Emotion/Non-emotion Feature Importance
(e.g., I said I am FINE).
Punctuation: Two features are included to model the occur- Emotion Features Gain Score
rence of question marks and exclamation marks repeated more
Plutchik emotion scores 3200.86
than two times in a document. Using punctuation can clarify the User emotions across items 1985.36
emotional content of the texts that are sometimes easy to miss [43]. User emotions across categories 1850.33
Grammatical Markers and Extended Words: This feature Ekman’s emotion label 1101.38
counts the number of times words with a character repeated more Punctuation 910.55
than two times (e.g., haaappy or oh yeah!!????) [7] as excessive Grammatical markers and extended words 860.13
use of letters in a word (e.g., repetition) is one way to emphasize Interjections 773.12
feelings. Capitalized words 640.21
Plutchik Emotion Scores: First, we measure the semantic re- Mixed emotions 526.97
Sentiment features 360.68
latedness score between a word 𝑊𝑖 in the text and an emotion
category 𝐶 𝑗 in the NRC lexicon (see Table 1) as follows [1]: Non-emotion Features Gain Score
v
t𝑛
Ö User latent vector 3640.87
𝑃𝑀𝐼 (𝑊𝑖 , 𝐶 𝑗 ) = 𝑛 𝑃𝑀𝐼 (𝑊𝑖 , 𝐶 𝑘𝑗 ) (1) Potential to trigger subscription 2974.46
𝑘=1 User interest in subcategory 1530.28
Topic labeling 1421.19
where 𝐶 𝑘𝑗 (𝑘 = 1 . . . 𝑛) is the 𝑘 th word of emotion category 𝐶 𝑗 . 𝑃𝑀𝐼
User spent time 1110.57
is the Pointwise Mutual Information calculated as follows: Visit count 920.53
Item topic 867.12
𝑃 (𝑊𝑖 , 𝐶 𝑘𝑗 ) Coherence 685.23
𝑃𝑀𝐼 (𝑊𝑖 , 𝐶 𝑘𝑗 ) = log (2)
𝑃 (𝑊𝑖 )𝑃 (𝐶 𝑘𝑗 ) TF-IDF 410.29
where 𝑃 (𝑊𝑖 ) and 𝑃 (𝐶 𝑘𝑗 ) are the probabilities that 𝑊𝑖 and 𝐶 𝑘𝑗 occur
in a text corpus, respectively, and 𝑃 (𝑊𝑖 , 𝐶 𝑘𝑗 ) is the probability that Topic Label: We use lda2vec [27] to generate and label the topics
in an item (i.e., document), where each generated topic is labeled
𝑊𝑖 and 𝐶 𝑘𝑗 co-occur within a sliding window in the corpus. Finally,
by one of its top 𝑘 words which is most semantically similar to
we calculate the average, maximum and minimum of score for all the other words in the top 𝑘 word list. We then label the item (i.e.,
words in the text for each emotion category and consider each as document) with the label of the most coherent topic among the top
an individual feature. 𝑚 topics of the document. The word vector of this label word is
3.1.2 User Emotion-based Features. used as the value for this feature.
As we do not have access to users’ explicit emotion towards items, TF-IDF: This feature represents items as n-grams (unigram, bi-
we develop users’ implicit emotional profile based on their historical gram, trigram) with the TF-IDF weighting approach [22].
interactions with items. By computing the emotion profile of the Coherence: We first calculate the cosine similarity scores be-
items with which a user is interacting, we derive the emotional tween all pair of words in an item using word2vec pre-trained word
taste of the user over that period of time over the set of items. vectors2 , and then record average of similarity scores, standard
User Emotions Across Items: We determine the emotion score deviation of similarity scores, the lowest score that is higher than
(i.e., Plutchik’s emotion scores) for the last accessed item before the standard deviation, and the highest score that is lower than the
subscription as well as for the last 20 items accessed by the user. standard deviation as four features.
Then, we pick the top 3 frequent emotions. Potential to Trigger Subscription: This feature represents the
User Emotions Across Categories: We determine the emotion total number of times the item was requested right before a paywall
of categories of items (e.g., sports in news domain) accessed by a was presented to a user who subsequently made a subscription [10,
user by counting the number of items assigned to an emotion in 11]. In a subscription-based item delivery model a paywall is the
a specific category, with the most frequent emotion considered as page asking for subscription before allowing an unsubscribed user
the emotion of the category. The feature is calculated for the whole to continue accessing items.
history of the user. 3.2.2 User Non-emotion-based Features.
Visit Count: We calculate the average number of items (articles)
3.2 Non-Emotion-based Features accessed by a user per visit. A visit is terminated if a user is inactive
Non-emotion-based features can also be classified into item-based for more than 30 minutes.
and user-based features. User Spent Time: Two features are represented. One is the
3.2.1 Item Non-Emotion-based Features. average time the user spent per item, and the other is the average
Item Topic: We extract topics in the article using Latent Dirich- time the user spent per visit.
let Allocation (LDA) [6]. In LDA, each topic is a distribution over User Interest in Subcategory: This feature represents the em-
words, and each document is a mixture of topics. The number of pirical probability of subcategory 𝑠 given a user 𝑢 and a category 𝑐
topics for the news articles are 112 , which were chosen empirically denoted as 𝑃 (𝑠 |𝑢, 𝑐).
to minimize the perplexity score of the LDA result. Thus, the item
topic is represented by a vector of length 112. 2 https://code.google.com/archive/p/word2vec/
For example, 𝑃 (election|𝑢, politics) can be determined by the total combination of all base models outcomes:
number of articles the user read on election over the total number
6
of articles that the user read on politics. In our experiments, the Õ
categories and subcategories were provided with the dataset and 𝛼𝑖 𝑝𝑖 (3)
𝑖
we consider only the top 50 most frequently visited subcategories
for this feature. where 𝑝𝑖 is the probability that the user is interested in the item
User Latent Vector: We calculate the latent vector for each user according to base model 𝑖 and 𝛼𝑖 is the weight of base model 𝑖
based on matrix factorization introduced in [40]. This feature is learned by XGboost/Catboost.
chosen so that we can compare our method with the Deep Matrix
Factorization model in [47], a state-of-the-art recommendation Model 2 (Deep Neural Network (DNN)): Figure 4 shows our
method, which uses this feature as input for a deep neural network. proposed Deep Neural Network architecture for leveraging the
emotion features (and other commonly available features) for the
3.3 Feature Selection recommendation purpose. The input is divided into four groups [5]:
One of the critical steps after feature extraction is to select important i) user non-emotion based features, ii) item non-emotion based
features for recommendation. Table 2 reports the most important features, ii) user emotion-based features, and iv) item emotion-based
features according to gain importance score for the news data set. features. For the categorical inputs, we utilize one-hot encoding
We evaluate feature importance by averaging over 10 training runs (the second layer is look-up embeddings mapping each categorical
of a gradient boosting machine learning model XGBoost [9] to feature to a fixed length embedding vector). In the architect “Dense
reduce variance3 . Also, the model is trained using early stopping Layer” can be formalized as: Dense(𝑥) = 𝑓 (𝑊 𝑥 + bias) where 𝑊
with a validation set to prevent over-fitting to the training data. By and 𝑏𝑖𝑎𝑠 are parameters, 𝑥 is the layer input and 𝑓 is the activation
using the zero importance function, we find features that have zero function (for linear layer 𝑓 is the identity function). We use 𝐿2
importance according to XGBoost. regularization to prevent over-fitting in embedding layer and use
back-propagation to learn the parameters.
4 RECOMMENDATION MODEL
In this section, we introduce a tailored structure of an Emotion- Model 3 (Deep Matrix Factorization (Deep MF)): Inspired by
aware Recommender System Model (EmoRec) for personalized the models proposed in [19, 47], we built our Deep MF (Figure 5)
recommendation. Our final model is an ensemble model of three to leverage extra user/item features (i.e., emotion and non-emotion
models leveraging both emotion/non-emotion-based features. We features) in the recommendation prediction task. In [47], they con-
describe the structure of the proposed model and the training meth- struct a user-item matrix with explicit ratings and implicit prefer-
ods next. ence feedback, then with this matrix as the input, they present a
deep neural architecture to learn a low dimensional space for the
4.1 Model Training representation of both users and items. In [19], by replacing the
Model 1 (Boost Model): Gradient Boosting Decision Tree (GBDT) inner product with a neural architecture, they learn an arbitrary
methods are among the most powerful machine learning approaches function to capture the interactions between user and item latent
which have been effectively used in many domains [14] including vectors. Different from their work, we focused on modeling the
recommendation [48]. The basic idea in GBDT approaches is to user/item with rich extra features, such as non-emotion and emo-
learn a set of base/weak learners (i.e., decision trees) sequentially by tion based features, as well as using embedding vectors learned in
using different training splits. More precisely, at each step, we learn our DNN model. The input of our proposed model is the same as
a new base model by fitting it to the error residuals (i.e., difference the DNN model where the categorical features are encoded using
between the current model predictions and the actual target values) one hot vectors. The second layer is the look-up embedding. In
at that step. The new model outcome is the previous model outcome this layer, we have both MF embedding vectors, which we estimate
plus the (weighted) new base learner outcome. Eventually, the final through the learning process, and DNN embedding vectors, which
model outcome is the weighted average of all base learners outcome, are concatenation of embedding vectors (for each similar input
where the weights are learned jointly with the base learners. We group) learned from DNN model (they are fixed in this model).
train two state-of-the-art GBDT models, namely, XGBoost [9] and Generalized Matrix Factorization (GMF) layer combines two em-
Catboost [12], on our training datasets with the features selected beddings using dot product and applies some non-linearity. Similar
in Section 3.3 as the input. to DNN model, the output of the model is the probability that a
XGBoost uses pre-sorted/histogram-based algorithm to compute user is interested in an item.
the best split while CatBoost uses ordered boosting, a permeation
based algorithm, to learn the weak learners effectively. Moreover, Ensemble/Blending Model: The final model EmoRec was the
XGBoost uses one-hot encoding before supplying categorical data, weighted average of the three models’ predictions. We use Nelder-
but CatBoost handles categorical features directly. We train both Mead Method [31] to find the optimum weights of each models.
models individually (three base models for each). The final model
output (i.e., probability that a user is interested in an item) is the 5 EXPERIMENTS
3 Variance refers to the sensitivity of the learning algorithm to the specifics of the In this section, we introduce the data, evaluation protocols and the
training data (e.g., the noise and specific observations). specific configurations used in our experiments.
Table 3: Results of our Models on News Dataset (F-score)
Model Non-Emo All
Single Boost Model 70.19 70.86
Boost Blend 70.69 71.50
Deep MF 72.93 73.29
Single DNN Model 70.88 73.00
DNN Ensemble 73.62 74.30
Boost Blend + Deep MF 73.07 74.98
Boost Blend + DNN Ensemble 74.00 74.23
Deep MF + DNN Ensemble 74.61 75.10
EmoRec
(Boost Blend + Deep MF + DNN Ensemble) 78.20 80.30
Figure 4: The Structure of Our DNN Model
Precision × Recall
𝐹 =2×
Precision + Recall
The F-score on a test data set is the average over all the users in
the test data set.
5.3 Comparing Recommendation Models with
and without Emotion Features
Our main objective is to see whether the use of emotion features
will boost the performance of recommendation models. For such a
purpose we run the three state-of-the-art recommendation models
described in the last section and some ensembles formed by these
models with and without emotion features. The models used in our
evaluation are as follows:
• Single Boost Model: We run XGBoost and Catboost separately
Figure 5: The Structure of Our Deep MF Model to make predictions and collect the average of their F-scores.
• Boost Blend: This is the 6-model ensemble described in Model
1 in Section 4.1.
5.1 Data • Deep MF: This is the deep matrix factorization model de-
Our experiments are conducted on a real-world news dataset. The scribed in Section 4.1.
Globe and Mail is one of the major newspapers4 in Canada. We use • Single DNN model: We run the DNN model for 5 times with
the data spanning from January to July 2014 (a 6-month period) the same hyperparameters but different random seeds and
in our experiments where the data in the first four months were collect the average result over 5 runs.
used for training, and the last two months for testing. The dataset • DNN Ensemble: An ensemble of 5 DNN models with different
contains information for 359,145 articles in total and 88,648 users hyperparameters (e.g., different learning rates, etc.) is run 5
in total, out of which 17,009 became subscribers during this period, times each with a different random seed. The average result
and 71,639 were non-subscribers. Every time a user reads an article, over the 5 runs is collected.
watches a video or generally takes an action in the news portal, the • Boost Blend + Deep MF: This is an ensemble consisting of
interaction is recorded as a hit. Typically, a hit contains information Boost Blend and Deep MF.
like date, time, user id, visited article, special events of interest like • Boost Blend + DNN Ensemble: This an ensemble consisting
subscription, sign in, and so on. of Boost Blend and DNN Ensemble.
• Deep MF + DNN Ensemble: This is an ensemble consisting of
5.2 Evaluation Metrics Deep MF and DNN Ensemble.
We use F-score to measure the predictive performance of a rec- • Boost Blend + Deep MF + DNN Ensemble: an ensemble con-
ommender system. For each user in the test data set, we use the sisting of Boost Blend, Deep MF and DNN Ensemble.
original set of read articles in the test period as the ground truth, We train each of the above models using the training data of our
denoted as 𝑇𝑔 . Assuming the set of recommended news articles data set and use the trained model to make recommendations by
for the user is 𝑇𝑟 , precision, recall, and F-measure are defined as predicting a user’s interest in an item in the test data. Table 3 shows
follows: the results (in F-score) of using these recommendation methods
|𝑇𝑔 ∩ 𝑇𝑟 | |𝑇𝑔 ∩ 𝑇𝑟 | with and without emotion features on the news data set, where the
Precision = , Recall = whole set of emotion features described in Section 3.3 is used in
|𝑇𝑟 | |𝑇𝑔 |
the results for "All", while none of the emotion features is used in
4 https://www.theglobeandmail.com/ the results for "Non-Emo". As can be seen, adding emotion features
Table 4: Comparison of EmoRec with State-of-the-art Base- Table 5: Effect of Individual Emotion Features (F-score)
lines on News Dataset (F-score)
Emotion Features News
Model Non-Emo All ALL emotion features 80.30
Basic MF 69.10 71.23 - Sentiment features 78.15
FDEN and GBDT 72.02 73.28 - Mixed emotions 76.90
Truncated SVD-based Feature Engineering 73.12 74.01 - Capitalized words 76.21
EmoRec 78.20 80.30 - Interjections 75.84
- Grammatical markers and extended words 75.23
- Ekman’s emotion label 74.98
- Punctuation 75.17
improves the predictive performance for all the recommendation - User emotions across categories 74.15
methods. Among the single recommendation models (i.e., Single - User emotions across items 73.23
Boost Model, Deep MF and Single DNN Model), Deep MF performs - Plutchik emotion scores 72.10
the best. The results also show that ensemble methods perform
better than single/component models. The best performance is
Table 6: Effect of Top Three Emotion Features (Plutchik
produced by the largest ensemble (i.e., Boost Blend + Deep MF +
emotions, User emotions across categories, and User emotions
DNN Ensemble). We refer to this best-performing model as our
across items) on State-of-the-art Models
EmoRec model.
5.4 Comparison with Other Baselines Model No Emotion Top Three Emotion
We also compare our EmoRec model with the following three state- Basic MF 69.10 70.38
of-the-art recommendation methods with well-tuned parameters Boost Blend 70.69 71.00
FDEN and GBDT 72.02 72.77
(that is, the parameters are optimally tuned to ensure the fair com-
Deep MF 72.93 73.01
parison). The objective is to investigate whether emotion features
Truncated SVD-based 73.12 73.60
can smarten up these recommender systems. A brief description of DNN Ensemble 73.62 73.98
these three models is as follows:
Basic MF: This is the simple matrix factorization model where
used for discovering latent features between two entities (i.e., user used in EmoRec. In each run of this study, we keep all the fea-
and articles) [40]. Both user preferences and item characteristics are tures except one type of emotion features. The results indicate that
mapped to latent factor vectors. Each element of the item-specific removing Plutchik emotion scores (item feature), User emotions
factor vector measures the extent to which the item possesses one across categories and User emotions across items (user features)
feature. Accordingly,each element of the user-specific factor vector lead to considerable decline in the performance. It also shows that
measures the extent of the user preferences in that feature. our model is able to capture useful implicit user emotion effectively.
FDEN and GBDT : an ensemble of different models, including To further validate the effectiveness of the top emotion features
Field-aware Deep Embedding Networks and Gradient Boosting as learned from our experiments, we run a further experiment incor-
Decision Trees [5]. The predictions of FDENs are from a bagging porating only the top three emotion features (i.e., Plutchik emotions,
ensemble using the arithmetic mean of many networks, each of User emotions across categories, and User emotions across items)
which has slight differences on hyper-parameters, including the on six state-of-the-art recommendation models. As the results in
forms of the activation. Table 6 show, only using these three emotion features can also
Truncated SVD-based Feature Engineering: a gradient boosted improve the recommender systems, with Basic MF showing the
decision trees model with truncated SVD-based embedding fea- most gain.
tures [37]. To overcome the cold start problem, a truncated SVD-
based embedding features were created using the embedding fea- 6 CONCLUSIONS
tures with four different statistical based features (users, items, Motivated by the recent development in emotion detection methods
artists and time), the final model was the weighted average of the (in textual information), we considered the problem of leveraging
five models’ predictions. emotion features to improve recommendations. Towards that end,
The results are illustrated in Table 4, which shows that emo- we derived a large number of emotion features that can be attrib-
tion features can also improve the recommendation performance uted to both items and users in news domain and can provide an
of these three state-of-the-art baselines. In addition, our EmoRec emotional context. Then, we devised state-of-the-art non-emotion
model performs significantly better than these three baselines in and emotion-aware recommendation models to investigate whether,
both cases of using emotion features and not using emotion fea- how and to what extent emotion features can improve recommen-
tures. dations. To the best of our knowledge, this is the first attempt to
systematically and broadly evaluate the utility of a number of emo-
5.5 Effect of Individual Emotion Features tion features for the recommendation task. Our results indicate
Table 5 presents the results of a feature ablation study in order that emotion-aware recommendation models consistently outper-
to further understand the effect of individual emotion features form state-of-the-art non-emotion-based recommendation models.
Furthermore, our study provided evidence of the usefulness of the (KDD ’18). ACM, New York, NY, USA, 205–214. https://doi.org/10.1145/3219819.
emotion features at large, as well as the feasibility of our approach 3219892
[11] H. Davoudi, M. Zihayat, and A. An. 2017. Time-Aware Subscription Prediction
on incorporating them to existing models to improve recommenda- Model for User Acquisition in Digital News Media. In Proceedings of the 2017
tions. SIAM International Conference on Data Mining. Society for Industrial and Ap-
plied Mathematics, Houston, Texas, USA, 135–143. https://doi.org/10.1137/1.
As a more tangible outcome of the study, we proposed EmoRec, 9781611974973.16
an emotion-aware recommendation model, which demonstrates [12] Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. 2018. CatBoost:
the best predictive performance in news recommendation task. gradient boosting with categorical features support. (Oct. 2018).
[13] Paul Ekman. 1984. Expression and the nature of emotion. Approaches to emotion
EmoRec itself is an ensemble model combining three models (Boost 3 (1984), 19–344.
Blend + Deep MF + DNN Ensemble). It significantly outperforms [14] Ji Feng, Yang Yu, and Zhi-Hua Zhou. 2018. Multi-Layered Gradient Boosting
other state-of-the-art recommendation methods evaluated in our Decision Trees. In Advances in Neural Information Processing Systems 31, S. Bengio,
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.).
experiments. We also evaluated the proposed emotion features Curran Associates, Inc., 3551–3561.
individually. Among the emotion features examined, the Plutchik [15] Blaž Fortuna, Carolina Fortuna, and Dunja Mladenić. 2010. Real-time news
recommender system. In Joint European Conference on Machine Learning and
emotion scores of items (obtained by computing PMI scores between Knowledge Discovery in Databases. Springer, 583–586.
words) and user emotion profiles (based on the emotion scores of [16] Cliff Goddard. 2014. Interjections and Emotion (with Special Reference to
the items that the user accessed) are the most important. “Surprise” and “Disgust”). Emotion Review 6, 1 (Jan. 2014), 53–63. https:
//doi.org/10.1177/1754073913491843
Employing emotional context in recommendations appears to be [17] Gustavo Gonzalez, Josep Lluis de la Rosa, Miquel Montaner, and Sonia Delfin.
a promising direction of research. While the scope of our current 2007. Embedding Emotional Context in Recommender Systems. In Proceedings
study is limited to emotions extracted by textual information, there of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
(ICDEW ’07). IEEE Computer Society, Washington, DC, USA, 845–852.
is evidence that emotions can be extracted through other means of [18] Byeong-Jun Han, Seungmin Rho, Sanghoon Jun, and Eenjun Hwang. 2010. Music
communication, such as audio and video, or other cues [38]. emotion classification and context-based music recommendation. Multimedia
Tools and Applications 47, 3 (2010), 433–460.
[19] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
7 ACKNOWLEDGEMENTS Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International
Conference on World Wide Web - WWW ’17. ACM Press, Perth, Australia, 173–182.
This work is funded by Natural Sciences and Engineering Research https://doi.org/10.1145/3038912.3052569
Council of Canada (NSERC), The Globe and Mail, and the Big Data [20] Dhruv Khattar, Vaibhav Kumar, Manish Gupta, and Vasudeva Varma. 2018. Neu-
Research, Analytics, and Information Network (BRAIN) Alliance ral Content-Collaborative Filtering for News Recommendation. In NewsIR’18
Workshop. NewsIR@ECIR, Grenoble, France, 1395–1399.
established by the Ontario Research Fund Research Excellence [21] Hong Joo Lee and Sung Joo Park. 2007. MONERS: A news recommender for the
Program (ORF-RE). We would like to thank The Globe and Mail for mobile web. Expert Systems with Applications 32, 1 (2007), 143–150.
providing the dataset used in this research. In particular, we thank [22] Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. 2009. Intro-
duction to Information Retrieval. (2009), 569.
Gordon Edall and the Data Science team of The Globe and Mail for [23] Jan Mizgajski and Mikołaj Morzy. [n.d.]. Affective recommender systems in
their insights and collaboration in our joint project. online news industry: how emotions influence reading choices. User Modeling
and User-Adapted Interaction ([n. d.]), 1–35.
[24] Saif M. Mohammad and Felipe Bravo-Marquez. 2017. WASSA-2017 Shared Task
REFERENCES on Emotion Intensity. In In Proceedings of the EMNLP 2017 Workshop on Com-
[1] Ameeta Agrawal and Aijun An. 2012. Unsupervised Emotion Detection from Text putational Approaches to Subjectivity, Sentiment, and Social Media (WASSA). As-
Using Semantic and Syntactic Relations. In 2012 IEEE/WIC/ACM International sociation for Computational Linguistics, Copenhagen, Denmark, 34–49. https:
Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE, //arxiv.org/abs/1708.03700
Macau, China, 346–353. https://doi.org/10.1109/WI-IAT.2012.170 [25] Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word–Emotion
[2] Ameeta Agrawal, Aijun An, and Manos Papagelis. 2018. Learning Emotion- Association Lexicon. Computational Intelligence 29, 3 (Aug. 2013), 436–465.
enriched Word Representations. In Proceedings of the 27th International Conference https://doi.org/10.1111/j.1467-8640.2012.00460.x
on Computational Linguistics. Association for Computational Linguistics, Santa [26] Alejandro Montes-García, Jose María Álvarez-Rodríguez, Jose Emilio Labra-Gayo,
Fe, New Mexico, USA, 950–961. https://www.aclweb.org/anthology/C18-1081 and Marcos Martínez-Merino. 2013. Towards a journalist-based news recommen-
[3] Mostafa Al Masum Shaikh, Helmut Prendinger, and Mitsuru Ishizuka. 2010. dation system: The Wesomender approach. Expert Systems with Applications 40,
Emotion Sensitive News Agent (ESNA): A system for user centric emotion sensing 17 (2013), 6735–6741.
from the news. Web Intelligence and Agent Systems 8, 4 (2010), 377–396. [27] Christopher E. Moody. 2016. Mixing Dirichlet Topic Models and Word Embed-
[4] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet dings to Make lda2vec. arXiv:1605.02019 [cs] (May 2016). http://arxiv.org/abs/
3.0: an enhanced lexical resource for sentiment analysis and opinion mining.. In 1605.02019 arXiv: 1605.02019.
Lrec, Vol. 10. 2200–2204. [28] Yashar Moshfeghi, Benjamin Piwowarski, and Joemon M Jose. 2011. Handling
[5] Bing Bai and Yushun Fan. 2017. Incorporating Field-aware Deep Embedding data sparsity in collaborative filtering using emotion and semantic based features.
Networks and Gradient Boosting Decision Trees for Music Recommendation. In In Proceedings of the 34th international ACM SIGIR conference on Research and
The 11th ACM International Conference on Web Search and Data Mining(WSDM). development in Information Retrieval. ACM, 625–634.
ACM, London, England, 7. [29] Ante Odić, Marko Tkalčič, Jurij F Tasič, and Andrej Košir. 2013. Predicting and
[6] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet detecting the relevant contextual information in a movie-recommender system.
Allocation. Journal of Machine Learning Research 3 (March 2003), 993–1022. Interacting with Computers 25, 1 (2013), 74–90.
[7] Mondher Bouazizi and Tomoaki Otsuki. 2016. A Pattern-Based Approach for [30] Sylvester Olubolu Orimaye, Saadat M. Alhashmi, and Siew Eu-gene. 2012. Sen-
Sarcasm Detection on Twitter. IEEE Access 4 (2016), 5477–5488. https://doi.org/ timent Analysis Amidst Ambiguities in Youtube Comments on Yoruba Lan-
10.1109/ACCESS.2016.2594194 guage (Nollywood) Movies. In Proceedings of the 21st International Conference on
[8] Li Chen, Guanliang Chen, and Feng Wang. 2015. Recommender Systems Based World Wide Web (WWW ’12 Companion). ACM, New York, NY, USA, 583–584.
on User Reviews: The State of the Art. User Modeling and User-Adapted Interaction https://doi.org/10.1145/2187980.2188138
25, 2 (June 2015), 99–154. https://doi.org/10.1007/s11257-015-9155-5 [31] Yoshihiko Ozaki, Masaki Yano, and Masaki Onishi. 2017. Effective hyperparameter
[9] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting optimization using Nelder-Mead method in deep learning. IPSJ Transactions on
System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Computer Vision and Applications 9, 1 (Nov. 2017), 20. https://doi.org/10.1186/
Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, s41074-017-0030-7
785–794. https://doi.org/10.1145/2939672.2939785 [32] Maja Pantic and Alessandro Vinciarelli. 2009. Implicit human-centered tagging
[10] Heidar Davoudi, Aijun An, Morteza Zihayat, and Gordon Edall. 2018. Adaptive [Social Sciences]. IEEE Signal Processing Magazine 26, 6 (2009), 173–180.
Paywall Mechanism for Digital News Media. In Proceedings of the 24th ACM [33] Manos Papagelis and Dimitris Plexousakis. 2005. Qualitative analysis of user-
SIGKDD International Conference on Knowledge Discovery & Data Mining based and item-based prediction algorithms for recommendation agents. Engi-
neering Applications of Artificial Intelligence 18, 7 (2005), 781–789.
[34] Manos Papagelis, Dimitris Plexousakis, and Themistoklis Kutsuras. 2005. Alle- Reviews.. In In Fourth International AAAI Conference on Weblogs and Social Media.
viating the sparsity problem of collaborative filtering using trust inferences. In Association for Computational Linguistics, Washington, DC, 107–116.
International Conference on Trust Management. Springer, 224–239. [44] Blanca Vargas-Govea, Gabriel González-Serna, and Rafael Ponce-Medellın. 2011.
[35] Ali Hakimi Parizi and Mohammad Kazemifard. 2015. Emotional news recom- Effects of relevant contextual features in the performance of a restaurant recom-
mender system. In 2015 Sixth International Conference of Cognitive Science (ICCS). mender system. ACM RecSys 11, 592 (2011), 56.
IEEE, 37–41. [45] Karzan Wakil, Rebwar Bakhtyar, Karwan Ali, and Kozhin Alaadin. 2015. Im-
[36] Mikhail Rumiantcev. 2017. Music adviser : emotion-driven music recommenda- proving Web Movie Recommender System Based on Emotions. International
tion ecosystem. Ph.D. Dissertation. Department of Mathematical Information Journal of Advanced Computer Science and Applications 6, 2 (2015), 9. https:
Technology Oleksiy Khriyenko. https://jyx.jyu.fi/handle/123456789/53196 //doi.org/10.14569/IJACSA.2015.060232
[37] Nima Shahbazi, Mohamed Chahhou, and Jarek Gryz. 2017. Truncated SVD-based [46] H. G. Wallbott and K. R. Scherer. 1986. How universal and specific is emotional
Feature Engineering for Music Recommendation. In The 11th ACM International experience? Evidence from 27 countries on five continents. Social Science Infor-
Conference on Web Search and Data Mining(WSDM). ACM, London, England, 7. mation 25, 4 (Dec. 1986), 763–795. https://doi.org/10.1177/053901886025004001
[38] Mohammad Soleymani, Sadjad Asghari-Esfeden, Yun Fu, and Maja Pantic. 2016. [47] Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen. 2017.
Analysis of EEG signals and facial expressions for continuous emotion detection. Deep Matrix Factorization Models for Recommender Systems. In Proceedings of
IEEE Transactions on Affective Computing 7, 1 (2016), 17–28. the Twenty-Sixth International Joint Conference on Artificial Intelligence. Inter-
[39] Carlo Strapparava, Alessandro Valitutti, and others. 2004. WordNet Affect: an national Joint Conferences on Artificial Intelligence Organization, Melbourne,
Affective Extension of WordNet.. In LREC, Vol. 4. European Language Resources Australia, 3203–3209. https://doi.org/10.24963/ijcai.2017/447
Association (ELRA), Lisbon, Portugal, 1083–1086. [48] Qian Zhao, Yue Shi, and Liangjie Hong. 2017. GB-CENT: Gradient Boosted
[40] Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. 2008. Inves- Categorical Embedding and Numerical Trees. In Proceedings of the 26th Interna-
tigation of various matrix factorization methods for large recommender systems. tional Conference on World Wide Web (WWW ’17). International World Wide Web
In 2008 IEEE International Conference on Data Mining Workshops. IEEE, 553–562. Conferences Steering Committee, Republic and Canton of Geneva, Switzerland.
[41] Marko Tkalčič, Urban Burnik, Ante Odić, Andrej Košir, and Jurij Tasič. 2012. [49] Yong Zheng, Robin Burke, and Bamshad Mobasher. 2013. The Role of Emotions
Emotion-aware recommender systems–a framework and a case study. In Interna- in Context-aware Recommendation. In RecSys workshop in conjunction with the
tional Conference on ICT Innovations. Springer, 141–150. 7th ACM conference on Recommender Systems. RecSys workshop in conjunction
[42] Marko Tkalcic, Andrej Kosir, Jurij Tasivc, and Matevž Kunaver. 2011. Affective with the 7th ACM conference on Recommender Systems, Hong Kong, China., 8.
recommender systems: the role of emotions in recommender systems. 9–13. [50] Morteza Zihayat, Anteneh Ayanso, Xing Zhao, Heidar Davoudi, and Aijun An.
[43] Oren Tsur, Dmitry Davidov, and Ari Rappoport. 2010. ICWSM-A Great Catchy 2019. A utility-based news recommendation system. Decision Support Systems
Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product 117 (2019), 14–27.