Incorporating Context Correlation into Context-aware Matrix Factorization
Yong Zheng, Bamshad Mobasher, Robin Burke
Center for Web Intelligence, DePaul University
Chicago, Illinois, USA
{yzheng8, mobasher, rburke}@cs.depaul.edu
Abstract have bearing on how the rating behavior in different contexts
is modeled. In this paper, we highlight the importance
Context-aware recommender systems (CARS) go of contextual correlation, and propose a correlation-based
beyond traditional recommender systems, that only context-aware matrix factorization algorithm.
consider users’ profiles, by adapting their recom-
mendations also to users’ contextual situations.
Several contextual recommendation algorithms 2 Related Work
have been developed by incorporating context
into recommendation algorithms in different ways. In context-aware recommender systems, context is usually
The most effective approaches try to model defined as, “any information that can be used to characterize
deviations in ratings among contexts, but ignore the the situation of an entity [Abowd et al., 1999]”, e.g., time and
correlations that may exist among these contexts. companion may be two influential contexts in movie domain.
In this paper, we highlight the importance of According to the two-part classification of contextual
contextual correlations and propose a correlation- information by Adomavicius, et al. [Adomavicius et al.,
based context-aware matrix factorization algori- 2011], we are concerned with static and fully observable
thm. Through detailed experimental evaluation we contexts, where we already have a set of known contextual
demonstrate that adopting contextual correlations variables at hand which remain stable over time, and
leads to improved performance. try to model users’ contextual preferences to provide
recommendations.
Several context-aware recommendation algorithms have
1 Introduction been developed in the past decade. Typically, context can be
Recommender systems (RS) are an effective ways to alleviate taken into account using three basic strategies: pre-filtering,
information overload by tailoring recommendations to users’ post-filtering and contextual modeling [Adomavicius et al.,
personal preferences. Context-aware recommender systems 2011]. Pre-filtering techniques, such as context-aware
(CARS) emerged to go beyond user preferences and also take splitting approaches [Baltrunas and Ricci, 2014; Zheng et
into account the contextual situation of users in generating al., 2014a], simply apply contexts as filters beforehand to
recommendations. filter out irrelevant rating profiles. Post-filtering, on the
The standard formulation of the recommendation problem other hand, applies the context as filters to the recommended
begins with a two-dimens-ional (2D) matrix of ratings, items after the recommendation process. Or, contexts can
organized by user and item: Users × Items → Ratings. be used as filters in the recommendation process, such as
The key insight of context-aware recommender systems is differential context modeling [Zheng et al., 2012; 2013].
that users’ preferences for items may be a function of the In contextual modeling, predictive models are learned using
context in which those items are encountered. Incorporating the full contextual data, and context information is used
contexts requires that we estimate user preferences using in the recommendation process. Most of the recent work,
a multidimensional rating function – R: Users × Items × such as CAMF [Baltrunas et al., 2011c], tensor factorization
Contexts → Ratings [Adomavicius et al., 2011]. (TF) [Karatzoglou et al., 2010] and CSLIM [Zheng et al.,
In the past decade, a number of context-aware recom- 2014b; 2014c], belong to contextual modeling.
mendation algorithms have been developed that attempt to The most effective context-aware recommendation algo-
integrate context with recommendation algorithms. Some rithms, such as CAMF and CSLIM, usually incorporate a
of the most effective methods, such as context-aware matrix contextual rating deviation term which is used to estimate
factorization (CAMF) [Baltrunas et al., 2011c] and contex- users’ rating deviations associated with specific contexts.
tual sparse linear methods (CSLIM) [Zheng et al., 2014b; Alternatively, the contextual correlation could be another
2014c], incorporate a contextual rating deviation component way to incorporate contextual information. This idea has
into the recommendation algorithms. However, these been introduced in the context of sparse linear method in
methods generally ignore contextual correlations that may our previous work [Zheng et al., 2015], but it has not been
explored as part of recommendation models based on matrix condition “home”. In the following discussion, we continue
factorization. to use those terms and symbols to describe corresponding
contextual dimensions and conditions in the algorithms or
3 Preliminary: Matrix Factorization and equations.
Context-aware Matrix Factorization The CAMF algorithm was proposed by Baltrunas et
al. [Baltrunas et al., 2011c]. The CAMF rating prediction
In this section, we introduce matrix factorization used in function is shown in Equation 3.
recommender systems, as well as the existing research
on Context-Aware Matrix Factorization which utilizes the
L
contextual rating deviations.
Bijck,j + −
p→ →
−
X
r̂uick,1 ck,2 ...ck,L = µ + bu + u · qi (3)
3.1 Matrix Factorization j=1
Matrix factorization (MF) [Koren et al., 2009] is one of the Assume there are L contextual dimensions in total, then
most effective recommendation algorithm in the traditional ck = {ck,1 ck,2 ...ck,L } describes a contextual situation, where
recommender systems. Simply, both users and items are ck,j denotes the contextual condition in the j th context
represented by vectors. For example, − p→u is used to denote dimension. Therefore, Bijck,j indicates the contextual rating
→
−
a user vector, and qi as an item vector. The values in deviation associated with item i and the contextual condition
those vectors indicate the weights on K (e.g., K = 5) latent in the j th dimension.
factors. As a result, the rating prediction can be described by A comparison between Equation 2 and Equation 3 reveals
Equation 1. that CAMF simply replaces the item bias bi by a contextual
L
−
→
r̂ui = p →
−
u · qi (1) rating deviation term
P
Bijck,j , where it assumes that the
−
→
More specifically, the weights in pu can be viewed as how j=1
contextual rating deviation is dependent on items. Therefore,
much users like those latent factors, and the weights in → −
qi
this approach is named as CAMF CI. Alternatively, this
represent how this specific item obtains those latent factors.
deviation can also be viewed as being dependent on users,
Therefore, the dot product of those two vectors can be used
which replaces bu by the contextual rating deviation term
to indicate how much the user likes this item, where users’
resulting in the CAMF CU variant. In addition, CAMF C
preferences on items are captured by the latent factors.
algorithm assumes that the contextual rating deviation is
In addition, user and item rating biases can be added to the
independent of both users and items.
prediction function, as shown in Equation 2, where µ denotes
The parameters, such as the user and item vectors,
the global average rating in the data set, bu and bi represent
user biases and rating deviations, can be learned by the
the user bias and item bias respectively.
stochastic gradient descent (SGD) method to minimize
the rating prediction errors. In early work [Baltrunas et
r̂ui = µ + bu + bi + −
p→ →
−
u · qi (2)
al., 2011c], CAMF was demonstrated to outperform other
3.2 Context-aware Matrix Factorization contextual recommendation algorithms, such as the tensor
factorization [Karatzoglou et al., 2010].
Consider the movie recommendation example in Table 1.
There is one user U 1, one item T 1, and three contextual
dimensions – Time (weekend or weekday), Location (at 4 Correlation-Based CAMF
home or cinema) and Companion (alone, girlfriend, family). Introducing the contextual rating deviation term is an
In the following discussion, we use contextual dimension effective way to build context-aware recommendation al-
to denote the contextual variable, e.g. “Location”. The gorithms. Our earlier work [Zheng et al., 2014b; 2014c]
term contextual condition refers to a specific value in a has successfully incorporated the contextual rating deviations
dimension, e.g. “home” and “cinema” are two contextual into the sparse linear method (SLIM) to develop contextual
conditions for “Location”. A context or contextual situation SLIM (CSLIM) which was demonstrated to outperform
is, therefore, a set of contextual conditions, e.g. {weekend, the state-of-the-art contextual recommendation algorithms,
home, family}. including CAMF and tensor factorization.
As mentioned before, contextual correlation is an alternat-
Table 1: Contextual Ratings on Movies ive way to build context-aware recommendation algorithms,
User Item Rating Time Location Companion other than modeling the contextual deviations. Our recent
U1 T1 3 weekend home alone work [Zheng et al., 2015] has made the first attempt to
U1 T1 5 weekend cinema girlfriend introduce the contextual correlation (or context similarity)
U1 T1 ? weekday home family into SLIM. In this paper, we further explore ways to develop
correlation-based CAMF.
More specifically, let ck and cm denote two different The underlying assumption behind the notion of “con-
contextual situations each of which is composed of a set textual correlation (or context similarity)” is that the more
of contextual conditions. We use ck,l to denote the lth similar or correlated two contexts are, the more similar two
contextual condition in the context ck . For example, assume recommendation lists for the same user within those two
ck = {weekend, home, alone}, then ck,2 is the contextual contextual situations should be. When integrated into matrix
Table 2: Example of a Correlation Matrix
Time=Weekend Time=Weekday Location=Home Location=Cinema
Time=Weekend 1 0.54 N/A N/A
Time=Weekday 0.54 1 N/A N/A
Location=Home N/A N/A 1 0.82
Location=Cinema N/A N/A 0.82 1
factorization, the prediction function can be described with 4.1 Independent Context Similarity (ICS)
Equation 4. An example of a correlation matrix can be seen in Table 2.
With Independent Context Similarity, we only measure the
−
→
r̂uick = p →
−
u · qi · Corr(ck , cE ) (4) context correlation or similarity between two contextual
conditions when they lie on the same contextual dimension,
where cE denotes the empty contextual situation – the value e.g., we never measure the correlation between “Time =
in each contextual dimension is empty or “NA”; that is, cE,1 = Weekend” and “Location = Home”, since they are from two
cE,2 = ... = cE,L = N/A. Therefore, the function Corr(ck , cE ) different dimensions. Each pair of contextual dimensions
estimates the correlation between the cE and the contextual are assumed to be independent. In this case, the correlation
situation ck where at least one contextual condition is not between two contexts can be represented by the product of
empty or “NA”. Note that in Equation 3, the contextual rating the correlations among different dimensions. For example,
deviation can be viewed as the deviation from the empty assume ck is {Time = Weekend, Location = Home}, and cm
contextual situation to a non-empty contextual situation. is {Time = Weekday, Location = Cinema}, the correlation
Accordingly, the user and item vectors, as well as the between ck and cm can be represented by the correlation of
contextual correlations can be learned based using stochastic