Incorporating Context Correlation into Context-aware Matrix Factorization Yong Zheng, Bamshad Mobasher, Robin Burke Center for Web Intelligence, DePaul University Chicago, Illinois, USA {yzheng8, mobasher, rburke}@cs.depaul.edu Abstract have bearing on how the rating behavior in different contexts is modeled. In this paper, we highlight the importance Context-aware recommender systems (CARS) go of contextual correlation, and propose a correlation-based beyond traditional recommender systems, that only context-aware matrix factorization algorithm. consider users’ profiles, by adapting their recom- mendations also to users’ contextual situations. Several contextual recommendation algorithms 2 Related Work have been developed by incorporating context into recommendation algorithms in different ways. In context-aware recommender systems, context is usually The most effective approaches try to model defined as, “any information that can be used to characterize deviations in ratings among contexts, but ignore the the situation of an entity [Abowd et al., 1999]”, e.g., time and correlations that may exist among these contexts. companion may be two influential contexts in movie domain. In this paper, we highlight the importance of According to the two-part classification of contextual contextual correlations and propose a correlation- information by Adomavicius, et al. [Adomavicius et al., based context-aware matrix factorization algori- 2011], we are concerned with static and fully observable thm. Through detailed experimental evaluation we contexts, where we already have a set of known contextual demonstrate that adopting contextual correlations variables at hand which remain stable over time, and leads to improved performance. try to model users’ contextual preferences to provide recommendations. Several context-aware recommendation algorithms have 1 Introduction been developed in the past decade. Typically, context can be Recommender systems (RS) are an effective ways to alleviate taken into account using three basic strategies: pre-filtering, information overload by tailoring recommendations to users’ post-filtering and contextual modeling [Adomavicius et al., personal preferences. Context-aware recommender systems 2011]. Pre-filtering techniques, such as context-aware (CARS) emerged to go beyond user preferences and also take splitting approaches [Baltrunas and Ricci, 2014; Zheng et into account the contextual situation of users in generating al., 2014a], simply apply contexts as filters beforehand to recommendations. filter out irrelevant rating profiles. Post-filtering, on the The standard formulation of the recommendation problem other hand, applies the context as filters to the recommended begins with a two-dimens-ional (2D) matrix of ratings, items after the recommendation process. Or, contexts can organized by user and item: Users × Items → Ratings. be used as filters in the recommendation process, such as The key insight of context-aware recommender systems is differential context modeling [Zheng et al., 2012; 2013]. that users’ preferences for items may be a function of the In contextual modeling, predictive models are learned using context in which those items are encountered. Incorporating the full contextual data, and context information is used contexts requires that we estimate user preferences using in the recommendation process. Most of the recent work, a multidimensional rating function – R: Users × Items × such as CAMF [Baltrunas et al., 2011c], tensor factorization Contexts → Ratings [Adomavicius et al., 2011]. (TF) [Karatzoglou et al., 2010] and CSLIM [Zheng et al., In the past decade, a number of context-aware recom- 2014b; 2014c], belong to contextual modeling. mendation algorithms have been developed that attempt to The most effective context-aware recommendation algo- integrate context with recommendation algorithms. Some rithms, such as CAMF and CSLIM, usually incorporate a of the most effective methods, such as context-aware matrix contextual rating deviation term which is used to estimate factorization (CAMF) [Baltrunas et al., 2011c] and contex- users’ rating deviations associated with specific contexts. tual sparse linear methods (CSLIM) [Zheng et al., 2014b; Alternatively, the contextual correlation could be another 2014c], incorporate a contextual rating deviation component way to incorporate contextual information. This idea has into the recommendation algorithms. However, these been introduced in the context of sparse linear method in methods generally ignore contextual correlations that may our previous work [Zheng et al., 2015], but it has not been explored as part of recommendation models based on matrix condition “home”. In the following discussion, we continue factorization. to use those terms and symbols to describe corresponding contextual dimensions and conditions in the algorithms or 3 Preliminary: Matrix Factorization and equations. Context-aware Matrix Factorization The CAMF algorithm was proposed by Baltrunas et al. [Baltrunas et al., 2011c]. The CAMF rating prediction In this section, we introduce matrix factorization used in function is shown in Equation 3. recommender systems, as well as the existing research on Context-Aware Matrix Factorization which utilizes the L contextual rating deviations. Bijck,j + − p→ → − X r̂uick,1 ck,2 ...ck,L = µ + bu + u · qi (3) 3.1 Matrix Factorization j=1 Matrix factorization (MF) [Koren et al., 2009] is one of the Assume there are L contextual dimensions in total, then most effective recommendation algorithm in the traditional ck = {ck,1 ck,2 ...ck,L } describes a contextual situation, where recommender systems. Simply, both users and items are ck,j denotes the contextual condition in the j th context represented by vectors. For example, − p→u is used to denote dimension. Therefore, Bijck,j indicates the contextual rating → − a user vector, and qi as an item vector. The values in deviation associated with item i and the contextual condition those vectors indicate the weights on K (e.g., K = 5) latent in the j th dimension. factors. As a result, the rating prediction can be described by A comparison between Equation 2 and Equation 3 reveals Equation 1. that CAMF simply replaces the item bias bi by a contextual L − → r̂ui = p → − u · qi (1) rating deviation term P Bijck,j , where it assumes that the − → More specifically, the weights in pu can be viewed as how j=1 contextual rating deviation is dependent on items. Therefore, much users like those latent factors, and the weights in → − qi this approach is named as CAMF CI. Alternatively, this represent how this specific item obtains those latent factors. deviation can also be viewed as being dependent on users, Therefore, the dot product of those two vectors can be used which replaces bu by the contextual rating deviation term to indicate how much the user likes this item, where users’ resulting in the CAMF CU variant. In addition, CAMF C preferences on items are captured by the latent factors. algorithm assumes that the contextual rating deviation is In addition, user and item rating biases can be added to the independent of both users and items. prediction function, as shown in Equation 2, where µ denotes The parameters, such as the user and item vectors, the global average rating in the data set, bu and bi represent user biases and rating deviations, can be learned by the the user bias and item bias respectively. stochastic gradient descent (SGD) method to minimize the rating prediction errors. In early work [Baltrunas et r̂ui = µ + bu + bi + − p→ → − u · qi (2) al., 2011c], CAMF was demonstrated to outperform other 3.2 Context-aware Matrix Factorization contextual recommendation algorithms, such as the tensor factorization [Karatzoglou et al., 2010]. Consider the movie recommendation example in Table 1. There is one user U 1, one item T 1, and three contextual dimensions – Time (weekend or weekday), Location (at 4 Correlation-Based CAMF home or cinema) and Companion (alone, girlfriend, family). Introducing the contextual rating deviation term is an In the following discussion, we use contextual dimension effective way to build context-aware recommendation al- to denote the contextual variable, e.g. “Location”. The gorithms. Our earlier work [Zheng et al., 2014b; 2014c] term contextual condition refers to a specific value in a has successfully incorporated the contextual rating deviations dimension, e.g. “home” and “cinema” are two contextual into the sparse linear method (SLIM) to develop contextual conditions for “Location”. A context or contextual situation SLIM (CSLIM) which was demonstrated to outperform is, therefore, a set of contextual conditions, e.g. {weekend, the state-of-the-art contextual recommendation algorithms, home, family}. including CAMF and tensor factorization. As mentioned before, contextual correlation is an alternat- Table 1: Contextual Ratings on Movies ive way to build context-aware recommendation algorithms, User Item Rating Time Location Companion other than modeling the contextual deviations. Our recent U1 T1 3 weekend home alone work [Zheng et al., 2015] has made the first attempt to U1 T1 5 weekend cinema girlfriend introduce the contextual correlation (or context similarity) U1 T1 ? weekday home family into SLIM. In this paper, we further explore ways to develop correlation-based CAMF. More specifically, let ck and cm denote two different The underlying assumption behind the notion of “con- contextual situations each of which is composed of a set textual correlation (or context similarity)” is that the more of contextual conditions. We use ck,l to denote the lth similar or correlated two contexts are, the more similar two contextual condition in the context ck . For example, assume recommendation lists for the same user within those two ck = {weekend, home, alone}, then ck,2 is the contextual contextual situations should be. When integrated into matrix Table 2: Example of a Correlation Matrix Time=Weekend Time=Weekday Location=Home Location=Cinema Time=Weekend 1 0.54 N/A N/A Time=Weekday 0.54 1 N/A N/A Location=Home N/A N/A 1 0.82 Location=Cinema N/A N/A 0.82 1 factorization, the prediction function can be described with 4.1 Independent Context Similarity (ICS) Equation 4. An example of a correlation matrix can be seen in Table 2. With Independent Context Similarity, we only measure the − → r̂uick = p → − u · qi · Corr(ck , cE ) (4) context correlation or similarity between two contextual conditions when they lie on the same contextual dimension, where cE denotes the empty contextual situation – the value e.g., we never measure the correlation between “Time = in each contextual dimension is empty or “NA”; that is, cE,1 = Weekend” and “Location = Home”, since they are from two cE,2 = ... = cE,L = N/A. Therefore, the function Corr(ck , cE ) different dimensions. Each pair of contextual dimensions estimates the correlation between the cE and the contextual are assumed to be independent. In this case, the correlation situation ck where at least one contextual condition is not between two contexts can be represented by the product of empty or “NA”. Note that in Equation 3, the contextual rating the correlations among different dimensions. For example, deviation can be viewed as the deviation from the empty assume ck is {Time = Weekend, Location = Home}, and cm contextual situation to a non-empty contextual situation. is {Time = Weekday, Location = Cinema}, the correlation Accordingly, the user and item vectors, as well as the between ck and cm can be represented by the correlation of contextual correlations can be learned based using stochastic