=Paper= {{Paper |id=Vol-1440/paper5 |storemode=property |title=Incorporating Context Correlation into Context-aware Matrix Factorization |pdfUrl=https://ceur-ws.org/Vol-1440/Paper5.pdf |volume=Vol-1440 |dblpUrl=https://dblp.org/rec/conf/ijcai/ZhengMB15 }} ==Incorporating Context Correlation into Context-aware Matrix Factorization == https://ceur-ws.org/Vol-1440/Paper5.pdf
    Incorporating Context Correlation into Context-aware Matrix Factorization

                                Yong Zheng, Bamshad Mobasher, Robin Burke
                                 Center for Web Intelligence, DePaul University
                                             Chicago, Illinois, USA
                                  {yzheng8, mobasher, rburke}@cs.depaul.edu


                        Abstract                               have bearing on how the rating behavior in different contexts
                                                               is modeled. In this paper, we highlight the importance
    Context-aware recommender systems (CARS) go                of contextual correlation, and propose a correlation-based
    beyond traditional recommender systems, that only          context-aware matrix factorization algorithm.
    consider users’ profiles, by adapting their recom-
    mendations also to users’ contextual situations.
    Several contextual recommendation algorithms               2   Related Work
    have been developed by incorporating context
    into recommendation algorithms in different ways.          In context-aware recommender systems, context is usually
    The most effective approaches try to model                 defined as, “any information that can be used to characterize
    deviations in ratings among contexts, but ignore the       the situation of an entity [Abowd et al., 1999]”, e.g., time and
    correlations that may exist among these contexts.          companion may be two influential contexts in movie domain.
    In this paper, we highlight the importance of                  According to the two-part classification of contextual
    contextual correlations and propose a correlation-         information by Adomavicius, et al. [Adomavicius et al.,
    based context-aware matrix factorization algori-           2011], we are concerned with static and fully observable
    thm. Through detailed experimental evaluation we           contexts, where we already have a set of known contextual
    demonstrate that adopting contextual correlations          variables at hand which remain stable over time, and
    leads to improved performance.                             try to model users’ contextual preferences to provide
                                                               recommendations.
                                                                   Several context-aware recommendation algorithms have
1   Introduction                                               been developed in the past decade. Typically, context can be
Recommender systems (RS) are an effective ways to alleviate    taken into account using three basic strategies: pre-filtering,
information overload by tailoring recommendations to users’    post-filtering and contextual modeling [Adomavicius et al.,
personal preferences. Context-aware recommender systems        2011]. Pre-filtering techniques, such as context-aware
(CARS) emerged to go beyond user preferences and also take     splitting approaches [Baltrunas and Ricci, 2014; Zheng et
into account the contextual situation of users in generating   al., 2014a], simply apply contexts as filters beforehand to
recommendations.                                               filter out irrelevant rating profiles. Post-filtering, on the
   The standard formulation of the recommendation problem      other hand, applies the context as filters to the recommended
begins with a two-dimens-ional (2D) matrix of ratings,         items after the recommendation process. Or, contexts can
organized by user and item: Users × Items → Ratings.           be used as filters in the recommendation process, such as
The key insight of context-aware recommender systems is        differential context modeling [Zheng et al., 2012; 2013].
that users’ preferences for items may be a function of the     In contextual modeling, predictive models are learned using
context in which those items are encountered. Incorporating    the full contextual data, and context information is used
contexts requires that we estimate user preferences using      in the recommendation process. Most of the recent work,
a multidimensional rating function – R: Users × Items ×        such as CAMF [Baltrunas et al., 2011c], tensor factorization
Contexts → Ratings [Adomavicius et al., 2011].                 (TF) [Karatzoglou et al., 2010] and CSLIM [Zheng et al.,
   In the past decade, a number of context-aware recom-        2014b; 2014c], belong to contextual modeling.
mendation algorithms have been developed that attempt to           The most effective context-aware recommendation algo-
integrate context with recommendation algorithms. Some         rithms, such as CAMF and CSLIM, usually incorporate a
of the most effective methods, such as context-aware matrix    contextual rating deviation term which is used to estimate
factorization (CAMF) [Baltrunas et al., 2011c] and contex-     users’ rating deviations associated with specific contexts.
tual sparse linear methods (CSLIM) [Zheng et al., 2014b;       Alternatively, the contextual correlation could be another
2014c], incorporate a contextual rating deviation component    way to incorporate contextual information. This idea has
into the recommendation algorithms.         However, these     been introduced in the context of sparse linear method in
methods generally ignore contextual correlations that may      our previous work [Zheng et al., 2015], but it has not been
explored as part of recommendation models based on matrix         condition “home”. In the following discussion, we continue
factorization.                                                    to use those terms and symbols to describe corresponding
                                                                  contextual dimensions and conditions in the algorithms or
3     Preliminary: Matrix Factorization and                       equations.
      Context-aware Matrix Factorization                             The CAMF algorithm was proposed by Baltrunas et
                                                                  al. [Baltrunas et al., 2011c]. The CAMF rating prediction
In this section, we introduce matrix factorization used in        function is shown in Equation 3.
recommender systems, as well as the existing research
on Context-Aware Matrix Factorization which utilizes the
                                                                                                         L
contextual rating deviations.
                                                                                                               Bijck,j + −
                                                                                                                         p→   →
                                                                                                                              −
                                                                                                         X
                                                                      r̂uick,1 ck,2 ...ck,L = µ + bu +                    u · qi   (3)
3.1        Matrix Factorization                                                                          j=1

Matrix factorization (MF) [Koren et al., 2009] is one of the         Assume there are L contextual dimensions in total, then
most effective recommendation algorithm in the traditional        ck = {ck,1 ck,2 ...ck,L } describes a contextual situation, where
recommender systems. Simply, both users and items are             ck,j denotes the contextual condition in the j th context
represented by vectors. For example, −    p→u is used to denote   dimension. Therefore, Bijck,j indicates the contextual rating
                       →
                       −
a user vector, and qi as an item vector. The values in            deviation associated with item i and the contextual condition
those vectors indicate the weights on K (e.g., K = 5) latent      in the j th dimension.
factors. As a result, the rating prediction can be described by      A comparison between Equation 2 and Equation 3 reveals
Equation 1.                                                       that CAMF simply replaces the item bias bi by a contextual
                                                                                            L
                                −
                                →
                         r̂ui = p     →
                                      −
                                  u · qi                    (1)   rating deviation term
                                                                                            P
                                                                                               Bijck,j , where it assumes that the
                                       −
                                       →
   More specifically, the weights in pu can be viewed as how                              j=1
                                                                  contextual rating deviation is dependent on items. Therefore,
much users like those latent factors, and the weights in →   −
                                                             qi
                                                                  this approach is named as CAMF CI. Alternatively, this
represent how this specific item obtains those latent factors.
                                                                  deviation can also be viewed as being dependent on users,
Therefore, the dot product of those two vectors can be used
                                                                  which replaces bu by the contextual rating deviation term
to indicate how much the user likes this item, where users’
                                                                  resulting in the CAMF CU variant. In addition, CAMF C
preferences on items are captured by the latent factors.
                                                                  algorithm assumes that the contextual rating deviation is
   In addition, user and item rating biases can be added to the
                                                                  independent of both users and items.
prediction function, as shown in Equation 2, where µ denotes
                                                                     The parameters, such as the user and item vectors,
the global average rating in the data set, bu and bi represent
                                                                  user biases and rating deviations, can be learned by the
the user bias and item bias respectively.
                                                                  stochastic gradient descent (SGD) method to minimize
                                                                  the rating prediction errors. In early work [Baltrunas et
                    r̂ui = µ + bu + bi + −
                                         p→   →
                                              −
                                          u · qi           (2)
                                                                  al., 2011c], CAMF was demonstrated to outperform other
3.2        Context-aware Matrix Factorization                     contextual recommendation algorithms, such as the tensor
                                                                  factorization [Karatzoglou et al., 2010].
Consider the movie recommendation example in Table 1.
There is one user U 1, one item T 1, and three contextual
dimensions – Time (weekend or weekday), Location (at              4   Correlation-Based CAMF
home or cinema) and Companion (alone, girlfriend, family).        Introducing the contextual rating deviation term is an
In the following discussion, we use contextual dimension          effective way to build context-aware recommendation al-
to denote the contextual variable, e.g. “Location”. The           gorithms. Our earlier work [Zheng et al., 2014b; 2014c]
term contextual condition refers to a specific value in a         has successfully incorporated the contextual rating deviations
dimension, e.g. “home” and “cinema” are two contextual            into the sparse linear method (SLIM) to develop contextual
conditions for “Location”. A context or contextual situation      SLIM (CSLIM) which was demonstrated to outperform
is, therefore, a set of contextual conditions, e.g. {weekend,     the state-of-the-art contextual recommendation algorithms,
home, family}.                                                    including CAMF and tensor factorization.
                                                                     As mentioned before, contextual correlation is an alternat-
              Table 1: Contextual Ratings on Movies               ive way to build context-aware recommendation algorithms,
    User     Item   Rating    Time      Location   Companion      other than modeling the contextual deviations. Our recent
     U1       T1      3      weekend      home        alone       work [Zheng et al., 2015] has made the first attempt to
     U1       T1      5      weekend     cinema     girlfriend    introduce the contextual correlation (or context similarity)
     U1       T1      ?      weekday      home       family       into SLIM. In this paper, we further explore ways to develop
                                                                  correlation-based CAMF.
  More specifically, let ck and cm denote two different              The underlying assumption behind the notion of “con-
contextual situations each of which is composed of a set          textual correlation (or context similarity)” is that the more
of contextual conditions. We use ck,l to denote the lth           similar or correlated two contexts are, the more similar two
contextual condition in the context ck . For example, assume      recommendation lists for the same user within those two
ck = {weekend, home, alone}, then ck,2 is the contextual          contextual situations should be. When integrated into matrix
                                                   Table 2: Example of a Correlation Matrix
                                               Time=Weekend           Time=Weekday    Location=Home      Location=Cinema
                         Time=Weekend                1                    0.54              N/A                N/A
                         Time=Weekday              0.54                    1                N/A                N/A
                         Location=Home             N/A                    N/A                1                 0.82
                        Location=Cinema            N/A                    N/A              0.82                  1


factorization, the prediction function can be described with                  4.1    Independent Context Similarity (ICS)
Equation 4.                                                                   An example of a correlation matrix can be seen in Table 2.
                                                                              With Independent Context Similarity, we only measure the
                            −
                            →
                   r̂uick = p     →
                                  −
                              u · qi · Corr(ck , cE )                  (4)    context correlation or similarity between two contextual
                                                                              conditions when they lie on the same contextual dimension,
where cE denotes the empty contextual situation – the value                   e.g., we never measure the correlation between “Time =
in each contextual dimension is empty or “NA”; that is, cE,1 =                Weekend” and “Location = Home”, since they are from two
cE,2 = ... = cE,L = N/A. Therefore, the function Corr(ck , cE )               different dimensions. Each pair of contextual dimensions
estimates the correlation between the cE and the contextual                   are assumed to be independent. In this case, the correlation
situation ck where at least one contextual condition is not                   between two contexts can be represented by the product of
empty or “NA”. Note that in Equation 3, the contextual rating                 the correlations among different dimensions. For example,
deviation can be viewed as the deviation from the empty                       assume ck is {Time = Weekend, Location = Home}, and cm
contextual situation to a non-empty contextual situation.                     is {Time = Weekday, Location = Cinema}, the correlation
   Accordingly, the user and item vectors, as well as the                     between ck and cm can be represented by the correlation of
contextual correlations can be learned based using stochastic