Incorporating Context Correlation into Context-aware Matrix Factorization

                                Yong Zheng, Bamshad Mobasher, Robin Burke
                                 Center for Web Intelligence, DePaul University
                                             Chicago, Illinois, USA
                                  {yzheng8, mobasher, rburke}@cs.depaul.edu


                        Abstract                               have bearing on how the rating behavior in different contexts
                                                               is modeled. In this paper, we highlight the importance
    Context-aware recommender systems (CARS) go                of contextual correlation, and propose a correlation-based
    beyond traditional recommender systems, that only          context-aware matrix factorization algorithm.
    consider users’ profiles, by adapting their recom-
    mendations also to users’ contextual situations.
    Several contextual recommendation algorithms               2   Related Work
    have been developed by incorporating context
    into recommendation algorithms in different ways.          In context-aware recommender systems, context is usually
    The most effective approaches try to model                 defined as, “any information that can be used to characterize
    deviations in ratings among contexts, but ignore the       the situation of an entity [Abowd et al., 1999]”, e.g., time and
    correlations that may exist among these contexts.          companion may be two influential contexts in movie domain.
    In this paper, we highlight the importance of                  According to the two-part classification of contextual
    contextual correlations and propose a correlation-         information by Adomavicius, et al. [Adomavicius et al.,
    based context-aware matrix factorization algori-           2011], we are concerned with static and fully observable
    thm. Through detailed experimental evaluation we           contexts, where we already have a set of known contextual
    demonstrate that adopting contextual correlations          variables at hand which remain stable over time, and
    leads to improved performance.                             try to model users’ contextual preferences to provide
                                                               recommendations.
                                                                   Several context-aware recommendation algorithms have
1   Introduction                                               been developed in the past decade. Typically, context can be
Recommender systems (RS) are an effective ways to alleviate    taken into account using three basic strategies: pre-filtering,
information overload by tailoring recommendations to users’    post-filtering and contextual modeling [Adomavicius et al.,
personal preferences. Context-aware recommender systems        2011]. Pre-filtering techniques, such as context-aware
(CARS) emerged to go beyond user preferences and also take     splitting approaches [Baltrunas and Ricci, 2014; Zheng et
into account the contextual situation of users in generating   al., 2014a], simply apply contexts as filters beforehand to
recommendations.                                               filter out irrelevant rating profiles. Post-filtering, on the
   The standard formulation of the recommendation problem      other hand, applies the context as filters to the recommended
begins with a two-dimens-ional (2D) matrix of ratings,         items after the recommendation process. Or, contexts can
organized by user and item: Users × Items → Ratings.           be used as filters in the recommendation process, such as
The key insight of context-aware recommender systems is        differential context modeling [Zheng et al., 2012; 2013].
that users’ preferences for items may be a function of the     In contextual modeling, predictive models are learned using
context in which those items are encountered. Incorporating    the full contextual data, and context information is used
contexts requires that we estimate user preferences using      in the recommendation process. Most of the recent work,
a multidimensional rating function – R: Users × Items ×        such as CAMF [Baltrunas et al., 2011c], tensor factorization
Contexts → Ratings [Adomavicius et al., 2011].                 (TF) [Karatzoglou et al., 2010] and CSLIM [Zheng et al.,
   In the past decade, a number of context-aware recom-        2014b; 2014c], belong to contextual modeling.
mendation algorithms have been developed that attempt to           The most effective context-aware recommendation algo-
integrate context with recommendation algorithms. Some         rithms, such as CAMF and CSLIM, usually incorporate a
of the most effective methods, such as context-aware matrix    contextual rating deviation term which is used to estimate
factorization (CAMF) [Baltrunas et al., 2011c] and contex-     users’ rating deviations associated with specific contexts.
tual sparse linear methods (CSLIM) [Zheng et al., 2014b;       Alternatively, the contextual correlation could be another
2014c], incorporate a contextual rating deviation component    way to incorporate contextual information. This idea has
into the recommendation algorithms.         However, these     been introduced in the context of sparse linear method in
methods generally ignore contextual correlations that may      our previous work [Zheng et al., 2015], but it has not been
explored as part of recommendation models based on matrix         condition “home”. In the following discussion, we continue
factorization.                                                    to use those terms and symbols to describe corresponding
                                                                  contextual dimensions and conditions in the algorithms or
3     Preliminary: Matrix Factorization and                       equations.
      Context-aware Matrix Factorization                             The CAMF algorithm was proposed by Baltrunas et
                                                                  al. [Baltrunas et al., 2011c]. The CAMF rating prediction
In this section, we introduce matrix factorization used in        function is shown in Equation 3.
recommender systems, as well as the existing research
on Context-Aware Matrix Factorization which utilizes the
                                                                                                         L
contextual rating deviations.
                                                                                                               Bijck,j + −
                                                                                                                         p→   →
                                                                                                                              −
                                                                                                         X
                                                                      r̂uick,1 ck,2 ...ck,L = µ + bu +                    u · qi   (3)
3.1        Matrix Factorization                                                                          j=1

Matrix factorization (MF) [Koren et al., 2009] is one of the         Assume there are L contextual dimensions in total, then
most effective recommendation algorithm in the traditional        ck = {ck,1 ck,2 ...ck,L } describes a contextual situation, where
recommender systems. Simply, both users and items are             ck,j denotes the contextual condition in the j th context
represented by vectors. For example, −    p→u is used to denote   dimension. Therefore, Bijck,j indicates the contextual rating
                       →
                       −
a user vector, and qi as an item vector. The values in            deviation associated with item i and the contextual condition
those vectors indicate the weights on K (e.g., K = 5) latent      in the j th dimension.
factors. As a result, the rating prediction can be described by      A comparison between Equation 2 and Equation 3 reveals
Equation 1.                                                       that CAMF simply replaces the item bias bi by a contextual
                                                                                            L
                                −
                                →
                         r̂ui = p     →
                                      −
                                  u · qi                    (1)   rating deviation term
                                                                                            P
                                                                                               Bijck,j , where it assumes that the
                                       −
                                       →
   More specifically, the weights in pu can be viewed as how                              j=1
                                                                  contextual rating deviation is dependent on items. Therefore,
much users like those latent factors, and the weights in →   −
                                                             qi
                                                                  this approach is named as CAMF CI. Alternatively, this
represent how this specific item obtains those latent factors.
                                                                  deviation can also be viewed as being dependent on users,
Therefore, the dot product of those two vectors can be used
                                                                  which replaces bu by the contextual rating deviation term
to indicate how much the user likes this item, where users’
                                                                  resulting in the CAMF CU variant. In addition, CAMF C
preferences on items are captured by the latent factors.
                                                                  algorithm assumes that the contextual rating deviation is
   In addition, user and item rating biases can be added to the
                                                                  independent of both users and items.
prediction function, as shown in Equation 2, where µ denotes
                                                                     The parameters, such as the user and item vectors,
the global average rating in the data set, bu and bi represent
                                                                  user biases and rating deviations, can be learned by the
the user bias and item bias respectively.
                                                                  stochastic gradient descent (SGD) method to minimize
                                                                  the rating prediction errors. In early work [Baltrunas et
                    r̂ui = µ + bu + bi + −
                                         p→   →
                                              −
                                          u · qi           (2)
                                                                  al., 2011c], CAMF was demonstrated to outperform other
3.2        Context-aware Matrix Factorization                     contextual recommendation algorithms, such as the tensor
                                                                  factorization [Karatzoglou et al., 2010].
Consider the movie recommendation example in Table 1.
There is one user U 1, one item T 1, and three contextual
dimensions – Time (weekend or weekday), Location (at              4   Correlation-Based CAMF
home or cinema) and Companion (alone, girlfriend, family).        Introducing the contextual rating deviation term is an
In the following discussion, we use contextual dimension          effective way to build context-aware recommendation al-
to denote the contextual variable, e.g. “Location”. The           gorithms. Our earlier work [Zheng et al., 2014b; 2014c]
term contextual condition refers to a specific value in a         has successfully incorporated the contextual rating deviations
dimension, e.g. “home” and “cinema” are two contextual            into the sparse linear method (SLIM) to develop contextual
conditions for “Location”. A context or contextual situation      SLIM (CSLIM) which was demonstrated to outperform
is, therefore, a set of contextual conditions, e.g. {weekend,     the state-of-the-art contextual recommendation algorithms,
home, family}.                                                    including CAMF and tensor factorization.
                                                                     As mentioned before, contextual correlation is an alternat-
              Table 1: Contextual Ratings on Movies               ive way to build context-aware recommendation algorithms,
    User     Item   Rating    Time      Location   Companion      other than modeling the contextual deviations. Our recent
     U1       T1      3      weekend      home        alone       work [Zheng et al., 2015] has made the first attempt to
     U1       T1      5      weekend     cinema     girlfriend    introduce the contextual correlation (or context similarity)
     U1       T1      ?      weekday      home       family       into SLIM. In this paper, we further explore ways to develop
                                                                  correlation-based CAMF.
  More specifically, let ck and cm denote two different              The underlying assumption behind the notion of “con-
contextual situations each of which is composed of a set          textual correlation (or context similarity)” is that the more
of contextual conditions. We use ck,l to denote the lth           similar or correlated two contexts are, the more similar two
contextual condition in the context ck . For example, assume      recommendation lists for the same user within those two
ck = {weekend, home, alone}, then ck,2 is the contextual          contextual situations should be. When integrated into matrix
                                                   Table 2: Example of a Correlation Matrix
                                               Time=Weekend           Time=Weekday    Location=Home      Location=Cinema
                         Time=Weekend                1                    0.54              N/A                N/A
                         Time=Weekday              0.54                    1                N/A                N/A
                         Location=Home             N/A                    N/A                1                 0.82
                        Location=Cinema            N/A                    N/A              0.82                  1


factorization, the prediction function can be described with                  4.1    Independent Context Similarity (ICS)
Equation 4.                                                                   An example of a correlation matrix can be seen in Table 2.
                                                                              With Independent Context Similarity, we only measure the
                            −
                            →
                   r̂uick = p     →
                                  −
                              u · qi · Corr(ck , cE )                  (4)    context correlation or similarity between two contextual
                                                                              conditions when they lie on the same contextual dimension,
where cE denotes the empty contextual situation – the value                   e.g., we never measure the correlation between “Time =
in each contextual dimension is empty or “NA”; that is, cE,1 =                Weekend” and “Location = Home”, since they are from two
cE,2 = ... = cE,L = N/A. Therefore, the function Corr(ck , cE )               different dimensions. Each pair of contextual dimensions
estimates the correlation between the cE and the contextual                   are assumed to be independent. In this case, the correlation
situation ck where at least one contextual condition is not                   between two contexts can be represented by the product of
empty or “NA”. Note that in Equation 3, the contextual rating                 the correlations among different dimensions. For example,
deviation can be viewed as the deviation from the empty                       assume ck is {Time = Weekend, Location = Home}, and cm
contextual situation to a non-empty contextual situation.                     is {Time = Weekday, Location = Cinema}, the correlation
   Accordingly, the user and item vectors, as well as the                     between ck and cm can be represented by the correlation of
contextual correlations can be learned based using stochastic                 <Time = Weekend, Time = Weekday> multiplied by the
gradient decsent by minimizing the rating prediction errors.                  correlation of <Location = Home, Location = Cinema>,
The loss function is described in Equation 5. Note that this is               since those two dimensions are assumed as independent.
the general formulation for the loss function, where the term                    Assuming there are L contextual dimensions in total,
“Corr2 ” should be specified and adjusted accordingly when                    the correlations can be depicted by Equation 6, where
the similarity of context is modeled in different ways. We will               ck,l is used to denote the value of contextual condition
introduce three context similarity models in the next section.                in the lth dimension in context ck , and the “correlation”
                                                                              function is used to represent the correlation between two
           1                   λ → 2 −                                        contextual conditions, which is also what to be learnt in
 M inimize (ruick − r uick )2 + (k−
                                  pu k +k→
                                         qi k2 +Corr2 ) (5)
                    _

  p,q,Corr 2                   2                                              the optimization. In other words, the correlation between
                                                                              two contexts is represented by the multiplication of the
   The remaining challenge is how to represent or model the                   individual correlations between contextual conditions on each
correlation function in Equation 4. Different representations                 dimension.
may directly influence the recommendation performance.
                                                                                                         L
In our recent work [Zheng et al., 2015], we considered                                                   Y
four strategies: Independent Context Similarity (ICS),                                Corr(ck , cm ) =         correlation(ck,l , cm,l )   (6)
Latent Context Similarity (LCS), Multidimensional Context                                                l=1
Similarity (MCS) and Weighted Jaccard Context Similarity                         These correlation values (i.e., correlation(ck,l , cm,l )) can
(WJCS) 1 . And those strategies can also be reused for                        be learned by the optimization process accordingly. The
the correlation-based CAMF. The prediction function in                        risk of this representation is that some information may be
Equation 4 and loss function in Equation 5 can be updated                     lost, if correlations are not in fact independent in different
accordingly when the correction is represented by different                   dimensions. For example, if users usually go to cinema
ways.                                                                         to see romantic movies with their partners, the “Location”
   In this paper, we will not consider WJCS since it only                     (e.g. at cinema) and “Companion” (e.g. partners) may be
uses the contextual dimensions with the same values, and                      significantly correlated as a result.
the correlation in Equation 4 is measured between a context
and the empty context. WJCS is therefore not applicable to                    4.2    Latent Context Similarity (LCS)
the correlation-based CAMF, since it only works when the                      As noted earlier, contextual rating data is often sparse, since
contextual conditions are the same. In the following, we                      it is somewhat unusual to have users rate items repeatedly
introduce and compare the ICS, LCS and MCS approaches                         within multiple contextual situations. This poses a difficulty
based on CAMF. Note that context correlation or similarity                    when new contexts are encountered. For example, the
is assumed to be measured between any two contextual                          correlation between a new pair of contextual conditions
situations ck and cm . However, in the correlation-based                      <“Time=Weekend”, “Time=Holiday”> may be required in
CAMF, the correlation is actually measured between ck and                     the testing set, but it may not have been learned from the train-
cE , where cE is the empty context.                                           ing data due to the sparsity problem. But, the correlation for
                                                                              two existing pairs, <“Time=Weekend”, “Time=Weekday”>
   1
       Here, context similarity is identical to context correlation           and <“Time=Weekday”, “Time=Holiday”>, may have been
                             !"#$%!&                                                    !"#$%!&


                %&'#"($(0,0,4)                                              %&'#"($(0,0,4)


                    !"#$(0,0,1)     -&+.$(0,2,0)    /("&0,$(0,6,0)             !"#$(0,0,1)           !"#$%&(0,3,0)    '#()&(0,6,0)


                                                   *!(+#&%!&                                                         *!(+#&%!&
                             )##*+(,$(2.5,0,0)                                               !!"#$%&(2.5,0,0)

         '%()        )##*#'+$(5,0,0)                                 '%()          !!"!#$%(5,0,0)

                                   Figure 1: Example of Multidimensional Coordinate System


learned. In this case, this representation suffers from the          condition can locate a position in the corresponding axis.
contextual rating sparsity problem. Treating each dimension          In this case, a context (as a set of contextual conditions)
independently prevents the algorithm from taking advantage           can be viewed as a point in the multidimensional space.
of comparisons that might be made across dimensions.                 Accordingly, the distance between two such points can
   To alleviate this situation, we represent each contextual         be used as the basis for a correlation measure. In this
condition by a vector of weights over a set of latent                approach, the real values for each contextual condition are
factors (we use 5 latent factors in our experiments), where          the parameters to be learned in the optimization process. For
the weights are initialized at the beginning and learnt by           example, the values for “family” and “kids” are updated in the
the optimization process. The dot product between two                right-hand side of the figure. Thus, the position of the data
vectors can be used to denote the correlation between each           points associated to those two contextual conditions will be
pair of contextual conditions. As a result, even if the              changed, as well as the distance between the corresponding
newly observed pair does not exist in the training data, the         two contexts. Therefore, the correlation can be measured
weights in the vectors representing the two conditions (i.e.,        as the inverse of the distance between two data points. In
“Time=Weekend” and “Time=Holiday”) will be learned and               our experiments, we use Euclidean distance to measure the
updated by the learning process over existing pairs, and the         distances, though other distance measures can also be used.
correlation for the new pair can be easily computed using the        The computational cost is directly associated with the number
dot product. The correlation is given by                             of contextual conditions in the data set, which may make
                                                                     this approach the highest-cost model. Again, the number of
           correlation(ck,l , cm,l ) = Vck,l • Vcm,l          (7)    contextual conditions can be reduced by context selection.
where Vck,l and Vcm,l denote the vector representation for
the contextual condition ck,l and cm,l , respectively, over the      5     Experimental Evaluation
space of latent factors. We then use the same correlation
calculation as shown in Equation 6. We call this approach the        In this section, we present our experimental evaluation and
Latent Context Similarity (LCS) model. This approach was             discuss the results.
able to improve the performance of deviation-based CSLIM
algorithms too [Zheng, 2015]. In contrast to the independent
                                                                     5.1    Data Sets
context similarity approach, LCS provides more flexibility,
but it also has the added computational costs associated with        We select three context-aware data sets with different
learning the latent factors. In LCS, what has to be learnt in the    numbers of contextual dimensions and conditions. Restau-
optimization process are the vectors of weights representing         rant data [Ramirez-Garcia and Garca-Valdez, 2014] is
each contextual condition.                                           comprised of users’ ratings on restaurants in city of Tijuana,
                                                                     Mexico. Music data [Baltrunas et al., 2011a] captures
4.3   Multidimensional Context Similarity (MCS)                      users’ ratings on music tracks in different driving and
In the multidimensional context similarity model, we                 traffic conditions. The Tourism data [Baltrunas et al.,
assume that contextual dimensions form a multidimensional            2011b] collects users’ places of interest (POIs) from mobile
coordinate system. An example is depicted in Figure 1.               applications. The characteristics of these data sets are
   Let us assume that there are three contextual dimensions:         summarized in Table 3. For more specific information about
time, location and companion. We assign a real value to              the contextual dimensions and conditions, please refer to the
each contextual condition in those dimensions, so that each          original papers using those data sets.
                                                  ("%)*+!*')                                                           ("%)*+!*')
                       !"%#                                                                            !"&
                        !"%                                                                                                                      ,-
                                                                                                      !"%#
            !"#$%$&'   !"$#                                                                                                                      ./0-1234


                                                                                                 ,-
                                                                                                       !"%
                        !"$                                                                                                                      ./0-15.6
                       !"!#                                                                           !"$#                                       ./0-17.6
                         !                                                                             !"$                                       ./0-10.6
                               $      %       &     '     #     (       )     *    +     $!                  $ % & ' # ( ) * + $!

                                                         ,+%$#                                                          ,+%$#
                        !"!'                                                                          !"!*
                       !"!&#                                                                                                                     ,-
                                                                                                      !"!(
            !"#$%$&'


                        !"!&                                                                                                                     ./0-1234


                                                                                                 ,-
                       !"!%#                                                                          !"!'
                                                                                                                                                 ./0-15.6
                        !"!%
                                                                                                      !"!%                                       ./0-17.6
                       !"!$#
                        !"!$                                                                            !                                        ./0-10.6
                                 $     %      &      '    #     (       )     *     +    $!                  $ % & ' # ( ) * + $!

                                                    .&+!$%/                                                            .&+!$%/
                       !"$%                                                                           !"%#
                       !"$$                                                                                                                      ,-
                                                                                                       !"%
            !"#$%$&'


                        !"$                                                                                                                      ./0-1234
                                                                                                 ,-


                                                                                                      !"$#
                       !"!+                                                                                                                      ./0-15.6
                                                                                                       !"$                                       ./0-17.6
                       !"!*
                       !"!)                                                                           !"!#                                       ./0-10.6
                               $      %       &     '     #     (       )     *    +     $!                  $ % & ' # ( ) * + $!
                               Figure 2: Experimental Comparison (x-axis denotes the number of recommendations.)

                        Table 3: Context-aware Data Sets
                            Restaurant                  Music                   Tourism
                         50 users, 40 items       40 users, 139 items       25 users, 20 items                                                         N
                                                                                                                                                       P
 Rating Profiles                                                                                                                                             P (k)
                           2314 ratings              3940 ratings              1678 ratings                           M
                                                                                                                      X
 # of contextual                                                                                                                                       k=1
   dimensions
                                 2                        8                        14                    M AP @N =          ap@N/M , where ap@N =                    (8)
                                                                                                                                                      min(m, N )
 # of contextual                                                                                                      i=1
                                 7                        34                       67
   conditions
  Rating Scale                  1-5                       1-5                      1-5                    We choose the CAMF [Baltrunas et al., 2011c] as
                                                                                                       baseline in our experiments. We tried all three versions:
                                                                                                       CAMF CI, CAMF CU and CAMF C, and only present the
                                                                                                       best performing one which is denoted as CAMF-Dev in
5.2    Evaluation Protocols                                                                            the following sections. In addition, we also add tensor
                                                                                                       factorization (TF) [Karatzoglou et al., 2010] as baseline.
We use a five-fold cross validation on our data sets,
performing top 10 recommendation task and using precision                                              5.3   Analysis and Findings
and mean average precision (MAP) as the evaluation metrics.                                            The results are depicted by Figure 2, where the correlation-
Precision is defined as the ratio of relevant items selected to                                        based CAMF are represented by three algorithms according-
number of items recommended in a specific context. MAP                                                 ly: CAMF-ICS, CAMF-LCS and CAMF-MCS.
is another popular ranking metric which additional takes                                                  In terms of the best performing algorithm, CAMF-MCS
the ranks of the recommended items into consideration. It                                              outperforms the other ones for the restaurant and tourism data
is calculated by Equation 8, where M denotes the number                                                sets. It is able to obtain comparable results with CAMF-
of the users, and N is the size of the recommendation                                                  LCS in the music data. Compared with the deviation-based
list, where P (k) means the precision at cut-off k in the                                              CAMF, the correlation-based CAMF can always outperform
item recommendation list, i.e., the ratio of number of users                                           CAMF-Dev if the appropriate correlation modeling is
followed up to the position k over the number k, where m in                                            applied. For example, in the restaurant data, CAMF-Dev
ap@N denotes the number of relevant items.                                                             works better than CAMF-ICS in precision, but CAMF-MCS
                                                   Table 4: Comparison Between CAMF and CSLIM
                                       Restaurant                                       Music                                          Tourism
                   CAMF-Dev   CAMF-MCS CSLIM-Dev        CSLIM-MCS   CAMF-Dev   CAMF-MCS CSLIM-Dev      CSLIM-MCS   CAMF-Dev   CAMF-MCS CSLIM-Dev       CSLIM-MCS
             @5      0.1309     0.1586         0.2044     0.2151      0.0210     0.0385       0.0332     0.0385      0.0968     0.1105        0.1200     0.1522
 Precision
             @10     0.1130     0.1276         0.1496     0.1723      0.0226     0.0361       0.0359     0.0361      0.1018     0.1061        0.1101     0.1265
             @5      0.2001     0.2765         0.2889     0.2993      0.0474     0.0661       0.0871     0.0913      0.1740     0.2148        0.2199     0.3379
    MAP
             @10     0.2122     0.2840         0.3128     0.3187      0.0542     0.0747       0.0995     0.1047      0.2026     0.2350        0.2442     0.3518


outperforms CAMF-Dev significantly, which states that MCS                            [Adomavicius et al., 2011] Gediminas             Adomavicius,
is the better representation to model contextual correlations                          Bamshad Mobasher, Francesco Ricci, and Alexander
than the ICS for this data set.                                                        Tuzhilin. Context-aware recommender systems. AI
   Tensor Factorization (TF) only works better than some                               Magazine, 32(3):67–80, 2011.
CAMF algorithms in the tourism data set, but CAMF-MCS                                [Baltrunas and Ricci, 2014] Linas Baltrunas and Francesco
is still the best one for this data. In TF, contextual dimensions                      Ricci. Experimental evaluation of context-dependent
are considered as extra dimensions in addition to the user and                         collaborative filtering using item splitting. User Modeling
item dimensions, and they are assumed as independent with                              and User-Adapted Interaction, 24(1-2):7–34, February
each other, where CAMF algorithms are actually dependent                               2014.
algorithms which further measures either contextual rating
deviations or contextual correlations.                                               [Baltrunas et al., 2011a] Linas Baltrunas, Marius Kamin-
   In our previous work [Zheng et al., 2014c; 2015], we have                           skas, Bernd Ludwig, Omar Moling, Francesco Ricci,
incorporated contextual deviations and correlations into S-                            Aykan Aydin, Karl-Heinz Lüke, and Roland Schwaiger.
LIM respectively to formulate CSLIM algorithms. Therefore,                             Incarmusic: Context-aware music recommendations in a
it is necessary to compare the CAMF and CSLIM, which                                   car. In E-Commerce and Web Technologies, pages 89–100.
can be viewed from Table 4, where we only present the best                             Springer, 2011.
performing deviation and correlation models. From the table,                         [Baltrunas et al., 2011b] Linas Baltrunas, Bernd Ludwig,
we can see that CSLIM outperforms CAMF significantly                                   Stefan Peer, and Francesco Ricci. Context-aware places
when the same strategy (either deviation or correlation                                of interest recommendations for mobile users. In Design,
modeling) is applied. And CSLIM-MCS works the best                                     User Experience, and Usability. Theory, Methods, Tools
in general. It is not surprising, since earlier work [Ning                             and Practice, pages 531–540. Springer, 2011.
and Karypis, 2011] has demonstrated that SLIM is able to
                                                                                     [Baltrunas et al., 2011c] Linas Baltrunas, Bernd Ludwig,
outperform the state-of-the-art traditional recommendation
algorithms, including the matrix factorization. Those results                          and Francesco Ricci. Matrix factorization techniques for
based on contextual recommendations further confirms this                              context aware recommendation. In Proceedings of the fifth
pattern, since CSLIM outperforms CAMF in view of the                                   ACM conference on Recommender systems, pages 301–
Table 4.                                                                               304. ACM, 2011.
   In short, those experimental results demonstrate that                             [Karatzoglou et al., 2010] Alexandros Karatzoglou, Xavier
correlation-based CAMF is able to outperform deviation-                                Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse
based CAMF and the TF algorithm, but the representation                                recommendation: n-dimensional tensor factorization for
for contextual correlation should be carefully selected.                               context-aware collaborative filtering. In Proceedings of the
Generally, the multidimensional context similarity is the best                         fourth ACM conference on Recommender systems, pages
choice to represent contextual correlations.                                           79–86. ACM, 2010.
                                                                                     [Koren et al., 2009] Yehuda Koren, Robert Bell, and Chris
6     Conclusions and Future Work                                                      Volinsky. Matrix factorization techniques for recom-
In this paper, we highlighted the importance of con-                                   mender systems. IEEE Computer, 42(8):30–37, 2009.
textual correlation and incorporate this notion into the                             [Ning and Karypis, 2011] Xia Ning and George Karypis.
matrix factorization technique to formulate correlation-based                          SLIM: Sparse linear methods for top-n recommender
Context-Aware Matrix Factorization (CAMF) algorithm. Our                               systems. In 2011 IEEE 11th International Conference on
experimental results reveal that correlation-based CAMF is                             Data Mining, pages 497–506. IEEE, 2011.
able to outperform the standard deviation-based CAMF and
the tensor factorization algorithms. In our future work, we                          [Ramirez-Garcia and Garca-Valdez, 2014] Xochilt Ramirez-
plan to incorporate contextual correlation modeling strategies                         Garcia and Mario Garca-Valdez. Post-filtering for a
into more recommendation algorithms, such as the slope one                             restaurant context-aware recommender system. In Recent
recommender.                                                                           Advances on Hybrid Approaches for Designing Intelligent
                                                                                       Systems, volume 547, pages 695–707. Springer, 2014.
References                                                                           [Zheng et al., 2012] Y. Zheng, R. Burke, and B. Mobasher.
[Abowd et al., 1999] Gregory D Abowd, Anind K Dey,                                     Differential context relaxation for context-aware travel
  Peter J Brown, Nigel Davies, Mark Smith, and Pete                                    recommendation. In E-Commerce and Web Technologies,
  Steggles. Towards a better understanding of context                                  pages 88–99. Springer, 2012.
  and context-awareness. In Handheld and ubiquitous                                  [Zheng et al., 2013] Y. Zheng, R. Burke, and B. Mobasher.
  computing, pages 304–307. Springer, 1999.                                            Recommendation with differential context weighting. In
  User Modeling, Adaptation, and Personalization, pages
  152–164. 2013.
[Zheng et al., 2014a] Y. Zheng, R. Burke, and B. Mobasher.
  Splitting approaches for context-aware recommendation:
  An empirical study. In Proceedings of the 29th Annual
  ACM Symposium on Applied Computing, pages 274–279.
  ACM, 2014.
[Zheng et al., 2014b] Y. Zheng, B. Mobasher, and R. Burke.
  CSLIM: Contextual SLIM recommendation algorithms. In
  Proceedings of the 8th ACM Conference on Recommender
  Systems, pages 301–304. ACM, 2014.
[Zheng et al., 2014c] Y. Zheng, B. Mobasher, and R. Burke.
  Deviation-based contextual SLIM recommenders. In
  Proceedings of the 23rd ACM Conference on Information
  and Knowledge Management, pages 271–280. ACM,
  2014.
[Zheng et al., 2015] Yong Zheng, Bamshad Mobasher, and
  Robin Burke. Integrating context similarity with sparse
  linear recommendation model.         In User Modeling,
  Adaptation, and Personalization, volume 9146 of Lecture
  Notes in Computer Science, pages 370–376. Springer
  Berlin Heidelberg, 2015.
[Zheng, 2015] Y. Zheng. Improve general contextual SLIM
  recommendation algorithms by factorizing contexts. In
  Proceedings of the 30th Annual ACM Symposium on
  Applied Computing, pages 929–930. ACM, 2015.