Towards Improving Top-N Recommendation by
                           Generalization of SLIM

                                               Santiago Larraín, Denis Parra, Alvaro Soto
                                                           Computer Science Department
                                                        Pontificia Universidad Católica de Chile
                                                                     Santiago, Chile
                                          slarrain@uc.cl,dparras@uc.cl,asoto@ing.puc.cl

ABSTRACT                                                                      them. Under the paradigm of Top-N recommendation, Sparse Lin-
Sparse Linear Methods (SLIM) are state-of-the-art recommenda-                 ear Methods (SLIM) have shown an outstanding performance over
tion approaches based on matrix factorization, which rely on a reg-           traditional matrix factorization algorithms, motivating the creation
ularized ℓ1 -norm and ℓ2 -norm optimization –an alternative opti-             of extensions [7, 8] to the original work of Ning and Karypis [6].
mization problem to the traditional Frobenious norm. Although                    If we compare SLIM with other similar applications that use ℓ1 -
they have shown outstanding performance in Top-N recommenda-                  norm and ℓ2 -norm optimization, such as sparse coding used for
tion, existent works have not yet analyzed some inherent assump-              dictionary learning in image processing, we can identify two as-
tions that can have an important effect on the performance of these           sumptions. First, SLIM aims at reconstructing directly the low-
algorithms. In this paper, we attempt to improve the performance              density user-item matrix A, but low-density datasets decrease the
of SLIM by proposing a generalized formulation of the aforemen-               effectiveness of Top-N recommendations [1]. We hypothesize that
tioned assumptions. Instead of directly learning a sparse represen-           re-constructing a denser representation of the user-item interactions
tation of the user-item matrix, we (i) learn the latent factors’ matrix       might produce an improvement in recommendation performance.
of the users and the items via a traditional matrix factorization ap-         Secondly, SLIM does not require to learn a dictionary of proto-
proach, and then (ii) reconstruct the latent user or item matrix via          types to reconstruct the matrix A, while dictionary learning is one
prototypes which are learned using sparse coding, an alternative              of the most important stages of the matrix reconstruction in sparse
SLIM commonly used in the image processing domain. The re-                    coding. Our intuition is that by testing these two assumptions we
sults show that by tuning the parameters of our generalized model             might improve the ranking performance of SLIM. Hence, we can
we are able to outperform SLIM in several Top-N recommendation                summarize the hypotheses for the generalized SLIM (gSLIM) that
experiments conducted on two different datasets, using both nDCG              motivates our work in two: (1) in a first stage, learning a denser
and nDCG@10 as evaluation metrics. These preliminary results,                 representation of the user-interaction matrix, such as the the low-
although not conclusive, indicate a promising line of research to             rank users’ and items’ latent factor matrices, can improve the final
improve the performance of SLIM recommendation.                               Top-N recommendation performance, and (2) learning a dictionary
                                                                              of prototypes when reconstructing the low-rank user or item la-
                                                                              tent matrices can improve the final Top-N recommendation perfor-
Categories and Subject Descriptors                                            mance.
H.4 [Information Systems Applications]: Miscellaneous; H.3.3                     In this document, we explain our gSLIM approach, preliminary
[Information Search and Retrieval]: Information filtering                     experiments on two datasets showing promising results, and finally
                                                                              we discuss limitations and some interesting ideas for future work.
Keywords
Recommender Systems, matrix factorization, SLIM
                                                                              2.    GSLIM RECOMMENDATION MODEL
                                                                                 In order to model each user we use a traditional regularized ma-
                                                                              trix factorization [3] technique as defined in equation 1 where A ∈
1. INTRODUCTION                                                               RU ×I represents our rating matrix of U users and I items. U ∈
   The main goal of a recommender system is helping users dealing             Rk1 ×U represents the latent users’ matrix where each user ⃗   ui is
with information overload by providing personalized suggestions.              represented with k1 latent dimensions, and V ∈ Rk1 ×I represents
This so-called “recommendation problem” has been addressed in                 the latent items’ matrix where each item ⃗vi is represented with k1
different ways, such as predicting unobserved user ratings or as              latent dimensions as well.
Top-N recommendation [2], where the objective is to optimize the
ranking of the N most relevant items, to eventually recommend                              min   ||A − UT V||2F + λ1 (||U||2F + ||V||2F )           (1)
                                                                                           U,V
                                                                                 After learning the latent users’ matrix U, we proceed to compute
                                                                              the latent social prototypes. Similar as Karypis et. al. [6] we recon-
                                                                              struct each user ⃗ui as a sparse lineal combination of our prototype
                                                                              matrix P ∈ Rk1 ×k2 . In equation 2 we introduce a sparse coding
                                                                              problem [4] in order to compute our prototype matrix P, where k2
                                                                              is the number of prototypes. The matrix W ∈ Rk2 ×U represents
                                                                              the sparse coding coefficients and λ2 is the sparsity parameter.
Copyright is held by the author(s).
RecSys 2015 Poster Proceedings, September 16-20, Vienna, Austria.                    min    ||U − PW||2F + λ2 ||W||1 ,    s.t.   ||wi || = 1   ∀i
.                                                                                    P,W
                                                                                                                                                    (2)
                                                                                                                                 0.940                                                           0.24


                                                                                                                                                                                       k2: 100


                                                                                                                                                                                                                                                     k2: 100
                                                                                                                                 0.935                                                           0.22
                                                                                                                                                                                                 0.20
           Algorithm                                 ML-100K                         ML-1M                                       0.930
                                                                                                                                 0.925                                                           0.18
                                                nDCG nDCG@10                   nDCG nDCG@10                                      0.940                                                           0.24


                                                                                                                                                                                       k2: 200


                                                                                                                                                                                                                                                     k2: 200
           k1 = 30,k2 = 200, γ = 9              0.9357    0.3043                  -         -                                    0.935                                                           0.22
                                                                                                                                                                                                 0.20
                                                                                                                                 0.930
           k1 = 20,k2 = 943, γ = 5                 -         -                 0.9398    0.2152                                  0.925                                                           0.18

           k1 = 20,k2 = 200, γ = 10                -         -                 0.9392    0.2155                                  0.940                                                           0.24


                                                                                                                                                                                       k2: 943


                                                                                                                                                                                                                                                     k2: 943
                                                                                                                                 0.935                                                           0.22
           SLIM (baseline)                      0.9202    0.2927               0.9230    0.2374                                  0.930                                                           0.20

           Traditional MF                       0.9361    0.3068               0.9377    0.2111                                  0.925                                                           0.18

                                                                                                                                         1    2   3   4   5       6   7   8   9   10                    1   2   3   4   5       6   7   8   9   10
                                                                                                                                                              γ                                                             γ
      Table 1: Results of nDCG and nDCG@10 for ML datasets.
                                                                                                                                             (a) nDCG in ML-1M                                          (b) nDCG@10 in ML-1M
  Finally to make a recommendation r̂i,j for the user i on the item
                                  T                                                                                         Figure 2: Results in ML-1M dataset. Line styles: dashed=SLIM,
j we multiply vectors r̂i,j = u⃗′ i · ⃗vj , where user’s reconstruction                                                     dash&dot=MF, solid=gSLIM.
defined by equation 3 is solved by an Orthogonal Matching Pursuit
(OMP) algorithm with a maximum of γ non-negative coefficients                                                               performs both MF and gSLIM in nDCG@10, but gSLIM outper-
for coefficients w.
                 ⃗                                                                                                          forms both in terms of nDCG, indicating that reconstructing the
                                    u′i = P argmin (⃗
                                    ⃗               ui − Pw)
                                                          ⃗                                                           (3)   denser latent user matrix with prototypes can support a recommen-
                                                          w
                                                          ⃗                                                                 dation task beyond a small top-N (hypothesis 2). Finally, figures
   gSLIM compared to SLIM. Unlike our method, SLIM [6] solves                                                               1a and 1b show the behavior in the ML-100K dataset of the pa-
directly a similar problem to equation 2, but instead of the latent                                                         rameters k1, k2 and γ in gSLIM performance compared to SLIM
users’ matrix U, it reconstructs the user-item matrix A without                                                             and MF baselines, and a similar behavior is seen in ML-1M dataset
learning the prototypes P, as shown in equation 4.                                                                          (Figures 2a and 2b). Analysis shows that larger k2 and γ give bet-
                                                                                                                            ter performance but k1 has its peak between 20-30 latent factors
           1               β                                                                                                for users’ matrix U.
min          ||A − AW||2F + ||W||2F + λ||W||1 , s.t.W ≥ 0 ∧ diag(W) = 0
  W        2               2
                                                                    (4)
3. EXPERIMENTS                                                                                                              4.           CONCLUSIONS AND FUTURE WORK
   Using the Movielens datasets with 100K (ML-100K, |u| = 943,                                                                 In this work we introduced a general formulation for SLIM, gSLIM.
|i| = 1682) and 1M (ML-1M, |u| = 6040, |i| = 3952) ratings1 ,                                                               Although we haven’t addressed the limitation of our model in terms
we performed top-N recommendation experiments by using the                                                                  of computational complexity, we think that our results open an op-
LensKit framework2 , evaluating the performance with nDCG and                                                               portunity for studying unexplored ideas on Sparse Linear Methods.
nDCG@10 metrics [5]. We compared SLIM3 , a regularized matrix                                                               In the first step, researchers can try different methods to learn a
factorization (MF) as in equation 1, and gSLIM. For gSLIM we                                                                denser representation of the A, U or V matrices. In the step where
also conducted a parameter analysis: 1) k1 , latent dimensions for                                                          we perform sparse coding, we could try reconstructing the items’
U, 2) k2 , amount of different social prototypes, and 3) γ, number                                                          latent factor matrix V rather than U, or try directly A. We can
of non-negative coefficients on the user’s sparse reconstruction. All                                                       also try alternative algorithms to OMP for prototype learning. Even
experiments where conducted using 5-fold cross validation. The                                                              more, our intuition is that learning these sparse prototypes can help
parameters λ1 and λ2 where set to 0.01 and 1 respectively.                                                                  us finding actual “stereotypes” of users and items, which can be
                  gamma: 1      gamma: 3       gamma: 5          gamma: 7       gamma: 9      gamma: 10
      0.935                                                                                                                 used, e.g., for clustering users and items.
                                                                                                            k2: 100


      0.930
      0.925
      0.920
      0.935                                                                                                                 5.           REFERENCES
                                                                                                                            [1] Santosh Kabbur, Xia Ning, and George Karypis. Fism: factored item
                                                                                                            k2: 200


      0.930
      0.925
      0.920
                                                                                                                                similarity models for top-n recommender systems. In Proceedings of
      0.935                                                                                                                     the 19th ACM SIGKDD international conference on Knowledge
                                                                                                            k2: 943


      0.930
      0.925
                                                                                                                                discovery and data mining, pages 659–667. ACM, 2013.
      0.920                                                                                                                 [2] George Karypis. Evaluation of item-based top-n recommendation
               10 30 50 70    10 30 50 70   10 30 50 70        10 30 50 70   10 30 50 70   10 30 50 70
                                                          k1                                                                    algorithms. In Proceedings of the Tenth International Conference on
                                                                                                                                Information and Knowledge Management, CIKM ’01, pages 247–254,
                                          (a) nDCG in ML-100K.
                gamma: 1       gamma: 3       gamma: 5           gamma: 7      gamma: 9      gamma: 10                          New York, NY, USA, 2001. ACM.
    0.28
                                                                                                                            [3] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization
                                                                                                          k2: 100


    0.24
    0.20                                                                                                                        techniques for recommender systems. Computer, 42(8):30–37, August
                                                                                                                                2009.
    0.28
                                                                                                          k2: 200


    0.24                                                                                                                    [4] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro.
    0.20
                                                                                                                                Online dictionary learning for sparse coding. In Proceedings of the
    0.28                                                                                                                        26th Annual International Conference on Machine Learning, ICML
                                                                                                          k2: 943


    0.24
    0.20
                                                                                                                                ’09, pages 689–696, New York, NY, USA, 2009. ACM.
              10 30 50 70    10 30 50 70    10 30 50 70       10 30 50 70    10 30 50 70   10 30 50 70
                                                                                                                            [5] Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al.
                                                          k1                                                                    Introduction to information retrieval, volume 1. Cambridge university
                                   (b) nDCG@10 in ML-100K.                                                                      press Cambridge, 2008.
Figure 1: Results in ML-100K. Line styles: dashed=SLIM,                                                                     [6] Xia Ning and G. Karypis. Slim: Sparse linear methods for top-n
dash&dot=MF, solid=gSLIM.                                                                                                       recommender systems. In Data Mining (ICDM), 2011 IEEE 11th
                                                                                                                                International Conference on, pages 497–506, Dec 2011.
   Results. Table 1 presents the results, showing that in the ML-                                                           [7] Xia Ning and George Karypis. Sparse linear methods with side
100K dataset SLIM is outperformed by both gSLIM and regular-                                                                    information for top-n recommendations. In Proceedings of the Sixth
ized MF, indicating that this smaller and denser dataset (6.3%)                                                                 ACM Conference on Recommender Systems, RecSys ’12, pages
                                                                                                                                155–162, New York, NY, USA, 2012. ACM.
might be better served by a traditional MF (hypothesis 1). How-
                                                                                                                            [8] Yong Zheng, Bamshad Mobasher, and Robin Burke. Cslim:
ever, in the larger and sparser ML-1M dataset (4.19%) SLIM out-                                                                 Contextual slim recommendation algorithms. In Proceedings of the
1 http://grouplens.org/datasets/movielens/                                                                                      8th ACM Conference on Recommender Systems, RecSys ’14, pages
2 http://lenskit.org/                                                                                                           301–304, New York, NY, USA, 2014. ACM.
3 http://www-users.cs.umn.edu/~xning/slim/html/