Towards Improving Top-N Recommendation by Generalization of SLIM Santiago Larraín, Denis Parra, Alvaro Soto Computer Science Department Pontificia Universidad Católica de Chile Santiago, Chile slarrain@uc.cl,dparras@uc.cl,asoto@ing.puc.cl ABSTRACT them. Under the paradigm of Top-N recommendation, Sparse Lin- Sparse Linear Methods (SLIM) are state-of-the-art recommenda- ear Methods (SLIM) have shown an outstanding performance over tion approaches based on matrix factorization, which rely on a reg- traditional matrix factorization algorithms, motivating the creation ularized ℓ1 -norm and ℓ2 -norm optimization –an alternative opti- of extensions [7, 8] to the original work of Ning and Karypis [6]. mization problem to the traditional Frobenious norm. Although If we compare SLIM with other similar applications that use ℓ1 - they have shown outstanding performance in Top-N recommenda- norm and ℓ2 -norm optimization, such as sparse coding used for tion, existent works have not yet analyzed some inherent assump- dictionary learning in image processing, we can identify two as- tions that can have an important effect on the performance of these sumptions. First, SLIM aims at reconstructing directly the low- algorithms. In this paper, we attempt to improve the performance density user-item matrix A, but low-density datasets decrease the of SLIM by proposing a generalized formulation of the aforemen- effectiveness of Top-N recommendations [1]. We hypothesize that tioned assumptions. Instead of directly learning a sparse represen- re-constructing a denser representation of the user-item interactions tation of the user-item matrix, we (i) learn the latent factors’ matrix might produce an improvement in recommendation performance. of the users and the items via a traditional matrix factorization ap- Secondly, SLIM does not require to learn a dictionary of proto- proach, and then (ii) reconstruct the latent user or item matrix via types to reconstruct the matrix A, while dictionary learning is one prototypes which are learned using sparse coding, an alternative of the most important stages of the matrix reconstruction in sparse SLIM commonly used in the image processing domain. The re- coding. Our intuition is that by testing these two assumptions we sults show that by tuning the parameters of our generalized model might improve the ranking performance of SLIM. Hence, we can we are able to outperform SLIM in several Top-N recommendation summarize the hypotheses for the generalized SLIM (gSLIM) that experiments conducted on two different datasets, using both nDCG motivates our work in two: (1) in a first stage, learning a denser and nDCG@10 as evaluation metrics. These preliminary results, representation of the user-interaction matrix, such as the the low- although not conclusive, indicate a promising line of research to rank users’ and items’ latent factor matrices, can improve the final improve the performance of SLIM recommendation. Top-N recommendation performance, and (2) learning a dictionary of prototypes when reconstructing the low-rank user or item la- tent matrices can improve the final Top-N recommendation perfor- Categories and Subject Descriptors mance. H.4 [Information Systems Applications]: Miscellaneous; H.3.3 In this document, we explain our gSLIM approach, preliminary [Information Search and Retrieval]: Information filtering experiments on two datasets showing promising results, and finally we discuss limitations and some interesting ideas for future work. Keywords Recommender Systems, matrix factorization, SLIM 2. GSLIM RECOMMENDATION MODEL In order to model each user we use a traditional regularized ma- trix factorization [3] technique as defined in equation 1 where A ∈ 1. INTRODUCTION RU ×I represents our rating matrix of U users and I items. U ∈ The main goal of a recommender system is helping users dealing Rk1 ×U represents the latent users’ matrix where each user ⃗ ui is with information overload by providing personalized suggestions. represented with k1 latent dimensions, and V ∈ Rk1 ×I represents This so-called “recommendation problem” has been addressed in the latent items’ matrix where each item ⃗vi is represented with k1 different ways, such as predicting unobserved user ratings or as latent dimensions as well. Top-N recommendation [2], where the objective is to optimize the ranking of the N most relevant items, to eventually recommend min ||A − UT V||2F + λ1 (||U||2F + ||V||2F ) (1) U,V After learning the latent users’ matrix U, we proceed to compute the latent social prototypes. Similar as Karypis et. al. [6] we recon- struct each user ⃗ui as a sparse lineal combination of our prototype matrix P ∈ Rk1 ×k2 . In equation 2 we introduce a sparse coding problem [4] in order to compute our prototype matrix P, where k2 is the number of prototypes. The matrix W ∈ Rk2 ×U represents the sparse coding coefficients and λ2 is the sparsity parameter. Copyright is held by the author(s). RecSys 2015 Poster Proceedings, September 16-20, Vienna, Austria. min ||U − PW||2F + λ2 ||W||1 , s.t. ||wi || = 1 ∀i . P,W (2) 0.940 0.24 k2: 100 k2: 100 0.935 0.22 0.20 Algorithm ML-100K ML-1M 0.930 0.925 0.18 nDCG nDCG@10 nDCG nDCG@10 0.940 0.24 k2: 200 k2: 200 k1 = 30,k2 = 200, γ = 9 0.9357 0.3043 - - 0.935 0.22 0.20 0.930 k1 = 20,k2 = 943, γ = 5 - - 0.9398 0.2152 0.925 0.18 k1 = 20,k2 = 200, γ = 10 - - 0.9392 0.2155 0.940 0.24 k2: 943 k2: 943 0.935 0.22 SLIM (baseline) 0.9202 0.2927 0.9230 0.2374 0.930 0.20 Traditional MF 0.9361 0.3068 0.9377 0.2111 0.925 0.18 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 γ γ Table 1: Results of nDCG and nDCG@10 for ML datasets. (a) nDCG in ML-1M (b) nDCG@10 in ML-1M Finally to make a recommendation r̂i,j for the user i on the item T Figure 2: Results in ML-1M dataset. Line styles: dashed=SLIM, j we multiply vectors r̂i,j = u⃗′ i · ⃗vj , where user’s reconstruction dash&dot=MF, solid=gSLIM. defined by equation 3 is solved by an Orthogonal Matching Pursuit (OMP) algorithm with a maximum of γ non-negative coefficients performs both MF and gSLIM in nDCG@10, but gSLIM outper- for coefficients w. ⃗ forms both in terms of nDCG, indicating that reconstructing the u′i = P argmin (⃗ ⃗ ui − Pw) ⃗ (3) denser latent user matrix with prototypes can support a recommen- w ⃗ dation task beyond a small top-N (hypothesis 2). Finally, figures gSLIM compared to SLIM. Unlike our method, SLIM [6] solves 1a and 1b show the behavior in the ML-100K dataset of the pa- directly a similar problem to equation 2, but instead of the latent rameters k1, k2 and γ in gSLIM performance compared to SLIM users’ matrix U, it reconstructs the user-item matrix A without and MF baselines, and a similar behavior is seen in ML-1M dataset learning the prototypes P, as shown in equation 4. (Figures 2a and 2b). Analysis shows that larger k2 and γ give bet- ter performance but k1 has its peak between 20-30 latent factors 1 β for users’ matrix U. min ||A − AW||2F + ||W||2F + λ||W||1 , s.t.W ≥ 0 ∧ diag(W) = 0 W 2 2 (4) 3. EXPERIMENTS 4. CONCLUSIONS AND FUTURE WORK Using the Movielens datasets with 100K (ML-100K, |u| = 943, In this work we introduced a general formulation for SLIM, gSLIM. |i| = 1682) and 1M (ML-1M, |u| = 6040, |i| = 3952) ratings1 , Although we haven’t addressed the limitation of our model in terms we performed top-N recommendation experiments by using the of computational complexity, we think that our results open an op- LensKit framework2 , evaluating the performance with nDCG and portunity for studying unexplored ideas on Sparse Linear Methods. nDCG@10 metrics [5]. We compared SLIM3 , a regularized matrix In the first step, researchers can try different methods to learn a factorization (MF) as in equation 1, and gSLIM. For gSLIM we denser representation of the A, U or V matrices. In the step where also conducted a parameter analysis: 1) k1 , latent dimensions for we perform sparse coding, we could try reconstructing the items’ U, 2) k2 , amount of different social prototypes, and 3) γ, number latent factor matrix V rather than U, or try directly A. We can of non-negative coefficients on the user’s sparse reconstruction. All also try alternative algorithms to OMP for prototype learning. Even experiments where conducted using 5-fold cross validation. The more, our intuition is that learning these sparse prototypes can help parameters λ1 and λ2 where set to 0.01 and 1 respectively. us finding actual “stereotypes” of users and items, which can be gamma: 1 gamma: 3 gamma: 5 gamma: 7 gamma: 9 gamma: 10 0.935 used, e.g., for clustering users and items. k2: 100 0.930 0.925 0.920 0.935 5. REFERENCES [1] Santosh Kabbur, Xia Ning, and George Karypis. Fism: factored item k2: 200 0.930 0.925 0.920 similarity models for top-n recommender systems. In Proceedings of 0.935 the 19th ACM SIGKDD international conference on Knowledge k2: 943 0.930 0.925 discovery and data mining, pages 659–667. ACM, 2013. 0.920 [2] George Karypis. Evaluation of item-based top-n recommendation 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 k1 algorithms. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ’01, pages 247–254, (a) nDCG in ML-100K. gamma: 1 gamma: 3 gamma: 5 gamma: 7 gamma: 9 gamma: 10 New York, NY, USA, 2001. ACM. 0.28 [3] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization k2: 100 0.24 0.20 techniques for recommender systems. Computer, 42(8):30–37, August 2009. 0.28 k2: 200 0.24 [4] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. 0.20 Online dictionary learning for sparse coding. In Proceedings of the 0.28 26th Annual International Conference on Machine Learning, ICML k2: 943 0.24 0.20 ’09, pages 689–696, New York, NY, USA, 2009. ACM. 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 10 30 50 70 [5] Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. k1 Introduction to information retrieval, volume 1. Cambridge university (b) nDCG@10 in ML-100K. press Cambridge, 2008. Figure 1: Results in ML-100K. Line styles: dashed=SLIM, [6] Xia Ning and G. Karypis. Slim: Sparse linear methods for top-n dash&dot=MF, solid=gSLIM. recommender systems. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 497–506, Dec 2011. Results. Table 1 presents the results, showing that in the ML- [7] Xia Ning and George Karypis. Sparse linear methods with side 100K dataset SLIM is outperformed by both gSLIM and regular- information for top-n recommendations. In Proceedings of the Sixth ized MF, indicating that this smaller and denser dataset (6.3%) ACM Conference on Recommender Systems, RecSys ’12, pages 155–162, New York, NY, USA, 2012. ACM. might be better served by a traditional MF (hypothesis 1). How- [8] Yong Zheng, Bamshad Mobasher, and Robin Burke. Cslim: ever, in the larger and sparser ML-1M dataset (4.19%) SLIM out- Contextual slim recommendation algorithms. In Proceedings of the 1 http://grouplens.org/datasets/movielens/ 8th ACM Conference on Recommender Systems, RecSys ’14, pages 2 http://lenskit.org/ 301–304, New York, NY, USA, 2014. ACM. 3 http://www-users.cs.umn.edu/~xning/slim/html/