=Paper=
{{Paper
|id=None
|storemode=property
|title=What about
Interpreting Features in Matrix Factorization-based Recommender Systems
as Users?
|pdfUrl=https://ceur-ws.org/Vol-1210/SP2014_12.pdf
|volume=Vol-1210
|dblpUrl=https://dblp.org/rec/conf/ht/AleksandrovaBBC14
}}
==What about
Interpreting Features in Matrix Factorization-based Recommender Systems
as Users?==
What about Interpreting Features in Matrix Factorization-based Recommender Systems as Users? Marharyta Aleksandrova Armelle Brun Université de Lorraine - Université de Lorraine - LORIA LORIA, France Campus Scientifique NTUU “KPI”, Ukraine 54506 Vandoeuvre les Nancy, firstname.lastname@loria.fr France firstname.lastname@loria.fr Anne Boyer Oleg Chertov Université de Lorraine - LORIA NTUU “KPI”, Campus Scientifique 37, Prospect Peremohy, 54506 Vandoeuvre les Nancy, 03056, Kyiv, Ukraine France chertov@i.ua firstname.lastname@loria.fr ABSTRACT trix R into two low-rank matrices U (dim(U ) = K × M ) Matrix factorization (MF) is a powerful approach used in and V (dim(V ) = K × N ) in such a way that the product recommender systems. One main drawback of MF is the dif- of these matrices approximates the original rating matrix ficulty to interpret the automatically formed features. Fol- R ≈ R∗ = U T V . The set of K factors can be seen as a joint lowing the intuition that the relation between users and latent space on which a mapping of both users and items items can be expressed through a reduced set of users, re- spaces is performed [1]. Features resulting from factoriza- ferred to as representative users, we propose a simple mod- tion usually do not have any physical sense, what makes ification of a traditional MF algorithm, that forms a set resulting recommendations unexplainable. Some works [2, of features corresponding to these representative users. On 3] made attempts to interpret them by using non-negative one state of the art dataset, we show that the proposed matrix factorization with multiplicative update rules (for representative users-based non-negative matrix factorization simplicity, further referred to as NMF). However, the pro- (RU-NMF) discovers interpretable features, while slightly posed interpretation is not so easy to perform as it has to (in some cases insignificantly) decreasing the accuracy. be discovered manually. Based on the assumption that the preferences between users are correlated, we assume that Keywords within the entire set of users, there is a small set of users Recommender systems, matrix factorization, features inter- that have a specific role or have specific preferences. These pretation. users can be considered as representative of the entire pop- ulation and we intend to discover features from MF that are associated with these representative users. 1. INTRODUCTION, RELATED WORKS Recommender systems aim to estimate ratings of target users on previously non-seen items. One of the methods used 2. THE PROPOSED APPROACH: RU-NMF Let us consider 2 linear spaces L1 and L2 of dimensionality for this task is matrix factorization (MF), which relies on respectively 6 and 3, with basic vectors in canonical form the idea that there is a small number of latent factors (fea- tures) that underly the interactions between users and items {~um }, m ∈ 1, 6 and {f~k }, k ∈ 1, 3 . Let the transfer matrix [1]. Let M be the number of users and N the number of from L1 to L2 be specified by matrix (1). Then ~ u5 , ~ u1 and items. The interaction between these entities is usually rep- u2 are direct preimages of f~1 , f~2 and f~3 respectively, indeed, ~ resented under the form of a matrix R with element rmn P~u5 = f~3 . At the same time vectors ~ u3 , ~ u4 and ~ u6 will be corresponding to the rating assigned by the user m to the mapped into linear combinations of basic vectors f~1 , f~2 , f~3 . item n. MF techniques decompose the original rating ma- 0 0 p13 p14 1 p16 P = 1 0 p23 p24 0 p26 (1) 0 1 p33 p34 0 p36 Matrix U can be considered as a transfer matrix from the space of users to the space of features. Analyzing the ex- ample considered above, we can say that if matrix U has a form similar to (1), i.e. U has exactly K unitary columns with one non-zero and equal to 1 element on different po- sitions, then the users corresponding to these columns are Table 1: Accuracy loss ρ between RU-NMF and the direct preimages of the K features. The features can thus be traditional NMF, for 10, 15 and 20 features. directly interpreted as users. These users will be referred to Learning set Test set as representative users. In order to force matrix U to satisfy MAE RMSE MAE RMSE the imposed conditions we propose the RU-NMF approach, 10 features that consists of 6 steps, further detailed below. mean 0.17% 0.19% 0.05% 0.05% min 0.03% 0.03% -0.06% -0.07% Step 1. A traditional matrix factorization is performed. max 0.38% 0.46% 0.18% 0.20% Following [2, 3], NMF is used. 15 features mean 0.98% 1.29% 0.29% 0.33% Step 2. A normalization of each of the M column vectors of min 0.49% 0.61% -0.06% -0.04% the matrix U is performed so as to result in unitary columns. max 1.71% 2.38% 0.77% 0.79% The resulting normalized matrix is denoted by Unorm and 20 features the vector of normalization coefficients by C. mean 2.78% 4.08% 0.70% 0.82% min 1.38% 1.94% 0.13% 0.12% Step 3. This step is dedicated to the identification of the max 4.27% 6.64% 1.43% 1.53% representative users in the Unorm matrix. A user um is con- sidered as the best preimage candidate (representative user) for the feature fk if the vector unorm m is the closest to the corresponding canonical vector (a vector with the only one lower on test than on learning, for both errors and for all non-zero and equal to 1 value on the position k). The no- the number of features: thus we can say that RU-NMF has tion of closeness between vectors is expressed in Euclidean a lower relative loss between learn and test compared to distance. Once all representative users are identified, the NMF. A thorough analysis of the losses obtained on the 30 matrix Unorm is modified so as to obtain a matrix in a form samples has shown that the accuracy loss on the test set of (1): lines, corresponding to the representative users, are is lower than the one on the learning set in all cases. In replaced with appropriate canonic vectors. The resulting some runs, RU-NMF has even a higher accuracy than NMF mod modified matrix is denoted by Unorm . (Table 1, values in gray shadow). mod Step 4. Each column of the matrix Unorm is multiplied by the appropriate normalization coefficient from the set C re- 4. DISCUSSIONS AND FUTURE WORK sulting in matrix U mod . After this, representative users will The analysis of the accuracy loss between RU-NMF and tra- remain preimages of the features but with scaling factors. ditional NMF has shown that prediction error rises slightly (in some cases insignificantly) with RU-NMF. However the Step 5. In order to obtain the best model we also have to features formed with this approach consistently disturb the modify the matrix V . The modification of V can be per- accuracy on the test set less than on the learning one. This formed using optimization methods with the starting value can be considered as a potential ability of factorization tech- obtained during the first step. As the objective of this pa- niques with features related to reality to form better searched per is to determine the relevance of finding preimages of the predictions. The proposed approach also lets us easily ex- features and to quantify the decrease in the quality of the plain the resulting recommendations. Indeed, each user of recommendations, we did not consider this step. the population is linearly mapped on the basis related to representative users (through matrix U ) and the preferences Step 6. The resulting recommendation model is made up of the latter ones (expressed by matrix V ) are used to esti- T mate the ratings of the whole population. In a future work, of matrices U mod and V (R∗ = U mod V ). we would like to focus first of all on the verifications of the hypothesis that users associated with the features can be re- 3. EXPERIMENTAL RESULTS ally considered as representative ones. We believe that this Experiments are performed on the 100k MovieLens dataset1 , can be done while solving the new item cold-start problem with 80% of ratings used for learning the model and 20% with ratings of the representative users on new items used for testing it. The accuracy is evaluated with two classical to estimate ratings of all the population on these items. measures: mean absolute error (MAE) and root mean square error (RMSE). The goal of the experiments is to compare 5. REFERENCES the accuracies of RU-NMF and NMF. For these reasons we [1] Y. Koren, R. Bell, and C. Volinsky, “Matrix compute the accuracy loss ρ = err(RU-NMF)−err(NMF) err(NMF) 100% factorization techniques for recommender systems,” for factorizations with 10, 15 and 20 features on 30 different Computer, vol. 42, no. 8, pp. 30–37, 2009. samples. Results are presented in Table 1. A positive loss [2] S. Zhang, W. Wang, J. Ford, and F. Makedon, means that NMF performs better than RU-NMF. In the “Learning from incomplete ratings using non-negative worst case the accuracy loss equals to 6.64%, for RMSE matrix factorization.” in Proc. of the 6th SIAM Conf. with 20 features, which is quite small. The lowest average on Data Mining, 2006. accuracy loss (0.05%) is obtained with 10 features for both [3] J.-F. Pessiot, V. Truong, N. Usunier, M. Amini, and errors. When comparing the accuracy loss between test and P. Gallinari, “Factorisation en matrices non-négatives learning sets, we can note that the average loss is 3 times pour le filtrage collaboratif,” in 3rd Conf. en Recherche 1 http://grouplens.org/datasets/movielens/ d’Information et Applications, 2006, pp. 315–326.