=Paper=
{{Paper
|id=None
|storemode=property
|title=What about
Interpreting Features in Matrix Factorization-based Recommender Systems
as Users?
|pdfUrl=https://ceur-ws.org/Vol-1210/SP2014_12.pdf
|volume=Vol-1210
|dblpUrl=https://dblp.org/rec/conf/ht/AleksandrovaBBC14
}}
==What about
Interpreting Features in Matrix Factorization-based Recommender Systems
as Users?==
<pdf width="1500px">https://ceur-ws.org/Vol-1210/SP2014_12.pdf</pdf>
<pre>
           What about Interpreting Features in Matrix
     Factorization-based Recommender Systems as Users?

                            Marharyta Aleksandrova                      Armelle Brun
                                Université de Lorraine -
                                                      Université de Lorraine - LORIA
                                    LORIA, France          Campus Scientifique
                                 NTUU “KPI”, Ukraine   54506 Vandoeuvre les Nancy,
                           firstname.lastname@loria.fr            France
                                                                firstname.lastname@loria.fr

                                     Anne Boyer                         Oleg Chertov
                           Université de Lorraine - LORIA               NTUU “KPI”,
                               Campus Scientifique                 37, Prospect Peremohy,
                           54506 Vandoeuvre les Nancy,              03056, Kyiv, Ukraine
                                       France                           chertov@i.ua
                           firstname.lastname@loria.fr


ABSTRACT                                                         trix R into two low-rank matrices U (dim(U ) = K × M )
Matrix factorization (MF) is a powerful approach used in         and V (dim(V ) = K × N ) in such a way that the product
recommender systems. One main drawback of MF is the dif-         of these matrices approximates the original rating matrix
ficulty to interpret the automatically formed features. Fol-     R ≈ R∗ = U T V . The set of K factors can be seen as a joint
lowing the intuition that the relation between users and         latent space on which a mapping of both users and items
items can be expressed through a reduced set of users, re-       spaces is performed [1]. Features resulting from factoriza-
ferred to as representative users, we propose a simple mod-      tion usually do not have any physical sense, what makes
ification of a traditional MF algorithm, that forms a set        resulting recommendations unexplainable. Some works [2,
of features corresponding to these representative users. On      3] made attempts to interpret them by using non-negative
one state of the art dataset, we show that the proposed          matrix factorization with multiplicative update rules (for
representative users-based non-negative matrix factorization     simplicity, further referred to as NMF). However, the pro-
(RU-NMF) discovers interpretable features, while slightly        posed interpretation is not so easy to perform as it has to
(in some cases insignificantly) decreasing the accuracy.         be discovered manually. Based on the assumption that the
                                                                 preferences between users are correlated, we assume that
Keywords                                                         within the entire set of users, there is a small set of users
Recommender systems, matrix factorization, features inter-       that have a specific role or have specific preferences. These
pretation.                                                       users can be considered as representative of the entire pop-
                                                                 ulation and we intend to discover features from MF that are
                                                                 associated with these representative users.
1.   INTRODUCTION, RELATED WORKS
Recommender systems aim to estimate ratings of target users
on previously non-seen items. One of the methods used
                                                                 2.   THE PROPOSED APPROACH: RU-NMF
                                                                 Let us consider 2 linear spaces L1 and L2 of dimensionality
for this task is matrix factorization (MF), which relies on
                                                                 respectively 6 and 3, with basic vectors in canonical form
the idea that there is a small number of latent factors (fea-
tures) that underly the interactions between users and items     {~um }, m ∈ 1, 6 and {f~k }, k ∈ 1, 3 . Let the transfer matrix
[1]. Let M be the number of users and N the number of            from L1 to L2 be specified by matrix (1). Then ~      u5 , ~
                                                                                                                            u1 and
items. The interaction between these entities is usually rep-    u2 are direct preimages of f~1 , f~2 and f~3 respectively, indeed,
                                                                 ~
resented under the form of a matrix R with element rmn           P~u5 = f~3 . At the same time vectors ~    u3 , ~
                                                                                                                 u4 and ~
                                                                                                                        u6 will be
corresponding to the rating assigned by the user m to the        mapped into linear combinations of basic vectors f~1 , f~2 , f~3 .
item n. MF techniques decompose the original rating ma-
                                                                                                                 
                                                                                    0 0 p13         p14   1   p16
                                                                              P =  1 0 p23         p24   0   p26             (1)
                                                                                    0 1 p33         p34   0   p36

                                                                 Matrix U can be considered as a transfer matrix from the
                                                                 space of users to the space of features. Analyzing the ex-
                                                                 ample considered above, we can say that if matrix U has a
                                                                 form similar to (1), i.e. U has exactly K unitary columns
with one non-zero and equal to 1 element on different po-
sitions, then the users corresponding to these columns are        Table 1: Accuracy loss ρ between RU-NMF and the
direct preimages of the K features. The features can thus be      traditional NMF, for 10, 15 and 20 features.
directly interpreted as users. These users will be referred to                         Learning set         Test set
as representative users. In order to force matrix U to satisfy                         MAE RMSE            MAE RMSE
the imposed conditions we propose the RU-NMF approach,                 10 features
that consists of 6 steps, further detailed below.                      mean          0.17%     0.19%     0.05%      0.05%
                                                                       min            0.03%     0.03%    -0.06%     -0.07%
Step 1. A traditional matrix factorization is performed.               max            0.38%     0.46%     0.18%      0.20%
Following [2, 3], NMF is used.                                         15 features
                                                                       mean          0.98%     1.29%     0.29%      0.33%
Step 2. A normalization of each of the M column vectors of             min            0.49%     0.61%    -0.06%     -0.04%
the matrix U is performed so as to result in unitary columns.          max            1.71%     2.38%     0.77%      0.79%
The resulting normalized matrix is denoted by Unorm and                20 features
the vector of normalization coefficients by C.                         mean          2.78%     4.08%     0.70%      0.82%
                                                                       min            1.38%     1.94%     0.13%      0.12%
Step 3. This step is dedicated to the identification of the            max            4.27%     6.64%     1.43%      1.53%
representative users in the Unorm matrix. A user um is con-
sidered as the best preimage candidate (representative user)
for the feature fk if the vector unorm
                                   m    is the closest to the
corresponding canonical vector (a vector with the only one        lower on test than on learning, for both errors and for all
non-zero and equal to 1 value on the position k). The no-         the number of features: thus we can say that RU-NMF has
tion of closeness between vectors is expressed in Euclidean       a lower relative loss between learn and test compared to
distance. Once all representative users are identified, the       NMF. A thorough analysis of the losses obtained on the 30
matrix Unorm is modified so as to obtain a matrix in a form       samples has shown that the accuracy loss on the test set
of (1): lines, corresponding to the representative users, are     is lower than the one on the learning set in all cases. In
replaced with appropriate canonic vectors. The resulting          some runs, RU-NMF has even a higher accuracy than NMF
                                  mod
modified matrix is denoted by Unorm   .                           (Table 1, values in gray shadow).
                                         mod
Step 4. Each column of the matrix Unorm       is multiplied by
the appropriate normalization coefficient from the set C re-      4.   DISCUSSIONS AND FUTURE WORK
sulting in matrix U mod . After this, representative users will   The analysis of the accuracy loss between RU-NMF and tra-
remain preimages of the features but with scaling factors.        ditional NMF has shown that prediction error rises slightly
                                                                  (in some cases insignificantly) with RU-NMF. However the
Step 5. In order to obtain the best model we also have to         features formed with this approach consistently disturb the
modify the matrix V . The modification of V can be per-           accuracy on the test set less than on the learning one. This
formed using optimization methods with the starting value         can be considered as a potential ability of factorization tech-
obtained during the first step. As the objective of this pa-      niques with features related to reality to form better searched
per is to determine the relevance of finding preimages of the     predictions. The proposed approach also lets us easily ex-
features and to quantify the decrease in the quality of the       plain the resulting recommendations. Indeed, each user of
recommendations, we did not consider this step.                   the population is linearly mapped on the basis related to
                                                                  representative users (through matrix U ) and the preferences
Step 6. The resulting recommendation model is made up             of the latter ones (expressed by matrix V ) are used to esti-
                                   T                             mate the ratings of the whole population. In a future work,
of matrices U mod and V (R∗ = U mod V ).
                                                                  we would like to focus first of all on the verifications of the
                                                                  hypothesis that users associated with the features can be re-
3.     EXPERIMENTAL RESULTS                                       ally considered as representative ones. We believe that this
Experiments are performed on the 100k MovieLens dataset1 ,        can be done while solving the new item cold-start problem
with 80% of ratings used for learning the model and 20%           with ratings of the representative users on new items used
for testing it. The accuracy is evaluated with two classical      to estimate ratings of all the population on these items.
measures: mean absolute error (MAE) and root mean square
error (RMSE). The goal of the experiments is to compare           5.   REFERENCES
the accuracies of RU-NMF and NMF. For these reasons we            [1] Y. Koren, R. Bell, and C. Volinsky, “Matrix
compute the accuracy loss ρ = err(RU-NMF)−err(NMF)
                                         err(NMF)
                                                         100%         factorization techniques for recommender systems,”
for factorizations with 10, 15 and 20 features on 30 different        Computer, vol. 42, no. 8, pp. 30–37, 2009.
samples. Results are presented in Table 1. A positive loss        [2] S. Zhang, W. Wang, J. Ford, and F. Makedon,
means that NMF performs better than RU-NMF. In the                    “Learning from incomplete ratings using non-negative
worst case the accuracy loss equals to 6.64%, for RMSE                matrix factorization.” in Proc. of the 6th SIAM Conf.
with 20 features, which is quite small. The lowest average            on Data Mining, 2006.
accuracy loss (0.05%) is obtained with 10 features for both       [3] J.-F. Pessiot, V. Truong, N. Usunier, M. Amini, and
errors. When comparing the accuracy loss between test and             P. Gallinari, “Factorisation en matrices non-négatives
learning sets, we can note that the average loss is 3 times           pour le filtrage collaboratif,” in 3rd Conf. en Recherche
1
    http://grouplens.org/datasets/movielens/                          d’Information et Applications, 2006, pp. 315–326.

</pre>