Recommendation with the Right Slice: Speeding Up
          Collaborative Filtering with Factorization Machines

                      Babak Loni1 , Martha Larson1 , Alexandros Karatzoglou2 , Alan Hanjalic1
                                                1
                                                    Delft University of Technology, Delft, Netherlands
                                                        2
                                                          Telefonica Research, Barcelona, Spain
                             {b.loni, m.a.larson}@tudelft.nl, alexk@tid.es, a.hanjalic@tudelft.nl


ABSTRACT
We propose an alternative way to efficiently exploit rating data for
collaborative filtering with Factorization Machines (FMs). Our ap-
proach partitions user-item matrix into ‘slices’ which are mutually
exclusive with respect to items. The training phase makes direct
use of the slice of interest (target slice), while incorporating infor-
mation from other slices indirectly. FMs represent user-item in-
teractions as feature vectors, and they offer the advantage of easy
incorporation of complementary information. We exploit this ad-
vantage to integrate information from other auxiliary slices. We
demonstrate, using experiments on two benchmark datasets, that
improved performance can be achieved, while the time complexity
of training can be reduced significantly.


1. INTRODUCTION
                                                                               Figure 1: Feature construction in the ‘Slice and Train’ method.
   In this paper, we investigate the idea that the ‘right’ data, rather
than all data, should be exploited to build an efficient recommender
system. We introduce an approach called ‘Slice and Train’ that
                                                                               Factorization Machines (FMs) generate recommendations by work-
trains a model on the right slice of a dataset (a ‘sensible’ subset
                                                                               ing with vector representations of user-item data. A benefit of these
containing the current items that need to be recommended) and
                                                                               representations is that they can easily be extended with additional
exploits the information in the rest of the dataset indirectly. This
                                                                               features. Such extensions are usually used to leverage additional
approach is particularly interesting in scenarios in which recom-
                                                                               information [5], or addition domains [3], to improve collaborative
mendations are needed only for a particular subset of the overall
                                                                               filtering. Here, the ‘additional’ information is actually drawn from
item set. An example is an e-commerce website that does not gen-
                                                                               different slices within the same dataset.
erate recommendations for out-of-season or discontinued products.
The obvious advantage of this approach is that models are trained
on a much smaller dataset (only the data in the slice), leading to             2.    THE SLICE AND TRAIN METHOD
shorter training times.                                                           The ‘Slice and Train’ method is implemented with Factoriza-
   The ‘Slice and Train’ approach also has, however, a less expected           tion Machines [4]. FMs are general factorization models which
advantage, namely, that it offers highly competitive performance               can be easily adopted for different scenarios without requiring to
with conventional approaches that make use of the whole dataset,               adopt specific models and learning algorithms. In a rating pre-
and in some cases even improves the performance. This advantage                diction scenario with FMs, user-item interactions are represented
means that it is helpful to apply ‘Slice and Train’ even in cases in           by a feature vector x and the rating is taken as the output y. By
which predictions are needed for all items in the dataset, by training         learning the model, the rating y can be predicted for unknown user-
a series of separate models, one for each slice.                               item interactions. In other words, if user u rated item i the feature
   Our approach consists of two steps: first the data is partitioned           vector x can be represented by its sparse binary representation as
into a set of slices containing mutually exclusive items. The slices           x(u, i) = {(u, 1), (i, 1)}, where non-zero elements correspond to
can be formed by grouping items based on their properties (e.g., the           user u and item i.
category of item), availability, item), or by more advanced slicing               The ‘Slice and Train’ method creates feature vectors x only for
methods such as clustering. In the second step the model is trained            the ratings in the target slice (i.e., the slice of interest). The rating
using the samples in the slice of interest, i.e., target slice, while          information from the other slices is exploited indirectly by extend-
other slices are indirectly being exploited as auxiliary slices. To            ing the feature vectors that are created for the target slice. Using
efficiently exploit information from auxiliary slices our approach             this method, the accuracy of the recommender is preserved, while
trains the recommender model using Factorization Machines [4].                 the training time is significantly reduced, since the number of sam-
                                                                               ples in the target slice is lower than in the original dataset.
                                                                                  Figure 1 shows how the feature vectors in the ‘Slice and Train’
Copyright is held by the authors.
                                                                               method are constructed. The feature vectors have a binary part,
RecSys 2015 Poster Proceedings, September 16-20, 2015, Austria, Vienna.        which reflects the corresponding user and item of a rating, and a
real-valued auxiliary part, which is constructed by using the rating
information of auxiliary slices.                                             Table 1: The performance of the proposed ‘Slice and Train’
   To understand how the auxiliary features are built, assume that           method compared to other experimental setups.
the dataset is divided into m slices {S1 , . . . , Sm }. Let us also as-      Eval. Metric           RMSE                MAE
sume that the items that are rated by user u in slice Sj is represented       Dataset           Amazon ML          Amazon ML
by sj (u). By extending the feature vector x(u, i) with auxiliary             FM-ALL            1.1610    0.8894 0.9533       0.8387
features, we can represent it with the following sparse representa-           MF-ALL            1.2927    0.8939 1.014        0.8411
tion:                                                                         FM-SLICE          1.2330    0.8974 0.9915       0.8427
                                                                              FM-SLICE-AUX 1.0539         0.8644 0.9017       0.8260
             x(u, i) = {(u, 1), (i, 1), z2 (u), . . . , zm (u)}       (1)
                        |    {z      } |          {z         }
                            target slice    auxiliary slices
                                                                                 • FM-SLICE-AUX: FM applied to slices with feature vectors
where zj (u) is sparse representation of auxiliary features from slice              extended by auxiliary features derived form auxiliary slices.
j and is defined as:                                                            The first two lines of Table 1 report results when all the data are
                                                                             used. The effectiveness of FM as our model is confirmed when we
                  zj (u) = {(l, ϕj (u, l)) : l ∈ sj (u)}              (2)
                                                                             compare FM-ALL baseline with MF-ALL, the Matrix-Factorization-
where ϕj (u, l) is a normalization function that defines the value of        based method. Comparing the remaining lines yields to some inter-
the auxiliary features. We define ϕj based on the ratings that user u        esting insights. Not surprisingly, when we train the model only on
gave to items in slice j and normalize it based on the average value         a target slice without using any auxiliary information (FM-SLICE)
of user ratings as follows:                                                  the performance of the model drops. This drop can be attributed
                                                                             to the smaller number of samples that is being used when training.
                                  rj (u, l) − r̄(u)
                    ϕj (u, l) =                     +1                (3)    However, when the auxiliary feature are exploited with the ‘Slice
                                   rmax − rmin                               and Train’ method (FM-SLICE-AUX) the performance becomes
where rj (u, l) indicates the rating of user u to item l in slice j, r̄(u)   even better than the situation where the complete data is being used
indicates the average value of user ratings, and rmax and rmin in-           (FM-ALL). We attribute this improvement to the items in the slices
dicate the maximum and minimum possible values for rating items.             being more homogeneous than the dataset as a whole. However, re-
                                                                             lying on homogeneity alone does not lead to the best performance
3. DATASET AND EXPERIMENTS                                                   (FM-SLICE). Instead the best performance is achieved when slic-
                                                                             ing is combined with auxiliary features. Furthermore, the time-
   In this work we tested our method on two benchmark datasets               complexity of training is significantly reduced when the model is
of MovieLens 1M dataset1 and Amazon reviews dataset [1]. The                 trained on a target slice. In our setup the training time of one slice
Amazon dataset contains product ratings in four different groups of          including auxiliary features compared to the training time of com-
items namely books, music CDs, DVDs and Video tapes. We use                  plete dataset falls from 7431 ms to 1605 ms for Amazon dataset
these four groups as natural slices that exist in this dataset. For the      and from 14569 ms to 6574 ms for the MovieLens dataset.
MovieLens dataset we build slices by clustering movies based on
their genres using a k-means clustering algorithm. Various numbers           4.    CONCLUSION
of clusters (i.e., slices) can be made for this dataset. Via exploratory
experiments we found that two or three slices perform well.                     In this paper, we presented a brief overview of the ‘Slice and
   In order to test our approach, the data in target slices are divided      Train’ method, which trains a recommender model on a sensible
into 75% training and 25% test data. For every experiment 10%                subset of items (target slice) using Factorization Machines, and
of training data is only used as validation data to tune the hyper-          exploits other information indirectly. This sort of targeted training
parameters. Factorization Machines can be trained using three dif-           yields improvements in both performance and complexity.
ferent learning methods [4]: Stochastic Gradient Decent (SGD),               Acknowledgment
Alternating Least Square (ALS) and Markov Chain Monte Carlo
(MCMC). In this paper we only report the results using MCMC                  This work is supported by funding from EU FP7 project under grant
learning method due to space limitation and since it usually per-            agreements no. 610594 (CrowdRec)
form better than the other two methods.
   The experiments are implemented using WrapRec [2] and LibFM               5.    REFERENCES
[4] open source toolkits. Furthermore, we compare the perfor-                [1] J. Leskovec, L. A. Adamic, and B. A. Huberman. The
mance of FM with Matrix Factorization (MF) to ensure our FM-                     dynamics of viral marketing. ACM Trans. Web, 1(1), 2007.
based solution is competitive with other state-of-the-art methods.
                                                                             [2] B. Loni and A. Said. Wraprec: An easy extension of
We used an implementation of MF in MyMediaLite2 toolkit.
                                                                                 recommender system libraries. In Proceedings of 8th ACM
   Table 1 presents the performance of our sliced training method
                                                                                 International Conference of Recommender Systems, 2014.
compared with the situation that the no slicing is done. The ex-
                                                                             [3] B. Loni, Y. Shi, M. Larson, and A. Hanjalic. Cross-domain
periments are evaluated using RMSE and MAE evaluation metrics.
                                                                                 collaborative filtering with factorization machines. In
The reported results are the averaged metrics over all slices when
                                                                                 Proceedings of the 36th European Conference on Information
each of them is considered as the target slice. The four different
                                                                                 Retrieval, ECIR ’14, 2014.
setups that are compared are the followings:
                                                                             [4] S. Rendle. Factorization machines with libfm. ACM Trans.
    • FM-ALL: FM applied on the complete dataset.                                Intell. Syst. Technol., 3(3), May 2012.
    • MF-ALL: MF applied on the complete dataset.                            [5] S. Rendle, Z. Gantner, C. Freudenthaler, and
    • FM-SLICE: FM applied on independent slices. No auxiliary                   L. Schmidt-Thieme. Fast context-aware recommendations
       features are used for this setup.                                         with factorization machines. In Proceedings of the 34th ACM
1
    http://grouplens.org/datasets/movielens/                                     SIGIR Conference on Research and Development in
2
    http://www.mymedialite.net                                                   Information Retrieval, SIGIR ’11. ACM, 2011.