Recommendation with the Right Slice: Speeding Up Collaborative Filtering with Factorization Machines Babak Loni1 , Martha Larson1 , Alexandros Karatzoglou2 , Alan Hanjalic1 1 Delft University of Technology, Delft, Netherlands 2 Telefonica Research, Barcelona, Spain {b.loni, m.a.larson}@tudelft.nl, alexk@tid.es, a.hanjalic@tudelft.nl ABSTRACT We propose an alternative way to efficiently exploit rating data for collaborative filtering with Factorization Machines (FMs). Our ap- proach partitions user-item matrix into ‘slices’ which are mutually exclusive with respect to items. The training phase makes direct use of the slice of interest (target slice), while incorporating infor- mation from other slices indirectly. FMs represent user-item in- teractions as feature vectors, and they offer the advantage of easy incorporation of complementary information. We exploit this ad- vantage to integrate information from other auxiliary slices. We demonstrate, using experiments on two benchmark datasets, that improved performance can be achieved, while the time complexity of training can be reduced significantly. 1. INTRODUCTION Figure 1: Feature construction in the ‘Slice and Train’ method. In this paper, we investigate the idea that the ‘right’ data, rather than all data, should be exploited to build an efficient recommender system. We introduce an approach called ‘Slice and Train’ that Factorization Machines (FMs) generate recommendations by work- trains a model on the right slice of a dataset (a ‘sensible’ subset ing with vector representations of user-item data. A benefit of these containing the current items that need to be recommended) and representations is that they can easily be extended with additional exploits the information in the rest of the dataset indirectly. This features. Such extensions are usually used to leverage additional approach is particularly interesting in scenarios in which recom- information [5], or addition domains [3], to improve collaborative mendations are needed only for a particular subset of the overall filtering. Here, the ‘additional’ information is actually drawn from item set. An example is an e-commerce website that does not gen- different slices within the same dataset. erate recommendations for out-of-season or discontinued products. The obvious advantage of this approach is that models are trained on a much smaller dataset (only the data in the slice), leading to 2. THE SLICE AND TRAIN METHOD shorter training times. The ‘Slice and Train’ method is implemented with Factoriza- The ‘Slice and Train’ approach also has, however, a less expected tion Machines [4]. FMs are general factorization models which advantage, namely, that it offers highly competitive performance can be easily adopted for different scenarios without requiring to with conventional approaches that make use of the whole dataset, adopt specific models and learning algorithms. In a rating pre- and in some cases even improves the performance. This advantage diction scenario with FMs, user-item interactions are represented means that it is helpful to apply ‘Slice and Train’ even in cases in by a feature vector x and the rating is taken as the output y. By which predictions are needed for all items in the dataset, by training learning the model, the rating y can be predicted for unknown user- a series of separate models, one for each slice. item interactions. In other words, if user u rated item i the feature Our approach consists of two steps: first the data is partitioned vector x can be represented by its sparse binary representation as into a set of slices containing mutually exclusive items. The slices x(u, i) = {(u, 1), (i, 1)}, where non-zero elements correspond to can be formed by grouping items based on their properties (e.g., the user u and item i. category of item), availability, item), or by more advanced slicing The ‘Slice and Train’ method creates feature vectors x only for methods such as clustering. In the second step the model is trained the ratings in the target slice (i.e., the slice of interest). The rating using the samples in the slice of interest, i.e., target slice, while information from the other slices is exploited indirectly by extend- other slices are indirectly being exploited as auxiliary slices. To ing the feature vectors that are created for the target slice. Using efficiently exploit information from auxiliary slices our approach this method, the accuracy of the recommender is preserved, while trains the recommender model using Factorization Machines [4]. the training time is significantly reduced, since the number of sam- ples in the target slice is lower than in the original dataset. Figure 1 shows how the feature vectors in the ‘Slice and Train’ Copyright is held by the authors. method are constructed. The feature vectors have a binary part, RecSys 2015 Poster Proceedings, September 16-20, 2015, Austria, Vienna. which reflects the corresponding user and item of a rating, and a real-valued auxiliary part, which is constructed by using the rating information of auxiliary slices. Table 1: The performance of the proposed ‘Slice and Train’ To understand how the auxiliary features are built, assume that method compared to other experimental setups. the dataset is divided into m slices {S1 , . . . , Sm }. Let us also as- Eval. Metric RMSE MAE sume that the items that are rated by user u in slice Sj is represented Dataset Amazon ML Amazon ML by sj (u). By extending the feature vector x(u, i) with auxiliary FM-ALL 1.1610 0.8894 0.9533 0.8387 features, we can represent it with the following sparse representa- MF-ALL 1.2927 0.8939 1.014 0.8411 tion: FM-SLICE 1.2330 0.8974 0.9915 0.8427 FM-SLICE-AUX 1.0539 0.8644 0.9017 0.8260 x(u, i) = {(u, 1), (i, 1), z2 (u), . . . , zm (u)} (1) | {z } | {z } target slice auxiliary slices • FM-SLICE-AUX: FM applied to slices with feature vectors where zj (u) is sparse representation of auxiliary features from slice extended by auxiliary features derived form auxiliary slices. j and is defined as: The first two lines of Table 1 report results when all the data are used. The effectiveness of FM as our model is confirmed when we zj (u) = {(l, ϕj (u, l)) : l ∈ sj (u)} (2) compare FM-ALL baseline with MF-ALL, the Matrix-Factorization- where ϕj (u, l) is a normalization function that defines the value of based method. Comparing the remaining lines yields to some inter- the auxiliary features. We define ϕj based on the ratings that user u esting insights. Not surprisingly, when we train the model only on gave to items in slice j and normalize it based on the average value a target slice without using any auxiliary information (FM-SLICE) of user ratings as follows: the performance of the model drops. This drop can be attributed to the smaller number of samples that is being used when training. rj (u, l) − r̄(u) ϕj (u, l) = +1 (3) However, when the auxiliary feature are exploited with the ‘Slice rmax − rmin and Train’ method (FM-SLICE-AUX) the performance becomes where rj (u, l) indicates the rating of user u to item l in slice j, r̄(u) even better than the situation where the complete data is being used indicates the average value of user ratings, and rmax and rmin in- (FM-ALL). We attribute this improvement to the items in the slices dicate the maximum and minimum possible values for rating items. being more homogeneous than the dataset as a whole. However, re- lying on homogeneity alone does not lead to the best performance 3. DATASET AND EXPERIMENTS (FM-SLICE). Instead the best performance is achieved when slic- ing is combined with auxiliary features. Furthermore, the time- In this work we tested our method on two benchmark datasets complexity of training is significantly reduced when the model is of MovieLens 1M dataset1 and Amazon reviews dataset [1]. The trained on a target slice. In our setup the training time of one slice Amazon dataset contains product ratings in four different groups of including auxiliary features compared to the training time of com- items namely books, music CDs, DVDs and Video tapes. We use plete dataset falls from 7431 ms to 1605 ms for Amazon dataset these four groups as natural slices that exist in this dataset. For the and from 14569 ms to 6574 ms for the MovieLens dataset. MovieLens dataset we build slices by clustering movies based on their genres using a k-means clustering algorithm. Various numbers 4. CONCLUSION of clusters (i.e., slices) can be made for this dataset. Via exploratory experiments we found that two or three slices perform well. In this paper, we presented a brief overview of the ‘Slice and In order to test our approach, the data in target slices are divided Train’ method, which trains a recommender model on a sensible into 75% training and 25% test data. For every experiment 10% subset of items (target slice) using Factorization Machines, and of training data is only used as validation data to tune the hyper- exploits other information indirectly. This sort of targeted training parameters. Factorization Machines can be trained using three dif- yields improvements in both performance and complexity. ferent learning methods [4]: Stochastic Gradient Decent (SGD), Acknowledgment Alternating Least Square (ALS) and Markov Chain Monte Carlo (MCMC). In this paper we only report the results using MCMC This work is supported by funding from EU FP7 project under grant learning method due to space limitation and since it usually per- agreements no. 610594 (CrowdRec) form better than the other two methods. The experiments are implemented using WrapRec [2] and LibFM 5. REFERENCES [4] open source toolkits. Furthermore, we compare the perfor- [1] J. Leskovec, L. A. Adamic, and B. A. Huberman. The mance of FM with Matrix Factorization (MF) to ensure our FM- dynamics of viral marketing. ACM Trans. Web, 1(1), 2007. based solution is competitive with other state-of-the-art methods. [2] B. Loni and A. Said. Wraprec: An easy extension of We used an implementation of MF in MyMediaLite2 toolkit. recommender system libraries. In Proceedings of 8th ACM Table 1 presents the performance of our sliced training method International Conference of Recommender Systems, 2014. compared with the situation that the no slicing is done. The ex- [3] B. Loni, Y. Shi, M. Larson, and A. Hanjalic. Cross-domain periments are evaluated using RMSE and MAE evaluation metrics. collaborative filtering with factorization machines. In The reported results are the averaged metrics over all slices when Proceedings of the 36th European Conference on Information each of them is considered as the target slice. The four different Retrieval, ECIR ’14, 2014. setups that are compared are the followings: [4] S. Rendle. Factorization machines with libfm. ACM Trans. • FM-ALL: FM applied on the complete dataset. Intell. Syst. Technol., 3(3), May 2012. • MF-ALL: MF applied on the complete dataset. [5] S. Rendle, Z. Gantner, C. Freudenthaler, and • FM-SLICE: FM applied on independent slices. No auxiliary L. Schmidt-Thieme. Fast context-aware recommendations features are used for this setup. with factorization machines. In Proceedings of the 34th ACM 1 http://grouplens.org/datasets/movielens/ SIGIR Conference on Research and Development in 2 http://www.mymedialite.net Information Retrieval, SIGIR ’11. ACM, 2011.