<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recommendation with the Right Slice: Speeding Up Collaborative Filtering with Factorization Machines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Babak Loni</string-name>
          <email>b.loni@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martha Larson</string-name>
          <email>m.a.larson@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandros Karatzoglou</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Hanjalic</string-name>
          <email>a.hanjalic@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
          ,
          <addr-line>Delft</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Telefonica Research</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>We propose an alternative way to efficiently exploit rating data for collaborative filtering with Factorization Machines (FMs). Our approach partitions user-item matrix into 'slices' which are mutually exclusive with respect to items. The training phase makes direct use of the slice of interest (target slice), while incorporating information from other slices indirectly. FMs represent user-item interactions as feature vectors, and they offer the advantage of easy incorporation of complementary information. We exploit this advantage to integrate information from other auxiliary slices. We demonstrate, using experiments on two benchmark datasets, that improved performance can be achieved, while the time complexity of training can be reduced significantly.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>In this paper, we investigate the idea that the ‘right’ data, rather
than all data, should be exploited to build an efficient recommender
system. We introduce an approach called ‘Slice and Train’ that
trains a model on the right slice of a dataset (a ‘sensible’ subset
containing the current items that need to be recommended) and
exploits the information in the rest of the dataset indirectly. This
approach is particularly interesting in scenarios in which
recommendations are needed only for a particular subset of the overall
item set. An example is an e-commerce website that does not
generate recommendations for out-of-season or discontinued products.
The obvious advantage of this approach is that models are trained
on a much smaller dataset (only the data in the slice), leading to
shorter training times.</p>
      <p>The ‘Slice and Train’ approach also has, however, a less expected
advantage, namely, that it offers highly competitive performance
with conventional approaches that make use of the whole dataset,
and in some cases even improves the performance. This advantage
means that it is helpful to apply ‘Slice and Train’ even in cases in
which predictions are needed for all items in the dataset, by training
a series of separate models, one for each slice.</p>
      <p>
        Our approach consists of two steps: first the data is partitioned
into a set of slices containing mutually exclusive items. The slices
can be formed by grouping items based on their properties (e.g., the
category of item), availability, item), or by more advanced slicing
methods such as clustering. In the second step the model is trained
using the samples in the slice of interest, i.e., target slice, while
other slices are indirectly being exploited as auxiliary slices. To
efficiently exploit information from auxiliary slices our approach
trains the recommender model using Factorization Machines [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Factorization Machines (FMs) generate recommendations by
working with vector representations of user-item data. A benefit of these
representations is that they can easily be extended with additional
features. Such extensions are usually used to leverage additional
information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or addition domains [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], to improve collaborative
filtering. Here, the ‘additional’ information is actually drawn from
different slices within the same dataset.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>THE SLICE AND TRAIN METHOD</title>
      <p>
        The ‘Slice and Train’ method is implemented with
Factorization Machines [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. FMs are general factorization models which
can be easily adopted for different scenarios without requiring to
adopt specific models and learning algorithms. In a rating
prediction scenario with FMs, user-item interactions are represented
by a feature vector x and the rating is taken as the output y. By
learning the model, the rating y can be predicted for unknown
useritem interactions. In other words, if user u rated item i the feature
vector x can be represented by its sparse binary representation as
x(u; i) = f(u; 1); (i; 1)g, where non-zero elements correspond to
user u and item i.
      </p>
      <p>The ‘Slice and Train’ method creates feature vectors x only for
the ratings in the target slice (i.e., the slice of interest). The rating
information from the other slices is exploited indirectly by
extending the feature vectors that are created for the target slice. Using
this method, the accuracy of the recommender is preserved, while
the training time is significantly reduced, since the number of
samples in the target slice is lower than in the original dataset.</p>
      <p>Figure 1 shows how the feature vectors in the ‘Slice and Train’
method are constructed. The feature vectors have a binary part,
which reflects the corresponding user and item of a rating, and a
real-valued auxiliary part, which is constructed by using the rating
information of auxiliary slices.</p>
      <p>To understand how the auxiliary features are built, assume that
the dataset is divided into m slices fS1; : : : ; Smg. Let us also
assume that the items that are rated by user u in slice Sj is represented
by sj (u). By extending the feature vector x(u; i) with auxiliary
features, we can represent it with the following sparse
representation:
x(u; i) = f(u; 1); (i; 1); z2(u); : : : ; zm(u)g</p>
      <p>| targe{tzslice } | auxilia{rzy slices }
where zj (u) is sparse representation of auxiliary features from slice
j and is defined as:</p>
      <p>zj (u) = f(l; ϕj (u; l)) : l 2 sj (u)g
where ϕj (u; l) is a normalization function that defines the value of
the auxiliary features. We define ϕj based on the ratings that user u
gave to items in slice j and normalize it based on the average value
of user ratings as follows:
ϕj (u; l) =
rj (u; l)
rmax
r(u)
rmin
where rj (u; l) indicates the rating of user u to item l in slice j, r(u)
indicates the average value of user ratings, and rmax and rmin
indicate the maximum and minimum possible values for rating items.</p>
    </sec>
    <sec id="sec-3">
      <title>DATASET AND EXPERIMENTS</title>
      <p>
        In this work we tested our method on two benchmark datasets
of MovieLens 1M dataset1 and Amazon reviews dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
Amazon dataset contains product ratings in four different groups of
items namely books, music CDs, DVDs and Video tapes. We use
these four groups as natural slices that exist in this dataset. For the
MovieLens dataset we build slices by clustering movies based on
their genres using a k-means clustering algorithm. Various numbers
of clusters (i.e., slices) can be made for this dataset. Via exploratory
experiments we found that two or three slices perform well.
      </p>
      <p>
        In order to test our approach, the data in target slices are divided
into 75% training and 25% test data. For every experiment 10%
of training data is only used as validation data to tune the
hyperparameters. Factorization Machines can be trained using three
different learning methods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: Stochastic Gradient Decent (SGD),
Alternating Least Square (ALS) and Markov Chain Monte Carlo
(MCMC). In this paper we only report the results using MCMC
learning method due to space limitation and since it usually
perform better than the other two methods.
      </p>
      <p>
        The experiments are implemented using WrapRec [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and LibFM
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] open source toolkits. Furthermore, we compare the
performance of FM with Matrix Factorization (MF) to ensure our
FMbased solution is competitive with other state-of-the-art methods.
We used an implementation of MF in MyMediaLite2 toolkit.
      </p>
      <p>Table 1 presents the performance of our sliced training method
compared with the situation that the no slicing is done. The
experiments are evaluated using RMSE and MAE evaluation metrics.
The reported results are the averaged metrics over all slices when
each of them is considered as the target slice. The four different
setups that are compared are the followings:</p>
      <p>FM-ALL: FM applied on the complete dataset.</p>
      <p>MF-ALL: MF applied on the complete dataset.</p>
      <p>FM-SLICE: FM applied on independent slices. No auxiliary
features are used for this setup.
(1)
(2)
(3)
FM-SLICE-AUX: FM applied to slices with feature vectors
extended by auxiliary features derived form auxiliary slices.</p>
      <p>The first two lines of Table 1 report results when all the data are
used. The effectiveness of FM as our model is confirmed when we
compare FM-ALL baseline with MF-ALL, the
Matrix-Factorizationbased method. Comparing the remaining lines yields to some
interesting insights. Not surprisingly, when we train the model only on
a target slice without using any auxiliary information (FM-SLICE)
the performance of the model drops. This drop can be attributed
to the smaller number of samples that is being used when training.
However, when the auxiliary feature are exploited with the ‘Slice
and Train’ method (FM-SLICE-AUX) the performance becomes
even better than the situation where the complete data is being used
(FM-ALL). We attribute this improvement to the items in the slices
being more homogeneous than the dataset as a whole. However,
relying on homogeneity alone does not lead to the best performance
(FM-SLICE). Instead the best performance is achieved when
slicing is combined with auxiliary features. Furthermore, the
timecomplexity of training is significantly reduced when the model is
trained on a target slice. In our setup the training time of one slice
including auxiliary features compared to the training time of
complete dataset falls from 7431 ms to 1605 ms for Amazon dataset
and from 14569 ms to 6574 ms for the MovieLens dataset.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>In this paper, we presented a brief overview of the ‘Slice and
Train’ method, which trains a recommender model on a sensible
subset of items (target slice) using Factorization Machines, and
exploits other information indirectly. This sort of targeted training
yields improvements in both performance and complexity.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This work is supported by funding from EU FP7 project under grant
agreements no. 610594 (CrowdRec)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Adamic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Huberman</surname>
          </string-name>
          .
          <article-title>The dynamics of viral marketing</article-title>
          .
          <source>ACM Trans. Web</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Loni</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Said</surname>
          </string-name>
          . Wraprec:
          <article-title>An easy extension of recommender system libraries</article-title>
          .
          <source>In Proceedings of 8th ACM International Conference of Recommender Systems</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Loni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          .
          <article-title>Cross-domain collaborative filtering with factorization machines</article-title>
          .
          <source>In Proceedings of the 36th European Conference on Information Retrieval, ECIR '14</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          .
          <article-title>Factorization machines with libfm</article-title>
          .
          <source>ACM Trans. Intell. Syst. Technol.</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ), May
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gantner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Freudenthaler</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          .
          <article-title>Fast context-aware recommendations with factorization machines</article-title>
          .
          <source>In Proceedings of the 34th ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>