<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable Bayesian Matrix Factorization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Avijit Saha??</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rishabh Misra??</string-name>
          <email>rishabhmisra1994@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Balaraman Ravindran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of CSE, Indian Institute of Technology Madras</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of CSE, Thapar University</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>43</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>Matrix factorization (MF) is the simplest and most well studied factor based model and has been applied successfully in several domains. One of the standard ways to solve MF is by nding maximum a posteriori estimate of the model parameters, which is equivalent to minimizing the regularized objective function. Stochastic gradient descent (SGD) is a common choice to minimize the regularized objective function. However, SGD su ers from the problem of over tting and entails tedious job of nding the learning rate and regularization parameters. A fully Bayesian treatment of MF avoids these problems. However, the existing Bayesian matrix factorization method based on the Markov chain Monte Carlo (MCMC) technique has cubic time complexity with respect to the target rank, which makes it less scalable. In this paper, we propose the Scalable Bayesian Matrix Factorization (SBMF), which is a MCMC Gibbs sampling algorithm for MF and has linear time complexity with respect to the target rank and linear space complexity with respect to the number of non-zero observations. Also, we show through extensive experiments on three su ciently large real word datasets that SBMF incurs only a small loss in the performance and takes much less time as compared to the baseline method for higher latent dimension.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender Systems</kwd>
        <kwd>Matrix Factorization</kwd>
        <kwd>Bayesian Inference</kwd>
        <kwd>Markov Chain Monte Carlo</kwd>
        <kwd>Scalability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Factor based models have been used extensively in collaborative ltering. In a
factor based model, preferences of each user are represented by a latent factor
vector. Matrix factorization (MF) [1{6] is the simplest and most well studied
factor based model and has been applied successfully in several domains.
Formally, MF recovers a low-rank latent structure of a matrix by approximating it
as a product of two low-rank matrices. For delineation, consider a user-movie
matrix R 2 RI J where the rij cell represents the rating provided to the jth
movie by the ith user. MF decomposes the matrix R into two low-rank matrices
U = [u1; u2; :::; uI ]T 2 RI K and V = [v1; v2; :::; vJ ]T 2 RJ K (K is the latent
space dimension) such that:</p>
      <p>R</p>
      <p>U V T :</p>
      <p>
        Probabilistic Matrix Factorization (PMF) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a probabilistic
interpretation for MF. In PMF, latent factor vectors are assumed to be marginally
independent, whereas rating variables, given the latent factor vectors, are assumed
to be conditionally independent. PMF considers the conditional distribution of
the rating variables (the likelihood term) as:
p(RjU ; V ;
1) =
      </p>
      <p>N (rij juiT vj ;</p>
      <p>1);</p>
      <p>Y
where is the set of all observed entries in R provided during the training and
is the model precision. Zero-mean spherical Gaussian priors are placed on the
latent factor vectors of users and movies. The main drawback of this model is
that inferring the posterior distribution over the latent factor vectors, given the
ratings, is intractable. PMF handles this intractability by providing a maximum
a posteriori estimation of the model parameters by maximizing the log-posterior
over the model parameters, which is equivalent to minimizing the regularized
square error loss de ned as:
(1)
(2)</p>
      <p>
        X
where is the regularization parameter and jjXjj2F is the Frobenius norm of
X. The optimization problem in Eq. (3) can be solved using stochastic gradient
descent (SGD) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. SGD is an online algorithm which obviates the need to store
the entire dataset in the memory. Although SGD is scalable and enjoys local
convergence guarantee [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], it often over ts the data and requires manual tuning
of the learning rate and regularization parameters. Hence, maximum a posteriori
estimation of MF su ers from the problem of over tting and entails tedious job of
nding the learning rate (if SGD is the choice of optimization) and regularization
parameters.
      </p>
      <p>
        On the other hand, fully Bayesian methods [5, 8{10] for MF do not require
manual tuning of the learning rate and regularization parameters and are
robust to over tting. As direct evaluation of posterior is intractable in practice,
approximate inference techniques are adopted to learn the posterior
distribution. One of the possible choices of approximate inference is to apply variational
approximate inference technique [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Bayesian MF based on the variational
approximation [11{13, 10] considers a simpli ed factorized distribution and
assumes that the latent factors of users are independent of the latent factors of
items while approximating the posterior. But this assumption often leads to
over simpli cation and can produce inaccurate results as shown in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. On the
other hand, Markov chain Monte Carlo (MCMC) based approximation method
can produce exact results when provided with in nite resources. MCMC based
Bayesian Probabilistic Matrix Factorization (BPMF) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] directly approximates
the posterior distribution using the Gibbs sampling technique and outperforms
the variational based approximation.
      </p>
      <p>In BPMF, user/item latent factor vectors are assumed to follow a
multivariate Gaussian distribution, which results cubic time complexity with respect
to the latent factor vector dimension. Though BPMF performs well in many
applications, this cubic time complexity makes it di cult to apply BPMF on
very large datasets. In this paper, we propose the Scalable Bayesian Matrix
Factorization (SBMF) based on the MCMC Gibbs sampling, where we assume
univariate Gaussian priors on each dimension of the latent factor. Due to this
assumption, the complexity of SBMF reduces to linear with respect to the latent
factor vector dimension. We also consider user and item bias terms in SBMF
which are missing in BPMF. These bias terms capture the variation in rating
values that are independent of any user-item interaction. Also, the proposed
SBMF algorithm is parallelized for multicore environments. We show through
extensive experiments on three large scale real world datasets that the adopted
univariate approximation in SBMF results in only a small performance loss and
provides signi cant speed up when compared with the baseline method BPMF
for higher values of latent dimension.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <sec id="sec-2-1">
        <title>Model</title>
        <p>
          + i + j + uiT vj + ij ;
(4)
where (i; j) 2 , is the global bias, i is the bias associated with the ith
user, j is the bias associated with the jth item, ui is the latent factor vector of
dimension K associated with the ith user, and vj is the latent factor vector of
dimension K associated with the jth item. Uncertainty in the model is absorbed
by the noise ij which is generated as ij N (0; 1), where is the precision
parameter. Bias terms are particularly helpful in capturing the individual bias
for user/item: a user may have the tendency to rate all the items higher than
the other users or an item may get higher ratings if it is perceived better than
the others [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The conditional on the observed entries of R (the likelihood term) can be
written as follows:
p(Rj ) =</p>
        <p>Y</p>
        <p>N (rij j + i + j + uiT vj ;
where = f ; ; f ig; f j g; U ; V g. We place independent univariate priors on
all the model parameters in as follows:
(5)
(6)
(7)
(8)
(9)
(10)
(11)
H =
(12)
(13)
(14)
(15)
(16)
p( ) =N ( j g; g 1);
p( i) =N ( ij
p( j ) =N ( j j</p>
        <p>I K
p(U ) = Y Y
i=1 k=1</p>
        <p>J K
p(V ) = Y Y
We denote fa0; b0; g; g; 0; 0; 0; 0g as 0 for notational convenience. The
joint distribution of the observations and the hidden variables can be written as:
p(R; ;</p>
        <p>I J
H j 0) = p(Rj )p( ) Y p( i) Y p( j )p(U )p(V )p(
i=1 j=1
;
)
p( ;</p>
        <p>K
) Y p( uk ; uk )p( vk ; vk ):
k=1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Inference</title>
        <p>
          Since evaluation of the joint distribution in Eq. (16) is intractable, we adopt a
Gibbs sampling based approximate inference technique. As all our model
parameters are conditionally conjugate [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], equations for Gibbs sampling can be
written in closed form using the joint distribution as given in Eq. (16).
Replacing Eq. (5)-(15) in Eq. (16), the sampling distribution of uik can be written as
follows:
p(uikj )
        </p>
        <p>N (uikj ;
) ;
where,</p>
        <p>0
=
0</p>
        <p>X
j2 i
0
uilvjlAAA :
(17)
(18)
(19)
(20)
Here, i is the set of items rated by the ith user in the training set. Now,
directly sampling uik from Eq. (17) requires O(Kj ij) complexity. However if
we precompute a quantity eij = rij ( + i + j + uiT vj ) for all (i; j) 2 and
write Eq. (19) as:
=
0</p>
        <p>1
X vjk (eij + uikvjk)A ;
j2 i
then the sampling complexity of uik reduces to O(j ij). Table 1 shows the space
and time complexities of SBMF and BPMF. We sample model parameters in
parallel whenever they are independent to each other. Algorithm 1 describes the
detailed Gibbs sampling procedure.
Algorithm 1 Scalable Bayesian Marix Factorization (SBMF)
Require: 0, initialize and H .</p>
        <p>Ensure: Compute eij for all (i; j) 2
1: for t = 1 to T do
2: // Sample hyperparameters
= 0 + 12 (I + 1),
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>In this section, we show empirical results on three large real world movie-rating
datasets3;4 to validate the e ectiveness of SBMF. The details of these datasets
are provided in Table 2. Both the Movielens datasets are publicly available and
90:10 split is used to create their train and test sets. For Net ix, the probe data
is used as the test set.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experimental Setup and Parameter Selection</title>
        <p>
          All the experiments are run on an Intel i5 machine with 16GB RAM. We have
considered the serial as well as the parallel implementation of SBMF for all the
experiments. In the parallel implementation, SBMF is parallelized in multicore
environment using OpenMP library. Although BPMF can also be parallelized,
the base paper [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and it's publicly available code provide only the serial
implementation. So in our experiments, we have compared only the serial
implementation of BPMF against the serial and the parallel implementations of SBMF.
Serial and parallel versions of the SBMF are denoted as SBMF-S and
SBMFP, respectively. Since the performance of both SBMF and BPMF depend on
the dimension of latent factor vector (K), it is necessary to investigate how the
models work with di erent values of K. So three sets of experiments are run for
each dataset corresponding to K = f50; 100; 200g for SBMF-S, SBMF-P, and
BPMF. As our main aim is to validate that SBMF is more scalable as compared
to BPMF under same conditions, we choose 50 burn-in iterations for all the
experiments of SBMF-S, SBMF-P, and BPMF. In Gibbs sampling process burn-in
refers to the practice of discarding an initial portion of a Markov chain sample,
so that the e ect of initial values on the posterior inference is minimized. Note
that, if SBMF takes less time than BPMF for a particular burn-in period, then
increasing the number of burn-in iterations will make SBMF more scalable as
compared to BPMF. Additionally, we allow the methods to have 100 collection
iterations.
        </p>
        <p>
          In SBMF, we initialize parameters in using a Gaussian distribution with 0
mean and 0:01 variance. All the parameters in H are set to 0. Also, a0; b0; 0; 0,
and 0 are set to 1, 0 and g are set to 0, and g is initialized to 0.01. In
BPMF, we use standard parameter setting as provided in the paper [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We
collect samples of user and item latent factor vectors and bias terms from the
collection iterations and approximate a rating rij as:
r^itj =
        </p>
        <p>
          C
1 X
C
c=1
c +
ic + jc + uic:vjc ;
(21)
3 http://grouplens.org/datasets/movielens/
4 http://www.net ixprize.com/
where uic and vjc are the cth drawn samples of the ith user and the jth item latent
factor vectors respectively, c; ic, and jc are the cth drawn samples of the global
bias, the ith user bias, and the jth item bias, respectively. C is the number of
drawn samples. Then the Root Mean Square Error (RMSE) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is used as the
evaluation metric for all the experiments. The code for SBMF will be publicly
available 5.
In all the graphs of Fig. 2 , X-axis represents the time elapsed since the starting
of experiment and Y-axis presents the RMSE value. Since we allow 50 burn-in
iterations for all the experiments and each iteration of BPMF takes more time
than SBMF-P, collection iterations of SBMF-P begin earlier than BPMF. Thus
we get the initial RMSE value of SBMF-P earlier. Similarly, each iteration of
SBMF-S takes less time as compared to BPMF (except for K = f50; 100g in the
Net ix dataset). We believe that in Net ix dataset (for K = f50; 100g), BPMF
takes less time than SBMF-S because BPMF is implemented in Matlab where
matrix computations are e cient. On the other hand, SBMF is implemented in
C++ where the matrix storage is unoptimized. As the Net ix data is large with
respect to the number of entries and the number of users and items, number
of matrix operations are more in it as compared to the other datasets. So for
lower values of K, the cost of matrix operations for SBMF-S dominates the
cost incurred due to O(K3) complexity of BPMF. Thus BPMF takes less time
than SBMF-S. However, with large values of K, BPMF starts taking more time
as the O(K3) complexity of BPMF becomes dominating. We leave the task of
optimizing the code of SBMF as future work.
        </p>
        <p>
          We can observe from the Fig. 2 that SBMF-P takes much less time in all
the experiments than BPMF and incurs only a small loss in the performance.
Similarly, SBMF-S also takes less time than the BPMF (except for K = f50; 100g
in Net ix dataset) and incurs only a small performance loss. Important point
to note is that total time di erence between both of the variants of SBMF and
BPMF increases with the dimension of latent factor vector and the speedup is
signi cantly high for K = 200. Table 3 shows the nal RMSE values and the total
time taken correspond to each dataset and K. We nd that the RMSE values
for SBMF-S and SBMF-P are very close for all the experiments. We also observe
that increasing the latent space dimension reduces the RMSE value in the Net ix
5 https://github.com/avijit1990, https://github.com/rishabhmisra
dataset. With high latent dimension, the running time of BPMF is signi cantly
high due to its cubic time complexity with respect to the latent space dimension
and it takes approximately 150 hours on Net ix dataset with K = 200. However,
SBMF has linear time complexity with respect to the latent space dimension and
SBMF-P and SBMF-S take only 35 and 90 hours (approximately) respectively on
the Net ix dataset with K = 200. Thus SBMF is more suited for large datasets
with large latent space dimension. Similar speed up patterns are found on the
other datasets also.
MF [1{6] is widely used in several domains because of performance and
scalability. Stochastic gradient descent [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is the simplest method to solve MF but it often
su ers from the problem of over tting and requires manual tuning of the
learning rate and regularization parameters. Thus many Bayesian methods [
          <xref ref-type="bibr" rid="ref11 ref13 ref5">5, 11, 13</xref>
          ]
have been developed for MF that automatically select all the model parameters
and avoid the problem of over tting. Variational Bayesian approximation based
MF [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] considers a simpli ed distribution to approximate the posterior. But
this method does not scale well on large datasets. Consequently, scalable
variational Bayesian methods [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ] have been proposed to scale to large datasets.
However variational approximation based Bayesian method might give
inaccurate results [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] because of its over simplistic assumptions. Thus, Gibbs sampling
based MF [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] has been proposed which gives better performance than the
variational Bayesian MF counter part.
        </p>
        <p>
          Since performance of MF depends on the latent dimensionality, several
nonparametric MF methods [14{16] have been proposed that set the number of
latent factors automatically. Non-negative matrix factorization (NMF) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a
variant of MF, which recovers two low rank matrices, each of which is
nonnegative. Bayesian NMF [
          <xref ref-type="bibr" rid="ref17 ref6">6, 17</xref>
          ] considers Poisson likelihood and di erent type
of priors and generates a family of MF model based on the prior imposed on the
latent factor. Also, in real world the preferences of user changes over time. To
incorporate this dynamics into the model, several dynamic MF models [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]
have been developed.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>We have proposed the Scalable Bayesian Matrix Factorization (SBMF), which
is a Markov chain Monte Carlo based Gibbs sampling algorithm for matrix
factorization and has linear time complexity with respect to the target rank and
linear space complexity with respect to the number of non-zero observations.
SBMF gives competitive performance in less time as compared to the baseline
method. Experiments on several real world datasets show the e ectiveness of
SBMF. In future, it would be interesting to extend this method in applications
like matrix factorization with side-information, where the time complexity is
cubic with respect to the number of features (which can be very large in practice).
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This project was supported by Ericsson India.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>N.</given-names>
            <surname>Srebro</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          , \
          <article-title>Weighted low-rank approximations,"</article-title>
          <source>in Proc. of ICML</source>
          , pp.
          <volume>720</volume>
          {
          <issue>727</issue>
          , AAAI Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          , \
          <article-title>Matrix factorization techniques for recommender systems,"</article-title>
          <source>Computer</source>
          , vol.
          <volume>42</volume>
          , pp.
          <volume>30</volume>
          {
          <issue>37</issue>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Seung</surname>
          </string-name>
          , \
          <article-title>Algorithms for non-negative matrix factorization,"</article-title>
          <source>in Proc. of NIPS</source>
          , pp.
          <volume>556</volume>
          {
          <issue>562</issue>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mnih</surname>
          </string-name>
          , \
          <article-title>Probabilistic matrix factorization,"</article-title>
          <source>in Proc. of NIPS</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mnih</surname>
          </string-name>
          , \
          <article-title>Bayesian probabilistic matrix factorization using markov chain monte carlo,"</article-title>
          <source>in Proc. of ICML</source>
          , pp.
          <volume>880</volume>
          {
          <issue>887</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Gopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          , \
          <article-title>Scalable recommendation with Poisson factorization," CoRR, vol</article-title>
          .
          <source>abs/1311.1704</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Sato</surname>
          </string-name>
          , \
          <article-title>Online model selection based on the variational Bayes,"</article-title>
          <source>Neural Computation</source>
          , vol.
          <volume>13</volume>
          , no.
          <issue>7</issue>
          , pp.
          <volume>1649</volume>
          {
          <issue>1681</issue>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>M. J. Beal</surname>
          </string-name>
          , \
          <article-title>Variational algorithms for approximate Bayesian inference,"</article-title>
          <source>in PhD. Thesis</source>
          , Gatsby Computational Neuroscience Unit, University College London.,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tzikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Likas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Galatsanos</surname>
          </string-name>
          , \
          <article-title>The variational approximation for Bayesian inference,"</article-title>
          <source>IEEE Signal Processing Magazine</source>
          , vol.
          <volume>25</volume>
          , pp.
          <volume>131</volume>
          {
          <issue>146</issue>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Ho man</surname>
            , D. Blei,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Paisley</surname>
          </string-name>
          , \
          <article-title>Stochastic variational inference,"</article-title>
          <source>JMLR</source>
          , vol.
          <volume>14</volume>
          , pp.
          <volume>1303</volume>
          {
          <issue>1347</issue>
          , may
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lim</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Teh</surname>
          </string-name>
          , \
          <article-title>Variational bayesian approach to movie rating prediction,"</article-title>
          <source>in Proc. of KDDCup</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J.</given-names>
            <surname>Silva</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Carin</surname>
          </string-name>
          , \
          <article-title>Active learning for online bayesian matrix factorization,"</article-title>
          <source>in Proc. of KDD</source>
          , pp.
          <volume>325</volume>
          {
          <issue>333</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Choi</surname>
          </string-name>
          , \
          <article-title>Scalable variational Bayesian matrix factorization with side information,"</article-title>
          <source>in Proc. of AISTATS</source>
          , pp.
          <volume>493</volume>
          {
          <issue>502</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>M. Zhou</surname>
            and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Carin</surname>
          </string-name>
          , \
          <article-title>Negative binomial process count and mixture modeling,"</article-title>
          <source>IEEE Trans. Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>99</volume>
          , no.
          <source>PrePrints</source>
          , p.
          <fpage>1</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hannah</surname>
            ,
            <given-names>D. B.</given-names>
          </string-name>
          <string-name>
            <surname>Dunson</surname>
          </string-name>
          , and L. Carin, \
          <article-title>Beta-negative binomial process and poisson factor analysis,"</article-title>
          <source>in Proc. of AISTATS</source>
          , pp.
          <volume>1462</volume>
          {
          <issue>1471</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , \
          <article-title>Nonparametric max-margin matrix factorization for collaborative prediction,"</article-title>
          <source>in Proc. of NIPS</source>
          , pp.
          <volume>64</volume>
          {
          <issue>72</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>P.</given-names>
            <surname>Gopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ranganath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          , \
          <article-title>Bayesian nonparametric Poisson factorization for Recommendation Systems,"</article-title>
          <source>in Proc. of AISTATS</source>
          , pp.
          <volume>275</volume>
          {
          <issue>283</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Y. Koren, \
          <article-title>Factorization meets the neighborhood: A multifaceted collaborative ltering model,"</article-title>
          <source>in Proc. of KDD</source>
          , pp.
          <volume>426</volume>
          {
          <issue>434</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. L.
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T. K.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          , and J. G. Carbonell, \
          <article-title>Temporal collaborative ltering with Bayesian probabilistic tensor factorization.,"</article-title>
          <source>in Proc. of SDM</source>
          , pp.
          <volume>211</volume>
          {
          <issue>222</issue>
          ,
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>