<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Sequential Modeling for Recommendation (DISCUSSION PAPER)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Manco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ettore Ritacco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noveen Sachdeva</string-name>
          <email>noveen.sachdeva@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Guarascio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICAR-CNR</institution>
          ,
          <addr-line>Via P. Bucci, 8/9c, Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>International Institute of Information Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a model which extends variational autoencoders by exploiting the rich information present in the past preference history. We introduce a recurrent version of the VAE, where instead of passing a subset of the whole history regardless of temporal dependencies, we rather pass the consumption sequence subset through a recurrent neural network. At each time-step of the RNN, the sequence is fed through a series of fully-connected layers, the output of which models the probability distribution of the most likely future preferences. We show that handling temporal information is crucial for improving the accuracy of recommendation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Learning of accurate models able to suggest interesting items to the user is an
important problem for many scienti c and industrial applications. To address
this problem,</p>
      <p>
        collaborative ltering approaches to recommendation were extensively
investigated by the current literature. Among these, latent variable models [
        <xref ref-type="bibr" rid="ref14 ref17 ref5 ref6">5, 14, 17,
6</xref>
        ] gained substantial attention, due to their capabilities in modeling the
hidden causal relationships that in uence user preferences. Recently, however, new
approaches based on neural architectures were proposed, achieving competitive
performance with respect to the current state of the art. Also, new paradigms
based on the combination of deep learning and latent variable modeling [
        <xref ref-type="bibr" rid="ref12 ref7">7, 12</xref>
        ]
were proven quite successful in domains such as computer vision and speech
processing. However, their adoption for modeling user preferences is still unexplored,
although recently it is starting to gain attention [
        <xref ref-type="bibr" rid="ref8 ref9">9, 8</xref>
        ].
      </p>
      <p>
        The aforementioned approaches rely on the \bag-of-word" assumption: when
considering a user and her preferences, the order of such preferences can be
neglected and all preferences are exchangeable. This assumption works with general
Copyright c 2019 for the individual papers by the papers' authors. Copying
permitted for private and academic purposes. This volume is published and copyrighted
by its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
user trends which re ect a long-term behavior. However, it fails to capture the
short-term preferences that are speci c of several application scenarios,
especially in the context of the Web. Sequential data can express causalities and
dependencies that require ad-hoc modeling and algorithms. And in fact, e orts
to capture this notion of causality have been made, both in the context of latent
variable modeling [
        <xref ref-type="bibr" rid="ref1 ref11 ref3">11, 1, 3</xref>
        ] and deep learning [
        <xref ref-type="bibr" rid="ref15 ref18 ref2 ref4">4, 2, 18, 15</xref>
        ].
      </p>
      <p>
        In this paper, we consider the task of sequence recommendation from the
perspective of combining deep learning and latent variable modeling. Inspired by
the approach in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we assume that at a given timestamp the choice of a given
item is in uenced by a latent factor that models user trends and preferences.
However, the latent factor itself can be in uenced by user history and modeled
to capture both long-term preferences and short-term behavior.
      </p>
      <p>Our contribution can be summarized as follows: (i ) we extend the
framework to the case of sequential recommendation, where user's preferences exhibit
temporal dependencies; (ii ) we evaluate the proposed framework on standard
benchmark datasets, by showing that (a) approaches not considering
temporal dynamics are not totally adequate to model user preferences, and (b) the
combination of latent variable and temporal dependency modeling produces a
substantial gain, even with regard to other approaches that only focus on
temporal dependencies through recurrent relationships.</p>
      <p>The paper is organized as follows. Sections 2 proposes the modeling of user
preferences in a variational setting, by describing how the framework can be
adapted to the case of temporally ordered dependencies. The e ectiveness of the
proposed modeling is illustrated in section 3, and pointers to future developments
are discussed in section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Variational Autoencoders for User Preferences</title>
      <p>
        The reader is referred to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for more details of the approach illustrated here. We
shall use the following shared notation: u 2 U = f1; : : : ; M g indexes a user and
i 2 I = f1; : : : ; N g indexes an item for which the user can express a preference.
We model implicit feedback, thus assuming a preference matrix X 2 f0; 1gN M ,
so that x u represents the (binary) row with all the item preferences for user u.
Given x u, we de ne Iu = fi 2 Ijxu;i = 1g (with Nu = jIuj). Analogously,
Ui = jfu 2 U jxu;i = 1gj and Mi = jUij.
      </p>
      <p>We also consider a precedence and temporal relationships within X . First of
all, the preference matrix induces a natural ordering relationship between items:
i u j has the meaning that xu;i &gt; xu;j in the rating matrix. Also, we assume the
existence of timing information T 2 IR+M N [f;g, where the term tu;i represents
the time when i was chosen by u (with tu;i = ; if xu;i = 0). Then, i &lt;u j denotes
that tu;i &lt; tu;j , With an abuse of notation, we also introduce a temporal mark
in the elements of x u: the term x u(t) (with 1 t Nu) represents the t-th
item in Iu in the sorting induced by &lt;u, whereas x u(1:t) represents the sequence
x u(1); : : : ; x u(t).</p>
      <p>
        The reference model is the Multinomial variational autoencoder (MVAE)
proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Within this framework, for a given user u it is possible to devise
x u according to the generative setting:
z u
      </p>
      <p>N (0 ; I K ) ;
(z u)</p>
      <p>exp ff (z u)g ;
x u</p>
      <p>Multi (Nu; (z u)) :
(2.1)
The underlying \decoder" is modeled by log P (x ujz u) = P xu;i log i(z u) ;
i
thus enabling a complete speci cation of the overall variational framework.
Prediction for new items is accomplished by resorting to the learned functions f
and Q : given a user history x , we compute z = (x ) and then devise the
probabilities for the whole item set through (z ). Unseen items can then be
ranked according to their associated probabilities.</p>
      <p>Ideally, latent variable modeling should be able to express temporal dynamics
and hence causalities and dependencies among preferences in a user's history. In
the following we elaborate on this intuition. Within a probabilistic framework,
we can model temporal dependencies by conditioning each event to the previous
events: given a sequence x (1:T ), we have</p>
      <p>T 1
Y
t=0
P x (1:T ) =</p>
      <p>P x (t+1)jx (1:t) :
This speci cation suggests two key aspects:
{ There is a recurrent relationship between x (t+1) and x (1:t) devised by P x (t+1)jx (1:t) ,
upon which the modeling can take advantage, and
{ each time-step can be handled separately, and in particular it can be modeled
through a conditional VAE.</p>
      <p>Let us consider the following (simple) generative model
z u(t)</p>
      <p>N (0 ; I K ) ;
z u(t)
exp f
z u(t)</p>
      <p>;
x u(t)</p>
      <sec id="sec-2-1">
        <title>Multi 1;</title>
        <p>z u(t)
;
which results in the joint likelihood
Here, we can approximate the posterior P (z u(1:T )jx u(1:T )) with the factorized
proposal distribution</p>
        <p>Q (z u(1:T )jx (1:T )) =</p>
        <p>Y q (z u(t)jx (1:t 1)) ;
t
where q (z u(t)jx (1:t 1)) is a gaussian distribution whose parameters (t) and
(t) depend upon the current history x u(1:t 1), by means of a recurrent layer
h t:
(t);
(t) = ' (h t)
h t = RNN
h t 1; x u(t 1) :
(2.2)
(2.3)
The resulting loss function can be formalized as follow:</p>
        <p>Nu (
L( ; ; X ) = X X</p>
        <p>The proposal distribution introduces a dependency of the latent variable from
a recurrent layer, which allows to recover the information from the previous
history. We call this model SVAE.</p>
        <p>Notably, the prediction step can be easily accomplished in a similar way as for
MVAE: given a user history x u(1:t 1), we can resort to eq. 2.3 and set z = (t),
upon which we can devise the probability for the x u(t) by means of (z ).</p>
        <p>The generative model of eq. 2.2 only focuses on the next item in the sequence.
The base model however is exible enough to extend its focus on the next k items,
regardless of the time: x u(t:t+k 1) Multi k; z u(t) . Again, the resulting
joint likelihood can be modeled in di erent ways. The simplest way consists in
considering x as a time-ordered multi-set,
Alternatively, we can consider the probability of an item as a mixture relative
to all the time-steps where it is considered:
where P (x u(t)jz u(j)) is the probability of observing x u(t) according to (z u(j)).
In both cases, the variational approximation is modeled exactly as shown above,
and the only di erence lies in the second component of the loss function, which
has to be adapted according to the above equations.</p>
        <p>
          There is an interesting analogy between eq. 2.5 and the attention
mechanism [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In fact, it can be noticed in the equation that the prediction of
x u(t) depends on the latent status of the k previous steps in the sequence. In
practice, this enables to capture short-term dependencies and to encapsulate
them in the same probabilistic framework by weighting the likelihood based on
z u(t k+1); : : : ; z u(t).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>We evaluate SVAE on some benchmark datasets, by comparing with various
baselines and the current state-of-the-art competitors, in order to assess its
capabilities in modeling preference data. Additionally, we provide a sensitivity
analysis relative to the con gurations/contour conditions upon which SVAE is
better suited.</p>
      <p>We evaluate our model along with the competitors on two popular publicly
available datasets, namely Movielens-1M and Net ix. Movielens-1M is a time
series dataset containing user-item ratings pairs along with the corresponding
timestamp. Since we work on implicit feedback, we binarize the data, by
considering only the user-item pairs where the rating provided by the user was strictly
greater than 3 on a range 1:5. Net ix has the same data format as Movielens-1M
and the same technique is used to binarize the ratings. We use a subset of the
full dataset that matches the user-distribution with the full dataset. The subset
is built by stratifying users according to their history length, and then sampling
a subset of size inversely proportional to the size of the strata.</p>
      <p>Since we are considering implicit preferences, the evaluation is done on top-n
recommendation, and it relies on the following metrics.</p>
      <p>Normalized Discounted Cumulative Gain. Also abbreviated as NDCG@n, the
metric gives more weight to the relevance of items on top of the recommender
list and is de ned as
Precision. By de ning Hits = Pi ri as the number of items occurring in the
recommendation list that were actually preferred by the user, we have</p>
      <sec id="sec-3-1">
        <title>Hits@n</title>
        <p>Precision@n =
n
Recall. de ned as the percentage of items actually preferred by the user that
were present in the recommendation list:
jRj
In our experiments, we use the above metrics with two values of n, respectively
10 and 100. The evaluation protocol works as follows. We partition users into
training, validation and test set. The model is trained using the full histories
of the users in the training set. During evaluation, for each user in the
validation/test set we split the time-sorted user history into two parts, fold-in and
fold-out splits. The fold-in split is used as a basis to learn the necessary
representations and provide a recommendation list which is then evaluated with the
fold-out split of the user history using the metrics de ned above. We believe that
this strategy is more robust when compared to other methodologies wherein the
same user can be in both the training as well as testing sets.</p>
        <p>
          It is worth noticing that Liang et. al [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] follow a similar strategy but
they do not consider any sorting for user histories. That is, for the
validation/test users, the fold-in set doesn't precede the fold-out with respect to time.
        </p>
        <p>
          We compare our model with current state-of-the-art models including
recently published neural architectures and we now present a brief summary about
our competitors to provide a better understanding of these models.
{ POP is a simple baseline where the most popular items are recommended.
{ BPR is a state of the art model based on Matrix Factorization, which ranks
items di erently for each user [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. There is a subtle issue concerning BPR:
by separating users on training/validation/test as discussed above, the latent
representation of users in the validation/test is not meaningful. Then, BPR is
only capable of providing meaningful predictions for users that were already
exploited in the training phase. To solve this, we extended the training set
to include the partial history in fold-in for each user in the validation/test.
        </p>
        <p>
          The evaluation takes place on their corresponding fold-out sets.
{ FPMC [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a model which clubs both Matrix Factorization and Markov
Chains together using personalized transition graphs over underlying Markov
chains.
{ CASER [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] is a convolutional model that uses vertical and horizontal
convolutional layers to embed a sequence of recent items thereby learning
sequential patterns for next-item recommendation. The authors have shown
that this model outperforms other approaches based on recurrent neural
network modeling, such as GRU4Rec. We use the implementation provided by
the authors and tune the network by keeping the number of horizontal lters
to be 16, and vertical lters to be 4.
{ MVAE from which the SVAE model draws heavily. We use the
implementation provided by the authors, with the default hyperparameter settings.
        </p>
        <p>The experiments consider the basic SVAE model and the extensions to the
next-k prediction, as discussed in the previous section. The model is trained
end-to-end on the full histories of the training users. Model hyperparameters
are set using the evaluation metrics obtained on validation users. The SVAE
architecture includes an embedding layer of size 256, a recurrent layer realized
as a GRU with 200 cells, and two encoding layers (of size 150 and 64) and nally
two decoding layers (again, of size 64 and 150). We set the number K of latent
factors for the variational autoencoder to be 64. Adam was used to optimize the
loss function coupled with a weight decay of 0:01. As for RVAE, the architecture
includes user/item embedding layers (of size 128), two encoding layers (size 100
and 64), and a nal layer that produces the score f (z u;i). Both SVAE and
RVAE were implemented in PyTorch and trained on a single GTX 1080Ti GPU.
The source code is available on GitHub.</p>
        <p>In a rst set of experiments, we compare SVAE with all competitors
described above. Table 1 shows the results of the comparison. SVAE outperforms
the competitors on both datasets with a signi cant gain on all the metrics. It
is important here to highlight how the temporal fold-in/fold-out split is crucial
for a fair evaluation of the predictive capabilities: MVAE was evaluated both
on temporal and random split, exhibiting totally di erent performances. Our
interpretation is that, with random splits, the prediction for an item is easier
if the encoding phase is aware of forthcoming items in the same user history.
This severely a ects the performance and overrates the predictive capabilities
https://github.com/noveens/svae_cf.
of a model: the accuracy of MVAE drops substantially when a temporal split is
considered.</p>
        <p>By contrast, SVAE is trained to capture the actual temporal dependencies
and ultimately results in better predictive accuracy. This is also shown in g. 1,
where we show that SVAE outperforms the competitors irrespective of the size
of fold-in. The only exception is with very short sequences (less than 10 items),
where MVAE gets better results with respect to the sequential models. It is also
worth noticing how the performance of both sequential models tend to degrade
with increasing sequences, but SVAE maintains its advantage over CASER.</p>
        <p>We discussed in the previous section how the basic SVAE framework can be
extended to focus on predicting the next k items, rather then just the next item.
We analyse this capability in g. 2, where the accuracy for di erent values of k
is considered according to the modeling in 2.4. On Movielens, the best value is
achieved for k = 4, and acceptable values range within the interval 1 10.</p>
        <p>Model</p>
        <p>
          Combining the representation power of latent spaces, provided by variational
autoencoders, with the sequence modeling capabilities of recurrent neural
networks is an e ective strategy to sequence recommendation. To prove this, we
[
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18">11-20</xref>
          ]
[21-50]
[50-10H0]istoryleng[t1h01-150] [151-200] [201-300]
        </p>
        <p>[301-]</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work</title>
      <p>devised SVAE, a simple yet robust mathematical framework capable of
modeling temporal dynamics upon di erent perspectives, within the fundamentals of
variational autoencoders. The experimental evaluation highlights the capability
of SVAE to consistently outperform state-of-the-art models.</p>
      <p>The framework proposed here is worth further extensions that we plan to
accomplish in a future work. In particular, from an architectural point of view,
the attention mechanism, outlined in section 2.5, requires a better
understanding and a more detailed analysis of its possible impact in view of the recent
developments in the literature.</p>
      <p>However, di erent architectures (e.g. based on convolution or translation
invariance are worth being investigated within a probabilistic variational setting.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritacco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carnuccio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevacqua</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Probabilistic topic models for sequence data</article-title>
          .
          <source>Machine Learning</source>
          pp.
          <volume>1</volume>
          {
          <issue>25</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Devooght</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bersini</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Long and short-term recommendations with recurrent neural networks</article-title>
          .
          <source>In: UMAP '17</source>
          . pp.
          <volume>13</volume>
          {
          <issue>21</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McAuley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Translation-based recommendation</article-title>
          .
          <source>In: RecSys '17</source>
          . pp.
          <volume>161</volume>
          {
          <issue>169</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hidasi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karatzoglou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baltrunas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tikk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Session-based recommendations with recurrent neural networks</article-title>
          .
          <source>In: ICLR '16</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Latent semantic models for collaborative ltering</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          <volume>22</volume>
          (
          <issue>1</issue>
          ),
          <volume>89</volume>
          {
          <fpage>115</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kabbur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karypis</surname>
          </string-name>
          , G.:
          <article-title>Fism: Factored item similarity models for top-n recommender systems</article-title>
          .
          <source>In: KDD '13</source>
          . pp.
          <volume>659</volume>
          {
          <issue>667</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Auto-encoding variational bayes</article-title>
          .
          <source>In: ICLR '14</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>She</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Collaborative variational autoencoder for recommender systems</article-title>
          .
          <source>In: KK '17</source>
          . pp.
          <volume>305</volume>
          {
          <issue>314</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          ,
          <article-title>Ho man</article-title>
          , M.,
          <string-name>
            <surname>Jebara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Variational autoencoders for collaborative ltering</article-title>
          .
          <source>In: WWW '18</source>
          . pp.
          <volume>689</volume>
          {
          <issue>698</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenthaler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gantner</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Bpr: Bayesian personalized ranking from implicit feedback</article-title>
          .
          <source>In: UAI '09</source>
          . pp.
          <volume>452</volume>
          {
          <issue>461</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenthaler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Factorizing personalized markov chains for next-basket recommendation</article-title>
          .
          <source>In: WWW '10</source>
          . pp.
          <volume>811</volume>
          {
          <issue>820</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rezende</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohamed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Stochastic backpropagation and approximate inference in deep generative models</article-title>
          .
          <source>In: ICML '14</source>
          . pp.
          <volume>1278</volume>
          {
          <issue>1286</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sachdeva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritacco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pudi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Sequential variational autoencoders for collaborative ltering</article-title>
          . pp.
          <volume>600</volume>
          {
          <fpage>608</fpage>
          . WSDM '
          <volume>19</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Probabilistic matrix factorization</article-title>
          .
          <source>In: NIPS '08</source>
          . pp.
          <volume>1257</volume>
          {
          <issue>1264</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Personalized top-n sequential recommendation via convolutional sequence embedding</article-title>
          .
          <source>In: WSDM '18</source>
          . pp.
          <volume>565</volume>
          {
          <issue>573</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>5998</volume>
          {
          <issue>6008</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Collaborative topic modeling for recommending scienti c articles</article-title>
          .
          <source>In: KDD '11</source>
          . pp.
          <volume>448</volume>
          {
          <issue>456</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beutel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jing</surname>
          </string-name>
          , H.:
          <article-title>Recurrent recommender networks</article-title>
          .
          <source>In: WSDM '17</source>
          . pp.
          <volume>495</volume>
          {
          <issue>503</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>