<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An LSTM-Based Dynamic Customer Model for Fashion Recommendation∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Heinz</string-name>
          <email>sebastian.heinz@zalando.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bracher</string-name>
          <email>christian.bracher@zalando.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roland Vollgraf</string-name>
          <email>roland.vollgraf@zalando.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Zalando Research</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Online fashion sales present a challenging use case for personalized recommendation: Stores ofer a huge variety of items in multiple sizes. Small stocks, high return rates, seasonality, and changing trends cause continuous turnover of articles for sale on all time scales. Customers tend to shop rarely, but often buy multiple items at once. We report on backtest experiments with sales data of 100k frequent shoppers at Zalando, Europe's leading online fashion platform. To model changing customer and store environments, our recommendation method employs a pair of neural networks: To overcome the cold start problem, a feedforward network generates article embeddings in “fashion space,” which serve as input to a recurrent neural network that predicts a style vector in this space for each client, based on their past purchase sequence. We compare our results with a static collaborative filtering approach, and a popularity ranking baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems; Content
analysis and feature selection; • Human-centered computing →
Collaborative filtering ; • Computing methodologies → Neural
networks;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>The recommendation task in the setting of online fashion sales
presents unique challenges. Consumer tastes and body shapes are
idiosyncratic, so a huge selection of items in diferent sizes must be
kept on ofer. On a typical day, Zalando, Europe’s leading online
fashion platform with ∼20M active customers, ofers ∼200k product
choices for sale. Being physical goods rather than digital
information, fashion articles must be stocked in warehouses; as most of
∗Copyright©2017 for this paper by its authors. Copying permitted for private and
academic purposes.
them are rarely ordered, items are generally available in small,
fluctuating numbers. In addition, shoppers commonly return articles.
The result is a rapid turnover of the inventory, with many items
going in and out of stock daily. Superimposed on short-scale
variations, there are periodic alterations associated with the seasonal
cycle, and secular changes caused by fashion trends. Regarding
consumer behavior, a noteworthy diference to e.g. streaming media
services is their propensity to buy rarely (a few sales annually), but
then multiple items at once. Hence, their purchase histories are
sparse, only partially ordered sequences.</p>
      <p>
        We previously introduced a recommendation algorithm for
fashion items that combines article images, tags, and other catalog
information with customer response, tethering curated content to
collaborative filtering by minimizing the cross-entropy loss of a
deep neural network for the sales record across a large selection of
customers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Like logistic matrix factorization methods [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ], our
technique yields low-dimensional embeddings for articles (“Fashion
DNA”) and customers (“style vectors”), but has the advantage to
circumvent the cold-start problem that plagues collaborative
methods by injecting catalog information for newly added articles. Our
model proves capable of recognizing individual style preferences
from a modest number of purchases; as cumulative sales events
extend over a multi-year period, however, it creates only a static
style “fingerprint” of a customer.
      </p>
      <p>
        In this contribution, we start from the static model, but extend it
by including time-of-sale information. To contend with the
evervarying article stock, we use the static model to generate Fashion
DNA from curated article data, and employ it as a fixed item
descriptor. This allows us to focus on the temporal sequence of sales events
for individual customers, which we feed into a neural network to
estimate their style vectors. As these are updated with every purchase,
the approach models the evolution of our customers’ tastes, and we
may employ the style vectors at a given date to create a
personalized preference ranking of the articles then in store, in a way fully
analogous to the static model. Recurrent neural networks (RNN) are
specifically designed to handle sequential data (see Chapter 10 in
Ref. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for an overview). Our network, introduced in Section 2,
employs long short-term memory (LSTM) cells [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to learn temporal
correlations between sales. As the model shares network weights
between customers, it has comparatively few parameters, and easily
scales to millions of clients during inference.
      </p>
      <p>
        Recently, evaluations have appeared in the literature [
        <xref ref-type="bibr" rid="ref10 ref2 ref8">2, 8, 10</xref>
        ]
that indicate superiority of RNN-based recommender systems on
standard data sets (LastFM, Netflix) over static models. Comparing
the dynamic customer style model with predictions from the static
counterpart [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and a baseline model build on global customer
preferences, we confirm that fashion recommendation benefits
from temporal information (Section 3). However, we also find that
peculiarities innate to the fashion context, like the prevalence of
partially ordered purchase sequences and the variability of in-store
content, are prone to impact recommendation quality; care must
be taken in designing RNN architecture, training, and evaluation
schemes to accommodate them. Further avenues for research are
discussed in Section 4.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>A DYNAMIC RECOMMENDER SYSTEM</title>
      <p>We now lay out the elements of our proposed model – the data used
for training and validation, the static network learning the article
embeddings (Fashion DNA), the recurrent network responsible for
predicting the customer response, and the training scheme.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Data overview</title>
      <p>This study is based on article and sales data from Zalando’s online
fashion store, collected from its start in 2008, up to a cutof date of
July 1, 2015. The data set contains information about ∼1M fashion
items and millions of individual sales events (excluding customer
returns). Merchandise is characterized by a thumbnail image of each
item (size 108×156), categorical data (brand, color, gender, etc.) that
has been rolled out into ∼7k one-hot encoded “tags,” and as
numerical data, the logarithm of the manufacturer-suggested retail price,
and, for garments only, the fabric composition across ∼50 fibers
as percentages. Each sales record contains a unique, anonymized
customer ID, the article bought (disregarding size information), and
the time of sale, with one minute granularity. Customer data is
limited to sales; in particular, article ratings were not available.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Fashion DNA</title>
      <p>
        Our first task is to encode the properties of the articles in a dense
numerical representation. As the curated data has multiple formats
and carries diverse information, a natural vehicle for this
transformation is a deep neural network that learns suitable combinations
of features on its own. We discussed such a model at length in an
earlier paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and we will only give an overview here.
      </p>
      <p>
        The representation of an article ν , its “Fashion DNA” vector
fν , is obtained as the activation in a low-dimensional “bottleneck”
layer near the top of the network. At its base, the network receives
the catalog information as its input: RGB image data is first
processed with a pretrained residual neural network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] whose output
is concatenated with the categorical and numerical article data and
further transformed with a stack of fully connected layers,
resulting in Fashion DNA. As we are ultimately interested in customer
preferences, it is sensible to train the model on the sales record:
Disregarding the timestamp information, we arrange the sales
information for a large number of frequent customers (∼100k) into
a sparse binary purchase matrix Π whose elements Πν k ∈ {0, 1}
indicate whether customer k has bought item ν . The network is
then trained to minimize the average cross-entropy loss per article
over these customers. In efect, the network learns both an optimal
representation of the article fν across the customer base, and a
logistic regression from Fashion DNA to the sales record for each
customer k, with weight vectors sk and bias βk that encode their
Θ
article data
      </p>
      <p>DNN
fDNA
fν
sk
customer
tc
u
d
o
r
p
r
a
l
sa
c
d
i
o
m
sg
i</p>
      <p>pνk
forecast
sso
l
y
p
o
r
te
n
sso
rc
Πνk
purchase
style preferences and purchase propensity, respectively. The model
architecture is sketched in Figure 1.</p>
      <p>
        The result is a low-rank logistic factorization of the purchase
matrix akin to collaborative filtering [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ],
Πν k ≈ pν k = σ ( fν · sk + βk ) ,
(1)
(where σ (·) denotes the logistic function), except that the Fashion
DNA fν is now clamped to the catalog data via the encoding neural
network. This is a decisive advantage for our setting where we
are faced with a continuously changing inventory of goods, as the
Fashion DNA for new articles is obtained from their curated data
by a simple forward pass through the neural network.
      </p>
      <p>
        Ranking the purchase probabilities pν k in Eq. (1) naturally
induces recommendations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a model we use for comparison in
Section 3.2. We emphasize that the lack of time of sale
information enforces static customer styles. Hence, to invoke dynamically
evolving customer tastes, we have to modify the style vectors sk .
2.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>LSTM network for purchase sequences</title>
      <p>Fashion DNA provides a compact encoding of all available content
information of an item, and largely solves the cold-start problem for
new articles entering the store. For these reasons, we use the static
model Fashion DNA as article representations in the dynamic model.
We also want to preserve the association between customer-item
afinity, and the scalar product of Fashion DNA and customer style,
akin to Eq. (1). Hence, we make our model dynamic by allowing
the customer style to change over time t . To distinguish between
static and dynamic customer styles, we denote the latter dk (t ).</p>
      <p>
        While we could add time as a dimension to the static model,
and attempt to factorize the resulting three-dimensional purchase
data tensor (as is done, for example, in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), we chose to follow a
diferent approach featuring LSTM cells. We also reverse the role
of articles and customers: While our implementation of the static
model used batches of articles as input, and learned the response
of all customers simultaneously, the input to the LSTM network
is customer based. Batches now contain Fashion DNA sequences
of the form ( fk,1, . . . , fk, Nk ), representing the purchase history
νk,1, . . . , νk, Nk of customer k. When customers buy multiple items
at once, the purchase sequence is ambiguous. To prevent the LSTM
from interpreting these non-sequential parts as time series, we
put purchases with the same time stamp in random order. Beyond
the order sequence, the absolute time of purchases tk,1, . . . , tk, Nk
carries important context information for our problem. For example,
the model may use temporal data to infer the in-store availability
of an article, and the season. We thus additionally supply the time
stamp of each purchase to the network.
      </p>
      <p>A single pass of the LSTM network processing customer
purchase histories is illustrated in Figure 2. For a fixed customer k and
purchase number i, the LSTM takes as input the concatenation of
the time stamp tk,i−1 and Fashion DNA fk,i−1 of the previous
purchase, and the time stamp tk,i of the current purchase. In addition,
the LSTM accesses the content of its own memory, mk,i−1, which
stores information on the purchase history of customer k it has seen
so far. The output of the LSTM is projected by a fully connected
layer which results in the current customer style dk,i . Note that
the first purchase of the sequence ( i = 1) is treated specially: Since
there is no previous purchase, we flush fk,0, tk,0, and mk,0 with
zero entries. Consequently, the customer style dk,1 just depends on
the time stamp tk,1 and favors the most popular items at that time.
2.4</p>
    </sec>
    <sec id="sec-7">
      <title>Training scheme</title>
      <p>For recommendation, we aim to predict customer style vectors
dk,i that maximize the afinity fk,i ·dk,i to the next-bought article,
while minimizing the afinity to all other items in store at that
time. Because it is expensive to compute the customer afinities for
every article, we only pick a small sample of “negative” examples
among the articles not bought. We denote their corresponding
Fashion DNA vectors by f˜k,i,1, . . . , f˜k,i,n . The number of negative
examples n &gt; 0 is a hyperparameter of the model.</p>
      <p>
        We tested three choices of loss functions for training the network,
sigmoid cross-entropy loss Lσ (as in the static model), softmax loss
Lsmax, and sigmoid-rank loss Lrank [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and varied the number n
of negative examples. The loss functions are given by:
Lσ = − log σ fk,i · dk,i − Pn log σ −f˜k,i, j · dk,i ,
j=1
Lsmax = − log .* exp(fkn,i ·dk,i ) +/ ,
. exp(fk,i ·dk,i )+ P exp f˜k,i, j ·dk,i /
      </p>
      <p>j=1
Lrank = n1 j P=n1,σ f˜k,i, j · dk,i − fk,i · dk,i .
(2)
Only Lsmax permits a probabilistic interpretation of the dynamical
model (when n reaches the number of all available articles).</p>
      <p>The minimization landscape for Lσ and Lsmax depends on the
number of negative examples, as their contribution to the loss
increases with n. Our experiments show that recommendation quality
improves when we use more negative examples. Yet, no significant
additional benefit is observed when n exceeds 50. In contrast, n
has no efect on the minimization landscape for the sigmoid-rank
loss. Still, for larger n fewer training epochs are needed to adjust
the network parameters. We find that n = 20 is a good tradeof
between faster convergence of the weights, and the computational
costs caused by using more negative examples.</p>
      <p>A subtle yet important aspect of the recommendation problem
is that we try to predict items in the next order of the customer,
rather than inferring articles within a single order. As items that are
bought together tend to be related (consider, e.g., a swimwear top
and bottom), an LSTM network trained on full purchase sequences
quickly focuses on multiple orders and overfits. To circumvent the
tk,i−1</p>
      <p>tk,i
fk,i−1
problem, we let only the first article in the purchase sequence
contribute to the loss when a multiple order is encountered. (Because
purchases with the same time stamp are always shufled before
feeding, the LSTM receives a variety of article sequences during
training.)
Here, dk (t ) is the dynamic style vector emitted by the LSTM
network after feeding all sales to customer k that occurred before the
time t (with randomly assigned sequence for items purchased
together); for the final sale, we replace the time stamp of the next
purchase by the evaluation time t . We note that ipν,k (t ), unlike
pν k (1), cannot be interpreted as a likelihood of sale.
3</p>
    </sec>
    <sec id="sec-8">
      <title>COMPARISON OF MODELS</title>
      <p>To evaluate our dynamic customer model, we assembled sales data
from the online fashion store for an eight day period immediately
following training, July 1–8, 2015. We identified customers with
orders during this test interval, representing ∼105 individual sales,
among ∼190k items that were available for purchase in at least one
size, for at least one day in this period. For comparison, we score
also the static recommendation model (Section 2.2), and a simple
empirical baseline that disregards customer specifics.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Empirical baseline</title>
      <p>Fashion articles in the Zalando catalog vary greatly in popularity,
with few articles representing most of the sales. This skewed
distribution enables a simple, non-personalized baseline recommender
that projects the recent popularity of items into the future. In detail,
we accumulated article sales for the week immediately preceding
the evaluation interval (June 23–30, 2015), and defined a popularity
score for each article by their sales count if they were still available
after July 1. For those articles (re-)entering inventory during the
evaluation period, we assigned the average number of sales among
all articles as a preliminary score. The empirical baseline model
then ranks the articles by descending popularity score.
3.2</p>
    </sec>
    <sec id="sec-10">
      <title>Static Fashion DNA model</title>
      <p>
        The Fashion DNA network (Section 2.2) provides the basis for a
more sophisticated, personalized recommender system, based on
the customer static style vectors sk and the predicted probability of
purchase pν k (1), as detailed in Ref. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Indeed, pν k proves to be an
unbiased estimate for the probability of purchase over the lifetime of
customer and article. These assumptions are not met here, because
the evaluation interval is outside the training period, and lasts only
eight days. Still, we may assume that the inner products fν · sk
underlying Eq. (1) are a measure of the afinity of an individual
customer k to the in-store items {ν }(t ) during the time of evaluation,
and sort them by decreasing value to create a static article ranking.
3.3
      </p>
    </sec>
    <sec id="sec-11">
      <title>Dynamic recommender system</title>
      <p>
        For the dynamic customer model, we rank the in-store articles for
each customer k according to their intent-of-purchase ipν,k (tk ), see
(3), evaluated at the time of first sale tk during the evaluation period.
We experimented with the three loss models detailed in Section 2.4,
and found comparable results for the sigmoid cross-entropy loss
Lσ and sigmoid-rank loss Lrank, while the softmax loss Lsmax
performed significantly worse. The following results are based on
a pretrained 128-float Fashion DNA and an LSTM implementation
with 256 cells, sigmoid-rank loss and n = 20 negative examples.
Note that 1 − Lrank provides a smooth approximation for the area
under the ROC curve [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], used for model evaluation below.
3.4
      </p>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>To compare model performance, we compile recommendation
rankings of the z ≈ 190k items in store for each customer (for the
baseline, the ranking is shared among customers), and identify the
positions rν k of the articles {ν }(k ) purchased by customer k during
evaluation. We then determine the cumulative distribution of ranks:
Rj = X X H (j − rν k ) . (4)
k</p>
      <p>ν ∈ {ν }(k )
H (·) denotes the Heaviside function. The normalized cumulative
rank Rj /Rz interpolates among customers and serves as a
collective receiver operating characteristic (ROC) of the recommender
schemes (Figure 3). The inset displays a double-logarithmic detail
of the origin region, representing high-quality recommendations.</p>
      <p>Table 1 lists the area under the curves (AUC) as a global
performance measure, together with quantiles of the distributions
Rj . We find that our dynamic model outperforms the static model
throughout, and both models are superior to the baseline popularity
model, except for the leading ∼10 recommendations, representing
less than 0.5% of the purchases (inset in Figure 3). The table also
lists the number of model parameters. Weights are shared among
customers for the LSTM network, but not the static model, resulting
in reduction of complexity by orders of magnitude.</p>
      <p>More than 3% of the purchased articles from the test interval have
not been sold before and, hence, were completely ignored during
training. For those new articles, the cold start problem applies and
the AUC of the baseline, static, and dynamic model decreases to
64.4%, 83.3%, and 87.7%, respectively. In comparison to the numbers
1.0
0.9
displayed in Table 1, the baseline shows a drastic performance drop
as would also be expected from any other recommender system
solely based on collaborative filtering. Static and dynamic model,
however, circumvent this problem thanks to Fashion DNA.
4</p>
    </sec>
    <sec id="sec-13">
      <title>OUTLOOK</title>
      <p>We find that a personalized recommendation model, based on a
recurrent network, outperforms a static customer model in the
fashion context. By encoding temporal awareness into the LSTM
memory of the network, the dynamic model can infer the
seasonality of items, and also record when certain articles are trending—a
distinct advantage over the static model, which is limited to learning
only long-term customer style preferences.</p>
      <p>An important element currently missing in the recommendation
model is short-term customer intent. In the fashion setting, goods
for sale belong to varied classes (clothes, shoes, accessories, etc.),
and shoppers, irrespective of their style profile, often have a
particular category in mind during a session. These implicit interests
strongly influence item preference, but due to their transient nature,
are hard to infer from the purchase record. Complementary data
sources like search queries, or the sequence of items viewed online,
will pick up the relevant signals instead. Models that successfully
integrate long-term style evolution and short-term customer intent
promise to greatly enhance recommendation quality and relevance,
and we plan to investigate them in future studies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bracher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heinz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          . Fashion DNA:
          <article-title>Merging content and sales data for recommendation and article mapping</article-title>
          .
          <source>In Workshop Machine learning meets fashion, KDD</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Devooght</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Bersini</surname>
          </string-name>
          . Long and
          <article-title>Short-Term Recommendations with Recurrent Neural Networks</article-title>
          .
          <source>Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization</source>
          (
          <year>2017</year>
          ), pp.
          <fpage>13</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <article-title>Deep learning</article-title>
          . MIT Press (Cambridge, Mass., USA),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>CoRR abs/1512</source>
          .03385 (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Herschtal</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Raskutti</surname>
          </string-name>
          .
          <article-title>Optimising area under the ROC curve using gradient descent</article-title>
          .
          <source>ICML: Conference Proceedings</source>
          (
          <year>2004</year>
          ), pp.
          <fpage>49</fpage>
          -
          <lpage>.</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>9</volume>
          (
          <issue>1997</issue>
          ), p.
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <article-title>Logistic matrix factorization for implicit feedback data</article-title>
          .
          <source>In NIPS Workshop on Distributed Matrix Computations</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.-J.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Maystre</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. Grossglauser.</surname>
          </string-name>
          <article-title>Collaborative recurrent neural networks for dynamic recommender systems</article-title>
          .
          <source>JMLR: Workshop and Conference Proceedings</source>
          <volume>63</volume>
          (
          <year>2016</year>
          ), p.
          <fpage>366</fpage>
          -
          <lpage>381</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          .
          <source>IEEE Computer</source>
          <volume>42</volume>
          (
          <year>2009</year>
          ), p.
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. Yeung.</surname>
          </string-name>
          <article-title>Collaborative recurrent autoencoder: recommend while learning to fill in the blanks</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          <volume>29</volume>
          (
          <year>2016</year>
          ), pp.
          <fpage>415</fpage>
          -
          <lpage>423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-K. Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
            and
            <given-names>J. G.</given-names>
          </string-name>
          <string-name>
            <surname>Carbonell</surname>
          </string-name>
          .
          <article-title>Temporal collaborative filtering with Bayesian probabilistic tensor factorization</article-title>
          .
          <source>Proceedings of the 2010 SIAM International Conference on Data Mining</source>
          (
          <year>2010</year>
          ), pp.
          <fpage>211</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dodier</surname>
          </string-name>
          , M. C. Mozer, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolniewicz</surname>
          </string-name>
          .
          <article-title>Optimizing classifier performance via approximation to the Wilcoxon-Mann-Witney statistic</article-title>
          .
          <source>ICML: Conference Proceedings</source>
          (
          <year>2003</year>
          ), pp.
          <fpage>848</fpage>
          -
          <lpage>855</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>