RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                           22


          A Simple Deep Personalized Recommendation
                            System
                      Pavlos Mitsoulis-Ntompos∗                                                        Travis Brady
                          Meisam Hejazinia∗                                                     tbrady@expediagroup.com
                             Serena Zhang∗                                                      Vrbo, part of Expedia Group
                     pntompos@expediagroup.com
                        mnia@expediagroup.com
                     shuazhang@expediagroup.com
                       Vrbo, part of Expedia Group

ABSTRACT                                                                        sparsity issue. In addition, the context of each trip might be
Recommender systems are critical tools to match listings and                    different for travelers within and across different seasons
travelers in two-sided vacation rental marketplaces. Such                       and destinations (e.g. winter trip to mountains with friends,
systems require high capacity to extract user preferences for                   summer trip to the beach with family, etc.). Moreover, such
items from implicit signals at scale. To learn those prefer-                    a personalized recommender system should always be avail-
ences, we propose a Simple Deep Personalized Recommen-                          able and trained based on the most relevant data, allowing
dation System to compute travelers’ conditional embeddings.                     quick test-and-learn iterations, adapting to ever changing
Our method combines listing embeddings in a supervised                          requirements of business. This personalized recommender
structure to build short-term historical context to personalize                 system should suggest handful relevant listings to the mil-
recommendations for travelers. Deployed in the production                       lions of travelers visiting site pages (e.g. home page, landing
environment, this approach is computationally efficient and                     page, or listing detail page), travelers receiving targeted mar-
scalable, and allows us to capture non-linear dependencies.                     keting emails, or travelers faced cancelled bookings due to
Our offline evaluation indicates that traveler embeddings                       various reasons.
created using a Deep Average Network can improve the pre-                          To develop such a recommender system we need to ex-
cision of a downstream conversion prediction model by seven                     tract travelers’ preferences from implicit signals of their in-
percent, outperforming more complex benchmark methods                           teractions using machine learning or statistical-economics
for online shopping experience personalization.                                 models. Given the complexity and scale of this problem, we
                                                                                require high capacity models. While powerful, high-capacity
KEYWORDS                                                                        models frequently require prohibitive amounts of comput-
travel, recommender system, deep learning, embeddings, e-                       ing power and memory, particularly for big data problems.
commerce                                                                        Many approaches have been proposed to learn item embed-
                                                                                dings for recommender systems [3, 4, 14, 21], yet learning
1 INTRODUCTION                                                                  travelers’ preferences from those listing embeddings at scale
                                                                                is still an open problem. Indeed, such a solution needs to
Personalizing recommender systems is the cornerstone for                        capture traveler heterogeneity while being generic and ro-
two-sided marketplace platforms in the vacation rental sec-                     bust to cold start problems. We propose a modular solution
tor. Such a system needs to be scalable to serve millions                       that learns listings and traveler embeddings non-linearly
of travelers and listings. On one side, travelers show com-                     using a combination of shallow and deep networks. We used
plex non-linear behavior. For example, during a shopping                        down-funnel booking signals, in addition to implicit signals
cycle travelers might collect and weight different signals                      (such as listing-page-view), to validate our extracted traveler
based on their heterogeneous preferences across various                         embeddings. We deployed this system in the production en-
days, by searching either sequentially or simultaneously.                       vironment. We compared our model with three benchmark
Furthermore, the travelers might forget and revisit items in                    models, and found that adding these traveler features to the
their consideration set [5, 7]. On the other side, marketplace                  extant feature set in the already-existing Traveler Booking
platforms should match each of the travelers with the most                      Intent model can add significant marginal values. Our find-
personalized listing out of millions of heterogeneous listings.                 ing suggests that this simple approach can outperform LSTM
Many of these listings have never been viewed by any trav-                      models, which have significantly higher time complexity. In
eler or have only been recently onboarded, imposing data                        the next sections we review related work, explain our model,
∗ Equal contribution to this research.                                          review the results, and conclude.


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                      23


2 RELATED WORKS                                                                 of listings in the same session signals the similarity of those
Representation learning has been widely explored for large-                     listings. We use a shallow neural network with one hidden
scale session-based recommender systems (SBRS), [9, 12, 21],                    layer with lower dimension for this purpose. The training
among which collaborative filtering and content-based set-                      objective is to find the listing local representation that speci-
tings are most commonly used to generate user and item                          fies surrounding most similar manifold. More formally the
representations [9, 14, 18]. Recent works have addressed                        objective function can be specified by the log probability
the cold start and adaptability problems in factorization ma-                   maximization problem as follows:
chine and latent factor based approaches [11, 17, 22]. Other
                                                                                                     S
works have employed non-linear functions and neural mod-                                          1Õ Õ
                                                                                                                      log p(x i+j |x i )
els to learn the complex relationships and interactions over                                      S s=1 −c ≤j ≤c, j,0
users and items on e-commerce platforms [12, 22]. In par-
ticular, word2vec techniques with shallow neural networks                         where c is the window size representing listing context.
[16] from the Natural Language Processing (NLP) commu-                          The basic skip-gram formulation defines p(x i+j |x i ) using
nity have inspired authors to generate non-linear entity em-                    softmax function as follows:
beddings [9] using historical contextual information. State-
of-the-art methods have used attention neural networks to                                                           exp(ν xTi +j ν x i )
                                                                                                  p(x i+j |x i ) = ÍX             T
                                                                                                                    x =1 exp(ν x ν x i )
aggregate representations in order to focus on relevant in-
puts and select the most important portion of the context
                                                                                   where ν x and ν x i are input and output representation
[6]. Attention has been found effective in assigning weights
                                                                                vector or neural network weights, and X is the number of
to user-item interactions within the encoder-decoder and
                                                                                listings available on our platform. To simplify the task, we
Long Short Term Memory (LSTM) architectures and collab-
                                                                                used the sigmoid formula, which makes the model a binary
orative filtering framework, capturing both long and short
                                                                                classifier, with negative samples, which we draw randomly
term preferences [8, 12, 20]. Similar to the spirit of our work,
                                                                                from the list of all available listings on our platform. Formally,
recent studies suggested simple neural networks, showing                                                                               exp(ν xT   νxi )
promising results in terms of performance, computational                        we use the following formula: p(x i+j |x i ) = 1+exp(ν Ti +j ν               for
                                                                                                                                              x i +j x i )
efficiency and scalability [2, 10, 26].                                         positive samples, and the following formula for negative
                                                                                ones: p(x i+j |x i ) = 1+exp(ν1T ν ) .
                                                                                                              x i +j x i
                                                                                    We have two more issues to address, sparsity and hetero-
3 ARCHITECTURE AND MODEL
                                                                               geneity in views per item. It is not uncommon to observe long
In this section, we will describe our model, which is based                    tail distribution of views for the listings. For this purpose
on the session based local embedding model. Our model has                      we leverage approaches mentioned by [16] wherein espe-
two modular stages. In the first stage, we train a skip-gram                   cially frequent items are downsampled using the inverse
sequence model to capture a local embedding representa-                        square root of the frequency. Additionally, we removed list-
tion for each listing, we then extrapolate latent embeddings                   ings with very low frequency. To resolve the cold start issue,
for listings subject to the cold start problem. In the second                  we leverage the contextual information that relates desti-
stage, we train a Deep Average Network (DAN) stacked with                      nations (or search terms) to the listings based on the book-
decoder and encoder layers predicting purchase events to                       ing information. Formally, considering that the destinations
capture a given traveler’s embedding or latent preference                      d 1 , d 2 , ..., d D are driving pid1 , ..., pid D , proportion of the de-
for listings embedding. We also mention a couple of alter-                     mand for a given listing, we form the expectation of the latent
natives we evaluated for traveler embeddings. We denote                        representation for each location using νd = N1 lL=1 pld ν xl ,
                                                                                                                                             Í
each listing by x i , so each traveler session sk (t j ) is defined as         where N is the normalizing factor and L is the total number
a sequence like x 1 , x 2 , ... for traveler t j . We denote booking           of destinations. Then, given latitude and longitude of the
event conditional on listings recently viewed by the traveler                  cold listing (for which we have no data), we form the belief
with bk (t j |x j1 , x j2 , , .., x jt ). Our contribution in this paper is    about the proportion of demand driven from each of the
mainly the second stage which we validate using a down-                        search terms p jd1 , ..., p jd D . Then, we use our destination em-
stream shopping funnel signal.                                                 bedding from the previous step to find the expected listing
                                                                                                                                             ÍD
                                                                               embedding for the cold listing as follows ν x j = d=1              p jd νd .
Skip-gram Sequence Model
The skip-gram model [16] in our context attempts to predict                     Deep Average Network and Alternatives
listings x i surrounded by listings x i−c and x i+c viewed in a                 In the second stage, given the listing’s embedding from
traveler session sk , based on the premise that traveler’s view                 the previous stage we model traveler embeddings using a


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                               24


sandwiched encoder-decoder non-linear Relu function. In
contrast to relatively weak implicit view signals, in this
stage we leverage strong booking signals as a target vari-
able based on historical traveler listing interaction. We have
various choices for this purpose including Deep Average Net-
work with Auto-Encoder-Decoder, Long Short Term Memory
(LSTM), and Attention Networks. The simplest approach is
to take the point-wise average of the embedding vector and
use it directly in the model. The second approach could be to
feed the average embedding into a dimensionality expansion
and reduction non-linear encoder-decoder architecture, or
Deep Average Network to extract the signals [10]. The third
approach could incorporate LSTM network [13, 19], testing
the hypothesis that the traveler signals information that they
gathered by looking at different listings in the shopping fun-
nel. The fourth approach could have an attention layer on the
top of LSTM [25], hypothesizing that they allocate different
weights on various latent features before their booking.
   We take a probabilistic approach to model traveler book-
ing events P(Yj ) based on the embedding vectors of historical                  Figure 1: Deep Average Network (DAN) on the top of skip-
units they have interacted with ν j1 , , .., ν jt . Formally, given             gram network.
the traveler embeddings (or last layer of the traveler book-
ing prediction neural network f (ν j . )), the probability of the
booking is defined as:                                                          pragmatic stand point, for millions of listings and travelers
                                                                                DAN seems to be more appealing for deployment as depicted
           P(Yj |ν j1 , ν j1 , , .., ν jt ) = sigmoid(f (ν j . )) (1)           in Figure 1.
   where, the Deep Average Network layers and f are defined                        We use adaptive stochastic gradient descent method to
as:                                                                             train the binary cross entropy of these neural networks. The
                                                                                last question to answer is how are we planning to combine
                                                                                the traveler and listing embedding for personalized recom-
                  f (ν j . ) = relu(ω1 · h 2 (ν j . ) + β 1 )           (2)     mendations. This is a particularly challenging task as traveler
                 h 1 (ν j . ) = relu(ω2 · h 1 (ν j . ) + β 2 )          (3)     embeddings is non-linear projection of listings embedding
                                                 t                              with a different dimension. As a result, they are not in the
                                             1Õ
                 h 2 (ν j . ) = relu(ω 3 ·         ν ji ) + β 3 )       (4)     same space to compute cosine similarity. We have various
                                             k i=1
                                                                                choices for this solution, including approaches such as fac-
  Alternatively, we can use an LSTM network with forget,                        torization machine and svm with kernel that allow modeling
input, and output gates as follows:                                             higher level interactions at scale. We defer the study of this
                                                                                approach to our next study.
   f (ν jt ) = sigmoid(ω f [ht , ν jt ] + β f ) · f (ν jt.−1 )                  4 EXPERIMENTS AND RESULTS
    + sigmoid(ωi [ht , ν jt ] + βi ) · tanh(ωc [ht −1 , ν jt ] + βc )   (5)     In this section we describe the experimental setup, and the
  And finally, we can also use an attention network on the                      results obtained when comparing the accuracy uplift of our
top of LSTM network as follows:                                                 Deep Average Network based approach to various baselines
                                                                                on a downstream conversion prediction model. The Traveler
                 f (ν j ) = softmax(ωT · hT )tanh(hT )                  (6)     Booking Intent XGBoost model is such a downstream model.
where ω . , β . are weight and bias parameters to estimate and                  It is trained using LightGBM [15] and uses a rich set of
ht represents the hidden layer parameter or function to esti-                   hand-crafted historical session-based product interaction
mate.                                                                           features in order to predict the booking intent probability1 .
   Among these models, DAN is more consistent with Oc-                          In order to evaluate offline our proposed methodology, we
cam’s razor principle, so it is more parsimonious, and faster
to train. However, LSTM and Attention Networks on the top                       1 We call it booking intent as our model predicts booking request from

of it are more theoretically appealing. As a result, from the                   travelers, which needs a couple of steps to be confirmed as booking.


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                  25


concatenated the hand-crafted features with the traveler                               Table 1: Comparison between Model Settings
embeddings, generated by all different model settings.
  The three baseline methods that we compare against our                                                              Performance Metrics
proposed Deep Average Network on the top of Skip-Gram                           Algorithm                     AUC      Precision     Recall    F-Score
include the following:
                                                                                Random               0.973               0.821       0.633     0.715
  (1) Random: a heuristic rule that chooses a random list-                      Averaging Embeddings 0.971               0.816       0.628      0.71
      ing embedding, among those listings a traveler has                        LSTM + Attention     0.976               0.877        0.62     0.727
      previously interacted with, in the current session.                       DAN                  0.978               0.888       0.628     0.735
  (2) Averaging Embeddings: a simple point-wise aver-
      aging of listing embeddings a traveler has previously
      interacted with, in the current session.                                    Moreover, Table 2 shows the performance improvement
  (3) LSTM with Attention: A recurrent neural network,                         to the Traveler Booking Intent (TBI) model when the Deep
      inspired by [13, 19, 23], that uses LSTM units and an                    Average Network generated traveler embeddings are con-
      attention mechanism on top of it in order to combine                     catenated to the initial hand-crafted features.
      embeddings of listings a user has previously interacted
                                                                                           Table 2: Performance Uplift to TBI Model
      with, in the current session.

Datasets                                                                                                                 Performance Metrics

For the experiments, anonymized clickstream data is col-                        Settings                         AUC     Precision    Recall   F-Score
lected for millions of users from two different seven-day pe-                   Only Hand-Crafted Feat.          0.975     0.817      0.651       0.724
riods. Specifically, the click stream data includes user views                  Hand-Crafted + DAN Feat.         0.978     0.888      0.628       0.735
and clicks of listing detail page logs, search requests, re-
sponses, views and clicks logs, homepage views and landing                       We noticed that the Deep Average Network traveler em-
page logs, conversion events logs, per visitor and session. The                beddings have competitive predictive power compared to the
first click-stream dataset was used to generate embeddings                     hand-crafted ones in the downstream TBI model. Based on
using Deep Average Network and the LSTM with Attention.                        random re-sampling the dataset and re-running the pipeline,
The second click-stream dataset was used to evaluate the                       we find that our results are reproducible.
learned embeddings on the Traveler Booking Intent Model.
We split each of the data sets into train and test set by 70:30               5 CONCLUSION
proportion randomly, based on users. In other words, users                    We presented a method that combines deep and shallow neu-
that are in the train set are excluded from the test set, and                 ral networks to learn traveler and listing embeddings for a
vice versa.                                                                   large online two-sided vacation rental marketplace platform.
                                                                              We deployed this system in the production environment.
Results
                                                                              Our results show Deep Average Networks can outperform
We ran our training pipeline on both CPU and GPU pro-                         more complex neural networks in this context. There are
duction systems using Tensorflow [1]. We cleaned up the                       various avenues to extend our study. First, we plan to test
data using Apache Spark [24], and the input data to training                  attention network without LSTM. Second, we plan to infuse
pipeline had observations from millions of traveler sessions.                 other contextual information into our model. Third, we want
The training process for LSTM models typically took 3 full                    to build a scoring layer that combines traveler and listing
days of time, while training DAN took less than 8 hours on                    embeddings to personalize recommendations. Finally, we
CPU. Given that our recommender system needs to be iter-                      plan to evaluate numerous spatio-temporal features, repre-
ated fast for improvement and infer in real-time with high                    sentational learning approaches, and bidirectional recurrent
coverage, DAN model scales better. Moreover, we modified                      neural networks in our framework.
the cost function to give more weight to minority class (i.e.
positive booking intent) in order to combat the imbalanced                     6 ACKNOWLEDGMENTS
classes in the data sets.                                                      This project is a collaborative effort between the recommen-
   We evaluated the performance of the Traveler Booking                        dation, marketing data science and growth marketing teams.
Intent model on the different settings using the test data                     The authors would like to thank Ali Miraftab, Ravi Divvela,
set based on AUC, Precision, Recall and F1 scores. The best                    Chandri Krishnan and Wenjun Ke for their contribution to
results of each model are shown in Table 1. It shows that our                  this paper.
proposed Deep Average Network approach contributes more
uplift to the downstream Traveler Booking Intent model.


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                   26


REFERENCES                                                                       [13] Tobias Lang and Matthias Rettenmeier. 2017. Understanding consumer
 [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng                behavior with recurrent neural networks. In Workshop on Machine
     Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu           Learning Methods for Recommender Systems.
     Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey               [14] Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. 2016.
     Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,            Factorization meets the item embedding: Regularizing matrix factoriza-
     Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry                  tion with item co-occurrence. In Proceedings of the 10th ACM conference
     Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens,                 on recommender systems. ACM, 59–66.
     Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent          [15] Microsoft. 2019. LightGBM. https://lightgbm.readthedocs.io
     Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete            [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff
     Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang                  Dean. 2013. Distributed representations of words and phrases and
                                                                                      their compositionality. (2013), 3111–3119.
     Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Hetero-
                                                                                 [17] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix
     geneous Systems. http://tensorflow.org/
                                                                                      factorization. In Advances in neural information processing systems.
 [2] Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but
                                                                                      1257–1264.
     tough-to-beat baseline for sentence embeddings. (2016).
                                                                                 [18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie.
 [3] Veronika Bogina and Tsvi Kuflik. 2017. Incorporating Dwell Time in
                                                                                      2015. Autorec: Autoencoders meet collaborative filtering. In Proceed-
     Session-Based Recommendations with Recurrent Neural Networks..
                                                                                      ings of the 24th International Conference on World Wide Web. ACM,
     In RecTemp@ RecSys. 57–59.
                                                                                      111–112.
 [4] Hugo Caselles-Dupré, Florian Lesaint, and Jimena Royo-Letelier. 2018.
                                                                                 [19] Humphrey Sheil, Omer Rana, and Ronan G. Reilly. 2018. Predicting
     Word2vec applied to recommendation: Hyperparameters matter. In
                                                                                      Purchasing Intent: Automatic Feature Learning using Recurrent Neural
     Proceedings of the 12th ACM Conference on Recommender Systems. ACM,
                                                                                      Networks. CoRR abs/1807.08207 (2018).
     352–356.
                                                                                 [20] Chu Wang, Lei Tang, Shujun Bian, Da Zhang, Zuohua Zhang, and
 [5] Hector Chade, Jan Eeckhout, and Lones Smith. 2017. Sorting through
                                                                                      Yongning Wu. 2019. Reference Product Search. arXiv:arXiv:1904.05985
     search and matching models in economics. Journal of Economic Liter-
                                                                                 [21] Shoujin Wang, Longbing Cao, and Yan Wang. 2019. A Survey on
     ature 55, 2 (2017), 493–544.
                                                                                      Session-based Recommender Systems. arXiv preprint arXiv:1902.04864
 [6] Sneha Chaudhari, Gungor Polatkan, Rohan Ramanath, and Varun
                                                                                      (2019).
     Mithal. 2019. An Attentive Survey of Attention Models. arXiv preprint
                                                                                 [22] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tie-
     arXiv:1904.02874 (2019).
                                                                                      niu Tan. 2018. Session-based Recommendation with Graph Neural
 [7] Babur De los Santos, Ali Hortaçsu, and Matthijs R Wildenbeest. 2012.
                                                                                      Networks. arXiv preprint arXiv:1811.00855 (2018).
     Testing models of consumer search using data on web browsing and
                                                                                 [23] Yuan Xia, Jingbo Zhou, Jingjia Cao, Yanyan Li, Fei Gao, Kun Liu, Hais-
     purchasing behavior. American Economic Review 102, 6 (2012), 2955–
                                                                                      han Wu, and Hui Xiong. 2019. Intent-Aware Audience Targeting for
     80.
                                                                                      Ride-Hailing Service. In Machine Learning and Knowledge Discovery
 [8] Simen Eide and Ning Zhou. 2018. Deep neural network marketplace
                                                                                      in Databases, Ulf Brefeld, Edward Curry, Elizabeth Daly, Brian Mac-
     recommenders in online experiments. In Proceedings of the 12th ACM
                                                                                      Namee, Alice Marascu, Fabio Pinelli, Michele Berlingerio, and Neil
     Conference on Recommender Systems. ACM, 387–391.
                                                                                      Hurley (Eds.). Springer International Publishing, Cham, 136–151.
 [9] Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan
                                                                                 [24] Matei Zaharia, Reynold Xin, Patrick Wendell, Tathagata Das, Michael
     Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. 2015.
                                                                                      Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram
     E-commerce in your inbox: Product recommendations at scale. In Pro-
                                                                                      Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott
     ceedings of the 21th ACM SIGKDD International Conference on Knowl-
                                                                                      Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big
     edge Discovery and Data Mining. ACM, 1809–1818.
                                                                                      data processing. Commun. ACM 59 (2016), 56–65.
[10] Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal
                                                                                 [25] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei
     Daumé III. 2015. Deep unordered composition rivals syntactic methods
                                                                                      Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term
     for text classification. In Proceedings of the 53rd Annual Meeting of the
                                                                                      memory networks for relation classification. In Proceedings of the
     Association for Computational Linguistics and the 7th International Joint
                                                                                      54th Annual Meeting of the Association for Computational Linguistics
     Conference on Natural Language Processing (Volume 1: Long Papers),
                                                                                      (Volume 2: Short Papers), Vol. 2. 207–212.
     Vol. 1. 1681–1691.
                                                                                 [26] Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and
[11] Christopher C Johnson. 2014. Logistic matrix factorization for implicit
                                                                                      Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Sys-
     feedback data. Advances in Neural Information Processing Systems 27
                                                                                      tems. In Proceedings of the 24th ACM SIGKDD International Conference
     (2014).
                                                                                      on Knowledge Discovery & Data Mining. ACM, 1079–1088.
[12] Thom Lake, Sinead A Williamson, Alexander T Hawk, Christopher C
     Johnson, and Benjamin P Wing. 2019. Large-scale Collaborative Filter-
     ing with Product Embeddings. arXiv preprint arXiv:1901.04321 (2019).


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).