<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Simple Deep Personalized Recommendation System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavlos Mitsoulis-Ntompos∗</string-name>
          <email>pntompos@expediagroup.com mnia@expediagroup.com shuazhang@expediagroup.com Vrbo, part of Expedia Group</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Travis Brady</string-name>
          <email>tbrady@expediagroup.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Meisam Hejazinia∗</institution>
          ,
          <addr-line>Serena Zhang∗</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vrbo, part of Expedia Group</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Recommender systems are critical tools to match listings and travelers in two-sided vacation rental marketplaces. Such systems require high capacity to extract user preferences for items from implicit signals at scale. To learn those preferences, we propose a Simple Deep Personalized Recommendation System to compute travelers' conditional embeddings. Our method combines listing embeddings in a supervised structure to build short-term historical context to personalize recommendations for travelers. Deployed in the production environment, this approach is computationally eficient and scalable, and allows us to capture non-linear dependencies. Our ofline evaluation indicates that traveler embeddings created using a Deep Average Network can improve the precision of a downstream conversion prediction model by seven percent, outperforming more complex benchmark methods for online shopping experience personalization.</p>
      </abstract>
      <kwd-group>
        <kwd>travel</kwd>
        <kwd>recommender system</kwd>
        <kwd>deep learning</kwd>
        <kwd>embeddings</kwd>
        <kwd>ecommerce</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        Personalizing recommender systems is the cornerstone for
two-sided marketplace platforms in the vacation rental
sector. Such a system needs to be scalable to serve millions
of travelers and listings. On one side, travelers show
complex non-linear behavior. For example, during a shopping
cycle travelers might collect and weight diferent signals
based on their heterogeneous preferences across various
days, by searching either sequentially or simultaneously.
Furthermore, the travelers might forget and revisit items in
their consideration set [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ]. On the other side, marketplace
platforms should match each of the travelers with the most
personalized listing out of millions of heterogeneous listings.
Many of these listings have never been viewed by any
traveler or have only been recently onboarded, imposing data
∗Equal contribution to this research.
sparsity issue. In addition, the context of each trip might be
diferent for travelers within and across diferent seasons
and destinations (e.g. winter trip to mountains with friends,
summer trip to the beach with family, etc.). Moreover, such
a personalized recommender system should always be
available and trained based on the most relevant data, allowing
quick test-and-learn iterations, adapting to ever changing
requirements of business. This personalized recommender
system should suggest handful relevant listings to the
millions of travelers visiting site pages (e.g. home page, landing
page, or listing detail page), travelers receiving targeted
marketing emails, or travelers faced cancelled bookings due to
various reasons.
      </p>
      <p>
        To develop such a recommender system we need to
extract travelers’ preferences from implicit signals of their
interactions using machine learning or statistical-economics
models. Given the complexity and scale of this problem, we
require high capacity models. While powerful, high-capacity
models frequently require prohibitive amounts of
computing power and memory, particularly for big data problems.
Many approaches have been proposed to learn item
embeddings for recommender systems [
        <xref ref-type="bibr" rid="ref14 ref21 ref3 ref4">3, 4, 14, 21</xref>
        ], yet learning
travelers’ preferences from those listing embeddings at scale
is still an open problem. Indeed, such a solution needs to
capture traveler heterogeneity while being generic and
robust to cold start problems. We propose a modular solution
that learns listings and traveler embeddings non-linearly
using a combination of shallow and deep networks. We used
down-funnel booking signals, in addition to implicit signals
(such as listing-page-view), to validate our extracted traveler
embeddings. We deployed this system in the production
environment. We compared our model with three benchmark
models, and found that adding these traveler features to the
extant feature set in the already-existing Traveler Booking
Intent model can add significant marginal values. Our
finding suggests that this simple approach can outperform LSTM
models, which have significantly higher time complexity. In
the next sections we review related work, explain our model,
review the results, and conclude.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORKS</title>
      <p>
        Representation learning has been widely explored for
largescale session-based recommender systems (SBRS), [
        <xref ref-type="bibr" rid="ref12 ref21 ref9">9, 12, 21</xref>
        ],
among which collaborative filtering and content-based
settings are most commonly used to generate user and item
representations [
        <xref ref-type="bibr" rid="ref14 ref18 ref9">9, 14, 18</xref>
        ]. Recent works have addressed
the cold start and adaptability problems in factorization
machine and latent factor based approaches [
        <xref ref-type="bibr" rid="ref11 ref17 ref22">11, 17, 22</xref>
        ]. Other
works have employed non-linear functions and neural
models to learn the complex relationships and interactions over
users and items on e-commerce platforms [
        <xref ref-type="bibr" rid="ref12 ref22">12, 22</xref>
        ]. In
particular, word2vec techniques with shallow neural networks
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] from the Natural Language Processing (NLP)
community have inspired authors to generate non-linear entity
embeddings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] using historical contextual information.
Stateof-the-art methods have used attention neural networks to
aggregate representations in order to focus on relevant
inputs and select the most important portion of the context
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Attention has been found efective in assigning weights
to user-item interactions within the encoder-decoder and
Long Short Term Memory (LSTM) architectures and
collaborative filtering framework, capturing both long and short
term preferences [
        <xref ref-type="bibr" rid="ref12 ref20 ref8">8, 12, 20</xref>
        ]. Similar to the spirit of our work,
recent studies suggested simple neural networks, showing
promising results in terms of performance, computational
eficiency and scalability [
        <xref ref-type="bibr" rid="ref10 ref2 ref26">2, 10, 26</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>ARCHITECTURE AND MODEL</title>
      <p>In this section, we will describe our model, which is based
on the session based local embedding model. Our model has
two modular stages. In the first stage, we train a skip-gram
sequence model to capture a local embedding
representation for each listing, we then extrapolate latent embeddings
for listings subject to the cold start problem. In the second
stage, we train a Deep Average Network (DAN) stacked with
decoder and encoder layers predicting purchase events to
capture a given traveler’s embedding or latent preference
for listings embedding. We also mention a couple of
alternatives we evaluated for traveler embeddings. We denote
each listing by xi , so each traveler session sk (tj ) is defined as
a sequence like x1, x2, ... for traveler tj . We denote booking
event conditional on listings recently viewed by the traveler
with bk (tj |xj1, xj2, , .., xjt ). Our contribution in this paper is
mainly the second stage which we validate using a
downstream shopping funnel signal.</p>
    </sec>
    <sec id="sec-4">
      <title>Skip-gram Sequence Model</title>
      <p>
        The skip-gram model [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] in our context attempts to predict
listings xi surrounded by listings xi−c and xi+c viewed in a
traveler session sk , based on the premise that traveler’s view
of listings in the same session signals the similarity of those
listings. We use a shallow neural network with one hidden
layer with lower dimension for this purpose. The training
objective is to find the listing local representation that
speciifes surrounding most similar manifold. More formally the
objective function can be specified by the log probability
maximization problem as follows:
1 ÕS
S
      </p>
      <p>Õ
s=1 −c ≤j ≤c, j,0</p>
      <p>log p(xi+j |xi )
where c is the window size representing listing context.
The basic skip-gram formulation defines p(xi+j |xi ) using
softmax function as follows:
p(xi+j |xi ) = ÍX
x=1 exp(νxT νxi )
exp(νxTi+j νxi )
where νx and νxi are input and output representation
vector or neural network weights, and X is the number of
listings available on our platform. To simplify the task, we
used the sigmoid formula, which makes the model a binary
classifier, with negative samples, which we draw randomly
from the list of all available listings on our platform. Formally,
exp(νxTi+j νxi ) for
we use the following formula: p(xi+j |xi ) = 1+exp(νxTi+j νxi )
positive samples, and the following formula for negative
ones: p(xi+j |xi ) = 1+exp(ν1xTi+j νxi ) .</p>
      <p>
        We have two more issues to address, sparsity and
heterogeneity in views per item. It is not uncommon to observe long
tail distribution of views for the listings. For this purpose
we leverage approaches mentioned by [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] wherein
especially frequent items are downsampled using the inverse
square root of the frequency. Additionally, we removed
listings with very low frequency. To resolve the cold start issue,
we leverage the contextual information that relates
destinations (or search terms) to the listings based on the
booking information. Formally, considering that the destinations
d1, d2, ..., dD are driving pid1 , ..., pidD , proportion of the
demand for a given listing, we form the expectation of the latent
representation for each location using νd = N1 ÍL
l=1 pldνxl ,
where N is the normalizing factor and L is the total number
of destinations. Then, given latitude and longitude of the
cold listing (for which we have no data), we form the belief
about the proportion of demand driven from each of the
search terms pjd1 , ..., pjdD . Then, we use our destination
embedding from the previous step to find the expected listing
embedding for the cold listing as follows νxj = ÍD
d=1 pjdνd .
      </p>
    </sec>
    <sec id="sec-5">
      <title>Deep Average Network and Alternatives</title>
      <p>
        In the second stage, given the listing’s embedding from
the previous stage we model traveler embeddings using a
sandwiched encoder-decoder non-linear Relu function. In
contrast to relatively weak implicit view signals, in this
stage we leverage strong booking signals as a target
variable based on historical traveler listing interaction. We have
various choices for this purpose including Deep Average
Network with Auto-Encoder-Decoder, Long Short Term Memory
(LSTM), and Attention Networks. The simplest approach is
to take the point-wise average of the embedding vector and
use it directly in the model. The second approach could be to
feed the average embedding into a dimensionality expansion
and reduction non-linear encoder-decoder architecture, or
Deep Average Network to extract the signals [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The third
approach could incorporate LSTM network [
        <xref ref-type="bibr" rid="ref13 ref19">13, 19</xref>
        ], testing
the hypothesis that the traveler signals information that they
gathered by looking at diferent listings in the shopping
funnel. The fourth approach could have an attention layer on the
top of LSTM [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], hypothesizing that they allocate diferent
weights on various latent features before their booking.
      </p>
      <p>We take a probabilistic approach to model traveler
booking events P(Yj ) based on the embedding vectors of historical
units they have interacted with νj1, , .., νjt . Formally, given
the traveler embeddings (or last layer of the traveler
booking prediction neural network f (νj .)), the probability of the
booking is defined as:</p>
      <p>P(Yj |νj1, νj1, , .., νjt ) = sigmoid(f (νj .)) (1)
where, the Deep Average Network layers and f are defined
as:
f (νj .) = relu(ω1 · h2(νj .) + β1)
h1(νj .) = relu(ω2 · h1(νj .) + β2)</p>
      <p>1 Õt
h2(νj .) = relu(ω3 · k</p>
      <p>νji ) + β3)
i=1</p>
      <p>Alternatively, we can use an LSTM network with forget,
input, and output gates as follows:
f (νjt ) = sigmoid(ωf [ht , νjt ] + βf ) · f (νjt.−1)</p>
      <p>+ sigmoid(ωi [ht , νjt ] + βi ) · tanh(ωc [ht −1, νjt ] + βc ) (5)
And finally, we can also use an attention network on the
top of LSTM network as follows:</p>
      <p>f (νj ) = softmax(ωT · hT )tanh(hT )
where ω., β. are weight and bias parameters to estimate and
ht represents the hidden layer parameter or function to
estimate.</p>
      <p>Among these models, DAN is more consistent with
Occam’s razor principle, so it is more parsimonious, and faster
to train. However, LSTM and Attention Networks on the top
of it are more theoretically appealing. As a result, from the
(2)
(3)
(4)
(6)
pragmatic stand point, for millions of listings and travelers
DAN seems to be more appealing for deployment as depicted
in Figure 1.</p>
      <p>We use adaptive stochastic gradient descent method to
train the binary cross entropy of these neural networks. The
last question to answer is how are we planning to combine
the traveler and listing embedding for personalized
recommendations. This is a particularly challenging task as traveler
embeddings is non-linear projection of listings embedding
with a diferent dimension. As a result, they are not in the
same space to compute cosine similarity. We have various
choices for this solution, including approaches such as
factorization machine and svm with kernel that allow modeling
higher level interactions at scale. We defer the study of this
approach to our next study.</p>
    </sec>
    <sec id="sec-6">
      <title>4 EXPERIMENTS AND RESULTS</title>
      <p>
        In this section we describe the experimental setup, and the
results obtained when comparing the accuracy uplift of our
Deep Average Network based approach to various baselines
on a downstream conversion prediction model. The Traveler
Booking Intent XGBoost model is such a downstream model.
It is trained using LightGBM [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and uses a rich set of
hand-crafted historical session-based product interaction
features in order to predict the booking intent probability1.
In order to evaluate ofline our proposed methodology, we
1We call it booking intent as our model predicts booking request from
travelers, which needs a couple of steps to be confirmed as booking.
concatenated the hand-crafted features with the traveler
embeddings, generated by all diferent model settings.
      </p>
      <p>
        The three baseline methods that we compare against our
proposed Deep Average Network on the top of Skip-Gram
include the following:
(1) Random: a heuristic rule that chooses a random
listing embedding, among those listings a traveler has
previously interacted with, in the current session.
(2) Averaging Embeddings: a simple point-wise
averaging of listing embeddings a traveler has previously
interacted with, in the current session.
(3) LSTM with Attention: A recurrent neural network,
inspired by [
        <xref ref-type="bibr" rid="ref13 ref19 ref23">13, 19, 23</xref>
        ], that uses LSTM units and an
attention mechanism on top of it in order to combine
embeddings of listings a user has previously interacted
with, in the current session.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Datasets</title>
      <p>For the experiments, anonymized clickstream data is
collected for millions of users from two diferent seven-day
periods. Specifically, the click stream data includes user views
and clicks of listing detail page logs, search requests,
responses, views and clicks logs, homepage views and landing
page logs, conversion events logs, per visitor and session. The
ifrst click-stream dataset was used to generate embeddings
using Deep Average Network and the LSTM with Attention.
The second click-stream dataset was used to evaluate the
learned embeddings on the Traveler Booking Intent Model.
We split each of the data sets into train and test set by 70:30
proportion randomly, based on users. In other words, users
that are in the train set are excluded from the test set, and
vice versa.</p>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
      <p>
        We ran our training pipeline on both CPU and GPU
production systems using Tensorflow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We cleaned up the
data using Apache Spark [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], and the input data to training
pipeline had observations from millions of traveler sessions.
The training process for LSTM models typically took 3 full
days of time, while training DAN took less than 8 hours on
CPU. Given that our recommender system needs to be
iterated fast for improvement and infer in real-time with high
coverage, DAN model scales better. Moreover, we modified
the cost function to give more weight to minority class (i.e.
positive booking intent) in order to combat the imbalanced
classes in the data sets.
      </p>
      <p>We evaluated the performance of the Traveler Booking
Intent model on the diferent settings using the test data
set based on AUC, Precision, Recall and F1 scores. The best
results of each model are shown in Table 1. It shows that our
proposed Deep Average Network approach contributes more
uplift to the downstream Traveler Booking Intent model.</p>
      <p>Moreover, Table 2 shows the performance improvement
to the Traveler Booking Intent (TBI) model when the Deep
Average Network generated traveler embeddings are
concatenated to the initial hand-crafted features.</p>
      <p>We noticed that the Deep Average Network traveler
embeddings have competitive predictive power compared to the
hand-crafted ones in the downstream TBI model. Based on
random re-sampling the dataset and re-running the pipeline,
we find that our results are reproducible.
5</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION</title>
      <p>We presented a method that combines deep and shallow
neural networks to learn traveler and listing embeddings for a
large online two-sided vacation rental marketplace platform.
We deployed this system in the production environment.
Our results show Deep Average Networks can outperform
more complex neural networks in this context. There are
various avenues to extend our study. First, we plan to test
attention network without LSTM. Second, we plan to infuse
other contextual information into our model. Third, we want
to build a scoring layer that combines traveler and listing
embeddings to personalize recommendations. Finally, we
plan to evaluate numerous spatio-temporal features,
representational learning approaches, and bidirectional recurrent
neural networks in our framework.
6</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>This project is a collaborative efort between the
recommendation, marketing data science and growth marketing teams.
The authors would like to thank Ali Miraftab, Ravi Divvela,
Chandri Krishnan and Wenjun Ke for their contribution to
this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Martín</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Devin</surname>
          </string-name>
          , Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke,
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoqiang</given-names>
            <surname>Zheng</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</article-title>
          . http://tensorflow.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Sanjeev</given-names>
            <surname>Arora</surname>
          </string-name>
          , Yingyu Liang, and Tengyu Ma.
          <year>2016</year>
          .
          <article-title>A simple but tough-to-beat baseline for sentence embeddings</article-title>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Bogina</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tsvi</given-names>
            <surname>Kuflik</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Incorporating Dwell Time in Session-Based Recommendations with Recurrent Neural Networks.</article-title>
          .
          <source>In RecTemp@ RecSys</source>
          .
          <fpage>57</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Caselles-Dupré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Florian</given-names>
            <surname>Lesaint</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jimena</surname>
          </string-name>
          Royo-Letelier.
          <year>2018</year>
          .
          <article-title>Word2vec applied to recommendation: Hyperparameters matter</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>352</volume>
          -
          <fpage>356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Hector</given-names>
            <surname>Chade</surname>
          </string-name>
          , Jan Eeckhout, and
          <string-name>
            <given-names>Lones</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Sorting through search and matching models in economics</article-title>
          .
          <source>Journal of Economic Literature</source>
          <volume>55</volume>
          ,
          <issue>2</issue>
          (
          <year>2017</year>
          ),
          <fpage>493</fpage>
          -
          <lpage>544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Sneha</given-names>
            <surname>Chaudhari</surname>
          </string-name>
          , Gungor Polatkan, Rohan Ramanath, and
          <string-name>
            <given-names>Varun</given-names>
            <surname>Mithal</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>An Attentive Survey of Attention Models</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>02874</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Babur</given-names>
            <surname>De los Santos</surname>
          </string-name>
          , Ali Hortaçsu, and
          <string-name>
            <surname>Matthijs R Wildenbeest</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Testing models of consumer search using data on web browsing and purchasing behavior</article-title>
          .
          <source>American Economic Review</source>
          <volume>102</volume>
          ,
          <issue>6</issue>
          (
          <year>2012</year>
          ),
          <fpage>2955</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Simen</given-names>
            <surname>Eide</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ning</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep neural network marketplace recommenders in online experiments</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>387</volume>
          -
          <fpage>391</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Mihajlo</given-names>
            <surname>Grbovic</surname>
          </string-name>
          , Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and
          <string-name>
            <given-names>Doug</given-names>
            <surname>Sharp</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>E-commerce in your inbox: Product recommendations at scale</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <year>1809</year>
          -
          <fpage>1818</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mohit</surname>
            <given-names>Iyyer</given-names>
          </string-name>
          , Varun Manjunatha,
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          Boyd-Graber, and Hal Daumé III.
          <year>2015</year>
          .
          <article-title>Deep unordered composition rivals syntactic methods for text classification</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Vol.
          <volume>1</volume>
          .
          <fpage>1681</fpage>
          -
          <lpage>1691</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Christopher</surname>
            <given-names>C</given-names>
          </string-name>
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Logistic matrix factorization for implicit feedback data</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          <volume>27</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Thom</surname>
            <given-names>Lake</given-names>
          </string-name>
          , Sinead A Williamson, Alexander T Hawk, Christopher C Johnson, and Benjamin P Wing.
          <year>2019</year>
          .
          <article-title>Large-scale Collaborative Filtering with Product Embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>04321</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Lang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Rettenmeier</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Understanding consumer behavior with recurrent neural networks</article-title>
          .
          <source>In Workshop on Machine Learning Methods for Recommender Systems.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Dawen</surname>
            <given-names>Liang</given-names>
          </string-name>
          , Jaan Altosaar, Laurent Charlin, and David M Blei.
          <year>2016</year>
          .
          <article-title>Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence</article-title>
          .
          <source>In Proceedings of the 10th ACM conference on recommender systems. ACM</source>
          ,
          <volume>59</volume>
          -
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Microsoft</surname>
          </string-name>
          .
          <year>2019</year>
          . LightGBM. https://lightgbm.readthedocs.io
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>(</source>
          <year>2013</year>
          ),
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Andriy</given-names>
            <surname>Mnih and Ruslan R Salakhutdinov</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Probabilistic matrix factorization</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>1257</volume>
          -
          <fpage>1264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Suvash</surname>
            <given-names>Sedhain</given-names>
          </string-name>
          , Aditya Krishna Menon,
          <string-name>
            <given-names>Scott</given-names>
            <surname>Sanner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Lexing</given-names>
            <surname>Xie</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Autorec: Autoencoders meet collaborative filtering</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web. ACM</source>
          ,
          <volume>111</volume>
          -
          <fpage>112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Humphrey</surname>
            <given-names>Sheil</given-names>
          </string-name>
          , Omer Rana, and
          <string-name>
            <surname>Ronan</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Reilly</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Predicting Purchasing Intent: Automatic Feature Learning using Recurrent Neural Networks</article-title>
          . CoRR abs/
          <year>1807</year>
          .08207 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Chu</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Lei</given-names>
            <surname>Tang</surname>
          </string-name>
          , Shujun Bian, Da Zhang, Zuohua Zhang, and
          <string-name>
            <given-names>Yongning</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Reference Product Search</article-title>
          . arXiv:arXiv:
          <year>1904</year>
          .05985
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Shoujin</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Longbing Cao</surname>
            , and
            <given-names>Yan</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Survey on Session-based Recommender Systems</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>04864</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Shu</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Yuyuan Tang, Yanqiao Zhu, Liang Wang,
          <string-name>
            <surname>Xing Xie</surname>
            , and
            <given-names>Tieniu</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Session-based Recommendation with Graph Neural Networks</article-title>
          . arXiv preprint arXiv:
          <year>1811</year>
          .
          <volume>00855</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Yuan</surname>
            <given-names>Xia</given-names>
          </string-name>
          , Jingbo Zhou, Jingjia Cao,
          <string-name>
            <given-names>Yanyan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Fei</given-names>
            <surname>Gao</surname>
          </string-name>
          , Kun Liu,
          <string-name>
            <surname>Haishan Wu</surname>
            , and
            <given-names>Hui</given-names>
          </string-name>
          <string-name>
            <surname>Xiong</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Intent-Aware Audience Targeting for Ride-Hailing Service</article-title>
          .
          <source>In Machine Learning and Knowledge Discovery in Databases</source>
          , Ulf Brefeld, Edward Curry, Elizabeth Daly,
          <string-name>
            <surname>Brian</surname>
            <given-names>MacNamee</given-names>
          </string-name>
          , Alice Marascu, Fabio Pinelli, Michele Berlingerio, and Neil Hurley (Eds.). Springer International Publishing, Cham,
          <fpage>136</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Matei</surname>
            <given-names>Zaharia</given-names>
          </string-name>
          , Reynold Xin, Patrick Wendell,
          <string-name>
            <surname>Tathagata Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Michael Armbrust</surname>
            , Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman,
            <given-names>Michael J.</given-names>
          </string-name>
          <string-name>
            <surname>Franklin</surname>
            , Ali Ghodsi,
            <given-names>Joseph</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>Scott</given-names>
          </string-name>
          <string-name>
            <surname>Shenker</surname>
            , and
            <given-names>Ion</given-names>
          </string-name>
          <string-name>
            <surname>Stoica</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Apache Spark: a unified engine for big data processing</article-title>
          .
          <source>Commun. ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          ),
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Peng</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Wei Shi, Jun Tian, Zhenyu Qi,
          <string-name>
            <given-names>Bingchen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hongwei</given-names>
            <surname>Hao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bo</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Attention-based bidirectional long short-term memory networks for relation classification</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Vol.
          <volume>2</volume>
          .
          <fpage>207</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Han</surname>
            <given-names>Zhu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xiang</given-names>
            <surname>Li</surname>
          </string-name>
          , Pengye Zhang, Guozheng Li, Jie He,
          <string-name>
            <given-names>Han</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kun</given-names>
            <surname>Gai</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning Tree-based Deep Model for Recommender Systems</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. ACM</source>
          ,
          <volume>1079</volume>
          -
          <fpage>1088</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>