<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hotel2vec: Learning Hotel Embeddings from User Click Sessions with Side Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>IOANNIS PARTALAS</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Expedia Group</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Switzerland ANNE MORVAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Expedia Group</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Switzerland ALI SADEGHIAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>University of Florida</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>USA SHERVIN MINAEE</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>USA XINXIN LI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Expedia Group</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>USA BROOKE COWAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Apex Clearing Corporation</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DAISY ZHE WANG, University of Florida</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>We propose a new neural network architecture for learning vector representations of items with attributes, specifically hotels. Unlike previous works, which typically only rely on modeling of user-item interactions for learning item embeddings, we propose a framework that combines several sources of data, including user clicks, hotel attributes (e.g., property type, star rating, average user rating), amenity information (e.g., if the hotel has free Wi-Fi or free breakfast), and geographic information that leverages an hexagonal geospatial system as well as spatial encoders. During model training, a joint embedding is learned from all of the above information. We show that including structured attributes about hotels enables us to make better predictions in a downstream task than when we rely exclusively on click data. We train our embedding model on more than 60 million user click sessions from a leading online travel platform, and learn embeddings for more than one million hotels. Our final learned embeddings integrate distinct sub-embeddings for user clicks, hotel attributes, and geographic information, providing a representation that can be used flexibly depending on the application. An important advantage of the proposed neural model is that it addresses the cold-start problem for hotels with insuficient historical click information by incorporating additional hotel attributes, which are available for all hotels. We show through the results of an online A/B test that our model generates high-quality representations that boost the performance of a hotel recommendation system on a large online travel platform. CCS Concepts: • Computing methodologies → Ranking; Learning latent representations; Learning from implicit feedback; Neural networks. Additional Key Words and Phrases: neural networks, embeddings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        Learning semantic representations of diferent entities, such as textual, commercial, and physical, has been a recent and
active area of research. Such representations can facilitate applications that rely on a notion of similarity, for example
recommendation systems and ranking algorithms in e-commerce [
        <xref ref-type="bibr" rid="ref18 ref2 ref22 ref39 ref5 ref6 ref9">2, 5, 6, 9, 18, 22, 39</xref>
        ].
      </p>
      <p>
        In natural language processing, word2vec [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] learns vector representations of words from large quantities of text,
where each word is mapped to a -dimensional vector such that semantically similar words have geometrically closer
vectors. This is achieved by predicting either the context words appearing in a window around a given target word
(skip-gram model), or the target word given the context (CBOW model). The main assumption is that words appearing
frequently in similar contexts share statistical properties (the distributional hypothesis). Crucially, word2vec models,
like many other word embedding models, preserve sequential information encoded in text so as to leverage word
co-occurrence statistics. The skip-gram model has been adapted to other domains in order to learn dense representations
∗Work completed while author was at Expedia Group.
of items other than words. For example, product embeddings in e-commerce [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or vacation rental embeddings in the
hospitality domain [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] can be learned by treating purchase histories or user click sequences as sentences, and applying
a word2vec approach.
      </p>
      <p>Most of the prior work on item embedding exploit the co-occurrence of items in a sequence as the main signal for
learning the representation. One disadvantage of this approach is that it fails to incorporate rich structured information
associated with the embedded items. For example, in the travel domain, where we seek to embed hotels and other
travel-related entities, it could be helpful to encode explicit information such as user ratings, star ratings, hotel amenities,
and location in addition to implicit information encoded in the click-stream.</p>
      <p>In this work, we propose an algorithm for learning hotel embeddings that combines sequential user click information
in a skip-gram approach with additional structured information about hotels. We propose a neural architecture that
adopts and extends the skip-gram model to accommodate arbitrary relevant information of embedded items, including
but not limited to geographic information, ratings, and item attributes. In experimental results, we show that enhancing
the neural network to jointly encode click and supplemental structured information, outperforms a skip-gram model
that encodes the click information alone. The proposed architecture also naturally handles the cold-start problem for
hotels with little or no historical clicks. Specifically, we can infer an embedding for these properties by leveraging their
supplemental structured metadata.</p>
      <p>Compared to previous work on item embeddings, the novel contributions of this paper are as follows:
(1) We propose a novel yet straightforward framework for fusing multiple sources of information about an item (such
as user click sequences and item-specific information) to learn item embeddings via self-supervised learning.
(2) We generate an embedding that consists of three sub-embeddings for clicks, geography, and amenities attributes,
which can be employed either as separate component embeddings or a single, unified embedding.
(3) We address the cold-start problem by including hotel metadata which are independent of user click-stream
interactions and available for all hotels. This helps us to better impute embeddings for sparse items/hotels.
(4) We show significant gains over previous work based on click-embeddings in several experimental studies.</p>
      <p>The structure of the remainder of this paper is as follows. Section 2 gives an overview of some of the recent works
on neural embedding. Section 3 provides details of the proposed framework, including the neural network architecture,
training methodology, and how the cold-start problem is addressed. In Section 4, we present experimental results on
several diferent tasks and a comparison with previous state-of-the-art work. Section 5 highlights online A/B tests
obtained for ranking hotels on a search result page by including as features these embeddings in the search ranking
model. Section 6 concludes the paper.
2
2.1</p>
      <p>RELATED WORK</p>
    </sec>
    <sec id="sec-2">
      <title>Embeddings from user sequences for diferent application domains</title>
      <p>
        Recommendation is an inherently challenging task that requires learning user interests and behavior. There has
been a significant body of research on advancing it using various frameworks [
        <xref ref-type="bibr" rid="ref13 ref25 ref3 ref30 ref42">3, 13, 25, 30, 42</xref>
        ]. Learning a semantic
representation/embedding of the items being recommended is a critical piece of most of these frameworks. Building
recommender systems for hotels is an specially hard task due to challenges such as balancing between the popular
hotels and the newly added ones (without enough clicks), and the very large space of candidates.
      </p>
      <p>
        Neural network models have been widely used for learning embeddings from user sessions [
        <xref ref-type="bibr" rid="ref37 ref38">37, 38</xref>
        ]. One prominent
use case is learning product embeddings for e-commerce. In [
        <xref ref-type="bibr" rid="ref12 ref4">4, 12</xref>
        ], the authors develop an approach based on the
skipgram model [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], frequently used in natural language processing. They leverage users’ purchase histories obtained from
their e-mail receipts to learn a dense representation of products. Each user’s complete purchase history is represented
as a sequence, which is treated as a sentence in which the items are considered words. For music recommendation,
authors in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] define track representations considered as ground truth with a word2vec continuous bag-of-word
model and a music session (a collection of tracks) as the average of track embeddings it contains. A session-level
user embedding is further learned via a LSTM model to maximize the cosine similarity between the predicted user
session-level embedding with the observed ground truth music session representation. In the online travel space,
authors in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] use the skip-gram framework to learn embeddings for vacation rental properties. They extend the
ideas in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to take into account a user’s click stream data during a session. A key contribution of their method is
the modification of the skip-gram model to always include the booked hotels in the context of each target token, so
that special attention is paid to bookings. They also improve negative sampling by sampling from the same market,
which leads to better within-market listing similarities. Nevertheless, their model relies exclusively on large amounts of
historical user engagement data, which is a major drawback when such data are sparse.
2.2
      </p>
      <p>
        Link to graph approaches
The skip-gram loss can be seen also as a graph-based one and this paves the way to graph embedding approaches
which share the same similarity assumption. Graph embedding methods aim at learning in an unsupervised manner
embeddings for pairs of edge-linked nodes which are more similar to each other than the embeddings of pairs of
nodes without an edge between them. In that case, the considered graph is constructed from the co-clicks: nodes are
the items and an edge designates a co-click between two items. In this area, graphSAGE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is the first approach to
work in the inductive mode. By learning aggregation functions, graphSAGE predicts the embedding of a new node
without a re-training based on its features and neighborhood. Two recent methods PyTorch-BigGraph (PBG) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and
Cleora [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] ofer more scalability by partitioning the graph. PBG proposes a margin-based ranking objective function
between positive and negative pairs of nodes. Cleora relies exclusively on the graph structure and does not use a
contrastive learning objective, hence does not require to sample positive or negative examples. Instead Cleora obtains
node embeddings by iteratively aggregating each node’s neighbor embeddings followed by an 2-normalization. Cleora
prevents the embeddings from collapsing through a careful initialization and the normalization step.
2.3
      </p>
      <p>
        Merging all side information to capture "context" of user activity
In another relevant work, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], authors propose a framework for YouTube video recommendation which fuses multiple
features (e.g., video watches, search tokens, geo embeddings) into a unified representation via a neural architecture.
They then use these embeddings for candidate generation and ranking. The main limitation of this work is that the
individual embeddings are learned separately, and then combined via a neural network to perform classification.
      </p>
      <p>
        There are also several works which try to use attention mechanism as a tool to capture "context" of users’ activities
on the basis of actions they have performed recently, such as contextual self-attention network for user sequential
recommendation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], multi-pointer co-attention network [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], multi-order attentive ranking model [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], neural
attentive interpretable recommendation system [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ], self-attentive sequential recommendation [
        <xref ref-type="bibr" rid="ref21 ref29">21, 29</xref>
        ].
      </p>
      <p>
        Similar to our work on hotel2vec, there are also some works which attempt to include explicit item attributes (e.g.,
size, artist, model, color) within the sequence prediction framework using various strategies. In [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], the item metadata
is injected into the model as side information to regularize the item embeddings. Their approach only uses one feature
(singer id) in the experiments. In addition, it does not accommodate learning independent embedding vectors for each
attribute group. Most recently, [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] propose a method where they train separate encoders for text data, click-stream
session data, and product image data, and then use a simple weighted average to unify these embeddings. The weights
are learned using grid search on the downstream task. While their approach allows for exploring independent embedding
vectors, the sub-embeddings of diferent attribute groups are learned independently rather than jointly. In addition to
eforts extending the skip-gram framework, emerging research attempts to extend GloVe [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] by incorporating various
attributes. Authors in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] incorporate attribute information into GloVe by modifying the loss function such that the
representation of a location can be learned by combining both text and structural data.
3
      </p>
      <p>
        THE PROPOSED FRAMEWORK
Similar to word2vec [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], by treating the clicks made by users within an interactive web session as words, and sequences
of clicks as sentences, we seek to predict the context hotels (words), given a target hotel (word) in the session (sentence).
On a high level, this is the approach proposed in [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. We refer to this approach as a session-only model.
      </p>
      <p>As mentioned earlier, one drawback of this approach is that it does not use any information apart from the click
data, making it very challenging to make predictions for unseen hotels or hotels with sparse click data. In addition,
the model may be forced to learn certain semantic features which capture aspects of user interest, hotel geographic
information, hotel attributes, and so on, as latent variables as opposed to leveraging them as explicitly-provided input
features. To address these shortcomings, we propose adding more explicit information about the hotel as model input.
Intuitively, this should make the model more eficient during training as well as provide information that it can use
when making predictions on unseen or sparse hotels.</p>
      <p>Another major advantage of our model is its use of diferent projection layers for various hotel/item attributes. This
enables us to learn independent embedding vectors representing diferent facets of the property, in addition to an
enriched, unified embedding for each hotel. This model also provides a dynamic framework for updating the embedding
of a hotel, once its user-rating or other attribute information changes over time. This is not trivial in session-only
models, unless we re-train a new model based on recent click data post attribute changes. In the remainder of the paper,
we refer to our proposed hotel2vec model as an enriched model, in contrast to the session-only model introduced above.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Neural Network Architecture for the enriched hotel2vec</title>
      <p>
        where  is the one-hot encoding of hotels in the click session, and  ∈ R is a continuous vector with geographical
information of the hotel which we will explain later what type of information contains.  ( ; W) is a normalized
W
projection layer parameterized with trainable weights W, i.e.,  ( ; W) = ReLU( ∥W ∥2 ) where ReLU is the rectified
Linear Unit activation function [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3.1.1 Geographical features. As geographical features we use a spatial encoder [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] (later called space2vec) as well as
the H3 hierarchical geo-spatial system1. H3 maps the world in hexagons which can be defined in diferent resolutions.
Both space2vec and H3 embeddings are concatenated to form .
      </p>
      <p>To obtain space2vec embedding, we encode a spatial point  (PE for Point Encoder) as a concatenation of multi-scale
representations where  is the total number of grid scales
and each component of  is used separately (in our case the latitude and longitude of the hotel) as follows:
[0 ( ); . . . ;   ( ); . . . ;</p>
      <p>−1 ( )]
∀ ∈ {0, . . . ,  − 1},  ( ) = [,1 ( ); ,2 ( )]</p>
      <p>∀ ∈ {1, 2}, , ( ) = [cos  ·[]/(−1) ; sin  ·[]/(−1) ]
where  and  are the minimum and maximum grid scale and  =  = 101000 . The multi-scale representation
is encoded with a fully connected layer in the hotel2vec model. As activation function we tried both ReLU and linear
one and we found out that linear works the best.</p>
      <p>
        To obtain H3 embedding, we use the index at resolution 8 which we find reasonable for our use case. From the
latitude and longitude of a hotel we can get the unique id (given a resolution) of the H3 hexagon which we embed in
the model.
3.1.2 Amenity features. Amenity features (e.g., PetsAllowed, GuestRating, SpaServices, etc.) can be categorical or
numerical with possible missing values. Thus,  ∈ R58 is partitioned per feature, where for numerical features we
simply use an element of  assigned with the value of that feature, and for categorical features with  categories, we
assign  elements of  and set the corresponding category to 1 and the others to 0. If the feature is missing, we set
everything to 0.
3.1.3 Loss function. We train our model by optimizing the Noise Contrastive Estimation (NCE) loss [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. More formally,
given ℎ as the target, we estimate the probability of ℎ being a context hotel to be
log  (ℎ |ℎ ) = log  (V⊤ W,:)
(2)
where  is the sigmoid function, V is the enriched embedding of ℎ , W,: is the th row of the output projection weights,
 . The 1, · · ·,  vectors in Figure 1 represent the one-hot encodings of other hotels in the training window of
 (in the window, stride length is 2 before and after the hotel). We find parameters of the model by maximizing the
probabilities of correct predictions. We train the model using backpropagation and by minimizing the following loss
function (2-regularization is also applied):
      </p>
      <p>J ( ) = −1 Õ=1 log  (ℎ |ℎ ) +ℎ Õ∈Nlog  (−V⊤ Wℎ,:)
(3)
where  includes all the parameters of the model,  is the size of the batch, W,: is th row of  , N = {ℎ |1 ≤
 ≤  , ℎ ∼  (ℎ )} is the set of negative examples, and  (ℎ ) is the distribution which we use to pick the negative
samples which we discuss in Section 3.2. Finally, we train the model by maximizing Eq. 3 using batch stochastic gradient
descent.</p>
    </sec>
    <sec id="sec-4">
      <title>3.2 Negative Sampling</title>
      <p>
        It is well known [
        <xref ref-type="bibr" rid="ref14 ref28">14, 28</xref>
        ] that using negative sampling, a version of noise contrastive estimation, significantly decreases
the amount of time required to train a classifier with a large number of possible classes. In the case of recommendation,
there is typically a large inventory of items available to recommend to the user, and thus we train our skip-gram model
using negative sampling. However, it is not uncommon that users frequently search exclusively within a particular
subdomain. For example, in hotel search, a customer looking to stay in Miami will focus on that market and rarely
across diferent markets. This motivates a more targeted strategy when selecting negative samples: we select negative
samples within the market that the target and context hotels belong to. Throughout this paper, a market is defined as a
set of hotels in the same geographic region. It’s worth noting that there may be multiple markets in the same city or
other geographical region.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.3 Cold Start Problem</title>
      <p>In practice, many hotels/items appear infrequently or never in historical data. Recommender systems typically have
dificulty handling these items efectively due to the lack of relevant training data. Apart from the obvious negative
impacts on searchability and sales, neglecting these items can introduce a Matthew’s efect where "rich get richer and
the poor get poorer". That is, the less these items are recommended, or the more they are recommended in inappropriate
circumstances, the more the data reinforces their apparent lack of popularity.</p>
      <p>
        Dealing with such hotels/items and choosing appropriate weights for them is referred to as the "cold start problem."
One of the main advantages of the enriched hotel2vec model over session-only approaches is its ability to better handle
cold start cases. Although an item might lack suficient prior user engagement, there are often other attributes available.
For example, in our use case, thousands of new properties are added to the lodging platform’s inventory each quarter.
While we do not have prior user engagement data from which to learn a click embedding  , we do have other attributes
such as geographical location, star rating, amenities, etc. hotel2vec can take advantage of this supplemental information
to provide a better cold-start embedding. For newly listed hotels with no click-session information, one can simply
choose  for new hotels at random and hotel2vec computes  using the randomly initialized  and the other hotel
attributes which are known even for recently listed hotels. In Section 4.4, we compare with the session-only model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
when setting  as the average of other hotels’ embeddings in the same market and show a 70% gain on Hits@10 for
the cold start hotels.
4
      </p>
      <p>EXPERIMENTAL RESULTS
In this section, we present several experiments to evaluate the performance of the trained hotel2vec embeddings. We
refer the reader to next Section 5 for results on an online A/B testing. Before diving into the details of the experiments,
we first describe the dataset and model parameters.
4.1</p>
      <p>
        Experimental Framework
4.1.1 Real-world large-scale dataset from Expedia Group, a leading Online Travel Agency (OTA). Our dataset collected
in 2019 (pre-Covid period) contains more than 65 million user click sessions, which includes more than 1.4 million
unique hotels. A click session is defined as a span of clicks performed by a user with no gap of more than 7 days for the
same destination and search parameters. Data are summarized in Table 1. We randomly split the sessions into training,
validation, and test with a ratio of 8:1:1.
4.1.2 Experiment configuration. We use a system with 64GB RAM, 8 CPU cores, and a Tesla V100 GPU. We use Python
3 as the programming language and the Tensorflow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] library for the neural network architecture and gradient
calculations. All weight matrices are initialized with a he_normal [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] initializer, and click embedding vectors are
initialized uniformly at random. As mentioned previously, 2-regularization is applied on the weights which we add in
the loss function.
4.1.3 Downstream tasks. In the following sections, we provide the performance evaluation of our trained embeddings
on several experimental tasks. We start with the quantitative results focusing on the next-item prediction task based on
model’s output probabilities (Section 4.2.1) and cosine similarity (Section 4.2.2), then present some qualitative results.
      </p>
      <p>
        We also evaluate each model’s performance on the cold start problem and provide insights on the efect of some of the
hyper-parameters.
4.1.4 Comparison against state-of-the-art embedding baselines. For next-item prediction task based on cosine similarity
in Section 4.2.2, we compare results against the following embedding methods:
• the state-of-the-art session-only model proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As explained in Section 2, this model can only learn
from historical user click sessions without direct use of the item’s attributes.
• our hotel2vec which combines both hotel/item attributes and the click session data.
• Matrix Factorization (MF) approach [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], where we factorize the matrix of the co-occurrence of the clicked hotels.
      </p>
      <p>
        Specifically, we factorize the log of the co-occurrence matrix.
• Cleora [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] and,
• graphSAGE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] presented in Section 2. We used the implementation from Stellargraph 2.
      </p>
      <p>
        For all considered experiments, we tune the hyperparameters of all models on the validation set. In particular for
the state-of-the-art baseline session-only model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we search for a learning rate from {0.01, 0.1, 0.5, 1.0, 2.5} and
embedding dimensions from {32, 128}. To train the model weights, we use stochastic gradient descent (SGD) with
exponential decay (power=0.99 and staircase steps per 40k training steps) since it performed better than other optimizers
in our case, and a batch size of 1024. We found that a learning rate of 0.5 and an embedding dimension of 32 worked
best. Hence, throughout the remainder of the paper, all embeddings will have dimension 32.
      </p>
      <p>
        For hotel2vec, an initial learning rate of 0.05 worked best; for the dimensions of the embedding vectors, we found
that letting  ,  ∈ R32,  ∈ R15 and  ∈ R36 worked best. For the multi-scale parameter in space2vec module we
tuned  to 16 and the output dimension to 28 with linear activation. For H3 layer we set the embedding size to 8 also.
These two representations are concatenated to form the geo-embedding part of the model. For both session-only [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
and hotel2vec models, the number of negative samples is 2000.
      </p>
      <p>For the MF approach we constructed the co-click matrix using a skip window of size 2 and factorized it with Alternate
Least Squares algorithm where we tuned the maximum iterations at 10 and the regularization parameter at 0.02. Note
that as the result are two matrices, one for the target hotel and one for the context hotel, we obtained the best results
by averaging the two corresponding vectors for each hotel in order to obtain the final representation.</p>
      <p>To make fair comparison with graphSAGE, we choose then 200 neighbors to sample from the direct neighborhood
and 10 from the 2-hop neighborhood. Stellargraph’s implementation of graphSAGE samples as many negative as
positive neighbors. We then follow recommendations of the authors and set no dropout, 2-regularization and ADAM
for the optimizer. The binary cross entropy loss is used for the link prediction task to learn the embeddings as well as
the sigmoid function for the activation.</p>
    </sec>
    <sec id="sec-6">
      <title>4.2 Quantitative Analysis: Next-item prediction task</title>
      <p>A robust metric for evaluating a set of hotel embeddings (or, more generally, any set of items displayed to a user in
response to an information need) is its ability to predict a user’s next click/selection. In this section, we compare our
model based on the Hits@k and MRR@k metrics in various cases.
• Hits@k measures the average number of times the correct selection (i.e. the hotels clicked by the users in a
session) appears in the top k predicted hotels (i.e. the hotels with highest predicted probabilities, conditioned on
the current hotel).
• MRR@k (for Mean Reciprocal Rank) evaluates the average list quality of k items returned by the model (ordered
by predicted probabilities) by looking at the rank of the first correctly predicted item.</p>
      <p>Next-item prediction task based on model’s output probabilities. We consider two main scenarios:
• Raw evaluation: We are given the current hotel clicked by the user, and we try to predict the next clicked hotel
among all approximately 1.4M hotels.
• Filtered evaluation: The second scenario is identical except we limit the candidates to hotels within the same
market.</p>
      <p>For the last scenario, three simple baselines are also included where we rank the properties according to their average
guest rating and their last week and last year popularity which is the raw number of bookings that the property received
the last seven days and last twelve months respectively.</p>
      <p>
        Table 2 shows Hits@k and MRR@k for  ∈ {10, 100} for hotel2vec and Session-32 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] approach. We also present
results for hotel2vec when we drop the geographical features.
      </p>
      <p>We notice that hotel2vec outperforms the session-only model by a huge margin, demonstrating the utility of including
item attributes when learning embeddings. By removing the geographical part we experience a drop in all the metrics
for hotel2vec.</p>
      <p>We also compare both models in the filtered scenario. This is a more realistic case because limiting hotels to the same
market reduces the efect of other information the recommender system can use to provide more relevant suggestions
to the user. Table 2b shows predictions results in the filtered scenario. The proposed hotel2vec model outperforms the
"highest rated" and "most popular" baselines with a large margin.</p>
      <p>
        As demonstrated by Table 2, the hotel2vec model outperforms the baseline session model from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] significantly in
both scenarios. This shows the efectiveness of hotel2vec in incorporating both click sessions and item/hotel attributes
for better recommendations.
4.2.2 Next-item prediction task based on cosine similarity. In this section, rather than using the model’s output
probabilities to induce a ranking over hotels, we measure Hits@k and MRR@k over the ranking induced using cosine similarity
of the embedding vectors. This is useful in scenarios where it is not feasible to directly use the model’s probabilities.
In particular, it is easier to compare the diferent baselines based solely on the embeddings, that is why we present
here more competing baselines. Table 3 shows the results for various embeddings: Session-32 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and hotel2vec but
also Cleora [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], MF [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and graphSAGE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The simple baselines "Highest rated", "Most Popular (last year)", "Most
Popular (last week)" which are not embedding approaches were removed since they were already performing poorly in
the previous experiment. For conciseness, we focus only on the raw evaluation scenario.
      </p>
      <p>Hotel2vec embeddings achieve the highest performance. We believe that bad performance obtained by graphSAGE is
due to its scalability issue. By increasing the number of sampled neighbors, it would probably increase the metrics’results
but training time would be prohibitive (about 150h of training were already needed to obtain these results). We also see
from Table 3 that using cosine similarity instead of the whole network does not result in a huge decrease in performance.</p>
    </sec>
    <sec id="sec-7">
      <title>4.3 Qualitative Analysis</title>
      <p>
        The learned hotel embeddings can be used for recommending similar hotels in various situations. In this section, we
show examples of how these embeddings are helpful with real examples of hotels from our dataset.
4.3.1 Visualization of embedding clusters. To further illuminate the nature of the embeddings learned by the hotel2vec
model, we examine a low-dimensional projection (UMAP [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]) of hotel embeddings in the Miami market (Fig. 2b and 2a).
The colors signify the grouping of hotels into various competing subcategories (i.e., similar hotels), manually annotated
by a human domain expert. The hotel2vec model is significantly better at clustering similar hotels than the session-only
model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
4.3.2 Finding the top-k most similar hotels. A common scenario is finding similar hotels to a target hotel in other
destinations. For example, when the user searches for a specific hotel name (e.g., Hotel Beacon, NY) we would like
to be able to recommend a few similar hotels. The learned embeddings can be used to find top-k most similar hotels
to a given one. Given a target hotel ℎ, we compute the cosine similarity of every other hotels with ℎ and pick the
most similar hotels. Rigid evaluation of this system requires A/B testing; here we show a few examples comparing our
hotel2vec embeddings and the session-only [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] embeddings in Fig. 3 to provide some intuition for the behavior of the
two models.
(a) hotel2vec embeddings
(b) Session-32 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] embeddings
Here we analyze how well the model learns embeddings for hotels with no presence in the training data. To demonstrate
the efectiveness of our model, we compare the hotel2vec’s Hits@k with the session-only model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]’s Hits@k, for
target hotels that were absent during training for same-market predictions (filtered evaluation). For hotel2vec, cold-start
concerns the  embedding associated to co-clicked properties within the same session. We use a simple heuristic
for cold-start imputation and compare the results with hotel2vec model for cold-start hotels. To impute vectors for
cold-start hotels, we borrow the idea in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and use price, star rating, geodesic distance, type of the property (e.g.,
hotel, vacation rental, etc.) size in terms of number of rooms, and the geographic market information. For each imputed
property, we collect the top 100 most similar properties in the same market based on the above features, considering
only those properties that fall within a radius of 5km of the target hotel and for which we have an existing  (resp.
session-32) embedding. We then average these embeddings to obtain the  (resp. final) embedding of the hotel to
impute. Results are shown in Table 4. The proposed enriched embedding by hotel2vec model significantly outperforms
the session-based embeddings for cold-start hotels.
In this section, we first look at the learning curves for both the session-32 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and hotel2vec models. Then, we analyse
the efect of  (number of negative samples), and  (learning rate) alongside the optimization algorithm on the
performance of our model.
4.5.1 Learning curves. Fig. 4 shows the overall training progress of both the session-32 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and hotel2vec models with
their respective best hyperparameters optimized for Hits@100. Our model achieves similar performance with fewer
data.
4.5.2 Number of negative samples. An interesting phenomenon is the efect of increasing the number of negative
samples on training time and accuracy. Although it takes more time to create a large number of negative samples, as
Fig. 5a shows, using more negative samples results in faster training times.
0.7
0.6
00.5
0
1
ts@0.4
i
H
ion0.3
it
a
d
laV0.2
0.1
0.0
      </p>
      <p>
        N = 10
N = 100
N = 200
N = 500
12
14
100
4.5.3 Learning rate and optimization techniques. We show empirical experiments with various optimization algorithms
and learning rates, summarized in Fig. 5b. Surprisingly, we see that SGD with exponential learning rate decay outperforms
most optimizers with sophisticated learning rate adaptations. We believe this is due to large variance and overfitting in
the early stages of training. These issues have been observed in other tasks such as [
        <xref ref-type="bibr" rid="ref32 ref8">8, 32</xref>
        ], suggesting the need to use
tricks such as warm-up heuristics when using momentum-based optimization algorithms to learn embeddings on large,
diverse datasets such as ours.
      </p>
      <p>0
2</p>
      <p>4 Traini6ng time (h8ours) 10
(a) Efect of negative sampling on prediction.</p>
      <p>Higher number of negative samples results in faster
training times.</p>
    </sec>
    <sec id="sec-8">
      <title>5 ONLINE A/B TESTS</title>
      <p>We performed online tests on search ranking in order to evaluate the embeddings. Specifically, we use the embeddings
as input features in the ranking model. The model implements a neural architecture and takes as input search features
(destination, dates, number of travelers, etc.) and property features (price, geographical information, ratings, etc.). The
model is trained in the context of Learning to Rank with a pairwise loss approach and the main of-line metric is NDCG.</p>
      <p>We performed two tests in order to validate the efectiveness of the embeddings. In the first test, we compared the
ranking model with hotel embeddings learned with the MF approach (described in Section 4) against the ranker that
was trained without hotel embeddings. This was our initial version of hotel embeddings.</p>
      <p>In the next test we compared the ranker model trained with hotel2vec embeddings and compared against the ranker
that leverages MF embeddings. Note that these tests were ran in a sequential manner because the two approaches have
been designed one after the other.</p>
      <p>Tables 5a and 5b present the results of the tests in terms of the main metrics, that is conversion rate (CVR) and gross
profit (GP). The first test had no efect in terms of CVR while it was significantly improving GP 3. Our analysis showed
that this was due to the fact that the embeddings would favor better quality hotels which are slightly more expensive.</p>
      <p>Table 5b shows the results of the second online test in terms of CVR and GP uplift. In this case we have observed a
positive uplift in terms of CVR while not hurting GP. Hotel2vec is able to capture better similarities and as a consequence
would help the ranker to propose properties with higher utility for the user. Also, the fact that we could impute hotels
that were not seen during training of hotel2vec had a positive efect as we observed an uplift on the main metrics for
new and less popular hotels.</p>
      <p>3Disclosure of specific numbers was not allowed by Legal Department.</p>
      <p>After the tests were completed, as the ranking model is linear, we also looked at feature importances on logged
searches in order to understand the diferent behavior. We found out that hotel2vec features were 4 times more important
than the MF approach. This result reinforced the conclusion that hotel2vec can capture better similarities among hotels
which are later leveraged by the ranker in order to propose higher utility hotels for the user.</p>
    </sec>
    <sec id="sec-9">
      <title>6 CONCLUSION</title>
      <p>In this work, we propose a framework to learn a semantic representation of hotels by jointly embedding hotel click
data, geographic information, user rating, and attributes (such as stars, whether it has free breakfast, whether pets
are allowed, etc.). Our neural network architecture extends the skip-gram model to accommodate multiple features
and encodes each one separately. We then fuse the sub-embeddings to predict hotels in the same session. Through
experimental results, we show that enriching the neural network with supplemental, structured hotel information
results in superior embeddings when compared to a model that relies solely on click information. Our final embedding
is composed from the stack of multiple sub-embeddings, each encoding the representation for a diferent hotel aspect,
resulting in a modular representation. It is also adaptive, in a sense that if one of the attributes or user ratings changes
for a hotel, we can feed the updated data to the model and easily obtain a new embedding. Although we mainly focus
on learning embeddings for hotels, the same framework can be applied to general item embedding, such as product
embedding on Amazon, Ebay, Netflix, or Spotify.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Ion Lesan, Peter Barszczewski, Daniele Donghi, and Ankur Aggrawal for helping us
collecting hotel’s attribute, click and geographical data. We would also like to thank Dan Friedman and Thomas Mulc
for providing useful comments and feedback.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Ashish Agarwal, Paul Barham, and et. al.
          <year>2015</year>
          .
          <article-title>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</article-title>
          . https: //www.tensorflow.org/ Software available from tensorflow.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jens</given-names>
            <surname>Adamczak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gerard-Paul Leyson</surname>
            , Peter Knees, Yashar Deldjoo, Farshad Bakhshandegan Moghaddam, Julia Neidhardt, Wolfgang Wörndl, and
            <given-names>Philipp</given-names>
          </string-name>
          <string-name>
            <surname>Monreal</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Session-Based Hotel Recommendations: Challenges and Future Directions</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>00071</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Deepak</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bee-Chung</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Pradheep Elango, and
          <string-name>
            <given-names>Raghu</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Content recommendation on web portals</article-title>
          .
          <source>Commun. ACM 56</source>
          ,
          <issue>6</issue>
          (
          <year>2013</year>
          ),
          <fpage>92</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Oren</given-names>
            <surname>Barkan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Noam</given-names>
            <surname>Koenigstein</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Item2vec: neural item embedding for collaborative filtering</article-title>
          .
          <source>In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)</source>
          .
          <source>IEEE</source>
          , 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Mostafa</given-names>
            <surname>Bayomi</surname>
          </string-name>
          , Annalina Caputo, Matthew Nicholson, Anirban Chakraborty, and
          <string-name>
            <given-names>Sèamus</given-names>
            <surname>Lawless</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>CoRE: a cold-start resistant and extensible recommender system</article-title>
          .
          <source>In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing</source>
          .
          <volume>1679</volume>
          -
          <fpage>1682</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Caselles-Dupré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Florian</given-names>
            <surname>Lesaint</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jimena</surname>
          </string-name>
          Royo-Letelier.
          <year>2018</year>
          .
          <article-title>Word2vec Applied to Recommendation: Hyperparameters Matter</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems</source>
          (Vancouver, British Columbia, Canada) (
          <source>RecSys '18)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>352</fpage>
          -
          <lpage>356</lpage>
          . https://doi.org/10.1145/3240323.3240377
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Covington</surname>
          </string-name>
          , Jay Adams, and
          <string-name>
            <given-names>Emre</given-names>
            <surname>Sargin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep Neural Networks for YouTube Recommendations</article-title>
          .
          <source>In Proceedings of the 10th ACM Conference on Recommender Systems</source>
          (Boston, Massachusetts, USA) (
          <source>RecSys '16)</source>
          . ACM, New York, NY, USA,
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          . https://doi.org/10.1145/2959100.2959190
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kallirroi</given-names>
            <surname>Dogani</surname>
          </string-name>
          , Matteo Tomassetti, Sofie De Cnudde, Saúl Vargas, and
          <string-name>
            <given-names>Ben</given-names>
            <surname>Chamberlain</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Learning Embeddings for Product Size Recommendations</article-title>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Xavier</surname>
            <given-names>Glorot</given-names>
          </string-name>
          , Antoine Bordes, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Deep sparse rectifier neural networks</article-title>
          .
          <source>In Proceedings of the fourteenth international conference on artificial intelligence and statistics</source>
          . 315-
          <fpage>323</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Mihajlo</given-names>
            <surname>Grbovic</surname>
          </string-name>
          and Haibin Cheng.
          <year>2018</year>
          .
          <article-title>Real-Time Personalization Using Embeddings for Search Ranking at Airbnb</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (London, United Kingdom) (KDD '18)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>311</fpage>
          -
          <lpage>320</lpage>
          . https://doi.org/10.1145/3219819.3219885
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mihajlo</surname>
            <given-names>Grbovic</given-names>
          </string-name>
          , Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and
          <string-name>
            <given-names>Doug</given-names>
            <surname>Sharp</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>E-commerce in your inbox: Product recommendations at scale</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source>
          . 1809-
          <fpage>1818</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Rachid</surname>
            <given-names>Guerraoui</given-names>
          </string-name>
          , Erwan Le Merrer, Rhicheek Patra, and
          <string-name>
            <surname>Jean-Ronan Vigouroux</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Sequences, items and latent links: Recommendation with consumed item packs</article-title>
          .
          <source>arXiv preprint arXiv:1711.06100</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Gutmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aapo</given-names>
            <surname>Hyvärinen</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Noise-contrastive estimation: A new estimation principle for unnormalized statistical models</article-title>
          .
          <source>In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics</source>
          .
          <volume>297</volume>
          -
          <fpage>304</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Will</surname>
            <given-names>Hamilton</given-names>
          </string-name>
          , Zhitao Ying, and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Inductive Representation Learning on Large Graphs</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R.
          <source>Garnett (Eds.)</source>
          , Vol.
          <volume>30</volume>
          . Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Casper</surname>
            <given-names>Hansen</given-names>
          </string-name>
          , Christian Hansen, Lucas Maystre, Rishabh Mehrotra, Brian Brost, Federico Tomasi, and
          <string-name>
            <given-names>Mounia</given-names>
            <surname>Lalmas</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Contextual and Sequential User Embeddings for Large-Scale Music Recommendation</article-title>
          .
          <source>In Fourteenth ACM Conference on Recommender Systems (Virtual Event</source>
          , Brazil) (
          <source>RecSys '20)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          . https://doi.org/10.1145/3383313.3412248
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Delving deep into rectifiers: Surpassing human-level performance on imagenet classification</article-title>
          .
          <source>In Proceedings of the IEEE international conference on computer vision</source>
          . 1026-
          <fpage>1034</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Peng</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Rong Du, Yao Hu, and
          <string-name>
            <given-names>Nan</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2019</year>
          . Du., R.:
          <article-title>Hybrid item-item recommendation via semi-parametric embedding</article-title>
          .
          <source>In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI</source>
          .
          <fpage>10</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Xiaowen</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Shengsheng Qian, Quan Fang, Jitao Sang, and
          <string-name>
            <given-names>Changsheng</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Csan: Contextual self-attention network for user sequential recommendation</article-title>
          .
          <source>In Proceedings of the 26th ACM international conference on Multimedia</source>
          .
          <volume>447</volume>
          -
          <fpage>455</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Schockaert</surname>
            <given-names>Jeawak</given-names>
          </string-name>
          , Jones.
          <year>2018</year>
          .
          <article-title>Embedding Geographic Locations for Modelling the Natural Environment using Flickr Tags and Structured Data</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>12091</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Wang-Cheng Kang</surname>
          </string-name>
          and
          <string-name>
            <surname>Julian McAuley</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Self-attentive sequential recommendation</article-title>
          .
          <source>In 2018 IEEE International Conference on Data Mining (ICDM)</source>
          . IEEE,
          <fpage>197</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Buket</given-names>
            <surname>Kaya</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Hotel recommendation system by bipartite networks and link prediction</article-title>
          .
          <source>Journal of Information Science</source>
          (
          <year>2019</year>
          ),
          <fpage>0165551518824577</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Yehuda</surname>
            <given-names>Koren</given-names>
          </string-name>
          , Robert Bell, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Matrix Factorization Techniques for Recommender Systems</article-title>
          .
          <source>Computer 42</source>
          , 8 (Aug.
          <year>2009</year>
          ),
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          . https://doi.org/10.1109/
          <string-name>
            <surname>MC</surname>
          </string-name>
          .
          <year>2009</year>
          .263
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Adam</surname>
            <given-names>Lerer</given-names>
          </string-name>
          , Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Peysakhovich</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>PyTorch-BigGraph: A Large-scale Graph Embedding System.</article-title>
          .
          <source>In Proceedings of the 2nd SysML Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Jiahui</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Dolan</surname>
          </string-name>
          , and Elin Rønby Pedersen.
          <year>2010</year>
          .
          <article-title>Personalized news recommendation based on click behavior</article-title>
          .
          <source>In Proceedings of the 15th international conference on Intelligent user interfaces. ACM</source>
          ,
          <volume>31</volume>
          -
          <fpage>40</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Gengchen</surname>
            <given-names>Mai</given-names>
          </string-name>
          , Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, and
          <string-name>
            <given-names>Ni</given-names>
            <surname>Lao</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells</article-title>
          . In International Conference on Learning Representations. https://openreview.net/forum?id=rJljdh4KDH
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Leland</surname>
            <given-names>McInnes</given-names>
          </string-name>
          , John Healy, and
          <string-name>
            <given-names>James</given-names>
            <surname>Melville</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction</article-title>
          . arXiv:
          <year>1802</year>
          .
          <article-title>03426 [stat</article-title>
          .ML]
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Zhiqiang</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Fei Cai, Yanxiang Ling, and Maarten de Rijke.
          <year>2020</year>
          .
          <article-title>Rethinking Item Importance in Session-Based Recommendation</article-title>
          . Association for Computing Machinery, New York, NY, USA,
          <fpage>1837</fpage>
          -
          <lpage>1840</lpage>
          . https://doi.org/10.1145/3397271.3401274
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Michael</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Pazzani</surname>
            and
            <given-names>Daniel</given-names>
          </string-name>
          <string-name>
            <surname>Billsus</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Content-based recommendation systems</article-title>
          .
          <source>In The adaptive web</source>
          . Springer,
          <fpage>325</fpage>
          -
          <lpage>341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          . http://www.aclweb.org/anthology/D14-1162
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Popel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ondřej</given-names>
            <surname>Bojar</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Training tips for the transformer model</article-title>
          .
          <source>The Prague Bulletin of Mathematical Linguistics</source>
          <volume>110</volume>
          ,
          <issue>1</issue>
          (
          <year>2018</year>
          ),
          <fpage>43</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Barbara</surname>
            <given-names>Rychalska</given-names>
          </string-name>
          , Piotr Bąbel, Konrad Gołuchowski, Andrzej Michałowski, and
          <string-name>
            <given-names>Jacek</given-names>
            <surname>Dąbrowski</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Cleora: A Simple, Strong and Scalable Graph Embedding Scheme</article-title>
          . arXiv:
          <volume>2102</volume>
          .02302 [cs.LG]
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Loveperteek</surname>
            <given-names>Singh</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shreya Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sagar</given-names>
            <surname>Arora</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Sumit</given-names>
            <surname>Borar</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>One Embedding To Do Them All</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>12120</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Yi</surname>
            <given-names>Tay</given-names>
          </string-name>
          , Anh Tuan Luu, and Siu Cheung Hui.
          <year>2018</year>
          .
          <article-title>Multi-pointer co-attention networks for recommendation</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          .
          <fpage>2309</fpage>
          -
          <lpage>2318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Flavian</surname>
            <given-names>Vasile</given-names>
          </string-name>
          , Elena Smirnova, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Meta-prod2vec: Product embeddings using side-information for recommendation</article-title>
          .
          <source>In Proceedings of the 10th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>225</volume>
          -
          <fpage>232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Shoujin</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Liang Hu, Longbing Cao, Xiaoshui Huang,
          <string-name>
            <given-names>Defu</given-names>
            <surname>Lian</surname>
          </string-name>
          , and Wei Liu.
          <year>2018</year>
          .
          <article-title>Attention-based transactional context embedding for next-item recommendation</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Shu</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Yuyuan Tang, Yanqiao Zhu, Liang Wang,
          <string-name>
            <surname>Xing Xie</surname>
            , and
            <given-names>Tieniu</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Session-based recommendation with graph neural networks</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , Vol.
          <volume>33</volume>
          .
          <fpage>346</fpage>
          -
          <lpage>353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Da</surname>
            <given-names>Xu</given-names>
          </string-name>
          , Chuanwei Ruan, Evren Körpeoglu, Sushant Kumar, and
          <string-name>
            <given-names>Kannan</given-names>
            <surname>Achan</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Modeling Complementary Products and Customer Preferences with Context Knowledge for Online Recommendation</article-title>
          .
          <source>In WSDM</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Lu</surname>
            <given-names>Yu</given-names>
          </string-name>
          , Chuxu Zhang, Shangsong Liang,
          <string-name>
            <given-names>and Xiangliang</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Multi-order attentive ranking model for sequential recommendation</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , Vol.
          <volume>33</volume>
          .
          <fpage>5709</fpage>
          -
          <lpage>5716</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Shuai</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yongbo</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Min Yang</surname>
            ,
            <given-names>Baocheng</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Qiang</given-names>
          </string-name>
          <string-name>
            <surname>Qu</surname>
            , and
            <given-names>Jialie</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>NAIRS: A neural attentive interpretable recommendation system</article-title>
          .
          <source>In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining</source>
          .
          <fpage>790</fpage>
          -
          <lpage>793</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Angelina</given-names>
            <surname>Ziesemer</surname>
          </string-name>
          and
          <string-name>
            <given-names>J</given-names>
            <surname>Oliveira</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>How to know what do you want? a survey of recommender systems and the next generation</article-title>
          .
          <source>In Proceedings of the Eighth Brazilian Symposium on Collaborative Systems, SBSC</source>
          .
          <fpage>104</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>