<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recommending Accommodation Filters with Online Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucas Bernardi</string-name>
          <email>lucas.bernardi@booking.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matias Eidis</string-name>
          <email>matias.eidis@booking.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Estevez</string-name>
          <email>pablo.estevez@booking.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eqbal Osama</string-name>
          <email>eqbal.osama@booking.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Booking.com</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Online Accommodations Platforms match guests searching for accommodation with hospitality service providers. A fundamental characteristic of eficient platforms is the ability to satisfy the needs and preferences of the guests. To achieve this goal, a common search tool is the Results Filtering capability which allows users to refine query results with explicit criteria. However, as supply grows and diversifies, more filtering options become available, reaching hundreds of diferent criteria for one query, and making it hard for customers to find the ones that are relevant to them. In this work we present the implementation of an Accommodation Filters Recommender System addressing this issue. The problem poses several challenges around recommendations feedback, user experience constraints, and non stationarity among others. We provide an end-to-end description of the System, discuss implementation issues and provide techniques to address them including a large scale distributed online learning architecture. The solution was validated through several Online Controlled Experiments performed in Booking.com, a top Online Travel Agency serving millions of daily users, showing statistically significant results on various user behaviour metrics indicating a strong positive efect on User Engagement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Applied computing → Online shopping; E-commerce
infrastructure; • Information systems → Collaborative filtering ;
Content ranking; Search interfaces.
recommender systems, online machine learning, information
filtering, distributed systems
Reference Format:
Lucas Bernardi, Pablo Estevez, Matias Eidis, and Eqbal Osama. 2020.
Recommending Accommodation Filters with Online Learning. In 3rd Workshop
on Online Recommender Systems and User Modeling (ORSUM 2020), in
conjunction with the 14th ACM Conference on Recommender Systems, September
25th, 2020, Virtual Event, Brazil.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        With the advent of e-commerce, customers have access to a broad
supply when searching for an item to purchase, which brings many
benefits, but also choice and information overload [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
prevalence of these issues in tourism has been highlighted in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] gives an extensive revision of the literature on this topic.
Given this context, the ability to apply filters such as Pet Friendly
Hotels or Breakfast Included is an important tool benefiting all
parties, helping users to browse a large supply of options as well as
accommodation suppliers to better market their services and also
the platform, by eliciting explicit guests preferences which can be
used to further personalize the user experience, potentially
increasing the probability of a purchase. On the other hand, the constant
growth of available filters due to supply diversification (such as
vacation rentals) and the increase of product details available to
use as filtering criteria, defeats the very purpose of this tool. This
motivates the need of filter recommendations that allow users to
refine their query without scanning a long list of options. These
recommendations can be displayed for example, on the side of the
Search Results Page as a quick access shortlist of filters before the
full list of options. In this paper we describe a Recommender
System for the case of Accommodation Filters. The problem poses
several common challenges like Large Scale, Continuous Cold Start
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Sparsity among others. Previous work has addressed some
of these issues in similar scenarios (e.g. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), but
our setting deviates from them and the standard Recommender
Systems or Information Retrieval settings, mainly because the item
space, the filters, are not what users are interested in, but a means
to find accommodations that fit their preferences. This brings
several uncommon challenges around feedback and user experience
consistency. Our work focuses on the design of an integral system
considering all the intricacies of a full recommender system relying
on well established and efective techniques, putting focus on their
composition into a fully functional system that allows us to
experiment with diferent approaches. The contributions of this work
are:
• A thorough description of a Recommender System
successfully deployed in Booking.com helping millions of new users
daily to browse millions of accommodations options
• An architecture for Online Recommender Systems capable
of producing a clean, well-formed and reliable data stream,
suitable for incremental learning.
• An algorithm-agnostic architecture for Large Scale Distributed
      </p>
      <p>Online Learning
• Online Controlled Experiments that demonstrate the
efectiveness of our approach at improving the shopping
experience through more efective filtering actions
The paper is organized as follows: Section 2 presents the problem,
main challenges and related work. Section 3 describes the solution
framework. Section 4 details the system architecture while Section
5 focuses on a Machine Learning Model as implemented in
Booking.com. Section 6 reports experiments on their website and Section
7 concludes.
2</p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM STATEMENT AND RELATED</title>
    </sec>
    <sec id="sec-4">
      <title>WORK</title>
      <p>Our goal is to construct a system that recommends
Accommodation Filters maximizing the utility users get from them with the
following requirements: ability to quickly on-board new filters
under specific User Experience Guidelines, in particular, browsing
consistency; it must scale to hundreds of millions of daily users and
thousands of filters; it must be available at all times, globally, with
latency &lt; 100ms. The problem poses many challenges, we highlight
three of them that we consider of particular relevance and strongly
influenced the design of our architecture.</p>
      <p>
        User Feedback: Implicit, Delayed and Split. We elicit Filter Utility
from implicit signals based on user behaviour such as applying
a filter, clicking on an accommodation to see further details, or
completing a reservation. Several characteristics of these signals
make learning from them challenging. First, the feedback is delayed
with delays ranging from minutes to days, and since the negative
feedback can only be implicitly assumed by the lack of positive
feedback after certain time elapsed, it is necessary to model the
delay (at least in principle), this has been studied by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and more
recently by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and references therein. Delayed Feedback requires
to keep track of the state of each impression which at the scale of
large e-commerce platforms is an engineering challenge. Another
issue is the Split Feedback Problem: in the classic recommender
system setting, the feedback indicates both the degree of satisfaction
of a specific item, and which item is receiving such feedback. In
our case, these two pieces of information are split; first we observe
that the user applied a filter, but only later, when and if the user
clicks through or completes a transaction we observe the utility.
To the best of our knowledge this problem has not been explicitly
addressed by the relevant literature.
      </p>
      <p>
        User Experience Consistency. Filter recommendations impose
certain consistency constraints. In principle, it is critical that the
recommendations don’t flicker with each page load, but there are some
nuances since in some situations the system must return the same
recommendations (e.g. right after the user applied a filter), but in
other situations the system can return new recommendations (e.g.
when the query dates change), and in others the system must return
new recommendations (e.g. when the destination changes). Which
variables trigger a new recommendation is a design choice that
trades of User Experience Consistency with number of samples to
learn from. Related work like [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] focuses on recommendations
consistency under small deviations of the user profile while [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] studies
the efect of recommending the same items multiple times and
focuses on the diversification-accuracy trade of. To our knowledge,
no previous work focused specifically on guaranteeing the system
is compliant with the User Experience Consistency constraints.
      </p>
      <p>
        Non Stationarity. Stationarity is a rather strong assumption. For
example, the set of matching results after applying a filter depends
on the number of available rooms, which are constantly booked,
cancelled, and replenished. A filter giving very few results might
suddenly match many options, drastically afecting its utility.
Experiments in other areas such as the Ranking Algorithm or the
User Interface (UI) also afect filters utility. Last, new filters
affect the utility of other filters, for example, introducing a Family
Friendly Property filter, reduced the utility of Family related
facilities. Non Stationarity motivates exploration which introduces the
exploration-exploitation trade of. This has been largely studied by
the contextual-bandits literature in for example [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and more
recently by [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and references therein.
      </p>
      <p>
        All these issues (and others, omitted due to space limitations)
motivate the construction of a complex system that enables their
systematic treatment. As proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we adopt the Online
Recommender System Setting that considers an event stream
produced by user interactions, and relies on incremental algorithms.
The following sections describe the architecture including
components that produce a reliable data stream and components that
host general incremental algorithms at scale, highlighting how they
contribute to the solution of the Accommodation Filters
Recommendations.
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>SOLUTION FRAMEWORK</title>
      <p>The main abstraction in our system is the Feedback Loop, which
models the interactions between the UI and our Recommender
System. In its simplest version a Feedback Loop is defined by the
following sequence:
(1) The UI requests recommendations (opens the loop).
(2) The System recommends filters based on the query, context
and filter features.
(3) The UI provides feedback (closes the loop).</p>
      <p>(4) The System uses closed loops to update the model.
This simple framework captures the dynamic nature of the problem
providing flexibility to experiment with diferent ways to address
the aforementioned issues and abstracting away specific
implementations. The Feedback Loop sequence needs to be extended in order
to address two issues. The first one is Censored Recommendations,
which occurs when filters are not seen by the user (e.g. because
they didn’t scroll enough) introducing ambiguity since the lack of
feedback might be caused by either dissatisfaction or by
censorship. To solve this issue we introduce another step in between 2
and 3: the exposure report, which notifies the System that a user
was exposed to a filter, indicating that lack of positive feedback
implies negative feedback. If no exposure report is received the
observation is not used for learning since the user never saw the
recommendation. The second issue, rather specific to our setting,
is Split Feedback which appears when the item and the level of
satisfaction are reported separately. For example if the user applies
a filter and then completes a reservation we would like to credit
that filter with positive feedback. But these two events are usually
separated by many other browsing events (clicking back and forth
the search results page), making it hard at reservation time to know
which filters where applied by the user, and therefore impossible to
send a feedback report. Another instance occurs when diferent UI
components know diferent pieces of the feedback. This is the case
of after filtering click-through : we would like to credit an applied
iflter if it leads to a click on a details card. Which filter is applied is
known by the Search Engine but whether a click happens or not,
is known by the Details Page Service which has no information
about the applied filters. This is addressed by a technique that we
call Confirmable Feedback , which splits the feedback report (step
3): first, the user interface reports unconfirmed feedback as soon as
the filter is applied, indicating which item might be credited. The
System waits for confirmation before using it to learn. The user
interface sends the confirmation for example when a reservation
is made or a details page loaded, without the need to know which
iflters were applied. As a result of a confirmation the loop is closed
and the system can learn from it. If no confirmation is sent (e.g.
the user applied a filter but no click or reservation happened), the
system ignores the unconfirmed feedback. Finally, if a confirmation
is received but not feedback was sent before, the confirmation is
ignored.</p>
      <p>The main benefit of this framework is that the state of a loop
is maintained by the System, the UI is stateless and relies on four
simple primitives: open loop, report exposure, report feedback and
confirm feedback, requiring minimal interventions in the UI
software which is a key factor to successfully deploy the full system
into production.
4</p>
    </sec>
    <sec id="sec-6">
      <title>ARCHITECTURE</title>
      <p>The Architecture implements the Feedback Loops framework and
consists of two main components described below.</p>
      <p>
        Instrumentation Layer. This component implements the full
Feedback Loops sequence, it is used by the UI to integrate the
recommendations in the platform. Its first responsibility is serving
recommendations requests using the Machine Learning Model (ML
Model) and providing User Experience Consistency guarantees.
User Experience Consistency presents a trade-of with sampling
eficiency which is addressed by a technique we call Contextual
Caching: the first time a specific user requests for
recommendations, the ML Model is invoked, a new loop is initialized, and the
recommended filters stored in a cache together with the query,
context and filter features. Subsequent requests are served from
the cache, but if at least one feature included in the Refresh Trigger
Feature Set changes, the loop is finalized, the cache is invalidated,
and a new loop is initialized with a new request to the ML Model.
Notice that at most one loop is active for a given user at any point
in time, and that one loop encapsulates information about the set
of recommended filters. This approach allows us to control the
trade-of by specifying which features must trigger a loop reset.
The second responsibility is to produce a data stream based on the
interactions with the UI, suitable to train a ML Model in an online
fashion. The instrumentation layer maintains a state machine for
each loop consisting of all the feature values when the loop was
opened, which filters were recommended, which ones were actually
seen by the user, their expiration status, the feedback received so far,
etc. This state is updated as the UI reports exposure and feedback.
From this transitions a data stream is produced containing all the
information needed to train a machine learning model. This process
is very dependent on the way we handle Delayed Feedback. We
considered two approaches: Fully Delayed, where the state machine
waits for a fixed period before assuming negative feedback, and
Fake Negatives Calibration where the negative feedback is assumed
on exposure as described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In our experiments both methods
were efective showing no significant diference. We favored Fake
Negatives Calibration due to its robustness to changes in the delay
distribution.
      </p>
      <p>The life-cycle of a loop is enforced prohibiting invalid state
transitions. This brings high robustness to chaotic UI interactions
(users refreshing pages, clicking copied links, multiple browser tabs,
etc.), guaranteeing a well formed stream of events that can be used
to train robust machine learning models in an online fashion.</p>
      <p>Distributed Online Machine Learning. This component is
responsible for maintaining a model to recommend filters when requested
by the Instrumentation Layer and for updating it as soon as
feedback is available. In order to handle high requests volume, we
distribute the model in a cluster, each node runs one model
instance serving a random sample of the recommendations requests
(sharding) while learning from the full feedback stream produced
by the Instrumentation Layer (replication), which is consumed from
a persistent message queue. The feedback is consumed in the order
it is produced, so although each node learns independently without
node-node interaction, all the models are exact replicas. To achieve
high availability, a special node saves checkpoints of the model to
a persistent storage. The checkpoint contains the serialized model,
learning state and a pointer to the last feedback message processed
in the queue. If a node fails, a new one is created which reads
the latest available version from the checkpoint and continues the
learning process from the corresponding point in the stream.</p>
      <p>This design is agnostic from concrete learning algorithms.
Specific implementations can be constructed with little attention to
fault tolerance, high availability and latency. This is an important
advantage when deploying models to production since it simplifies
algorithm development and debugging. Another important
consequence is that all changes have an immediate efect, accelerating
experimentation allowing us to quickly asses modeling techniques
such as the Refresh Trigger Feature Set, the Delayed Feedback
approach, etc. ultimately streamlining the iterative process.
5</p>
    </sec>
    <sec id="sec-7">
      <title>MACHINE LEARNING MODEL</title>
      <p>
        The model must select ~10 Filters out of ~20000 optimizing the total
Utility. The latency requirement is strict, if fulfilling a request takes
more than 100ms, it is canceled by the UI and no recommendations
are shown to the user. This limit involves all the steps including
network latency, Contextual Caching and Ranking. Because of this, we
favor simple, fast and scalable models, in particular we rely on the
library Vowpal Wabbit[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] embedded in an on-line process. We use
a point-wise model estimating the expected utility conditioned on
context, query and filter features. At recommendation time we rank
by the estimated expected utility modeled with logistic regression.
Most of the users are not logged-in while browsing, which means
user history is not available, therefore we rely on query, context and
item features. Example Query Features are number of guests,
destination, number of children, anticipation, length of stay, traveler
type (couple, family, etc.); Contextual Features include temporal
and geographical attributes; and some item features are Category,
Id and Number of Matching Properties. The Refresh Trigger Feature
Set contains all the query features except anticipation plus all the
temporal features. Quadratic interaction features are created
between the Filter and the Query and Context Features, keeping the
linear and independent terms. The full model with Fake Negatives
Calibration[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is given by Equation 3, where r ∈ {0, 1} is the binary
output, f , q and c are the Filter, Query and Context feature vectors
with dimensions df , dq and dc respectively. All the Greek letters
are model parameters. θ0 ∈ R is the global bias, θ ∈ Rdf , ω ∈ Rdq ,
ϕ ∈ Rdc are linear parameters, α ∈ Rdq xdf and β ∈ Rdc xdf are the
interaction weights of Query and Context with Filter features.
z ( f , q, c ) = θ0 +
df dq dc
X fi θi + X qi ωi + X ci ϕi
i i i
dq df dc df
+ X X qi fj αi j + X X ci fj βi j
      </p>
      <p>i j i j
b ( f , q, c ) =</p>
      <p>1
1 + e−z (f ,q,c )
pˆ(r = 1| f , q, c ) =</p>
      <p>b ( f , q, c )
1 − b ( f , q, c )
(1)
(2)
(3)</p>
      <p>
        The filter representation includes the unique id which allows
the model to learn the specifics of the particular filter option, the
number of matching options, which captures contextual utility (a
generally very useful filter in a context where it matches many
properties might be less useful) and the category (e.g. Facilities)
that captures general properties across filters in the same category,
which is efective to address the cold start problem: when a new
filter is added, it inherits what is known about its category. In practice,
all the features except Number of Matching Properties are
categorical and are encoded using the hashing trick [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. We estimated
about 8 millions features (including interactions), following [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] we
used a hashing space of 228 buckets which resulted in about 3%
collision rate. All parameters are learned through Stochastic Gradient
Descent with constant learning rate and Normalized Updates [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
(Algorithm 1).
      </p>
      <p>
        To address non stationarity, we rely on active exploration for
which we adopt the contextual ϵ-greedy algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in which
a proportion ϵ of the recommendation requests (after Contextual
Caching) is served with random uniform recommendations
(exploration), and the rest by choosing the best according to the
estimations of the model (exploitation). Two small adaptations are
required for our case. First, since our system recommends many
items, the exploration branch is computed by sampling from a
uniform distribution between 0 and 1 for each candidate filter, and
returning the top-k filters. The exploitation branch, simply sorts
the candidate filters by their estimated utility given the context and
query, and returns the top-k. Second, the model cannot be updated
right after the recommendations are made since the feedback is
delayed. We update the model independently from the
recommendations requests, as soon as the feedback is available. All weights
are initialized with zero and ties are broken randomly uniformly
(see Algorithm 1).
      </p>
      <p>Algorithm 1 top-k Delayed Contextual ϵ-Greedy
ϵ: exploration factor
k: number of filters to recommend
c, q: context and query features
F : set of candidate filters
procedure recommendFilters(ϵ, k, c, q, F )</p>
      <p>With probability ϵ: ▷ (explore)
for each filter Fi in F with features f do</p>
      <p>scores[Fi ] ← sample from Uniform(0,1)
end for
or with probability 1 − ϵ: ▷ (exploit)
for each filter Fi in F with features f do</p>
      <p>scores[Fi ] ← pˆ(r = 1| f , q, c ) ▷ as given by Eq. 3
end for
return topk(scores, k) ▷ (breaks ties uniformly)
end procedure
c, q: context and query features, f : rewarded filter, ro : observed
feedback
procedure onFeedbackAvailable(c, f , q, ro )</p>
      <p>
        Update pˆ(r=1|c, q, f) with ro ▷ detailed update rule in
Algorithm 1 in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ])
end procedure
6
      </p>
    </sec>
    <sec id="sec-8">
      <title>EXPERIMENTS</title>
      <p>We performed several Online Controlled Experiments to validate
our approach. We highlight 2 where our model was used to feed
a quick filters section of the search results page of Booking.com.
50% of the trafic was exposed to the current baseline model, and
50% to our new model. Statistical significance was computed at 90%
confidence (two-sided) using g-test at a predefined time duration.
In these experiments we used click after filtering as feedback with
ϵ = 2%. The baseline is a popularity model based on the same
features with a thick layer of business logic on top, blacking out
some filters, up-ranking others, etc. It is based on years of data
on filter usage and it is updated manually at arbitrary moments.
It is considered a robust baseline since many attempts of using
more complex models failed in the past. The metrics of interest for
these experiments are: Overall Filter Usage (proportion of users
using at least one filter), Recommended Filter Usage (proportion of
users using at least one recommended filter), After Filtering Click
Through Rate (AFCTR, proportion of users filtering (any filter) and
landing on a property detail page), Recommended Filter Utility
(ratio between number of users applying a recommended filter and
number of users applying any filter). All these metrics indicate the
utility of the recommendations from the users point of view. In
particular, Recommended Filter Utility is a strong indicator since it
quantifies how easy is for customers to find a relevant filter.</p>
      <p>First, an experiment was run for two weeks to validate the
technical health of the system. The variance of the cluster was very
low, (median 8.2e-6, 99th percentile 0.0012) indicating that the
models are indeed close replicas. Figure 2 shows the hourly average
for the first 2 weeks of the experiment. The peaks around hour
220 are due to one of the nodes lagging behind (due to uneven
resource allocation which is dynamic and not uniform across the
cluster), but quickly caught up. The model starts with all the weights
set to 0 (making random recommendations), so we measured the
time to learn reasonable recommendations defined as the moment
where the AFCTR matches the baseline (y=0 in Figure 3) which
was roughly 20 hours. This is evidence of the ability of the model
to efectively incorporate new filters and contexts since at the
beginning, all filters and features are new for the model. Logarithmic
loss (Figure 1) showed a periodic pattern, likely following the
general probability of clicking after filtering during the day. The peaks
are likely due to changes in the environment. The general trend is
negative, suggesting that the system is able to adapt to changes. We
remark that many new filters were added while this experiment was
running, and some of them were consistently picked in the top 10.
Regarding the latency requirement, we observed a degradation in
total page load time of 15ms which is inline with the requirements.
We let the experiment run for two more weeks to make sure results
are stable.</p>
      <p>In a second experiment we stress-test the adaptability of the
system by allowing automated trafic from scrapers and crawlers
for a few days during week 6 which completely changes the utility
of almost all filters (the proportion of automated trafic was
significant). The system degraded but after normalization of the trafic it
recovered, we interpret this as evidence of adaptability to changes
in the environment. This behaviour is depicted in Figure 4.</p>
      <p>Results of both experiments are summarized in Table 1. From
these results we conclude that our system is able to make
recommendations that are useful for our customers and superior to the
ones by the baseline model. We also conclude that the system is
reliable, stable and robust, meeting all the requirements specified
in Section 2.
7</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION</title>
      <p>This work presented a Recommender System for Accommodation
Filters, a relevant problem for Booking.com and other e-commerce
platforms. Our solution features the implementation of well
established techniques for which practical aspects were discussed
in detail, and new ideas addressing several setting-specific
problems. The Feedback Loops Framework and the Distributed Online
Learning Learning Architecture allowed us to address requirements
and trade-ofs in a systematic way enabling a fast iterative process.
The efectiveness of our solution was demonstrated by Online
Controlled Experiments conducted in Booking.com, on millions of users
and accommodations options, showing clear positive efects on User
Engagement. Table 2 summarizes challenges and techniques
addressing them. Looking forward, the presented system will allow us
to experiment with more advanced online learning algorithms such
as tree based models and more sophisticated exploration techniques
and to solve new and diferent business cases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Bernardi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Don't be tricked by the Hashing Trick</article-title>
          . https://booking.ai/ dont-be
          <article-title>-tricked-by-the-hashing-trick-</article-title>
          192a6aae3087
          <source>Retrieved</source>
          <year>2020</year>
          -
          <volume>06</volume>
          -16.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Bernardi</surname>
          </string-name>
          , Jaap Kamps,
          <source>Julia Kiseleva, and Melanie JI Müller</source>
          .
          <year>2015</year>
          .
          <article-title>The continuous cold start problem in e-commerce recommender systems</article-title>
          .
          <source>arXiv preprint arXiv:1508.01177</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Bernardi</surname>
          </string-name>
          , Themistoklis Mavridis, and
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Estevez</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>150 successful machine learning models: 6 lessons learned at booking. com</article-title>
          .
          <source>In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          .
          <fpage>1743</fpage>
          -
          <lpage>1751</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Shiyu</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Jiliang Tang, Dawei Yin,
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Mark A HasegawaJohnson,</article-title>
          and Thomas S Huang.
          <year>2017</year>
          .
          <article-title>Streaming recommender systems</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web</source>
          .
          <fpage>381</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Chapelle</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Modeling Delayed Feedback in Display Advertising</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          (New York, New York, USA) (
          <article-title>KDD âĂŹ14)</article-title>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>1097âĂŞ1105</year>
          . https://doi.org/ 10.1145/2623330.2623634
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>David</given-names>
            <surname>Cortes</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Adapting multi-armed bandits policies to contextual bandits scenarios</article-title>
          . ArXiv abs/
          <year>1811</year>
          .04383 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Controlling Consistency in Top-N Recommender Systems</article-title>
          .
          <source>In 2010 IEEE International Conference on Data Mining Workshops</source>
          .
          <fpage>919</fpage>
          -
          <lpage>926</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Basak</given-names>
            <surname>Denizci</surname>
          </string-name>
          <string-name>
            <surname>Guillet</surname>
          </string-name>
          , Anna Mattila, and
          <string-name>
            <given-names>Lisa</given-names>
            <surname>Gao</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The efects of choice set size and information filtering mechanisms on online hotel booking</article-title>
          .
          <source>International Journal of Hospitality Management</source>
          (
          <year>2019</year>
          ),
          <fpage>102379</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Xinran</given-names>
            <surname>He</surname>
          </string-name>
          , Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich,
          <string-name>
            <given-names>Stuart</given-names>
            <surname>Bowers</surname>
          </string-name>
          , et al.
          <year>2014</year>
          .
          <article-title>Practical lessons from predicting clicks on ads at facebook</article-title>
          .
          <source>In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. 1-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Sheena</surname>
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Iyengar and Mark R Lepper</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>When choice is demotivating: Can one desire too much of a good thing?</article-title>
          <source>Journal of personality and social psychology 79</source>
          ,
          <issue>6</issue>
          (
          <year>2000</year>
          ),
          <fpage>995</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Rishabh</surname>
            <given-names>Iyer</given-names>
          </string-name>
          , Nimit Acharya, Tanuja Bompada, Denis Charles, and
          <string-name>
            <given-names>Eren</given-names>
            <surname>Manavoglu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A Unified Batch Online Learning Framework for Click Prediction</article-title>
          . arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>04673</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszár, Steven Yoo, and
          <string-name>
            <given-names>Wenzhe</given-names>
            <surname>Shi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Addressing delayed feedback for continuous training with neural networks in CTR prediction</article-title>
          .
          <source>In Proceedings of the 13th ACM Conference on Recommender Systems</source>
          .
          <volume>187</volume>
          -
          <fpage>195</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>John</surname>
            <given-names>Langford</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Lihong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Alex</given-names>
            <surname>Strehl</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Vowpal wabbit online learning project</article-title>
          . http://hunch.net/?p=
          <fpage>309</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>John</surname>
            <given-names>Langford and Tong</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>The epoch-greedy algorithm for contextual multi-armed bandits</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Neural Information Processing Systems</source>
          . Citeseer,
          <volume>817</volume>
          -
          <fpage>824</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Neal</surname>
            <given-names>Lathia</given-names>
          </string-name>
          , Stephen Hailes, Licia Capra, and
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Amatriain</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Temporal diversity in recommender systems</article-title>
          .
          <source>In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval</source>
          .
          <volume>210</volume>
          -
          <fpage>217</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Cheng</given-names>
            <surname>Li</surname>
          </string-name>
          , Yue Lu, Qiaozhu Mei,
          <string-name>
            <given-names>Dong</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Sandeep</given-names>
            <surname>Pandey</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Clickthrough prediction for advertising in twitter timeline</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          .
          <year>1959</year>
          -
          <fpage>1968</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Lihong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Wei Chu</surname>
          </string-name>
          , John Langford, and
          <string-name>
            <given-names>Robert E</given-names>
            <surname>Schapire</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A contextualbandit approach to personalized news article recommendation</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web. 661-670.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jeong-Yeol Park</surname>
          </string-name>
          and SooCheong Shawn Jang.
          <year>2013</year>
          .
          <article-title>Confused by too many choices? Choice overload in tourism</article-title>
          .
          <source>Tourism Management</source>
          <volume>35</volume>
          (
          <year>2013</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Stéphane</surname>
            <given-names>Ross</given-names>
          </string-name>
          , Paul Mineiro,
          <string-name>
            <given-names>and John</given-names>
            <surname>Langford</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Normalized online learning</article-title>
          .
          <source>arXiv preprint arXiv:1305.6646</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Nguyen</surname>
            <given-names>T</given-names>
          </string-name>
          <string-name>
            <surname>Thai</surname>
            and
            <given-names>Ulku</given-names>
          </string-name>
          <string-name>
            <surname>Yuksel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>What can tourists and travel advisors learn from choice overload research? Consumer Behavior in Tourism and Hospitality Research (</article-title>
          <year>2017</year>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Kilian</surname>
            <given-names>Weinberger</given-names>
          </string-name>
          , Anirban Dasgupta, John Langford, Alex Smola, and
          <string-name>
            <given-names>Josh</given-names>
            <surname>Attenberg</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Feature hashing for large scale multitask learning</article-title>
          .
          <source>In Proceedings of the 26th annual international conference on machine learning</source>
          .
          <volume>1113</volume>
          -
          <fpage>1120</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Qingyun</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Naveen Iyer, and
          <string-name>
            <given-names>Hongning</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning contextual bandits in a non-stationary environment</article-title>
          .
          <source>In The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          .
          <fpage>495</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>