Recommending Accommodation Filters with Online Learning
                                Lucas Bernardi                                                               Pablo Estevez
                        lucas.bernardi@booking.com                                                    pablo.estevez@booking.com
                                Booking.com                                                                   Booking.com
                        Amsterdam, The Netherlands                                                    Amsterdam, The Netherlands

                                  Matias Eidis                                                                Eqbal Osama
                         matias.eidis@booking.com                                                     eqbal.osama@booking.com
                                Booking.com                                                                  Booking.com
                        Amsterdam, The Netherlands                                                   Amsterdam, The Netherlands
ABSTRACT                                                                               1   INTRODUCTION
Online Accommodations Platforms match guests searching for ac-                         With the advent of e-commerce, customers have access to a broad
commodation with hospitality service providers. A fundamental                          supply when searching for an item to purchase, which brings many
characteristic of efficient platforms is the ability to satisfy the needs              benefits, but also choice and information overload [10]. The preva-
and preferences of the guests. To achieve this goal, a common                          lence of these issues in tourism has been highlighted in [18] and
search tool is the Results Filtering capability which allows users to                  [20]. [8] gives an extensive revision of the literature on this topic.
refine query results with explicit criteria. However, as supply grows                  Given this context, the ability to apply filters such as Pet Friendly
and diversifies, more filtering options become available, reaching                     Hotels or Breakfast Included is an important tool benefiting all par-
hundreds of different criteria for one query, and making it hard                       ties, helping users to browse a large supply of options as well as
for customers to find the ones that are relevant to them. In this                      accommodation suppliers to better market their services and also
work we present the implementation of an Accommodation Filters                         the platform, by eliciting explicit guests preferences which can be
Recommender System addressing this issue. The problem poses                            used to further personalize the user experience, potentially increas-
several challenges around recommendations feedback, user experi-                       ing the probability of a purchase. On the other hand, the constant
ence constraints, and non stationarity among others. We provide                        growth of available filters due to supply diversification (such as
an end-to-end description of the System, discuss implementation is-                    vacation rentals) and the increase of product details available to
sues and provide techniques to address them including a large scale                    use as filtering criteria, defeats the very purpose of this tool. This
distributed online learning architecture. The solution was validated                   motivates the need of filter recommendations that allow users to
through several Online Controlled Experiments performed in Book-                       refine their query without scanning a long list of options. These
ing.com, a top Online Travel Agency serving millions of daily users,                   recommendations can be displayed for example, on the side of the
showing statistically significant results on various user behaviour                    Search Results Page as a quick access shortlist of filters before the
metrics indicating a strong positive effect on User Engagement.                        full list of options. In this paper we describe a Recommender Sys-
                                                                                       tem for the case of Accommodation Filters. The problem poses
CCS CONCEPTS                                                                           several common challenges like Large Scale, Continuous Cold Start
                                                                                       [2] and Sparsity among others. Previous work has addressed some
• Applied computing → Online shopping; E-commerce infras-                              of these issues in similar scenarios (e.g. [9] [11] [16] and [3]), but
tructure; • Information systems → Collaborative filtering; Con-                        our setting deviates from them and the standard Recommender
tent ranking; Search interfaces.                                                       Systems or Information Retrieval settings, mainly because the item
                                                                                       space, the filters, are not what users are interested in, but a means
KEYWORDS                                                                               to find accommodations that fit their preferences. This brings sev-
recommender systems, online machine learning, information filter-                      eral uncommon challenges around feedback and user experience
ing, distributed systems                                                               consistency. Our work focuses on the design of an integral system
                                                                                       considering all the intricacies of a full recommender system relying
Reference Format:
                                                                                       on well established and effective techniques, putting focus on their
Lucas Bernardi, Pablo Estevez, Matias Eidis, and Eqbal Osama. 2020. Rec-               composition into a fully functional system that allows us to exper-
ommending Accommodation Filters with Online Learning. In 3rd Workshop                  iment with different approaches. The contributions of this work
on Online Recommender Systems and User Modeling (ORSUM 2020), in con-                  are:
junction with the 14th ACM Conference on Recommender Systems, September
25th, 2020, Virtual Event, Brazil.
                                                                                           • A thorough description of a Recommender System success-
                                                                                             fully deployed in Booking.com helping millions of new users
                                                                                             daily to browse millions of accommodations options
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                         • An architecture for Online Recommender Systems capable
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                           of producing a clean, well-formed and reliable data stream,
                                                                                             suitable for incremental learning.
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                   Bernardi, et al.


     • An algorithm-agnostic architecture for Large Scale Distributed      the effect of recommending the same items multiple times and fo-
       Online Learning                                                     cuses on the diversification-accuracy trade off. To our knowledge,
     • Online Controlled Experiments that demonstrate the effec-           no previous work focused specifically on guaranteeing the system
       tiveness of our approach at improving the shopping experi-          is compliant with the User Experience Consistency constraints.
       ence through more effective filtering actions
                                                                              Non Stationarity. Stationarity is a rather strong assumption. For
The paper is organized as follows: Section 2 presents the problem,         example, the set of matching results after applying a filter depends
main challenges and related work. Section 3 describes the solution         on the number of available rooms, which are constantly booked,
framework. Section 4 details the system architecture while Section         cancelled, and replenished. A filter giving very few results might
5 focuses on a Machine Learning Model as implemented in Book-              suddenly match many options, drastically affecting its utility. Ex-
ing.com. Section 6 reports experiments on their website and Section        periments in other areas such as the Ranking Algorithm or the
7 concludes.                                                               User Interface (UI) also affect filters utility. Last, new filters af-
                                                                           fect the utility of other filters, for example, introducing a Family
2    PROBLEM STATEMENT AND RELATED                                         Friendly Property filter, reduced the utility of Family related facili-
     WORK                                                                  ties. Non Stationarity motivates exploration which introduces the
Our goal is to construct a system that recommends Accommoda-               exploration-exploitation trade off. This has been largely studied by
tion Filters maximizing the utility users get from them with the           the contextual-bandits literature in for example [14], [17] and more
following requirements: ability to quickly on-board new filters un-        recently by [22] and references therein.
der specific User Experience Guidelines, in particular, browsing              All these issues (and others, omitted due to space limitations)
consistency; it must scale to hundreds of millions of daily users and      motivate the construction of a complex system that enables their
thousands of filters; it must be available at all times, globally, with    systematic treatment. As proposed in [4], we adopt the Online
latency < 100ms. The problem poses many challenges, we highlight           Recommender System Setting that considers an event stream pro-
three of them that we consider of particular relevance and strongly        duced by user interactions, and relies on incremental algorithms.
influenced the design of our architecture.                                 The following sections describe the architecture including com-
                                                                           ponents that produce a reliable data stream and components that
   User Feedback: Implicit, Delayed and Split. We elicit Filter Utility    host general incremental algorithms at scale, highlighting how they
from implicit signals based on user behaviour such as applying             contribute to the solution of the Accommodation Filters Recom-
a filter, clicking on an accommodation to see further details, or          mendations.
completing a reservation. Several characteristics of these signals
make learning from them challenging. First, the feedback is delayed        3   SOLUTION FRAMEWORK
with delays ranging from minutes to days, and since the negative           The main abstraction in our system is the Feedback Loop, which
feedback can only be implicitly assumed by the lack of positive            models the interactions between the UI and our Recommender
feedback after certain time elapsed, it is necessary to model the          System. In its simplest version a Feedback Loop is defined by the
delay (at least in principle), this has been studied by [5] and more       following sequence:
recently by [12] and references therein. Delayed Feedback requires
                                                                               (1) The UI requests recommendations (opens the loop).
to keep track of the state of each impression which at the scale of
                                                                               (2) The System recommends filters based on the query, context
large e-commerce platforms is an engineering challenge. Another
                                                                                   and filter features.
issue is the Split Feedback Problem: in the classic recommender
                                                                               (3) The UI provides feedback (closes the loop).
system setting, the feedback indicates both the degree of satisfaction
                                                                               (4) The System uses closed loops to update the model.
of a specific item, and which item is receiving such feedback. In
our case, these two pieces of information are split; first we observe      This simple framework captures the dynamic nature of the problem
that the user applied a filter, but only later, when and if the user       providing flexibility to experiment with different ways to address
clicks through or completes a transaction we observe the utility.          the aforementioned issues and abstracting away specific implemen-
To the best of our knowledge this problem has not been explicitly          tations. The Feedback Loop sequence needs to be extended in order
addressed by the relevant literature.                                      to address two issues. The first one is Censored Recommendations,
                                                                           which occurs when filters are not seen by the user (e.g. because
   User Experience Consistency. Filter recommendations impose cer-         they didn’t scroll enough) introducing ambiguity since the lack of
tain consistency constraints. In principle, it is critical that the rec-   feedback might be caused by either dissatisfaction or by censor-
ommendations don’t flicker with each page load, but there are some         ship. To solve this issue we introduce another step in between 2
nuances since in some situations the system must return the same           and 3: the exposure report, which notifies the System that a user
recommendations (e.g. right after the user applied a filter), but in       was exposed to a filter, indicating that lack of positive feedback
other situations the system can return new recommendations (e.g.           implies negative feedback. If no exposure report is received the
when the query dates change), and in others the system must return         observation is not used for learning since the user never saw the
new recommendations (e.g. when the destination changes). Which             recommendation. The second issue, rather specific to our setting,
variables trigger a new recommendation is a design choice that             is Split Feedback which appears when the item and the level of
trades off User Experience Consistency with number of samples to           satisfaction are reported separately. For example if the user applies
learn from. Related work like [7] focuses on recommendations con-          a filter and then completes a reservation we would like to credit
sistency under small deviations of the user profile while [15] studies     that filter with positive feedback. But these two events are usually
Recommending Accommodation Filters with Online Learning                                 ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil


separated by many other browsing events (clicking back and forth              etc. This state is updated as the UI reports exposure and feedback.
the search results page), making it hard at reservation time to know          From this transitions a data stream is produced containing all the
which filters where applied by the user, and therefore impossible to          information needed to train a machine learning model. This process
send a feedback report. Another instance occurs when different UI             is very dependent on the way we handle Delayed Feedback. We
components know different pieces of the feedback. This is the case            considered two approaches: Fully Delayed, where the state machine
of after filtering click-through: we would like to credit an applied          waits for a fixed period before assuming negative feedback, and
filter if it leads to a click on a details card. Which filter is applied is   Fake Negatives Calibration where the negative feedback is assumed
known by the Search Engine but whether a click happens or not,                on exposure as described in [12]. In our experiments both methods
is known by the Details Page Service which has no information                 were effective showing no significant difference. We favored Fake
about the applied filters. This is addressed by a technique that we           Negatives Calibration due to its robustness to changes in the delay
call Confirmable Feedback, which splits the feedback report (step             distribution.
3): first, the user interface reports unconfirmed feedback as soon as            The life-cycle of a loop is enforced prohibiting invalid state
the filter is applied, indicating which item might be credited. The           transitions. This brings high robustness to chaotic UI interactions
System waits for confirmation before using it to learn. The user              (users refreshing pages, clicking copied links, multiple browser tabs,
interface sends the confirmation for example when a reservation               etc.), guaranteeing a well formed stream of events that can be used
is made or a details page loaded, without the need to know which              to train robust machine learning models in an online fashion.
filters were applied. As a result of a confirmation the loop is closed
and the system can learn from it. If no confirmation is sent (e.g.                Distributed Online Machine Learning. This component is respon-
the user applied a filter but no click or reservation happened), the          sible for maintaining a model to recommend filters when requested
system ignores the unconfirmed feedback. Finally, if a confirmation           by the Instrumentation Layer and for updating it as soon as feed-
is received but not feedback was sent before, the confirmation is             back is available. In order to handle high requests volume, we
ignored.                                                                      distribute the model in a cluster, each node runs one model in-
    The main benefit of this framework is that the state of a loop            stance serving a random sample of the recommendations requests
is maintained by the System, the UI is stateless and relies on four           (sharding) while learning from the full feedback stream produced
simple primitives: open loop, report exposure, report feedback and            by the Instrumentation Layer (replication), which is consumed from
confirm feedback, requiring minimal interventions in the UI soft-             a persistent message queue. The feedback is consumed in the order
ware which is a key factor to successfully deploy the full system             it is produced, so although each node learns independently without
into production.                                                              node-node interaction, all the models are exact replicas. To achieve
                                                                              high availability, a special node saves checkpoints of the model to
                                                                              a persistent storage. The checkpoint contains the serialized model,
4    ARCHITECTURE
                                                                              learning state and a pointer to the last feedback message processed
The Architecture implements the Feedback Loops framework and                  in the queue. If a node fails, a new one is created which reads
consists of two main components described below.                              the latest available version from the checkpoint and continues the
   Instrumentation Layer. This component implements the full Feed-            learning process from the corresponding point in the stream.
back Loops sequence, it is used by the UI to integrate the recom-                 This design is agnostic from concrete learning algorithms. Spe-
mendations in the platform. Its first responsibility is serving rec-          cific implementations can be constructed with little attention to
ommendations requests using the Machine Learning Model (ML                    fault tolerance, high availability and latency. This is an important
Model) and providing User Experience Consistency guarantees.                  advantage when deploying models to production since it simplifies
User Experience Consistency presents a trade-off with sampling                algorithm development and debugging. Another important conse-
efficiency which is addressed by a technique we call Contextual               quence is that all changes have an immediate effect, accelerating
Caching: the first time a specific user requests for recommenda-              experimentation allowing us to quickly asses modeling techniques
tions, the ML Model is invoked, a new loop is initialized, and the            such as the Refresh Trigger Feature Set, the Delayed Feedback
recommended filters stored in a cache together with the query,                approach, etc. ultimately streamlining the iterative process.
context and filter features. Subsequent requests are served from
the cache, but if at least one feature included in the Refresh Trigger        5    MACHINE LEARNING MODEL
Feature Set changes, the loop is finalized, the cache is invalidated,         The model must select ~10 Filters out of ~20000 optimizing the total
and a new loop is initialized with a new request to the ML Model.             Utility. The latency requirement is strict, if fulfilling a request takes
Notice that at most one loop is active for a given user at any point          more than 100ms, it is canceled by the UI and no recommendations
in time, and that one loop encapsulates information about the set             are shown to the user. This limit involves all the steps including net-
of recommended filters. This approach allows us to control the                work latency, Contextual Caching and Ranking. Because of this, we
trade-off by specifying which features must trigger a loop reset.             favor simple, fast and scalable models, in particular we rely on the
The second responsibility is to produce a data stream based on the            library Vowpal Wabbit[13] embedded in an on-line process. We use
interactions with the UI, suitable to train a ML Model in an online           a point-wise model estimating the expected utility conditioned on
fashion. The instrumentation layer maintains a state machine for              context, query and filter features. At recommendation time we rank
each loop consisting of all the feature values when the loop was              by the estimated expected utility modeled with logistic regression.
opened, which filters were recommended, which ones were actually              Most of the users are not logged-in while browsing, which means
seen by the user, their expiration status, the feedback received so far,      user history is not available, therefore we rely on query, context and
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                  Bernardi, et al.


item features. Example Query Features are number of guests, des-                       delayed. We update the model independently from the recommen-
tination, number of children, anticipation, length of stay, traveler                   dations requests, as soon as the feedback is available. All weights
type (couple, family, etc.); Contextual Features include temporal                      are initialized with zero and ties are broken randomly uniformly
and geographical attributes; and some item features are Category,                      (see Algorithm 1).
Id and Number of Matching Properties. The Refresh Trigger Feature
Set contains all the query features except anticipation plus all the
                                                                                       Algorithm 1 top-k Delayed Contextual ϵ-Greedy
temporal features. Quadratic interaction features are created be-
tween the Filter and the Query and Context Features, keeping the                           ϵ: exploration factor
linear and independent terms. The full model with Fake Negatives                           k: number of filters to recommend
Calibration[12] is given by Equation 3, where r ∈ {0, 1} is the binary                     c, q: context and query features
output, f , q and c are the Filter, Query and Context feature vectors                      F : set of candidate filters
with dimensions d f , dq and dc respectively. All the Greek letters                        procedure recommendFilters(ϵ, k, c, q, F )
                                                                                                With probability ϵ:                                  ▷ (explore)
are model parameters. θ 0 ∈ R is the global bias, θ ∈ Rd f , ω ∈ Rdq ,
                                                                                                for each filter Fi in F with features f do
ϕ ∈ Rdc are linear parameters, α ∈ Rdq xd f and β ∈ Rdc xd f are the
                                                                                                    scores[Fi ] ← sample from Uniform(0,1)
interaction weights of Query and Context with Filter features.
                                                                                                end for
                                                                                                or with probability 1 − ϵ:                            ▷ (exploit)
                                   df              dq
                                   X               X              dc
                                                                  X                             for each filter Fi in F with features f do
           z( f , q, c) = θ 0 +         fi θ i +        qi ωi +        ci ϕi                        scores[Fi ] ← p̂(r = 1| f , q, c)       ▷ as given by Eq. 3
                                 i                 i              i                             end for
                                                                                 (1)
                              dq X
                              X   df                         df
                                                          dc X
                                                          X                                     return topk(scores, k)                ▷ (breaks ties uniformly)
                          +              qi f j α i j +           c i f j βi j             end procedure
                               i    j                     i   j
                                                                                           c, q: context and query features, f : rewarded filter, ro : observed
                                              1                                            feedback
                       b ( f , q, c) =                                           (2)       procedure onFeedbackAvailable(c, f , q, ro )
                                       1 + e −z (f ,q,c )
                                             b ( f , q, c)                                      Update p̂(r=1|c, q, f) with ro        ▷ detailed update rule in
                    p̂(r = 1| f , q, c) =                                        (3)
                                          1 − b ( f , q, c)                                Algorithm 1 in [19])
                                                                                           end procedure

    The filter representation includes the unique id which allows
the model to learn the specifics of the particular filter option, the
number of matching options, which captures contextual utility (a                       6     EXPERIMENTS
generally very useful filter in a context where it matches many                        We performed several Online Controlled Experiments to validate
properties might be less useful) and the category (e.g. Facilities)                    our approach. We highlight 2 where our model was used to feed
that captures general properties across filters in the same category,                  a quick filters section of the search results page of Booking.com.
which is effective to address the cold start problem: when a new fil-                  50% of the traffic was exposed to the current baseline model, and
ter is added, it inherits what is known about its category. In practice,               50% to our new model. Statistical significance was computed at 90%
all the features except Number of Matching Properties are categor-                     confidence (two-sided) using g-test at a predefined time duration.
ical and are encoded using the hashing trick [21]. We estimated                        In these experiments we used click after filtering as feedback with
about 8 millions features (including interactions), following [1] we                   ϵ = 2%. The baseline is a popularity model based on the same
used a hashing space of 228 buckets which resulted in about 3% col-                    features with a thick layer of business logic on top, blacking out
lision rate. All parameters are learned through Stochastic Gradient                    some filters, up-ranking others, etc. It is based on years of data
Descent with constant learning rate and Normalized Updates [19]                        on filter usage and it is updated manually at arbitrary moments.
(Algorithm 1).                                                                         It is considered a robust baseline since many attempts of using
    To address non stationarity, we rely on active exploration for                     more complex models failed in the past. The metrics of interest for
which we adopt the contextual ϵ-greedy algorithm [6] in which                          these experiments are: Overall Filter Usage (proportion of users
a proportion ϵ of the recommendation requests (after Contextual                        using at least one filter), Recommended Filter Usage (proportion of
Caching) is served with random uniform recommendations (ex-                            users using at least one recommended filter), After Filtering Click
ploration), and the rest by choosing the best according to the es-                     Through Rate (AFCTR, proportion of users filtering (any filter) and
timations of the model (exploitation). Two small adaptations are                       landing on a property detail page), Recommended Filter Utility
required for our case. First, since our system recommends many                         (ratio between number of users applying a recommended filter and
items, the exploration branch is computed by sampling from a uni-                      number of users applying any filter). All these metrics indicate the
form distribution between 0 and 1 for each candidate filter, and                       utility of the recommendations from the users point of view. In
returning the top-k filters. The exploitation branch, simply sorts                     particular, Recommended Filter Utility is a strong indicator since it
the candidate filters by their estimated utility given the context and                 quantifies how easy is for customers to find a relevant filter.
query, and returns the top-k. Second, the model cannot be updated                          First, an experiment was run for two weeks to validate the tech-
right after the recommendations are made since the feedback is                         nical health of the system. The variance of the cluster was very
Recommending Accommodation Filters with Online Learning                                    ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil


        Figure 1: Logarithmic loss of model predictions.                            Figure 2: Prediction variance of replicas across 6 nodes.


Figure 3: Cumulative Uplift w.r.t. Baseline on After Filtering                   Figure 4: Weekly Uplift w.r.t. Baseline on After Filtering CTR
CTR with 90% CIs                                                                 with 90% CIs

                      Table 1: Experiments results with 90% CIs. All statistical significant with p-value < 0.001.

             Metric Uplift w.r.t. Baseline (%)            Exp 1. After 2 weeks    After 4 weeks    Exp. 2 After 2 weeks       After 4 weeks
             Overall Filter Usage                                  0.26 ± 0.06       0.22 ± 0.04               0.19 ± 0.06        0.22 ± 0.03
             Recommended Filter Usage                              37.59 ± 0.1      40.85 ± 0.08               38.02 ± 0.1       32.98 ± 0.05
             After Filtering CTR                                    0.19 ± 0.1       0.23 ± 0.08                0.30 ± 0.1        0.24 ± 0.05
             Recommended Filter Utility                            37.22 ± 0.1      40.53 ± 0.07               32.67 ± 0.1       37.75 ± 0.07


low, (median 8.2e-6, 99th percentile 0.0012) indicating that the mod-             set to 0 (making random recommendations), so we measured the
els are indeed close replicas. Figure 2 shows the hourly average                  time to learn reasonable recommendations defined as the moment
for the first 2 weeks of the experiment. The peaks around hour                    where the AFCTR matches the baseline (y=0 in Figure 3) which
220 are due to one of the nodes lagging behind (due to uneven                     was roughly 20 hours. This is evidence of the ability of the model
resource allocation which is dynamic and not uniform across the                   to effectively incorporate new filters and contexts since at the be-
cluster), but quickly caught up. The model starts with all the weights            ginning, all filters and features are new for the model. Logarithmic
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                     Bernardi, et al.


                                 Table 2: Challenges and corresponding techniques applied in our system.

                                   Problem                           Techniques
                                   User Experience Consistency       Contextual Caching
                                   Delayed Feedback                  Fake Negatives Calibration [12]
                                   Censored Recommendations          Exposure Reports
                                   Split Feedback                    Confirmable Feedback
                                   Non Stationarity                  Online Learning and ϵ-greedy
                                   Low Latency                       Sharded Inference, Replicated Learning
                                   High Availability                 Replicated Learning, Redundant Checkpoints
                                   Continuous Cold Start             ϵ-greedy and Item Features


loss (Figure 1) showed a periodic pattern, likely following the gen-          REFERENCES
eral probability of clicking after filtering during the day. The peaks         [1] Lucas Bernardi. 2018. Don’t be tricked by the Hashing Trick. https://booking.ai/
are likely due to changes in the environment. The general trend is                 dont-be-tricked-by-the-hashing-trick-192a6aae3087 Retrieved 2020-06-16.
                                                                               [2] Lucas Bernardi, Jaap Kamps, Julia Kiseleva, and Melanie JI Müller. 2015. The
negative, suggesting that the system is able to adapt to changes. We               continuous cold start problem in e-commerce recommender systems. arXiv
remark that many new filters were added while this experiment was                  preprint arXiv:1508.01177 (2015).
                                                                               [3] Lucas Bernardi, Themistoklis Mavridis, and Pablo Estevez. 2019. 150 successful
running, and some of them were consistently picked in the top 10.                  machine learning models: 6 lessons learned at booking. com. In Proceedings of
Regarding the latency requirement, we observed a degradation in                    the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data
total page load time of 15ms which is inline with the requirements.                Mining. 1743–1751.
                                                                               [4] Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A Hasegawa-
We let the experiment run for two more weeks to make sure results                  Johnson, and Thomas S Huang. 2017. Streaming recommender systems. In
are stable.                                                                        Proceedings of the 26th International Conference on World Wide Web. 381–389.
   In a second experiment we stress-test the adaptability of the               [5] Olivier Chapelle. 2014. Modeling Delayed Feedback in Display Advertising.
                                                                                   In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge
system by allowing automated traffic from scrapers and crawlers                    Discovery and Data Mining (New York, New York, USA) (KDD âĂŹ14). Association
for a few days during week 6 which completely changes the utility                  for Computing Machinery, New York, NY, USA, 1097âĂŞ1105. https://doi.org/
                                                                                   10.1145/2623330.2623634
of almost all filters (the proportion of automated traffic was signifi-        [6] David Cortes. 2018. Adapting multi-armed bandits policies to contextual bandits
cant). The system degraded but after normalization of the traffic it               scenarios. ArXiv abs/1811.04383 (2018).
recovered, we interpret this as evidence of adaptability to changes            [7] P. Cremonesi and R. Turrin. 2010. Controlling Consistency in Top-N Recom-
                                                                                   mender Systems. In 2010 IEEE International Conference on Data Mining Workshops.
in the environment. This behaviour is depicted in Figure 4.                        919–926.
   Results of both experiments are summarized in Table 1. From                 [8] Basak Denizci Guillet, Anna Mattila, and Lisa Gao. 2019. The effects of choice set
these results we conclude that our system is able to make recom-                   size and information filtering mechanisms on online hotel booking. International
                                                                                   Journal of Hospitality Management (2019), 102379.
mendations that are useful for our customers and superior to the               [9] Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine
ones by the baseline model. We also conclude that the system is                    Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting
                                                                                   clicks on ads at facebook. In Proceedings of the Eighth International Workshop on
reliable, stable and robust, meeting all the requirements specified                Data Mining for Online Advertising. 1–9.
in Section 2.                                                                 [10] Sheena S Iyengar and Mark R Lepper. 2000. When choice is demotivating: Can
                                                                                   one desire too much of a good thing? Journal of personality and social psychology
                                                                                   79, 6 (2000), 995.
                                                                              [11] Rishabh Iyer, Nimit Acharya, Tanuja Bompada, Denis Charles, and Eren Man-
                                                                                   avoglu. 2018. A Unified Batch Online Learning Framework for Click Prediction.
                                                                                   arXiv preprint arXiv:1809.04673 (2018).
7    CONCLUSION                                                               [12] Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak
This work presented a Recommender System for Accommodation                         Dilipkumar, Ferenc Huszár, Steven Yoo, and Wenzhe Shi. 2019. Addressing
                                                                                   delayed feedback for continuous training with neural networks in CTR prediction.
Filters, a relevant problem for Booking.com and other e-commerce                   In Proceedings of the 13th ACM Conference on Recommender Systems. 187–195.
platforms. Our solution features the implementation of well es-               [13] John Langford, Lihong Li, and Alex Strehl. 2007. Vowpal wabbit online learning
tablished techniques for which practical aspects were discussed                    project. http://hunch.net/?p=309
                                                                              [14] John Langford and Tong Zhang. 2007. The epoch-greedy algorithm for contextual
in detail, and new ideas addressing several setting-specific prob-                 multi-armed bandits. In Proceedings of the 20th International Conference on Neural
lems. The Feedback Loops Framework and the Distributed Online                      Information Processing Systems. Citeseer, 817–824.
                                                                              [15] Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain. 2010. Temporal
Learning Learning Architecture allowed us to address requirements                  diversity in recommender systems. In Proceedings of the 33rd international ACM
and trade-offs in a systematic way enabling a fast iterative process.              SIGIR conference on Research and development in information retrieval. 210–217.
The effectiveness of our solution was demonstrated by Online Con-             [16] Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey. 2015. Click-
                                                                                   through prediction for advertising in twitter timeline. In Proceedings of the 21th
trolled Experiments conducted in Booking.com, on millions of users                 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
and accommodations options, showing clear positive effects on User                 1959–1968.
Engagement. Table 2 summarizes challenges and techniques ad-                  [17] Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-
                                                                                   bandit approach to personalized news article recommendation. In Proceedings of
dressing them. Looking forward, the presented system will allow us                 the 19th international conference on World wide web. 661–670.
to experiment with more advanced online learning algorithms such              [18] Jeong-Yeol Park and SooCheong Shawn Jang. 2013. Confused by too many
                                                                                   choices? Choice overload in tourism. Tourism Management 35 (2013), 1–12.
as tree based models and more sophisticated exploration techniques
and to solve new and different business cases.
Recommending Accommodation Filters with Online Learning                                             ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil


[19] Stéphane Ross, Paul Mineiro, and John Langford. 2013. Normalized online learn-     [21] Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh
     ing. arXiv preprint arXiv:1305.6646 (2013).                                             Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings
[20] Nguyen T Thai and Ulku Yuksel. 2017. What can tourists and travel advisors learn        of the 26th annual international conference on machine learning. 1113–1120.
     from choice overload research? Consumer Behavior in Tourism and Hospitality        [22] Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning contextual
     Research (2017), 1.                                                                     bandits in a non-stationary environment. In The 41st International ACM SIGIR
                                                                                             Conference on Research & Development in Information Retrieval. 495–504.