Recommending Accommodation Filters with Online Learning Lucas Bernardi Pablo Estevez lucas.bernardi@booking.com pablo.estevez@booking.com Booking.com Booking.com Amsterdam, The Netherlands Amsterdam, The Netherlands Matias Eidis Eqbal Osama matias.eidis@booking.com eqbal.osama@booking.com Booking.com Booking.com Amsterdam, The Netherlands Amsterdam, The Netherlands ABSTRACT 1 INTRODUCTION Online Accommodations Platforms match guests searching for ac- With the advent of e-commerce, customers have access to a broad commodation with hospitality service providers. A fundamental supply when searching for an item to purchase, which brings many characteristic of efficient platforms is the ability to satisfy the needs benefits, but also choice and information overload [10]. The preva- and preferences of the guests. To achieve this goal, a common lence of these issues in tourism has been highlighted in [18] and search tool is the Results Filtering capability which allows users to [20]. [8] gives an extensive revision of the literature on this topic. refine query results with explicit criteria. However, as supply grows Given this context, the ability to apply filters such as Pet Friendly and diversifies, more filtering options become available, reaching Hotels or Breakfast Included is an important tool benefiting all par- hundreds of different criteria for one query, and making it hard ties, helping users to browse a large supply of options as well as for customers to find the ones that are relevant to them. In this accommodation suppliers to better market their services and also work we present the implementation of an Accommodation Filters the platform, by eliciting explicit guests preferences which can be Recommender System addressing this issue. The problem poses used to further personalize the user experience, potentially increas- several challenges around recommendations feedback, user experi- ing the probability of a purchase. On the other hand, the constant ence constraints, and non stationarity among others. We provide growth of available filters due to supply diversification (such as an end-to-end description of the System, discuss implementation is- vacation rentals) and the increase of product details available to sues and provide techniques to address them including a large scale use as filtering criteria, defeats the very purpose of this tool. This distributed online learning architecture. The solution was validated motivates the need of filter recommendations that allow users to through several Online Controlled Experiments performed in Book- refine their query without scanning a long list of options. These ing.com, a top Online Travel Agency serving millions of daily users, recommendations can be displayed for example, on the side of the showing statistically significant results on various user behaviour Search Results Page as a quick access shortlist of filters before the metrics indicating a strong positive effect on User Engagement. full list of options. In this paper we describe a Recommender Sys- tem for the case of Accommodation Filters. The problem poses CCS CONCEPTS several common challenges like Large Scale, Continuous Cold Start [2] and Sparsity among others. Previous work has addressed some • Applied computing → Online shopping; E-commerce infras- of these issues in similar scenarios (e.g. [9] [11] [16] and [3]), but tructure; • Information systems → Collaborative filtering; Con- our setting deviates from them and the standard Recommender tent ranking; Search interfaces. Systems or Information Retrieval settings, mainly because the item space, the filters, are not what users are interested in, but a means KEYWORDS to find accommodations that fit their preferences. This brings sev- recommender systems, online machine learning, information filter- eral uncommon challenges around feedback and user experience ing, distributed systems consistency. Our work focuses on the design of an integral system considering all the intricacies of a full recommender system relying Reference Format: on well established and effective techniques, putting focus on their Lucas Bernardi, Pablo Estevez, Matias Eidis, and Eqbal Osama. 2020. Rec- composition into a fully functional system that allows us to exper- ommending Accommodation Filters with Online Learning. In 3rd Workshop iment with different approaches. The contributions of this work on Online Recommender Systems and User Modeling (ORSUM 2020), in con- are: junction with the 14th ACM Conference on Recommender Systems, September 25th, 2020, Virtual Event, Brazil. • A thorough description of a Recommender System success- fully deployed in Booking.com helping millions of new users daily to browse millions of accommodations options ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil • An architecture for Online Recommender Systems capable Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). of producing a clean, well-formed and reliable data stream, suitable for incremental learning. ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Bernardi, et al. • An algorithm-agnostic architecture for Large Scale Distributed the effect of recommending the same items multiple times and fo- Online Learning cuses on the diversification-accuracy trade off. To our knowledge, • Online Controlled Experiments that demonstrate the effec- no previous work focused specifically on guaranteeing the system tiveness of our approach at improving the shopping experi- is compliant with the User Experience Consistency constraints. ence through more effective filtering actions Non Stationarity. Stationarity is a rather strong assumption. For The paper is organized as follows: Section 2 presents the problem, example, the set of matching results after applying a filter depends main challenges and related work. Section 3 describes the solution on the number of available rooms, which are constantly booked, framework. Section 4 details the system architecture while Section cancelled, and replenished. A filter giving very few results might 5 focuses on a Machine Learning Model as implemented in Book- suddenly match many options, drastically affecting its utility. Ex- ing.com. Section 6 reports experiments on their website and Section periments in other areas such as the Ranking Algorithm or the 7 concludes. User Interface (UI) also affect filters utility. Last, new filters af- fect the utility of other filters, for example, introducing a Family 2 PROBLEM STATEMENT AND RELATED Friendly Property filter, reduced the utility of Family related facili- WORK ties. Non Stationarity motivates exploration which introduces the Our goal is to construct a system that recommends Accommoda- exploration-exploitation trade off. This has been largely studied by tion Filters maximizing the utility users get from them with the the contextual-bandits literature in for example [14], [17] and more following requirements: ability to quickly on-board new filters un- recently by [22] and references therein. der specific User Experience Guidelines, in particular, browsing All these issues (and others, omitted due to space limitations) consistency; it must scale to hundreds of millions of daily users and motivate the construction of a complex system that enables their thousands of filters; it must be available at all times, globally, with systematic treatment. As proposed in [4], we adopt the Online latency < 100ms. The problem poses many challenges, we highlight Recommender System Setting that considers an event stream pro- three of them that we consider of particular relevance and strongly duced by user interactions, and relies on incremental algorithms. influenced the design of our architecture. The following sections describe the architecture including com- ponents that produce a reliable data stream and components that User Feedback: Implicit, Delayed and Split. We elicit Filter Utility host general incremental algorithms at scale, highlighting how they from implicit signals based on user behaviour such as applying contribute to the solution of the Accommodation Filters Recom- a filter, clicking on an accommodation to see further details, or mendations. completing a reservation. Several characteristics of these signals make learning from them challenging. First, the feedback is delayed 3 SOLUTION FRAMEWORK with delays ranging from minutes to days, and since the negative The main abstraction in our system is the Feedback Loop, which feedback can only be implicitly assumed by the lack of positive models the interactions between the UI and our Recommender feedback after certain time elapsed, it is necessary to model the System. In its simplest version a Feedback Loop is defined by the delay (at least in principle), this has been studied by [5] and more following sequence: recently by [12] and references therein. Delayed Feedback requires (1) The UI requests recommendations (opens the loop). to keep track of the state of each impression which at the scale of (2) The System recommends filters based on the query, context large e-commerce platforms is an engineering challenge. Another and filter features. issue is the Split Feedback Problem: in the classic recommender (3) The UI provides feedback (closes the loop). system setting, the feedback indicates both the degree of satisfaction (4) The System uses closed loops to update the model. of a specific item, and which item is receiving such feedback. In our case, these two pieces of information are split; first we observe This simple framework captures the dynamic nature of the problem that the user applied a filter, but only later, when and if the user providing flexibility to experiment with different ways to address clicks through or completes a transaction we observe the utility. the aforementioned issues and abstracting away specific implemen- To the best of our knowledge this problem has not been explicitly tations. The Feedback Loop sequence needs to be extended in order addressed by the relevant literature. to address two issues. The first one is Censored Recommendations, which occurs when filters are not seen by the user (e.g. because User Experience Consistency. Filter recommendations impose cer- they didn’t scroll enough) introducing ambiguity since the lack of tain consistency constraints. In principle, it is critical that the rec- feedback might be caused by either dissatisfaction or by censor- ommendations don’t flicker with each page load, but there are some ship. To solve this issue we introduce another step in between 2 nuances since in some situations the system must return the same and 3: the exposure report, which notifies the System that a user recommendations (e.g. right after the user applied a filter), but in was exposed to a filter, indicating that lack of positive feedback other situations the system can return new recommendations (e.g. implies negative feedback. If no exposure report is received the when the query dates change), and in others the system must return observation is not used for learning since the user never saw the new recommendations (e.g. when the destination changes). Which recommendation. The second issue, rather specific to our setting, variables trigger a new recommendation is a design choice that is Split Feedback which appears when the item and the level of trades off User Experience Consistency with number of samples to satisfaction are reported separately. For example if the user applies learn from. Related work like [7] focuses on recommendations con- a filter and then completes a reservation we would like to credit sistency under small deviations of the user profile while [15] studies that filter with positive feedback. But these two events are usually Recommending Accommodation Filters with Online Learning ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil separated by many other browsing events (clicking back and forth etc. This state is updated as the UI reports exposure and feedback. the search results page), making it hard at reservation time to know From this transitions a data stream is produced containing all the which filters where applied by the user, and therefore impossible to information needed to train a machine learning model. This process send a feedback report. Another instance occurs when different UI is very dependent on the way we handle Delayed Feedback. We components know different pieces of the feedback. This is the case considered two approaches: Fully Delayed, where the state machine of after filtering click-through: we would like to credit an applied waits for a fixed period before assuming negative feedback, and filter if it leads to a click on a details card. Which filter is applied is Fake Negatives Calibration where the negative feedback is assumed known by the Search Engine but whether a click happens or not, on exposure as described in [12]. In our experiments both methods is known by the Details Page Service which has no information were effective showing no significant difference. We favored Fake about the applied filters. This is addressed by a technique that we Negatives Calibration due to its robustness to changes in the delay call Confirmable Feedback, which splits the feedback report (step distribution. 3): first, the user interface reports unconfirmed feedback as soon as The life-cycle of a loop is enforced prohibiting invalid state the filter is applied, indicating which item might be credited. The transitions. This brings high robustness to chaotic UI interactions System waits for confirmation before using it to learn. The user (users refreshing pages, clicking copied links, multiple browser tabs, interface sends the confirmation for example when a reservation etc.), guaranteeing a well formed stream of events that can be used is made or a details page loaded, without the need to know which to train robust machine learning models in an online fashion. filters were applied. As a result of a confirmation the loop is closed and the system can learn from it. If no confirmation is sent (e.g. Distributed Online Machine Learning. This component is respon- the user applied a filter but no click or reservation happened), the sible for maintaining a model to recommend filters when requested system ignores the unconfirmed feedback. Finally, if a confirmation by the Instrumentation Layer and for updating it as soon as feed- is received but not feedback was sent before, the confirmation is back is available. In order to handle high requests volume, we ignored. distribute the model in a cluster, each node runs one model in- The main benefit of this framework is that the state of a loop stance serving a random sample of the recommendations requests is maintained by the System, the UI is stateless and relies on four (sharding) while learning from the full feedback stream produced simple primitives: open loop, report exposure, report feedback and by the Instrumentation Layer (replication), which is consumed from confirm feedback, requiring minimal interventions in the UI soft- a persistent message queue. The feedback is consumed in the order ware which is a key factor to successfully deploy the full system it is produced, so although each node learns independently without into production. node-node interaction, all the models are exact replicas. To achieve high availability, a special node saves checkpoints of the model to a persistent storage. The checkpoint contains the serialized model, 4 ARCHITECTURE learning state and a pointer to the last feedback message processed The Architecture implements the Feedback Loops framework and in the queue. If a node fails, a new one is created which reads consists of two main components described below. the latest available version from the checkpoint and continues the Instrumentation Layer. This component implements the full Feed- learning process from the corresponding point in the stream. back Loops sequence, it is used by the UI to integrate the recom- This design is agnostic from concrete learning algorithms. Spe- mendations in the platform. Its first responsibility is serving rec- cific implementations can be constructed with little attention to ommendations requests using the Machine Learning Model (ML fault tolerance, high availability and latency. This is an important Model) and providing User Experience Consistency guarantees. advantage when deploying models to production since it simplifies User Experience Consistency presents a trade-off with sampling algorithm development and debugging. Another important conse- efficiency which is addressed by a technique we call Contextual quence is that all changes have an immediate effect, accelerating Caching: the first time a specific user requests for recommenda- experimentation allowing us to quickly asses modeling techniques tions, the ML Model is invoked, a new loop is initialized, and the such as the Refresh Trigger Feature Set, the Delayed Feedback recommended filters stored in a cache together with the query, approach, etc. ultimately streamlining the iterative process. context and filter features. Subsequent requests are served from the cache, but if at least one feature included in the Refresh Trigger 5 MACHINE LEARNING MODEL Feature Set changes, the loop is finalized, the cache is invalidated, The model must select ~10 Filters out of ~20000 optimizing the total and a new loop is initialized with a new request to the ML Model. Utility. The latency requirement is strict, if fulfilling a request takes Notice that at most one loop is active for a given user at any point more than 100ms, it is canceled by the UI and no recommendations in time, and that one loop encapsulates information about the set are shown to the user. This limit involves all the steps including net- of recommended filters. This approach allows us to control the work latency, Contextual Caching and Ranking. Because of this, we trade-off by specifying which features must trigger a loop reset. favor simple, fast and scalable models, in particular we rely on the The second responsibility is to produce a data stream based on the library Vowpal Wabbit[13] embedded in an on-line process. We use interactions with the UI, suitable to train a ML Model in an online a point-wise model estimating the expected utility conditioned on fashion. The instrumentation layer maintains a state machine for context, query and filter features. At recommendation time we rank each loop consisting of all the feature values when the loop was by the estimated expected utility modeled with logistic regression. opened, which filters were recommended, which ones were actually Most of the users are not logged-in while browsing, which means seen by the user, their expiration status, the feedback received so far, user history is not available, therefore we rely on query, context and ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Bernardi, et al. item features. Example Query Features are number of guests, des- delayed. We update the model independently from the recommen- tination, number of children, anticipation, length of stay, traveler dations requests, as soon as the feedback is available. All weights type (couple, family, etc.); Contextual Features include temporal are initialized with zero and ties are broken randomly uniformly and geographical attributes; and some item features are Category, (see Algorithm 1). Id and Number of Matching Properties. The Refresh Trigger Feature Set contains all the query features except anticipation plus all the Algorithm 1 top-k Delayed Contextual ϵ-Greedy temporal features. Quadratic interaction features are created be- tween the Filter and the Query and Context Features, keeping the ϵ: exploration factor linear and independent terms. The full model with Fake Negatives k: number of filters to recommend Calibration[12] is given by Equation 3, where r ∈ {0, 1} is the binary c, q: context and query features output, f , q and c are the Filter, Query and Context feature vectors F : set of candidate filters with dimensions d f , dq and dc respectively. All the Greek letters procedure recommendFilters(ϵ, k, c, q, F ) With probability ϵ: ▷ (explore) are model parameters. θ 0 ∈ R is the global bias, θ ∈ Rd f , ω ∈ Rdq , for each filter Fi in F with features f do ϕ ∈ Rdc are linear parameters, α ∈ Rdq xd f and β ∈ Rdc xd f are the scores[Fi ] ← sample from Uniform(0,1) interaction weights of Query and Context with Filter features. end for or with probability 1 − ϵ: ▷ (exploit) df dq X X dc X for each filter Fi in F with features f do z( f , q, c) = θ 0 + fi θ i + qi ωi + ci ϕi scores[Fi ] ← p̂(r = 1| f , q, c) ▷ as given by Eq. 3 i i i end for (1) dq X X df df dc X X return topk(scores, k) ▷ (breaks ties uniformly) + qi f j α i j + c i f j βi j end procedure i j i j c, q: context and query features, f : rewarded filter, ro : observed 1 feedback b ( f , q, c) = (2) procedure onFeedbackAvailable(c, f , q, ro ) 1 + e −z (f ,q,c ) b ( f , q, c) Update p̂(r=1|c, q, f) with ro ▷ detailed update rule in p̂(r = 1| f , q, c) = (3) 1 − b ( f , q, c) Algorithm 1 in [19]) end procedure The filter representation includes the unique id which allows the model to learn the specifics of the particular filter option, the number of matching options, which captures contextual utility (a 6 EXPERIMENTS generally very useful filter in a context where it matches many We performed several Online Controlled Experiments to validate properties might be less useful) and the category (e.g. Facilities) our approach. We highlight 2 where our model was used to feed that captures general properties across filters in the same category, a quick filters section of the search results page of Booking.com. which is effective to address the cold start problem: when a new fil- 50% of the traffic was exposed to the current baseline model, and ter is added, it inherits what is known about its category. In practice, 50% to our new model. Statistical significance was computed at 90% all the features except Number of Matching Properties are categor- confidence (two-sided) using g-test at a predefined time duration. ical and are encoded using the hashing trick [21]. We estimated In these experiments we used click after filtering as feedback with about 8 millions features (including interactions), following [1] we ϵ = 2%. The baseline is a popularity model based on the same used a hashing space of 228 buckets which resulted in about 3% col- features with a thick layer of business logic on top, blacking out lision rate. All parameters are learned through Stochastic Gradient some filters, up-ranking others, etc. It is based on years of data Descent with constant learning rate and Normalized Updates [19] on filter usage and it is updated manually at arbitrary moments. (Algorithm 1). It is considered a robust baseline since many attempts of using To address non stationarity, we rely on active exploration for more complex models failed in the past. The metrics of interest for which we adopt the contextual ϵ-greedy algorithm [6] in which these experiments are: Overall Filter Usage (proportion of users a proportion ϵ of the recommendation requests (after Contextual using at least one filter), Recommended Filter Usage (proportion of Caching) is served with random uniform recommendations (ex- users using at least one recommended filter), After Filtering Click ploration), and the rest by choosing the best according to the es- Through Rate (AFCTR, proportion of users filtering (any filter) and timations of the model (exploitation). Two small adaptations are landing on a property detail page), Recommended Filter Utility required for our case. First, since our system recommends many (ratio between number of users applying a recommended filter and items, the exploration branch is computed by sampling from a uni- number of users applying any filter). All these metrics indicate the form distribution between 0 and 1 for each candidate filter, and utility of the recommendations from the users point of view. In returning the top-k filters. The exploitation branch, simply sorts particular, Recommended Filter Utility is a strong indicator since it the candidate filters by their estimated utility given the context and quantifies how easy is for customers to find a relevant filter. query, and returns the top-k. Second, the model cannot be updated First, an experiment was run for two weeks to validate the tech- right after the recommendations are made since the feedback is nical health of the system. The variance of the cluster was very Recommending Accommodation Filters with Online Learning ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Figure 1: Logarithmic loss of model predictions. Figure 2: Prediction variance of replicas across 6 nodes. Figure 3: Cumulative Uplift w.r.t. Baseline on After Filtering Figure 4: Weekly Uplift w.r.t. Baseline on After Filtering CTR CTR with 90% CIs with 90% CIs Table 1: Experiments results with 90% CIs. All statistical significant with p-value < 0.001. Metric Uplift w.r.t. Baseline (%) Exp 1. After 2 weeks After 4 weeks Exp. 2 After 2 weeks After 4 weeks Overall Filter Usage 0.26 ± 0.06 0.22 ± 0.04 0.19 ± 0.06 0.22 ± 0.03 Recommended Filter Usage 37.59 ± 0.1 40.85 ± 0.08 38.02 ± 0.1 32.98 ± 0.05 After Filtering CTR 0.19 ± 0.1 0.23 ± 0.08 0.30 ± 0.1 0.24 ± 0.05 Recommended Filter Utility 37.22 ± 0.1 40.53 ± 0.07 32.67 ± 0.1 37.75 ± 0.07 low, (median 8.2e-6, 99th percentile 0.0012) indicating that the mod- set to 0 (making random recommendations), so we measured the els are indeed close replicas. Figure 2 shows the hourly average time to learn reasonable recommendations defined as the moment for the first 2 weeks of the experiment. The peaks around hour where the AFCTR matches the baseline (y=0 in Figure 3) which 220 are due to one of the nodes lagging behind (due to uneven was roughly 20 hours. This is evidence of the ability of the model resource allocation which is dynamic and not uniform across the to effectively incorporate new filters and contexts since at the be- cluster), but quickly caught up. The model starts with all the weights ginning, all filters and features are new for the model. Logarithmic ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Bernardi, et al. Table 2: Challenges and corresponding techniques applied in our system. Problem Techniques User Experience Consistency Contextual Caching Delayed Feedback Fake Negatives Calibration [12] Censored Recommendations Exposure Reports Split Feedback Confirmable Feedback Non Stationarity Online Learning and ϵ-greedy Low Latency Sharded Inference, Replicated Learning High Availability Replicated Learning, Redundant Checkpoints Continuous Cold Start ϵ-greedy and Item Features loss (Figure 1) showed a periodic pattern, likely following the gen- REFERENCES eral probability of clicking after filtering during the day. The peaks [1] Lucas Bernardi. 2018. Don’t be tricked by the Hashing Trick. https://booking.ai/ are likely due to changes in the environment. The general trend is dont-be-tricked-by-the-hashing-trick-192a6aae3087 Retrieved 2020-06-16. [2] Lucas Bernardi, Jaap Kamps, Julia Kiseleva, and Melanie JI Müller. 2015. The negative, suggesting that the system is able to adapt to changes. We continuous cold start problem in e-commerce recommender systems. arXiv remark that many new filters were added while this experiment was preprint arXiv:1508.01177 (2015). [3] Lucas Bernardi, Themistoklis Mavridis, and Pablo Estevez. 2019. 150 successful running, and some of them were consistently picked in the top 10. machine learning models: 6 lessons learned at booking. com. In Proceedings of Regarding the latency requirement, we observed a degradation in the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data total page load time of 15ms which is inline with the requirements. Mining. 1743–1751. [4] Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A Hasegawa- We let the experiment run for two more weeks to make sure results Johnson, and Thomas S Huang. 2017. Streaming recommender systems. In are stable. Proceedings of the 26th International Conference on World Wide Web. 381–389. In a second experiment we stress-test the adaptability of the [5] Olivier Chapelle. 2014. Modeling Delayed Feedback in Display Advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge system by allowing automated traffic from scrapers and crawlers Discovery and Data Mining (New York, New York, USA) (KDD âĂŹ14). Association for a few days during week 6 which completely changes the utility for Computing Machinery, New York, NY, USA, 1097âĂŞ1105. https://doi.org/ 10.1145/2623330.2623634 of almost all filters (the proportion of automated traffic was signifi- [6] David Cortes. 2018. Adapting multi-armed bandits policies to contextual bandits cant). The system degraded but after normalization of the traffic it scenarios. ArXiv abs/1811.04383 (2018). recovered, we interpret this as evidence of adaptability to changes [7] P. Cremonesi and R. Turrin. 2010. Controlling Consistency in Top-N Recom- mender Systems. In 2010 IEEE International Conference on Data Mining Workshops. in the environment. This behaviour is depicted in Figure 4. 919–926. Results of both experiments are summarized in Table 1. From [8] Basak Denizci Guillet, Anna Mattila, and Lisa Gao. 2019. The effects of choice set these results we conclude that our system is able to make recom- size and information filtering mechanisms on online hotel booking. International Journal of Hospitality Management (2019), 102379. mendations that are useful for our customers and superior to the [9] Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine ones by the baseline model. We also conclude that the system is Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on reliable, stable and robust, meeting all the requirements specified Data Mining for Online Advertising. 1–9. in Section 2. [10] Sheena S Iyengar and Mark R Lepper. 2000. When choice is demotivating: Can one desire too much of a good thing? Journal of personality and social psychology 79, 6 (2000), 995. [11] Rishabh Iyer, Nimit Acharya, Tanuja Bompada, Denis Charles, and Eren Man- avoglu. 2018. A Unified Batch Online Learning Framework for Click Prediction. arXiv preprint arXiv:1809.04673 (2018). 7 CONCLUSION [12] Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak This work presented a Recommender System for Accommodation Dilipkumar, Ferenc Huszár, Steven Yoo, and Wenzhe Shi. 2019. Addressing delayed feedback for continuous training with neural networks in CTR prediction. Filters, a relevant problem for Booking.com and other e-commerce In Proceedings of the 13th ACM Conference on Recommender Systems. 187–195. platforms. Our solution features the implementation of well es- [13] John Langford, Lihong Li, and Alex Strehl. 2007. Vowpal wabbit online learning tablished techniques for which practical aspects were discussed project. http://hunch.net/?p=309 [14] John Langford and Tong Zhang. 2007. The epoch-greedy algorithm for contextual in detail, and new ideas addressing several setting-specific prob- multi-armed bandits. In Proceedings of the 20th International Conference on Neural lems. The Feedback Loops Framework and the Distributed Online Information Processing Systems. Citeseer, 817–824. [15] Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain. 2010. Temporal Learning Learning Architecture allowed us to address requirements diversity in recommender systems. In Proceedings of the 33rd international ACM and trade-offs in a systematic way enabling a fast iterative process. SIGIR conference on Research and development in information retrieval. 210–217. The effectiveness of our solution was demonstrated by Online Con- [16] Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey. 2015. Click- through prediction for advertising in twitter timeline. In Proceedings of the 21th trolled Experiments conducted in Booking.com, on millions of users ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. and accommodations options, showing clear positive effects on User 1959–1968. Engagement. Table 2 summarizes challenges and techniques ad- [17] Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual- bandit approach to personalized news article recommendation. In Proceedings of dressing them. Looking forward, the presented system will allow us the 19th international conference on World wide web. 661–670. to experiment with more advanced online learning algorithms such [18] Jeong-Yeol Park and SooCheong Shawn Jang. 2013. Confused by too many choices? Choice overload in tourism. Tourism Management 35 (2013), 1–12. as tree based models and more sophisticated exploration techniques and to solve new and different business cases. Recommending Accommodation Filters with Online Learning ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil [19] Stéphane Ross, Paul Mineiro, and John Langford. 2013. Normalized online learn- [21] Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh ing. arXiv preprint arXiv:1305.6646 (2013). Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings [20] Nguyen T Thai and Ulku Yuksel. 2017. What can tourists and travel advisors learn of the 26th annual international conference on machine learning. 1113–1120. from choice overload research? Consumer Behavior in Tourism and Hospitality [22] Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning contextual Research (2017), 1. bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 495–504.