RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 9 Cascaded Machine Learning Model for Efficient Hotel Recommendations from Air Travel Bookings Eoin Thomas∗ Benoit Lardeux Mourad Boudia Antonio Gonzalez Ferrer∗ Amadeus SAS Amadeus SAS eoin.thomas@amadeus.com Sophia Antipolis, France Sophia Antipolis, France Amadeus SAS Sophia Antipolis, France Christian Haas-Frangii Rodrigo Acuna Agost Amadeus SAS Amadeus SAS Sophia Antipolis, France Sophia Antipolis, France ABSTRACT advertisements [20] or inform themselves by reading reviews [6, 18]. Recommending a hotel for vacations or a business trip can be a However, the Internet has overtaken word of mouth as the primary challenging task due to the large number of alternatives and con- medium for choosing destinations [23] by guiding the user in a siderations to take into account. In this study, a recommendation personalized way to interesting or useful products from a large engine is designed to identify relevant hotels based on features of space of possible options. the facilities and the context of the trip via flight information. The Many players have emerged in the past decades mediating the system was designed as a cascaded machine learning pipeline, with communication between the consumers and the suppliers. One type a model to predict the conversion probability of each hotel and an- of player is the Global Distribution System (GDS), which allows other to predict the conversion of a set of hotels as presented to the customer-facing travel agencies (online or physical) to search and traveller. By analysing the feature importance of the model based book content from most airlines and hotels. Increased conversion on sets of hotels, we are able to construct optimal lists of hotels is a benefical goal for the supplier and broker as it implies more by selecting individual hotels that will maximise the probability of revenue for a lower cost of operation, and for the traveller, as it conversion. implies quicker decision making and thus less time spent on search and shopping activities. CCS CONCEPTS In this study, we aim to increase the conversion rate for hospi- tality recommendations after users book air travel. In Section 2, • Computing methodologies → Machine learning; the problem is formulated in order to highlight the considera- tions which separate this work from many recommender system KEYWORDS paradigms. Section 3 presents the main techniques and concepts Recommender systems, machine learning, hotels, conversion. used in this study. In Section 4, a brief overview is given of the indus- try data used in this study. Section 5 discusses the results obtained 1 INTRODUCTION for different machine learning models including feature analysis. In the United States, the travel industry is estimated to be the third A discussion of the main outcomes of this study is provided in largest industry after the automotive and food sectors and con- Section 6. tributes to approximately 5% of the gross domestic product. Travel has experienced rapid growth as users are willing to pay for new 2 PROBLEM FORMULATION experiences, unexpected situations, and moments of meditation [9, 28], while the cost of travel has decreased over time in part due 2.1 Industry background to low cost carriers and the sharing economy. At the same time, Booking a major holiday is typically a yearly or bi-yearly activity for traditional travel players such as airlines, hotels, and travel agen- travellers, requiring research for destinations, activities and pricing. cies, aim to increase revenue from these activities. The supply side According to a study from Expedia [12], on average, travellers must identify its market segments, create the respective products visit 38 sites up to 45 days prior to booking. The travel sector is with the right features and prices, and it has to find a distribution characterized by Burke and Ramezani [5] as a domain with the channel. The traveller has to find the right product, its conditions, following factors: its price and how and where to buy it. In fact, the vast quantity • Low heterogeneity: the needs that the items can satisfy are of information available to the users makes this selection more not so diverse. challenging. • High risk: the price of items is comparatively high. Finding the best alternative can become a complicated and time- • Low churn: the relevance of items do not change rapidly. consuming process. Consumers used to rely mostly on recommen- • Explicit interaction style: the user needs to explicitly interact dations from other people by word of mouth, known products from with the system in order to add personal data. Although some ∗ Both authors contributed equally to this research. implicit preferences can be tracked from web activity and Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 10 past history, mainly the information obtained is gathered in to be designed. Figure 1 shows an outline of the rule-based rec- an explicit way (e.g. when/where do you want to travel?). ommendation system currently in use. After a user books a flight, • Unstable preferences: information collected from the past information related to the trip is sent to the recommender engine. about the user might be no longer trustworthy today. However, this system does not take into account valuable in- Researchers have tried to relate touristic behavioural patterns formation such as the context of the request (e.g. where did the to psychological needs and expectations by 1) defining a charac- booking originate from?), details about the associated flight (e.g. terization of travel personalities and 2) building a computational how many days is the user staying in the city?) nor historical rec- model based on a proper description of these profiles [27]. Recom- ommendations (e.g. are similar users likely to book similar hotels?), mender systems are a particular form of information filtering that which are key assets to fine tune the recommendations. exploit past behaviours and user similarities. They have become The problem is novel due to the richness of available data sources fundamental in e-commerce applications, providing suggestions (bookings, ratings, passenger information) and the variety of dis- that adequately reduce large search spaces so that users are directed tribution channels: indirect through travel agencies or direct (web- toward items that best meet their preferences. There are several site, mobile, mailbox). However, it is important to consider that core techniques that are applied to predict whether an item is in by design, no personally identifiable information (PII) or traveller fact useful to the user [4]. With a content-based approach, items specific history is used as part of the model, which therefore ex- are recommended based on attributes of the items chosen by the cludes collaborative-filtering or content-based approaches. The user in the past [3, 26]. In collaborative filtering techniques, rec- contributions of this work are: ommendations to each user are based on information provided by • The combination of data feeds to generate the context of similar users, typically without any characterization of the con- travel, including flights booked by traveller, historical hotels tent [19, 24, 25]. More recentely, session-based recommenders have proposed and booked at destination by other travellers, and been proposed, where content is selected based on previous activity hotel content information. made by the user on a website or application [17]. • The definition of a 2-stage machine learning recommender tailored for travel context. Two machine learning models are 2.2 Terminology required to build the new recommendation set. The output In order to clearly define our goal, let us first define some terminol- of the first machine learning algorithm (prediction of the ogy: probability of hotel booking) is a key input for the second • Hotel Conversion: a hotel recommendation leads to a con- algorithm, based on the idea of [13]. version when the user books a specific hotel. • The comparison of several machine learning algorithms for • Hotel Model: machine learning model trained to predict modelling the hospitality conversion in the travel industry. the conversion probability of individual hotels. • The design and implementation of a recommendation builder • Passenger Name Record (PNR): digital record that con- engine which generates the hotel recommendations that tains information about the passenger data and flight details. maximize the conversion rate of the session. This engine is • Session: after a traveller completes a flight booking through built based on the analysis of the feature importance of the a reservation system, a session is defined by the context of session model at individual level [29]. the flight, the context of the reservation, and a set of five recommended hotels proposed by the recommender system. 3 METHODOLOGY • Session Conversion: a session leads to a conversion when 3.1 Pipeline the user books any of the hotels suggested during the session. Using machine learning and the historical dataset of recommen- • Session Model: machine learning model trained using fea- dations, we can train a model which is capable of predicting with tures related with the session context and hotels, its output high confidence whether a proposed set of recommended hotels is the conversion probability of the session. leads to a booking. The end goal of the recommender system is to increase session Once we have fit the model, we can evaluate other combinations conversion. We can estimate the probability of booking of a list of of hotels and recommend a list of hotels to the user that maximizes hotels using the session model, and thus we can compare different the conversion. Instead of proposing a completely new set of hotels, lists using the session model to determine the one which will max- we decide to modify the existing suggestions given by the existing imise the probability of conversion of the session. Note that in this rule-based system. Our approach, shown in Figure 2, removes one case conversion is defined as a selection or "click" of a hotel on the of the initial hotels and introduces an additional one that increases interface, rather than a booking. the conversion probability: We have identified two different ways to select the hotel that is 2.3 Hotel recommendations going to be introduced within the set of recommendations: The content sold through a GDS is diverse, including flight seg- • We can create and evaluate all possible combinations and ments, hotel stays, cruises, car rental, and airport-hotel transfers. choose the one with the highest conversion probability. This The core GDS business concerns the delivery of appropriate travel means, each time one out of the five hotels from the initial solutions to travel retailers. Therefore, state-of-the-art recommen- list is removed, and a new one from the pool of hotels is in- dation engines capable of analysing historical bookings and au- serted. However, this brute force solution is computationally tomatically recommending the appropriate travel solutions need inefficient and time-consuming (e.g., in Paris this results in Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 11 Figure 1: A hotel recommendation system. When a flight booking is completed, the flight details are passed to the hotel recommender engine which selects a set of available hotels for the user based on historical hotel bookings, hotel facilities and a corporate policy check. Figure 2: The goal of the system is to improve the probability of conversion. To provide a better set of recommendations, the session builder replaces hotels in the original list. 5*1,653 different combinations for a single swap, the length in complex models, allowing the switch from a multi-objective to a of the list multiplied by the number of available hotels). univariate optimization problem when one feature is dominant. • Alternatively, a hotel from the list of selected hotels can be replaced with an available hotel, based on some criteria. Typically, the criteria might be the price of the hotel room, 3.2 Cascade Generalization or the average review score, or a combination of multiple Ensembling techniques consist in combining the decisions of multi- indicators. In this work, the criteria used to optimise the ple classifiers in order to reduce the test error on unseen data. After overall list of hotels is determined via feature analysis. studying the bias-variance decomposition of the error in bagging and boosting, Kohavi observed that the reduction of the error is Nevertheless, the last solution presents some challenges that mainly due to reduction in the variance [21]. An issue with boosting need to be discussed and solved: is robustness to noise since noisy examples tend to be misclassified (1) How to study the feature importance of complex non-linear and therefore the weight will increase for these examples [2]. A models? new direction in ensemble methods was proposed by Gama and (2) How to best interpret the feature importance in an unbal- Brazdil [13] called Cascade Generalization. The basic idea is to use anced dataset? sequentially the set of classifiers (similarly to boosting), where at (3) How many features should be used during the selection pro- each step, new attributes are added to the original data. The new cess of building an optimal list? Initially, we are facing a attributes are derived from the probability class distribution given multi-objective optimization problem since the choice of a by the base classifiers. hotel for enhancing the conversion probability might depend There are several advantages of using cascade generalization on different features. Furthermore, the existence of categor- over other ensemble algorithms: ical features makes this optimization even harder. Can we • The new attributes are continuous since they are probability convert it into a univariate optimization problem? class distributions. The novelty of this study comes from the use of two related works • Each classifier has access to the original attributes and any to address the above points. First, we design a two-stage cascaded new attribute included at lower levels is considered exactly machine learning model [13] where the output probabilities of the in the same way as any of the original attributes. first model are a new feature of the second one. Second, we interpret • It does not use internal cross validation which affects the the feature importance of the positive instances (i.e. conversions) computational efficiency of the method. with a local interpretable model-agnostic (LIME) technique [29]. • The new probabilities can act as a dimensionality reduc- Thus, we can study the feature importance of particular instances tion technique. The relationship between the independent Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 12 features and the target variable are captured by these new Machines (GBMs) were used to evaluate Decision Tree based ensem- attributes. bles and fully connected Neural Networks (NN) were also assessed. Furthermore, the model ensembling technique of Stacking (STK) As will be shown in further sections, this last point is a key was also assessed. Stacking comprises of learning a linear model aspect of the proposed system, as the probabilities generated by the to predict the target variable based on the output probabilities of hotel model can be used to directly select new hotels to include in multiple machine learning algorithms as features. the recommendation. However, the session model uses aggregated features from the hotel model, and as such an interpretable feature 3.6 Hotel Model analysis is required to determine how best to select hotels based on their features. The first step is to train a machine learning model on individual hotels, as shown is Figure 3. The features used for training this model are not exclusively related to hotels, but also with the session 3.3 Interpretability in Machine Learning and flight context. Evaluating this model, we get the probability Machine learning has grown in popularity in the last decade by that a certain hotel will be booked for a given location. The model producing more reliable, more accurate, and faster results in areas is learned by framing the problem as a supervised classification such as speech recognition [16], natural language understanding problem, using the conversion (i.e. click) as a label. Note that for the [8], and image processing [22]. Nevertheless, machine learning hotel model, the probabilities of conversion are independent of other models act mostly as black boxes. That is, given an input the system hotels presented in the session. This leads to several advantages: produces an output with little interpretable knowledge on how it • Cold start problem: the model does not penalise items or achieved that result. This necessity for interpretability comes from users that have not been recommended yet, since no hotel an incompleteness in the problem formalisation meaning that, for identifier or personally identifiable information is used. [31]. certain problems, it is not enough to get the solution, but also how it • Dimensionality reduction: the output probabilities of the came to that answer [11]. Several studies on the interpretability for hotel model can be interpreted as a feature that comprises machine learning models can be found on the literature [1, 15, 32]. the relationship between the independent variables and the target variable. This is a key concept of the Cascade Gen- 3.4 Local Interpretable Model-Agnostic eralization technique, thus the output of the hotel model is Explanations (LIME) combined with the features to create the feature vector for In this section, we focus on the work from Ribeiro et al. [29] called the session model, as shown in 4. Local Interpretable Model-Agnostic Explanations. The Local In- Note that the features used as input to the hotel model are dis- terpretable Model-Agnostic Explanations model explains the pre- cussed in Section 4. dictions of any classifier (model-agnostic) in a interpretable and faithful manner by learning an interpretable model locally around the prediction: • Interpretable. In the context of machine learning systems, we define interpretability as the ability to explain or to present in understandable terms to a human [11]. • Local fidelity. Global interpretability implies describing the patterns present in the overall model, while local inter- pretability describes the reasons for a specific decision on a unique sample. For interpreting a specific observation, we Figure 3: Sketch of the Hotel Model. The machine learning assume it is sufficient to understand how it behaves locally. model is trained to predict the probability that each hotel • Model-agnostic. The goal is to provide a set of techniques will be booked. that can be applied to any classifier or regressor in contrast to other domain-specific techniques [33]. 3.7 Session Model In practice, LIME creates interpretable explanations for an in- dividual sample by fitting a linear model to a set of perturbed The second machine learning model predicts whether a session variations of the sample and the resulting predictions as output leads to a conversion or not, see Figure 4. A session is composed from a complex-model. of five different hotels and the aim of the recommender system is to propose a set of hotels that results in the user booking any one of them. Aggregates of the features from the Hotel Model 3.5 Predictive Models (contextual, passenger, and hotel features) are used, as well as the The selection of which machine learning model to use highly de- hotel probabilities obtained from the hotel model. The numerical pends on the problem nature, constraints and limitations that are features related with the hotels are aggregated in different ways trying to be solved. In this work, algorithms from different families (max, min, std and avg of price and probability for example). The of machine learning were investigated. Specifically, the Naive Bayes features related with the context do not change (e.g. attributes about Classifier (NB) and Generalised linear Model (GLM) were investi- the session or the flight) as these are identical for each element in gated as linear models, Random Forests (RF), Gradient Boosting the session. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 13 (1 + β 2 )PR Fβ = β 2P + R β is a parameter that controls a balance between precision P and recall R. When β = 1, F 1 comes to be equivalent to the harmonic mean of P and R. If β > 1, F becomes more recall-oriented (by plac- ing more emphasis on false negatives) and if β < 1, it becomes more Figure 4: Sketch of the session model pipeline. This machine precision oriented (by attenuating the influence of false negatives). learning model predicts the probability that a session leads Common used metrics are the F 2 and F 0.5 scores. to a conversion, given a list of hotels. This is achieved using Area Under the ROC curve. The receiver operating characteris- cascaded machine learning in which the hotel model predic- tic (ROC) curve is created by plotting the true positive rate (TPR) tions are used as features to the session model. against the false positive rate (FPR) at various threshold levels. How- ever, this can present an optimistic view of a classifier performance 3.8 Session Builder if there is a large skew in the class distribution because the metric takes into account true negatives. The Session Model estimates the conversion probability of the ses- sion using contextual and content information. Thus, part of the Average Precision (AP). The precision-recall curve is a similar session builder is to create and evaluate new lists of hotels to deter- evaluation measure that is based on recall and precision at different mine whether these lists will result in higher conversion probability threshold levels. An equivalent metric is the Average Precision than the original list. Figure 5 shows how this process is performed. (AP) which is the weighted mean of precisions achieved at each First, a reference session with the recommendations, given by an threshold, with the increase in recall from the previous threshold existing rule based system, is scored. For each of the proposed as the weight: hotels, we estimate the booking probability of each hotel using Õ the Hotel Model. Next, we can calculate the booking probability AP = (Rn − Rn−1 )Pn at session level, using the probabilities of the Hotel Model as an n input feature of the Session Model. Then, we aim to improve the Precision-recall curves are better for highlighting differences conversion probability of the session by removing one of the hotels between models for unbalanced datasets due to the fact that they from the list and introducing a new one. After including the new evaluate the fraction of true positives among positive instances. In hotel, if the booking probability of the current session is greater highly imbalanced settings, the AP curve will likely exhibit larger than the probability of the previous session, then this new hotel differences and will be more informative than the area under the list is the one that will be proposed to the user. ROC curve. Note that the relative ranking of the algorithms does A rule must be defined to select the hotel to remove and which not change since a curve dominates in ROC space if and only if it new hotel to introduce in the recommendation list. Once we have dominates in PR space [10, 30]. trained the Session Model, we can analyse the feature importance of the variables for the positive cases that were correctly classified (i.e. 4 DATA true positive cases). With the Local Interpretable Model-Agnostic 4.1 Hotel Recommendation Logs Explanations model [29], we can understand the behaviour of the The dataset in this study consists of 715,952 elements. Out of these model for these particular instances. Based on the importance of recommendations, there are a total of 3,588 clicks, which are consid- features from LIME, a heuristic can be defined to replace a hotel ered conversions. Therefore, the dataset is unbalanced since only from the list in order to improve the session conversion probability. 0.5% of the instances are session conversions. Note that the LIME analysis is performed only on true positive Each row contains information regarding the context of the ses- cases from the training set. In this dataset, the classes are highly sion, the recommended hotel, and whether the recommendation imbalanced due to a low conversion rate, as such standard feature led to a conversion. In particular, the features are the number of analysis techniques may be overly influenced by negative samples, recommendations (from 1 to 5), date of the recommendation, coun- i.e., sessions which did not result in clicks. As LIME is designed to try where the booking was made, country where the passenger is be used on individual decisions, a linear model is fitted and analysed traveling, hotel identifier, hotel provider identifier, price of the hotel for each true positive. The feature weights for each linear model are at time of the recommendation, price currency and whether the then averaged, given a feature importance ranking for all correctly recommendation led to a conversion. Additionally, the logs were classified converted sessions. enriched with supplementary information regarding each hotel including a hotel numerical rating (from 0 to 5), hotel categorical 3.9 Evaluation Metrics rating and the hotel chain. As with many conversion problems, the classes are highly imbal- anced, and as such the metrics used to assess performance must be 4.2 Passenger Name Record carefully chosen. In the travel industry, a Passenger Name Record (PNR) is the basic F-measure (F β ). The generalization of the F 1 metric is given by form of computerized travel record. A PNR is a set of data created [7]: when a travel reservation is made. PNRs include the travel itinerary Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 14 Figure 5: Sketch of the full recommendation pipeline. The session builder is designed to select hotels which will maximise the session conversion, based on the LIME feature importance of the session model. Figure 6: Representation of ROC and AP curves for two Random Forest models predicting individual hotel conversion with and without the PNR data. information (e.g., flights number, dates) and the passenger informa- 5 RESULTS tion (e.g., name, gender, and somethime passport details). A PNR Table 1 shows the results of the experiment comparing different may also include many other data elements such as payment infor- algorithms for the hotel model in terms of AUC, AP, F 1 and F 0.5 mation (currency, total price, etc), additional ancillary services sold scores. In Figure 6, the ROC and AP curves can be seen in detail. with the ticket (such as extra baggage and hotel reservation) and The low AUC value for the GLM model and Naive Bayes Classifier other airline related information (cabin code, special meal request, suggest that linear classification techniques do not lead to the best etc). results and more complex models are needed to correctly represent For the purpose of this study, we retrieve and extract features the data. The non-linear techniques have closer results, with the related with the air travel of the traveller. These include the date Random Forest obtaining the best values for AP, F 1 and F 0.5 . A of PNR creation, airline code, origin city, destination city, date of Stacked Ensemble using all the previous models is created but it departure, time of departure, date of arrival, time of arrival, days does not improve the previous outcome. between the departure and booking date, travel class, number of stops (if any), duration of the flight in minutes (including stops) 5.1 Contribution of PNR data and the number of days the passenger is staying at the destination. The PNR data is an important attribute since it contains rich at- tributes related to the trip and passenger. However, is this case personally identifiable information is not used in the recommender system, thus the PNR features help to provide context about the Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 15 Table 1: Summary of AUC, AP, F 1 and F 0.5 metrics for the Table 2: Summary of AUC, AP, F 1 and F 0.5 metrics for the hotel model. session model. Model AUC AP F1 F0.5 Model AUC AP F1 F0.5 GLM 0.625 0.128 0.247 0.274 GLM 0.822 0.395 0.520 0.538 NBC 0.819 0.058 0.175 0.159 NBC 0.933 0.342 0.467 0.408 RF 0.966 0.249 0.320 0.334 RF 0.971 0.446 0.529 0.508 GBM 0.953 0.210 0.294 0.288 GBM 0.958 0.383 0.531 0.542 NN 0.965 0.165 0.245 0.219 NN 0.967 0.433 0.483 0.467 STK (all) 0.924 0.182 0.271 0.288 STK (RF + GLM + NBC) 0.972 0.453 0.539 0.529 STK (RF + NN) 0.969 0.242 0.314 0.284 As can be seen in Figure 7, the most important features according trip rather than the traveller. Incorporating this data to the models to LIME are all derived from the hotel model: the standard deviation, substantially enhanced their performance, as can be observed in maximum, and average individual hotel conversion probabilities. Figure 6. Features of the PNR including the number of travellers Some features which are important to the model such as "market" in the booking and trip duration, among others, contributed to an (country where the booking is made from), the flight class of service, increase in area under the PR curve from 0.183 to 0.249. the destination city, and arrival and departure times of the flight can not be used to manipulate the results of the session builder, 5.2 Session Model as these are all part of context of the recommendation. Features After we have trained the hotel model, we predict individually the extracted from prices (the difference between the average price and probability of conversion of a hotel. Then, we create the sessions the minimum, and the ratio of the lowest price to the average price) based on 5 recommended hotels. are also considered important by the LIME model, but rank lower In Table 2 the results are shown. In this case, the best model for than many hotel conversion probability features. both AUC and AP is the Stacked Ensemble composed of a Random As the standard deviation of the individual hotel conversions Forest, a Generalized Linear Model and a Naïve Bayes Classifier. is the most important criteria, the following rule for the session Although the F 0.5 score of the GBM model is slightly better than the builder is defined: from the original hotel list remove one hotel STK model, the latter clearly outperforms the rest of the metrics. with the closest conversion probability to the mean conversion probability of the list, and replace it with the hotel with the high- 5.3 Feature Importance est conversion probability from the set of available hotels for a After the Session model has been trained, we analyse its feature im- particular city. portance to study which variables contribute the most to the model using LIME. Concretely, we evaluate the model on the true positive 5.4 Simulated conversion using Hotel List instances from the training dataset, since we want to optimise the Builder conversion. Results from the hotel list builder are shown in Table 3 for the two largest cities in the dataset and for the complete dataset. For both cities, we observe a large increase in conversion when using the LIME based session builder. However, a brute force approach to evaluating all possible lists does lead to higher conversion rates, at the cost of a significant increase in processing time. When we con- sider the complete dataset, we once again observe a large increase in conversion from the baseline for the LIME model. With respect to brute force, we observe that the LIME session builder performs much closer to the brute force builder in terms of conversion. This is attributed to the impact of smaller cities in the complete dataset, and thus less choice in hotels for the builders, resulting in the LIME session builder finding the optimal list. Additionally, on the com- plete dataset, the processing time of the brute force builder is 2.8 times the duration of the LIME builder, whereas larger gains were observed on the individual cities, where more options for hotels were available. 6 DISCUSSION In this study, an algorithm was created to improve hotel recom- Figure 7: Feature importance of the true positive cases from mendations based on historical hotel bookings and flight booking the Session Model using LIME. attributes. Different machine learning models are used in a cascaded Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 16 Table 3: Conversion rates and processing times for two large Journal of Machine Learning Research 12, Aug (2011), 2493–2537. cities and the complete dataset. The baseline performance is [9] Antónia Correia, Patricia Oom do Valle, and Cláudia Moço. 2007. Why people travel to exotic places. International Journal of Culture, Tourism and Hospitality given prior to any optimisation of the hotel lists, the LIME Research 1, 1 (2007), 45–61. based optimisation is compared to brute force. [10] Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning. ACM, 233–240. Nice Barcelona Complete [11] Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter- Base Conversion 0.0019 0 0.0005 pretable machine learning. (2017). [12] Expedia. 2013. Retail and Travel Site Visitation Aligns As Consumers Plan and Conversion LIME 0.0207 0.0089 0.0019 Book Vacation Packages. https://advertising.expedia.com/about/press-releases/ Conversion brute 0.0338 0.0125 0.0026 retail-and-travel-site-visitation-aligns-consumers-plan-and-book-vacation-packages Processing time LIME 23s 23s 4h48m [13] João Gama and Pavel Brazdil. 2000. Cascade generalization. Machine Learning 41, 3 (2000), 315–343. Processing time brute 314s 496s 13h36m [14] Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. fashion. First, a model estimates the conversion probability of the ACM, 364–379. individual hotels independently. Note that adding trip context, via [15] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col- laborative filtering recommendations. In Proceedings of the 2000 ACM conference PNR based features, resulted in better PR AUC. The output of the on Computer supported cooperative work. ACM, 241–250. first model is then combined with aggregates of the hotels in the [16] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, list in order to create a feature vector for the session model to es- Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech timate the conversion probability that any hotel in the list will be recognition: The shared views of four research groups. IEEE Signal Processing converted. LIME analysis revealed that the hotel model conversion Magazine 29, 6 (2012), 82–97. probabilities are the most important features, specifically the stan- [17] Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based item recommendation in e-commerce: on short-term intents, reminders, trends and dard deviation, mean and maximum individual hotel conversion discounts. User Modeling and User-Adapted Interaction 27, 3-5 (2017), 351–392. probabilities in the list. This allows for a simple heuristic to be [18] Ingrid Jeacle and Chris Carter. 2011. In TripAdvisor we trust: Rankings, calculative regimes and abstract systems. Accounting, Organizations and Society 36, 4 (2011), defined to increase the session conversion probability. In this study, 293–309. a single change is performed in the list of hotels, however this could [19] Michael Kenteris, Damianos Gavalas, and Aristides Mpitziopoulos. 2010. A mobile be expanded to allow multiple changes. tourism recommender system. In Computers and Communications (ISCC), 2010 IEEE Symposium on. IEEE, 840–845. Variations on this pipeline could also be considered, for instance [20] Dae-Young Kim, Yeong-Hyeon Hwang, and Daniel R Fesenmaier. 2005. Modeling LIME is used in this study for feature importance ranking in the ses- tourism advertising effectiveness. Journal of Travel Research 44, 1 (2005), 42–49. sion builder, however recently a similar methodology was proposed [21] Ron Kohavi, David H Wolpert, et al. 1996. Bias plus variance decomposition for zero-one loss functions. In ICML, Vol. 96. 275–83. using a mixture regression model referred to as LEMNA [14]. [22] Yann Le Cun, LD Jackel, B Boser, JS Denker, HP Graf, Isabelle Guyon, Don Here, the session builder relies on insights gained from analysis Henderson, RE Howard, and W Hubbard. 1989. Handwritten digit recognition: Applications of neural network chips and automatic learning. IEEE Communica- of the feature importance ranking of the session model using LIME tions Magazine 27, 11 (1989), 41–46. over all sessions which lead to a conversion. Thus, the same heuris- [23] Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft. 2012. Finding a tic is applied to all datapoints in the session builder. However, a key needle in a haystack of reviews: cold start context-based hotel recommender system. In Proceedings of the sixth ACM conference on Recommender systems. ACM, aspect of LIME is that it provides an interpretation of a model for a 115–122. single datapoint. As such, an evolution of the approach would be [24] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommenda- to compute the most important features for each recommendation tions: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003), 76–80. in real time, and to use the information to build an optimal hotel [25] Stanley Loh, Fabiana Lorenzi, Ramiro Saldaña, and Daniel Licthnow. 2003. A list based on the attributes most likely to lead to conversion. tourism recommender system based on collaboration and text analysis. Informa- tion Technology & Tourism 6, 3 (2003), 157–165. [26] Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending REFERENCES using learning for text categorization. In Proceedings of the fifth ACM conference [1] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja on Digital libraries. ACM, 195–204. Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification [27] Julia Neidhardt, Leonhard Seyfang, Rainer Schuster, and Hannes Werthner. 2014. decisions. Journal of Machine Learning Research 11, Jun (2010), 1803–1831. A picture-based approach to recommender systems. Information Technology & [2] Eric Bauer and Ron Kohavi. 1998. An empirical comparison of voting classification Tourism 15, 1 (sep 2014), 49–69. https://doi.org/10.1007/s40558-014-0017-5 algorithms: Bagging, boosting, and variants. Machine learning 36, 1 (1998), 2. [28] Andreas Papatheodorou. 2001. Why people travel to different places. Annals of [3] Yolanda Blanco-Fernandez, Jose J Pazos-Arias, Alberto Gil-Solla, Manuel Ramos- tourism research 28, 1 (2001), 164–179. Cabrer, and Martin Lopez-Nores. 2008. Providing entertainment by content- [29] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i based filtering and semantic reasoning in intelligent recommender systems. IEEE trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd Transactions on Consumer Electronics 54, 2 (2008). ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [4] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender ACM, 1135–1144. systems survey. Knowledge-Based Systems 46 (July 2013), 109–132. https://doi. [30] Takaya Saito and Marc Rehmsmeier. 2015. The precision-recall plot is more org/10.1016/j.knosys.2013.03.012 informative than the ROC plot when evaluating binary classifiers on imbalanced [5] Robin Burke and Maryam Ramezani. 2011. Matching recommendation technolo- datasets. PloS one 10, 3 (2015), e0118432. gies and domains. In Recommender systems handbook. Springer, 367–386. [31] Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock. [6] Marcirio Silveira Chaves, Rodrigo Gomes, and Cristiane Pedron. 2012. Analysing 2002. Methods and metrics for cold-start recommendations. In Proceedings of the reviews in the Web 2.0: Small and medium hotels in Portugal. Tourism Manage- 25th annual international ACM SIGIR conference on Research and development in ment 33, 5 (2012), 1286–1287. information retrieval. ACM, 253–260. [7] Nancy Chinchor. 1992. MUC-4 Evaluation Metrics. In Proceedings of the 4th Con- [32] Alfredo Vellido, José David Martín-Guerrero, and Paulo JG Lisboa. 2012. Making ference on Message Understanding (MUC4 ’92). Association for Computational Lin- machine learning models interpretable.. In ESANN, Vol. 12. Citeseer, 163–172. guistics, Stroudsburg, PA, USA, 22–29. https://doi.org/10.3115/1072064.1072067 [33] Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, and Devi Parikh. 2014. [8] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Predicting failures of vision systems. In Proceedings of the IEEE Conference on and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Computer Vision and Pattern Recognition. 3566–3573. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).