=Paper= {{Paper |id=Vol-2435/paper2 |storemode=property |title=Cascaded Machine Learning Model for Efficient Hotel Recommendations from Air Travel Bookings |pdfUrl=https://ceur-ws.org/Vol-2435/paper2.pdf |volume=Vol-2435 |authors=Eoin Thomas,Antonio Gonzalez Ferrer,Benoit Lardeux,Mourad Boudia,Christian Haas-Frangii,Rodrigo Acuna Agost |dblpUrl=https://dblp.org/rec/conf/rectour/ThomasFLBHA19 }} ==Cascaded Machine Learning Model for Efficient Hotel Recommendations from Air Travel Bookings== https://ceur-ws.org/Vol-2435/paper2.pdf

RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 9

Cascaded Machine Learning Model for Efficient Hotel
Recommendations from Air Travel Bookings
Eoin Thomas∗ Benoit Lardeux Mourad Boudia
Antonio Gonzalez Ferrer∗ Amadeus SAS Amadeus SAS
eoin.thomas@amadeus.com Sophia Antipolis, France Sophia Antipolis, France
Amadeus SAS
Sophia Antipolis, France

Christian Haas-Frangii Rodrigo Acuna Agost
Amadeus SAS Amadeus SAS
Sophia Antipolis, France Sophia Antipolis, France

ABSTRACT advertisements [20] or inform themselves by reading reviews [6, 18].
Recommending a hotel for vacations or a business trip can be a However, the Internet has overtaken word of mouth as the primary
challenging task due to the large number of alternatives and con- medium for choosing destinations [23] by guiding the user in a
siderations to take into account. In this study, a recommendation personalized way to interesting or useful products from a large
engine is designed to identify relevant hotels based on features of space of possible options.
the facilities and the context of the trip via flight information. The Many players have emerged in the past decades mediating the
system was designed as a cascaded machine learning pipeline, with communication between the consumers and the suppliers. One type
a model to predict the conversion probability of each hotel and an- of player is the Global Distribution System (GDS), which allows
other to predict the conversion of a set of hotels as presented to the customer-facing travel agencies (online or physical) to search and
traveller. By analysing the feature importance of the model based book content from most airlines and hotels. Increased conversion
on sets of hotels, we are able to construct optimal lists of hotels is a benefical goal for the supplier and broker as it implies more
by selecting individual hotels that will maximise the probability of revenue for a lower cost of operation, and for the traveller, as it
conversion. implies quicker decision making and thus less time spent on search
and shopping activities.
CCS CONCEPTS In this study, we aim to increase the conversion rate for hospi-
tality recommendations after users book air travel. In Section 2,
• Computing methodologies → Machine learning;
the problem is formulated in order to highlight the considera-
tions which separate this work from many recommender system
KEYWORDS paradigms. Section 3 presents the main techniques and concepts
Recommender systems, machine learning, hotels, conversion. used in this study. In Section 4, a brief overview is given of the indus-
try data used in this study. Section 5 discusses the results obtained
1 INTRODUCTION for different machine learning models including feature analysis.
In the United States, the travel industry is estimated to be the third A discussion of the main outcomes of this study is provided in
largest industry after the automotive and food sectors and con- Section 6.
tributes to approximately 5% of the gross domestic product. Travel
has experienced rapid growth as users are willing to pay for new 2 PROBLEM FORMULATION
experiences, unexpected situations, and moments of meditation
[9, 28], while the cost of travel has decreased over time in part due 2.1 Industry background
to low cost carriers and the sharing economy. At the same time, Booking a major holiday is typically a yearly or bi-yearly activity for
traditional travel players such as airlines, hotels, and travel agen- travellers, requiring research for destinations, activities and pricing.
cies, aim to increase revenue from these activities. The supply side According to a study from Expedia [12], on average, travellers
must identify its market segments, create the respective products visit 38 sites up to 45 days prior to booking. The travel sector is
with the right features and prices, and it has to find a distribution characterized by Burke and Ramezani [5] as a domain with the
channel. The traveller has to find the right product, its conditions, following factors:
its price and how and where to buy it. In fact, the vast quantity
• Low heterogeneity: the needs that the items can satisfy are
of information available to the users makes this selection more
not so diverse.
challenging.
• High risk: the price of items is comparatively high.
Finding the best alternative can become a complicated and time-
• Low churn: the relevance of items do not change rapidly.
consuming process. Consumers used to rely mostly on recommen-
• Explicit interaction style: the user needs to explicitly interact
dations from other people by word of mouth, known products from
with the system in order to add personal data. Although some
∗ Both authors contributed equally to this research. implicit preferences can be tracked from web activity and

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 10

past history, mainly the information obtained is gathered in to be designed. Figure 1 shows an outline of the rule-based rec-
an explicit way (e.g. when/where do you want to travel?). ommendation system currently in use. After a user books a flight,
• Unstable preferences: information collected from the past information related to the trip is sent to the recommender engine.
about the user might be no longer trustworthy today. However, this system does not take into account valuable in-
Researchers have tried to relate touristic behavioural patterns formation such as the context of the request (e.g. where did the
to psychological needs and expectations by 1) defining a charac- booking originate from?), details about the associated flight (e.g.
terization of travel personalities and 2) building a computational how many days is the user staying in the city?) nor historical rec-
model based on a proper description of these profiles [27]. Recom- ommendations (e.g. are similar users likely to book similar hotels?),
mender systems are a particular form of information filtering that which are key assets to fine tune the recommendations.
exploit past behaviours and user similarities. They have become The problem is novel due to the richness of available data sources
fundamental in e-commerce applications, providing suggestions (bookings, ratings, passenger information) and the variety of dis-
that adequately reduce large search spaces so that users are directed tribution channels: indirect through travel agencies or direct (web-
toward items that best meet their preferences. There are several site, mobile, mailbox). However, it is important to consider that
core techniques that are applied to predict whether an item is in by design, no personally identifiable information (PII) or traveller
fact useful to the user [4]. With a content-based approach, items specific history is used as part of the model, which therefore ex-
are recommended based on attributes of the items chosen by the cludes collaborative-filtering or content-based approaches. The
user in the past [3, 26]. In collaborative filtering techniques, rec- contributions of this work are:
ommendations to each user are based on information provided by • The combination of data feeds to generate the context of
similar users, typically without any characterization of the con- travel, including flights booked by traveller, historical hotels
tent [19, 24, 25]. More recentely, session-based recommenders have proposed and booked at destination by other travellers, and
been proposed, where content is selected based on previous activity hotel content information.
made by the user on a website or application [17]. • The definition of a 2-stage machine learning recommender
tailored for travel context. Two machine learning models are
2.2 Terminology required to build the new recommendation set. The output
In order to clearly define our goal, let us first define some terminol- of the first machine learning algorithm (prediction of the
ogy: probability of hotel booking) is a key input for the second
• Hotel Conversion: a hotel recommendation leads to a con- algorithm, based on the idea of [13].
version when the user books a specific hotel. • The comparison of several machine learning algorithms for
• Hotel Model: machine learning model trained to predict modelling the hospitality conversion in the travel industry.
the conversion probability of individual hotels. • The design and implementation of a recommendation builder
• Passenger Name Record (PNR): digital record that con- engine which generates the hotel recommendations that
tains information about the passenger data and flight details. maximize the conversion rate of the session. This engine is
• Session: after a traveller completes a flight booking through built based on the analysis of the feature importance of the
a reservation system, a session is defined by the context of session model at individual level [29].
the flight, the context of the reservation, and a set of five
recommended hotels proposed by the recommender system.
3 METHODOLOGY
• Session Conversion: a session leads to a conversion when 3.1 Pipeline
the user books any of the hotels suggested during the session. Using machine learning and the historical dataset of recommen-
• Session Model: machine learning model trained using fea- dations, we can train a model which is capable of predicting with
tures related with the session context and hotels, its output high confidence whether a proposed set of recommended hotels
is the conversion probability of the session. leads to a booking.
The end goal of the recommender system is to increase session Once we have fit the model, we can evaluate other combinations
conversion. We can estimate the probability of booking of a list of of hotels and recommend a list of hotels to the user that maximizes
hotels using the session model, and thus we can compare different the conversion. Instead of proposing a completely new set of hotels,
lists using the session model to determine the one which will max- we decide to modify the existing suggestions given by the existing
imise the probability of conversion of the session. Note that in this rule-based system. Our approach, shown in Figure 2, removes one
case conversion is defined as a selection or "click" of a hotel on the of the initial hotels and introduces an additional one that increases
interface, rather than a booking. the conversion probability:
We have identified two different ways to select the hotel that is
2.3 Hotel recommendations going to be introduced within the set of recommendations:
The content sold through a GDS is diverse, including flight seg- • We can create and evaluate all possible combinations and
ments, hotel stays, cruises, car rental, and airport-hotel transfers. choose the one with the highest conversion probability. This
The core GDS business concerns the delivery of appropriate travel means, each time one out of the five hotels from the initial
solutions to travel retailers. Therefore, state-of-the-art recommen- list is removed, and a new one from the pool of hotels is in-
dation engines capable of analysing historical bookings and au- serted. However, this brute force solution is computationally
tomatically recommending the appropriate travel solutions need inefficient and time-consuming (e.g., in Paris this results in

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 11

Figure 1: A hotel recommendation system. When a flight booking is completed, the flight details are passed to the hotel
recommender engine which selects a set of available hotels for the user based on historical hotel bookings, hotel facilities and
a corporate policy check.

Figure 2: The goal of the system is to improve the probability of conversion. To provide a better set of recommendations, the
session builder replaces hotels in the original list.

5*1,653 different combinations for a single swap, the length in complex models, allowing the switch from a multi-objective to a
of the list multiplied by the number of available hotels). univariate optimization problem when one feature is dominant.
• Alternatively, a hotel from the list of selected hotels can
be replaced with an available hotel, based on some criteria.
Typically, the criteria might be the price of the hotel room,
3.2 Cascade Generalization
or the average review score, or a combination of multiple Ensembling techniques consist in combining the decisions of multi-
indicators. In this work, the criteria used to optimise the ple classifiers in order to reduce the test error on unseen data. After
overall list of hotels is determined via feature analysis. studying the bias-variance decomposition of the error in bagging
and boosting, Kohavi observed that the reduction of the error is
Nevertheless, the last solution presents some challenges that mainly due to reduction in the variance [21]. An issue with boosting
need to be discussed and solved: is robustness to noise since noisy examples tend to be misclassified
(1) How to study the feature importance of complex non-linear and therefore the weight will increase for these examples [2]. A
models? new direction in ensemble methods was proposed by Gama and
(2) How to best interpret the feature importance in an unbal- Brazdil [13] called Cascade Generalization. The basic idea is to use
anced dataset? sequentially the set of classifiers (similarly to boosting), where at
(3) How many features should be used during the selection pro- each step, new attributes are added to the original data. The new
cess of building an optimal list? Initially, we are facing a attributes are derived from the probability class distribution given
multi-objective optimization problem since the choice of a by the base classifiers.
hotel for enhancing the conversion probability might depend There are several advantages of using cascade generalization
on different features. Furthermore, the existence of categor- over other ensemble algorithms:
ical features makes this optimization even harder. Can we
• The new attributes are continuous since they are probability
convert it into a univariate optimization problem?
class distributions.
The novelty of this study comes from the use of two related works • Each classifier has access to the original attributes and any
to address the above points. First, we design a two-stage cascaded new attribute included at lower levels is considered exactly
machine learning model [13] where the output probabilities of the in the same way as any of the original attributes.
first model are a new feature of the second one. Second, we interpret • It does not use internal cross validation which affects the
the feature importance of the positive instances (i.e. conversions) computational efficiency of the method.
with a local interpretable model-agnostic (LIME) technique [29]. • The new probabilities can act as a dimensionality reduc-
Thus, we can study the feature importance of particular instances tion technique. The relationship between the independent

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 12

features and the target variable are captured by these new Machines (GBMs) were used to evaluate Decision Tree based ensem-
attributes. bles and fully connected Neural Networks (NN) were also assessed.
Furthermore, the model ensembling technique of Stacking (STK)
As will be shown in further sections, this last point is a key
was also assessed. Stacking comprises of learning a linear model
aspect of the proposed system, as the probabilities generated by the
to predict the target variable based on the output probabilities of
hotel model can be used to directly select new hotels to include in
multiple machine learning algorithms as features.
the recommendation. However, the session model uses aggregated
features from the hotel model, and as such an interpretable feature
3.6 Hotel Model
analysis is required to determine how best to select hotels based
on their features. The first step is to train a machine learning model on individual
hotels, as shown is Figure 3. The features used for training this
model are not exclusively related to hotels, but also with the session
3.3 Interpretability in Machine Learning
and flight context. Evaluating this model, we get the probability
Machine learning has grown in popularity in the last decade by that a certain hotel will be booked for a given location. The model
producing more reliable, more accurate, and faster results in areas is learned by framing the problem as a supervised classification
such as speech recognition [16], natural language understanding problem, using the conversion (i.e. click) as a label. Note that for the
[8], and image processing [22]. Nevertheless, machine learning hotel model, the probabilities of conversion are independent of other
models act mostly as black boxes. That is, given an input the system hotels presented in the session. This leads to several advantages:
produces an output with little interpretable knowledge on how it
• Cold start problem: the model does not penalise items or
achieved that result. This necessity for interpretability comes from
users that have not been recommended yet, since no hotel
an incompleteness in the problem formalisation meaning that, for
identifier or personally identifiable information is used. [31].
certain problems, it is not enough to get the solution, but also how it
• Dimensionality reduction: the output probabilities of the
came to that answer [11]. Several studies on the interpretability for
hotel model can be interpreted as a feature that comprises
machine learning models can be found on the literature [1, 15, 32].
the relationship between the independent variables and the
target variable. This is a key concept of the Cascade Gen-
3.4 Local Interpretable Model-Agnostic eralization technique, thus the output of the hotel model is
Explanations (LIME) combined with the features to create the feature vector for
In this section, we focus on the work from Ribeiro et al. [29] called the session model, as shown in 4.
Local Interpretable Model-Agnostic Explanations. The Local In- Note that the features used as input to the hotel model are dis-
terpretable Model-Agnostic Explanations model explains the pre- cussed in Section 4.
dictions of any classifier (model-agnostic) in a interpretable and
faithful manner by learning an interpretable model locally around
the prediction:
• Interpretable. In the context of machine learning systems,
we define interpretability as the ability to explain or to
present in understandable terms to a human [11].
• Local fidelity. Global interpretability implies describing
the patterns present in the overall model, while local inter-
pretability describes the reasons for a specific decision on a
unique sample. For interpreting a specific observation, we Figure 3: Sketch of the Hotel Model. The machine learning
assume it is sufficient to understand how it behaves locally. model is trained to predict the probability that each hotel
• Model-agnostic. The goal is to provide a set of techniques will be booked.
that can be applied to any classifier or regressor in contrast
to other domain-specific techniques [33].
3.7 Session Model
In practice, LIME creates interpretable explanations for an in-
dividual sample by fitting a linear model to a set of perturbed The second machine learning model predicts whether a session
variations of the sample and the resulting predictions as output leads to a conversion or not, see Figure 4. A session is composed
from a complex-model. of five different hotels and the aim of the recommender system
is to propose a set of hotels that results in the user booking any
one of them. Aggregates of the features from the Hotel Model
3.5 Predictive Models (contextual, passenger, and hotel features) are used, as well as the
The selection of which machine learning model to use highly de- hotel probabilities obtained from the hotel model. The numerical
pends on the problem nature, constraints and limitations that are features related with the hotels are aggregated in different ways
trying to be solved. In this work, algorithms from different families (max, min, std and avg of price and probability for example). The
of machine learning were investigated. Specifically, the Naive Bayes features related with the context do not change (e.g. attributes about
Classifier (NB) and Generalised linear Model (GLM) were investi- the session or the flight) as these are identical for each element in
gated as linear models, Random Forests (RF), Gradient Boosting the session.

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 13

(1 + β 2 )PR
Fβ =
β 2P + R
β is a parameter that controls a balance between precision P and
recall R. When β = 1, F 1 comes to be equivalent to the harmonic
mean of P and R. If β > 1, F becomes more recall-oriented (by plac-
ing more emphasis on false negatives) and if β < 1, it becomes more
Figure 4: Sketch of the session model pipeline. This machine precision oriented (by attenuating the influence of false negatives).
learning model predicts the probability that a session leads Common used metrics are the F 2 and F 0.5 scores.
to a conversion, given a list of hotels. This is achieved using Area Under the ROC curve. The receiver operating characteris-
cascaded machine learning in which the hotel model predic- tic (ROC) curve is created by plotting the true positive rate (TPR)
tions are used as features to the session model. against the false positive rate (FPR) at various threshold levels. How-
ever, this can present an optimistic view of a classifier performance
3.8 Session Builder if there is a large skew in the class distribution because the metric
takes into account true negatives.
The Session Model estimates the conversion probability of the ses-
sion using contextual and content information. Thus, part of the Average Precision (AP). The precision-recall curve is a similar
session builder is to create and evaluate new lists of hotels to deter- evaluation measure that is based on recall and precision at different
mine whether these lists will result in higher conversion probability threshold levels. An equivalent metric is the Average Precision
than the original list. Figure 5 shows how this process is performed. (AP) which is the weighted mean of precisions achieved at each
First, a reference session with the recommendations, given by an threshold, with the increase in recall from the previous threshold
existing rule based system, is scored. For each of the proposed as the weight:
hotels, we estimate the booking probability of each hotel using Õ
the Hotel Model. Next, we can calculate the booking probability AP = (Rn − Rn−1 )Pn
at session level, using the probabilities of the Hotel Model as an n
input feature of the Session Model. Then, we aim to improve the Precision-recall curves are better for highlighting differences
conversion probability of the session by removing one of the hotels between models for unbalanced datasets due to the fact that they
from the list and introducing a new one. After including the new evaluate the fraction of true positives among positive instances. In
hotel, if the booking probability of the current session is greater highly imbalanced settings, the AP curve will likely exhibit larger
than the probability of the previous session, then this new hotel differences and will be more informative than the area under the
list is the one that will be proposed to the user. ROC curve. Note that the relative ranking of the algorithms does
A rule must be defined to select the hotel to remove and which not change since a curve dominates in ROC space if and only if it
new hotel to introduce in the recommendation list. Once we have dominates in PR space [10, 30].
trained the Session Model, we can analyse the feature importance of
the variables for the positive cases that were correctly classified (i.e. 4 DATA
true positive cases). With the Local Interpretable Model-Agnostic 4.1 Hotel Recommendation Logs
Explanations model [29], we can understand the behaviour of the
The dataset in this study consists of 715,952 elements. Out of these
model for these particular instances. Based on the importance of
recommendations, there are a total of 3,588 clicks, which are consid-
features from LIME, a heuristic can be defined to replace a hotel
ered conversions. Therefore, the dataset is unbalanced since only
from the list in order to improve the session conversion probability.
0.5% of the instances are session conversions.
Note that the LIME analysis is performed only on true positive
Each row contains information regarding the context of the ses-
cases from the training set. In this dataset, the classes are highly
sion, the recommended hotel, and whether the recommendation
imbalanced due to a low conversion rate, as such standard feature
led to a conversion. In particular, the features are the number of
analysis techniques may be overly influenced by negative samples,
recommendations (from 1 to 5), date of the recommendation, coun-
i.e., sessions which did not result in clicks. As LIME is designed to
try where the booking was made, country where the passenger is
be used on individual decisions, a linear model is fitted and analysed
traveling, hotel identifier, hotel provider identifier, price of the hotel
for each true positive. The feature weights for each linear model are
at time of the recommendation, price currency and whether the
then averaged, given a feature importance ranking for all correctly
recommendation led to a conversion. Additionally, the logs were
classified converted sessions.
enriched with supplementary information regarding each hotel
including a hotel numerical rating (from 0 to 5), hotel categorical
3.9 Evaluation Metrics
rating and the hotel chain.
As with many conversion problems, the classes are highly imbal-
anced, and as such the metrics used to assess performance must be 4.2 Passenger Name Record
carefully chosen.
In the travel industry, a Passenger Name Record (PNR) is the basic
F-measure (F β ). The generalization of the F 1 metric is given by form of computerized travel record. A PNR is a set of data created
[7]: when a travel reservation is made. PNRs include the travel itinerary

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 14

Figure 5: Sketch of the full recommendation pipeline. The session builder is designed to select hotels which will maximise the
session conversion, based on the LIME feature importance of the session model.

Figure 6: Representation of ROC and AP curves for two Random Forest models predicting individual hotel conversion with
and without the PNR data.

information (e.g., flights number, dates) and the passenger informa- 5 RESULTS
tion (e.g., name, gender, and somethime passport details). A PNR Table 1 shows the results of the experiment comparing different
may also include many other data elements such as payment infor- algorithms for the hotel model in terms of AUC, AP, F 1 and F 0.5
mation (currency, total price, etc), additional ancillary services sold scores. In Figure 6, the ROC and AP curves can be seen in detail.
with the ticket (such as extra baggage and hotel reservation) and The low AUC value for the GLM model and Naive Bayes Classifier
other airline related information (cabin code, special meal request, suggest that linear classification techniques do not lead to the best
etc). results and more complex models are needed to correctly represent
For the purpose of this study, we retrieve and extract features the data. The non-linear techniques have closer results, with the
related with the air travel of the traveller. These include the date Random Forest obtaining the best values for AP, F 1 and F 0.5 . A
of PNR creation, airline code, origin city, destination city, date of Stacked Ensemble using all the previous models is created but it
departure, time of departure, date of arrival, time of arrival, days does not improve the previous outcome.
between the departure and booking date, travel class, number of
stops (if any), duration of the flight in minutes (including stops)
5.1 Contribution of PNR data
and the number of days the passenger is staying at the destination.
The PNR data is an important attribute since it contains rich at-
tributes related to the trip and passenger. However, is this case
personally identifiable information is not used in the recommender
system, thus the PNR features help to provide context about the

Table 1: Summary of AUC, AP, F 1 and F 0.5 metrics for the Table 2: Summary of AUC, AP, F 1 and F 0.5 metrics for the
hotel model. session model.

Model AUC AP F1 F0.5 Model AUC AP F1 F0.5
GLM 0.625 0.128 0.247 0.274 GLM 0.822 0.395 0.520 0.538
NBC 0.819 0.058 0.175 0.159 NBC 0.933 0.342 0.467 0.408
RF 0.966 0.249 0.320 0.334 RF 0.971 0.446 0.529 0.508
GBM 0.953 0.210 0.294 0.288 GBM 0.958 0.383 0.531 0.542
NN 0.965 0.165 0.245 0.219 NN 0.967 0.433 0.483 0.467
STK (all) 0.924 0.182 0.271 0.288 STK (RF + GLM + NBC) 0.972 0.453 0.539 0.529
STK (RF + NN) 0.969 0.242 0.314 0.284

As can be seen in Figure 7, the most important features according
trip rather than the traveller. Incorporating this data to the models to LIME are all derived from the hotel model: the standard deviation,
substantially enhanced their performance, as can be observed in maximum, and average individual hotel conversion probabilities.
Figure 6. Features of the PNR including the number of travellers Some features which are important to the model such as "market"
in the booking and trip duration, among others, contributed to an (country where the booking is made from), the flight class of service,
increase in area under the PR curve from 0.183 to 0.249. the destination city, and arrival and departure times of the flight
can not be used to manipulate the results of the session builder,
5.2 Session Model as these are all part of context of the recommendation. Features
After we have trained the hotel model, we predict individually the extracted from prices (the difference between the average price and
probability of conversion of a hotel. Then, we create the sessions the minimum, and the ratio of the lowest price to the average price)
based on 5 recommended hotels. are also considered important by the LIME model, but rank lower
In Table 2 the results are shown. In this case, the best model for than many hotel conversion probability features.
both AUC and AP is the Stacked Ensemble composed of a Random As the standard deviation of the individual hotel conversions
Forest, a Generalized Linear Model and a Naïve Bayes Classifier. is the most important criteria, the following rule for the session
Although the F 0.5 score of the GBM model is slightly better than the builder is defined: from the original hotel list remove one hotel
STK model, the latter clearly outperforms the rest of the metrics. with the closest conversion probability to the mean conversion
probability of the list, and replace it with the hotel with the high-
5.3 Feature Importance est conversion probability from the set of available hotels for a
After the Session model has been trained, we analyse its feature im- particular city.
portance to study which variables contribute the most to the model
using LIME. Concretely, we evaluate the model on the true positive 5.4 Simulated conversion using Hotel List
instances from the training dataset, since we want to optimise the Builder
conversion. Results from the hotel list builder are shown in Table 3 for the two
largest cities in the dataset and for the complete dataset. For both
cities, we observe a large increase in conversion when using the
LIME based session builder. However, a brute force approach to
evaluating all possible lists does lead to higher conversion rates, at
the cost of a significant increase in processing time. When we con-
sider the complete dataset, we once again observe a large increase
in conversion from the baseline for the LIME model. With respect
to brute force, we observe that the LIME session builder performs
much closer to the brute force builder in terms of conversion. This
is attributed to the impact of smaller cities in the complete dataset,
and thus less choice in hotels for the builders, resulting in the LIME
session builder finding the optimal list. Additionally, on the com-
plete dataset, the processing time of the brute force builder is 2.8
times the duration of the LIME builder, whereas larger gains were
observed on the individual cities, where more options for hotels
were available.

6 DISCUSSION
In this study, an algorithm was created to improve hotel recom-
Figure 7: Feature importance of the true positive cases from mendations based on historical hotel bookings and flight booking
the Session Model using LIME. attributes. Different machine learning models are used in a cascaded

Table 3: Conversion rates and processing times for two large Journal of Machine Learning Research 12, Aug (2011), 2493–2537.
cities and the complete dataset. The baseline performance is [9] Antónia Correia, Patricia Oom do Valle, and Cláudia Moço. 2007. Why people
travel to exotic places. International Journal of Culture, Tourism and Hospitality
given prior to any optimisation of the hotel lists, the LIME Research 1, 1 (2007), 45–61.
based optimisation is compared to brute force. [10] Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall
and ROC curves. In Proceedings of the 23rd international conference on Machine
learning. ACM, 233–240.
Nice Barcelona Complete [11] Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter-
Base Conversion 0.0019 0 0.0005 pretable machine learning. (2017).
[12] Expedia. 2013. Retail and Travel Site Visitation Aligns As Consumers Plan and
Conversion LIME 0.0207 0.0089 0.0019 Book Vacation Packages. https://advertising.expedia.com/about/press-releases/
Conversion brute 0.0338 0.0125 0.0026 retail-and-travel-site-visitation-aligns-consumers-plan-and-book-vacation-packages
Processing time LIME 23s 23s 4h48m [13] João Gama and Pavel Brazdil. 2000. Cascade generalization. Machine Learning
41, 3 (2000), 315–343.
Processing time brute 314s 496s 13h36m [14] Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018.
Lemna: Explaining deep learning based security applications. In Proceedings of
the 2018 ACM SIGSAC Conference on Computer and Communications Security.
fashion. First, a model estimates the conversion probability of the ACM, 364–379.
individual hotels independently. Note that adding trip context, via [15] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col-
laborative filtering recommendations. In Proceedings of the 2000 ACM conference
PNR based features, resulted in better PR AUC. The output of the on Computer supported cooperative work. ACM, 241–250.
first model is then combined with aggregates of the hotels in the [16] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed,
list in order to create a feature vector for the session model to es- Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N
Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech
timate the conversion probability that any hotel in the list will be recognition: The shared views of four research groups. IEEE Signal Processing
converted. LIME analysis revealed that the hotel model conversion Magazine 29, 6 (2012), 82–97.
probabilities are the most important features, specifically the stan- [17] Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based item
recommendation in e-commerce: on short-term intents, reminders, trends and
dard deviation, mean and maximum individual hotel conversion discounts. User Modeling and User-Adapted Interaction 27, 3-5 (2017), 351–392.
probabilities in the list. This allows for a simple heuristic to be [18] Ingrid Jeacle and Chris Carter. 2011. In TripAdvisor we trust: Rankings, calculative
regimes and abstract systems. Accounting, Organizations and Society 36, 4 (2011),
defined to increase the session conversion probability. In this study, 293–309.
a single change is performed in the list of hotels, however this could [19] Michael Kenteris, Damianos Gavalas, and Aristides Mpitziopoulos. 2010. A mobile
be expanded to allow multiple changes. tourism recommender system. In Computers and Communications (ISCC), 2010
IEEE Symposium on. IEEE, 840–845.
Variations on this pipeline could also be considered, for instance [20] Dae-Young Kim, Yeong-Hyeon Hwang, and Daniel R Fesenmaier. 2005. Modeling
LIME is used in this study for feature importance ranking in the ses- tourism advertising effectiveness. Journal of Travel Research 44, 1 (2005), 42–49.
sion builder, however recently a similar methodology was proposed [21] Ron Kohavi, David H Wolpert, et al. 1996. Bias plus variance decomposition for
zero-one loss functions. In ICML, Vol. 96. 275–83.
using a mixture regression model referred to as LEMNA [14]. [22] Yann Le Cun, LD Jackel, B Boser, JS Denker, HP Graf, Isabelle Guyon, Don
Here, the session builder relies on insights gained from analysis Henderson, RE Howard, and W Hubbard. 1989. Handwritten digit recognition:
Applications of neural network chips and automatic learning. IEEE Communica-
of the feature importance ranking of the session model using LIME tions Magazine 27, 11 (1989), 41–46.
over all sessions which lead to a conversion. Thus, the same heuris- [23] Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft. 2012. Finding a
tic is applied to all datapoints in the session builder. However, a key needle in a haystack of reviews: cold start context-based hotel recommender
system. In Proceedings of the sixth ACM conference on Recommender systems. ACM,
aspect of LIME is that it provides an interpretation of a model for a 115–122.
single datapoint. As such, an evolution of the approach would be [24] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommenda-
to compute the most important features for each recommendation tions: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003),
76–80.
in real time, and to use the information to build an optimal hotel [25] Stanley Loh, Fabiana Lorenzi, Ramiro Saldaña, and Daniel Licthnow. 2003. A
list based on the attributes most likely to lead to conversion. tourism recommender system based on collaboration and text analysis. Informa-
tion Technology & Tourism 6, 3 (2003), 157–165.
[26] Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending
REFERENCES using learning for text categorization. In Proceedings of the fifth ACM conference
[1] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja on Digital libraries. ACM, 195–204.
Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification [27] Julia Neidhardt, Leonhard Seyfang, Rainer Schuster, and Hannes Werthner. 2014.
decisions. Journal of Machine Learning Research 11, Jun (2010), 1803–1831. A picture-based approach to recommender systems. Information Technology &
[2] Eric Bauer and Ron Kohavi. 1998. An empirical comparison of voting classification Tourism 15, 1 (sep 2014), 49–69. https://doi.org/10.1007/s40558-014-0017-5
algorithms: Bagging, boosting, and variants. Machine learning 36, 1 (1998), 2. [28] Andreas Papatheodorou. 2001. Why people travel to different places. Annals of
[3] Yolanda Blanco-Fernandez, Jose J Pazos-Arias, Alberto Gil-Solla, Manuel Ramos- tourism research 28, 1 (2001), 164–179.
Cabrer, and Martin Lopez-Nores. 2008. Providing entertainment by content- [29] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i
based filtering and semantic reasoning in intelligent recommender systems. IEEE trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd
Transactions on Consumer Electronics 54, 2 (2008). ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[4] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender ACM, 1135–1144.
systems survey. Knowledge-Based Systems 46 (July 2013), 109–132. https://doi. [30] Takaya Saito and Marc Rehmsmeier. 2015. The precision-recall plot is more
org/10.1016/j.knosys.2013.03.012 informative than the ROC plot when evaluating binary classifiers on imbalanced
[5] Robin Burke and Maryam Ramezani. 2011. Matching recommendation technolo- datasets. PloS one 10, 3 (2015), e0118432.
gies and domains. In Recommender systems handbook. Springer, 367–386. [31] Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock.
[6] Marcirio Silveira Chaves, Rodrigo Gomes, and Cristiane Pedron. 2012. Analysing 2002. Methods and metrics for cold-start recommendations. In Proceedings of the
reviews in the Web 2.0: Small and medium hotels in Portugal. Tourism Manage- 25th annual international ACM SIGIR conference on Research and development in
ment 33, 5 (2012), 1286–1287. information retrieval. ACM, 253–260.
[7] Nancy Chinchor. 1992. MUC-4 Evaluation Metrics. In Proceedings of the 4th Con- [32] Alfredo Vellido, José David Martín-Guerrero, and Paulo JG Lisboa. 2012. Making
ference on Message Understanding (MUC4 ’92). Association for Computational Lin- machine learning models interpretable.. In ESANN, Vol. 12. Citeseer, 163–172.
guistics, Stroudsburg, PA, USA, 22–29. https://doi.org/10.3115/1072064.1072067 [33] Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, and Devi Parikh. 2014.
[8] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Predicting failures of vision systems. In Proceedings of the IEEE Conference on
and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Computer Vision and Pattern Recognition. 3566–3573.