RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                 9


               Cascaded Machine Learning Model for Efficient Hotel
                   Recommendations from Air Travel Bookings
              Eoin Thomas∗                                        Benoit Lardeux                                   Mourad Boudia
         Antonio Gonzalez Ferrer∗                                 Amadeus SAS                                       Amadeus SAS
           eoin.thomas@amadeus.com                            Sophia Antipolis, France                          Sophia Antipolis, France
                  Amadeus SAS
            Sophia Antipolis, France

                                         Christian Haas-Frangii                       Rodrigo Acuna Agost
                                               Amadeus SAS                                 Amadeus SAS
                                           Sophia Antipolis, France                    Sophia Antipolis, France

ABSTRACT                                                                         advertisements [20] or inform themselves by reading reviews [6, 18].
Recommending a hotel for vacations or a business trip can be a                   However, the Internet has overtaken word of mouth as the primary
challenging task due to the large number of alternatives and con-                medium for choosing destinations [23] by guiding the user in a
siderations to take into account. In this study, a recommendation                personalized way to interesting or useful products from a large
engine is designed to identify relevant hotels based on features of              space of possible options.
the facilities and the context of the trip via flight information. The              Many players have emerged in the past decades mediating the
system was designed as a cascaded machine learning pipeline, with                communication between the consumers and the suppliers. One type
a model to predict the conversion probability of each hotel and an-              of player is the Global Distribution System (GDS), which allows
other to predict the conversion of a set of hotels as presented to the           customer-facing travel agencies (online or physical) to search and
traveller. By analysing the feature importance of the model based                book content from most airlines and hotels. Increased conversion
on sets of hotels, we are able to construct optimal lists of hotels              is a benefical goal for the supplier and broker as it implies more
by selecting individual hotels that will maximise the probability of             revenue for a lower cost of operation, and for the traveller, as it
conversion.                                                                      implies quicker decision making and thus less time spent on search
                                                                                 and shopping activities.
CCS CONCEPTS                                                                        In this study, we aim to increase the conversion rate for hospi-
                                                                                 tality recommendations after users book air travel. In Section 2,
• Computing methodologies → Machine learning;
                                                                                 the problem is formulated in order to highlight the considera-
                                                                                 tions which separate this work from many recommender system
KEYWORDS                                                                         paradigms. Section 3 presents the main techniques and concepts
Recommender systems, machine learning, hotels, conversion.                       used in this study. In Section 4, a brief overview is given of the indus-
                                                                                 try data used in this study. Section 5 discusses the results obtained
1    INTRODUCTION                                                                for different machine learning models including feature analysis.
In the United States, the travel industry is estimated to be the third           A discussion of the main outcomes of this study is provided in
largest industry after the automotive and food sectors and con-                  Section 6.
tributes to approximately 5% of the gross domestic product. Travel
has experienced rapid growth as users are willing to pay for new                 2 PROBLEM FORMULATION
experiences, unexpected situations, and moments of meditation
[9, 28], while the cost of travel has decreased over time in part due            2.1 Industry background
to low cost carriers and the sharing economy. At the same time,                  Booking a major holiday is typically a yearly or bi-yearly activity for
traditional travel players such as airlines, hotels, and travel agen-            travellers, requiring research for destinations, activities and pricing.
cies, aim to increase revenue from these activities. The supply side             According to a study from Expedia [12], on average, travellers
must identify its market segments, create the respective products                visit 38 sites up to 45 days prior to booking. The travel sector is
with the right features and prices, and it has to find a distribution            characterized by Burke and Ramezani [5] as a domain with the
channel. The traveller has to find the right product, its conditions,            following factors:
its price and how and where to buy it. In fact, the vast quantity
                                                                                     • Low heterogeneity: the needs that the items can satisfy are
of information available to the users makes this selection more
                                                                                       not so diverse.
challenging.
                                                                                     • High risk: the price of items is comparatively high.
   Finding the best alternative can become a complicated and time-
                                                                                     • Low churn: the relevance of items do not change rapidly.
consuming process. Consumers used to rely mostly on recommen-
                                                                                     • Explicit interaction style: the user needs to explicitly interact
dations from other people by word of mouth, known products from
                                                                                       with the system in order to add personal data. Although some
∗ Both authors contributed equally to this research.                                   implicit preferences can be tracked from web activity and


          Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                            10


       past history, mainly the information obtained is gathered in            to be designed. Figure 1 shows an outline of the rule-based rec-
       an explicit way (e.g. when/where do you want to travel?).               ommendation system currently in use. After a user books a flight,
     • Unstable preferences: information collected from the past               information related to the trip is sent to the recommender engine.
       about the user might be no longer trustworthy today.                       However, this system does not take into account valuable in-
   Researchers have tried to relate touristic behavioural patterns             formation such as the context of the request (e.g. where did the
to psychological needs and expectations by 1) defining a charac-               booking originate from?), details about the associated flight (e.g.
terization of travel personalities and 2) building a computational             how many days is the user staying in the city?) nor historical rec-
model based on a proper description of these profiles [27]. Recom-             ommendations (e.g. are similar users likely to book similar hotels?),
mender systems are a particular form of information filtering that             which are key assets to fine tune the recommendations.
exploit past behaviours and user similarities. They have become                   The problem is novel due to the richness of available data sources
fundamental in e-commerce applications, providing suggestions                  (bookings, ratings, passenger information) and the variety of dis-
that adequately reduce large search spaces so that users are directed          tribution channels: indirect through travel agencies or direct (web-
toward items that best meet their preferences. There are several               site, mobile, mailbox). However, it is important to consider that
core techniques that are applied to predict whether an item is in              by design, no personally identifiable information (PII) or traveller
fact useful to the user [4]. With a content-based approach, items              specific history is used as part of the model, which therefore ex-
are recommended based on attributes of the items chosen by the                 cludes collaborative-filtering or content-based approaches. The
user in the past [3, 26]. In collaborative filtering techniques, rec-          contributions of this work are:
ommendations to each user are based on information provided by                      • The combination of data feeds to generate the context of
similar users, typically without any characterization of the con-                     travel, including flights booked by traveller, historical hotels
tent [19, 24, 25]. More recentely, session-based recommenders have                    proposed and booked at destination by other travellers, and
been proposed, where content is selected based on previous activity                   hotel content information.
made by the user on a website or application [17].                                  • The definition of a 2-stage machine learning recommender
                                                                                      tailored for travel context. Two machine learning models are
2.2    Terminology                                                                    required to build the new recommendation set. The output
In order to clearly define our goal, let us first define some terminol-               of the first machine learning algorithm (prediction of the
ogy:                                                                                  probability of hotel booking) is a key input for the second
      • Hotel Conversion: a hotel recommendation leads to a con-                      algorithm, based on the idea of [13].
        version when the user books a specific hotel.                               • The comparison of several machine learning algorithms for
      • Hotel Model: machine learning model trained to predict                        modelling the hospitality conversion in the travel industry.
        the conversion probability of individual hotels.                            • The design and implementation of a recommendation builder
      • Passenger Name Record (PNR): digital record that con-                         engine which generates the hotel recommendations that
        tains information about the passenger data and flight details.                maximize the conversion rate of the session. This engine is
      • Session: after a traveller completes a flight booking through                 built based on the analysis of the feature importance of the
        a reservation system, a session is defined by the context of                  session model at individual level [29].
        the flight, the context of the reservation, and a set of five
        recommended hotels proposed by the recommender system.
                                                                               3 METHODOLOGY
      • Session Conversion: a session leads to a conversion when               3.1 Pipeline
        the user books any of the hotels suggested during the session.         Using machine learning and the historical dataset of recommen-
      • Session Model: machine learning model trained using fea-               dations, we can train a model which is capable of predicting with
        tures related with the session context and hotels, its output          high confidence whether a proposed set of recommended hotels
        is the conversion probability of the session.                          leads to a booking.
    The end goal of the recommender system is to increase session                 Once we have fit the model, we can evaluate other combinations
conversion. We can estimate the probability of booking of a list of            of hotels and recommend a list of hotels to the user that maximizes
hotels using the session model, and thus we can compare different              the conversion. Instead of proposing a completely new set of hotels,
lists using the session model to determine the one which will max-             we decide to modify the existing suggestions given by the existing
imise the probability of conversion of the session. Note that in this          rule-based system. Our approach, shown in Figure 2, removes one
case conversion is defined as a selection or "click" of a hotel on the         of the initial hotels and introduces an additional one that increases
interface, rather than a booking.                                              the conversion probability:
                                                                                  We have identified two different ways to select the hotel that is
2.3    Hotel recommendations                                                   going to be introduced within the set of recommendations:
The content sold through a GDS is diverse, including flight seg-                    • We can create and evaluate all possible combinations and
ments, hotel stays, cruises, car rental, and airport-hotel transfers.                  choose the one with the highest conversion probability. This
The core GDS business concerns the delivery of appropriate travel                      means, each time one out of the five hotels from the initial
solutions to travel retailers. Therefore, state-of-the-art recommen-                   list is removed, and a new one from the pool of hotels is in-
dation engines capable of analysing historical bookings and au-                        serted. However, this brute force solution is computationally
tomatically recommending the appropriate travel solutions need                         inefficient and time-consuming (e.g., in Paris this results in


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                            11


Figure 1: A hotel recommendation system. When a flight booking is completed, the flight details are passed to the hotel
recommender engine which selects a set of available hotels for the user based on historical hotel bookings, hotel facilities and
a corporate policy check.


Figure 2: The goal of the system is to improve the probability of conversion. To provide a better set of recommendations, the
session builder replaces hotels in the original list.


      5*1,653 different combinations for a single swap, the length             in complex models, allowing the switch from a multi-objective to a
      of the list multiplied by the number of available hotels).               univariate optimization problem when one feature is dominant.
    • Alternatively, a hotel from the list of selected hotels can
      be replaced with an available hotel, based on some criteria.
      Typically, the criteria might be the price of the hotel room,
                                                                               3.2     Cascade Generalization
      or the average review score, or a combination of multiple                Ensembling techniques consist in combining the decisions of multi-
      indicators. In this work, the criteria used to optimise the              ple classifiers in order to reduce the test error on unseen data. After
      overall list of hotels is determined via feature analysis.               studying the bias-variance decomposition of the error in bagging
                                                                               and boosting, Kohavi observed that the reduction of the error is
  Nevertheless, the last solution presents some challenges that                mainly due to reduction in the variance [21]. An issue with boosting
need to be discussed and solved:                                               is robustness to noise since noisy examples tend to be misclassified
   (1) How to study the feature importance of complex non-linear               and therefore the weight will increase for these examples [2]. A
       models?                                                                 new direction in ensemble methods was proposed by Gama and
   (2) How to best interpret the feature importance in an unbal-               Brazdil [13] called Cascade Generalization. The basic idea is to use
       anced dataset?                                                          sequentially the set of classifiers (similarly to boosting), where at
   (3) How many features should be used during the selection pro-              each step, new attributes are added to the original data. The new
       cess of building an optimal list? Initially, we are facing a            attributes are derived from the probability class distribution given
       multi-objective optimization problem since the choice of a              by the base classifiers.
       hotel for enhancing the conversion probability might depend                 There are several advantages of using cascade generalization
       on different features. Furthermore, the existence of categor-           over other ensemble algorithms:
       ical features makes this optimization even harder. Can we
                                                                                     • The new attributes are continuous since they are probability
       convert it into a univariate optimization problem?
                                                                                       class distributions.
    The novelty of this study comes from the use of two related works                • Each classifier has access to the original attributes and any
to address the above points. First, we design a two-stage cascaded                     new attribute included at lower levels is considered exactly
machine learning model [13] where the output probabilities of the                      in the same way as any of the original attributes.
first model are a new feature of the second one. Second, we interpret                • It does not use internal cross validation which affects the
the feature importance of the positive instances (i.e. conversions)                    computational efficiency of the method.
with a local interpretable model-agnostic (LIME) technique [29].                     • The new probabilities can act as a dimensionality reduc-
Thus, we can study the feature importance of particular instances                      tion technique. The relationship between the independent


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                              12


        features and the target variable are captured by these new              Machines (GBMs) were used to evaluate Decision Tree based ensem-
        attributes.                                                             bles and fully connected Neural Networks (NN) were also assessed.
                                                                                Furthermore, the model ensembling technique of Stacking (STK)
   As will be shown in further sections, this last point is a key
                                                                                was also assessed. Stacking comprises of learning a linear model
aspect of the proposed system, as the probabilities generated by the
                                                                                to predict the target variable based on the output probabilities of
hotel model can be used to directly select new hotels to include in
                                                                                multiple machine learning algorithms as features.
the recommendation. However, the session model uses aggregated
features from the hotel model, and as such an interpretable feature
                                                                                3.6    Hotel Model
analysis is required to determine how best to select hotels based
on their features.                                                              The first step is to train a machine learning model on individual
                                                                                hotels, as shown is Figure 3. The features used for training this
                                                                                model are not exclusively related to hotels, but also with the session
3.3     Interpretability in Machine Learning
                                                                                and flight context. Evaluating this model, we get the probability
Machine learning has grown in popularity in the last decade by                  that a certain hotel will be booked for a given location. The model
producing more reliable, more accurate, and faster results in areas             is learned by framing the problem as a supervised classification
such as speech recognition [16], natural language understanding                 problem, using the conversion (i.e. click) as a label. Note that for the
[8], and image processing [22]. Nevertheless, machine learning                  hotel model, the probabilities of conversion are independent of other
models act mostly as black boxes. That is, given an input the system            hotels presented in the session. This leads to several advantages:
produces an output with little interpretable knowledge on how it
                                                                                     • Cold start problem: the model does not penalise items or
achieved that result. This necessity for interpretability comes from
                                                                                       users that have not been recommended yet, since no hotel
an incompleteness in the problem formalisation meaning that, for
                                                                                       identifier or personally identifiable information is used. [31].
certain problems, it is not enough to get the solution, but also how it
                                                                                     • Dimensionality reduction: the output probabilities of the
came to that answer [11]. Several studies on the interpretability for
                                                                                       hotel model can be interpreted as a feature that comprises
machine learning models can be found on the literature [1, 15, 32].
                                                                                       the relationship between the independent variables and the
                                                                                       target variable. This is a key concept of the Cascade Gen-
3.4     Local Interpretable Model-Agnostic                                             eralization technique, thus the output of the hotel model is
        Explanations (LIME)                                                            combined with the features to create the feature vector for
In this section, we focus on the work from Ribeiro et al. [29] called                  the session model, as shown in 4.
Local Interpretable Model-Agnostic Explanations. The Local In-                      Note that the features used as input to the hotel model are dis-
terpretable Model-Agnostic Explanations model explains the pre-                 cussed in Section 4.
dictions of any classifier (model-agnostic) in a interpretable and
faithful manner by learning an interpretable model locally around
the prediction:
      • Interpretable. In the context of machine learning systems,
        we define interpretability as the ability to explain or to
        present in understandable terms to a human [11].
      • Local fidelity. Global interpretability implies describing
        the patterns present in the overall model, while local inter-
        pretability describes the reasons for a specific decision on a
        unique sample. For interpreting a specific observation, we              Figure 3: Sketch of the Hotel Model. The machine learning
        assume it is sufficient to understand how it behaves locally.           model is trained to predict the probability that each hotel
      • Model-agnostic. The goal is to provide a set of techniques              will be booked.
        that can be applied to any classifier or regressor in contrast
        to other domain-specific techniques [33].
                                                                                3.7    Session Model
   In practice, LIME creates interpretable explanations for an in-
dividual sample by fitting a linear model to a set of perturbed                 The second machine learning model predicts whether a session
variations of the sample and the resulting predictions as output                leads to a conversion or not, see Figure 4. A session is composed
from a complex-model.                                                           of five different hotels and the aim of the recommender system
                                                                                is to propose a set of hotels that results in the user booking any
                                                                                one of them. Aggregates of the features from the Hotel Model
3.5     Predictive Models                                                       (contextual, passenger, and hotel features) are used, as well as the
The selection of which machine learning model to use highly de-                 hotel probabilities obtained from the hotel model. The numerical
pends on the problem nature, constraints and limitations that are               features related with the hotels are aggregated in different ways
trying to be solved. In this work, algorithms from different families           (max, min, std and avg of price and probability for example). The
of machine learning were investigated. Specifically, the Naive Bayes            features related with the context do not change (e.g. attributes about
Classifier (NB) and Generalised linear Model (GLM) were investi-                the session or the flight) as these are identical for each element in
gated as linear models, Random Forests (RF), Gradient Boosting                  the session.


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                               13


                                                                                                               (1 + β 2 )PR
                                                                                                          Fβ =
                                                                                                                 β 2P + R
                                                                                  β is a parameter that controls a balance between precision P and
                                                                               recall R. When β = 1, F 1 comes to be equivalent to the harmonic
                                                                               mean of P and R. If β > 1, F becomes more recall-oriented (by plac-
                                                                               ing more emphasis on false negatives) and if β < 1, it becomes more
Figure 4: Sketch of the session model pipeline. This machine                   precision oriented (by attenuating the influence of false negatives).
learning model predicts the probability that a session leads                   Common used metrics are the F 2 and F 0.5 scores.
to a conversion, given a list of hotels. This is achieved using                    Area Under the ROC curve. The receiver operating characteris-
cascaded machine learning in which the hotel model predic-                     tic (ROC) curve is created by plotting the true positive rate (TPR)
tions are used as features to the session model.                               against the false positive rate (FPR) at various threshold levels. How-
                                                                               ever, this can present an optimistic view of a classifier performance
3.8    Session Builder                                                         if there is a large skew in the class distribution because the metric
                                                                               takes into account true negatives.
The Session Model estimates the conversion probability of the ses-
sion using contextual and content information. Thus, part of the                  Average Precision (AP). The precision-recall curve is a similar
session builder is to create and evaluate new lists of hotels to deter-        evaluation measure that is based on recall and precision at different
mine whether these lists will result in higher conversion probability          threshold levels. An equivalent metric is the Average Precision
than the original list. Figure 5 shows how this process is performed.          (AP) which is the weighted mean of precisions achieved at each
First, a reference session with the recommendations, given by an               threshold, with the increase in recall from the previous threshold
existing rule based system, is scored. For each of the proposed                as the weight:
hotels, we estimate the booking probability of each hotel using                                            Õ
the Hotel Model. Next, we can calculate the booking probability                                     AP =      (Rn − Rn−1 )Pn
at session level, using the probabilities of the Hotel Model as an                                            n
input feature of the Session Model. Then, we aim to improve the                   Precision-recall curves are better for highlighting differences
conversion probability of the session by removing one of the hotels            between models for unbalanced datasets due to the fact that they
from the list and introducing a new one. After including the new               evaluate the fraction of true positives among positive instances. In
hotel, if the booking probability of the current session is greater            highly imbalanced settings, the AP curve will likely exhibit larger
than the probability of the previous session, then this new hotel              differences and will be more informative than the area under the
list is the one that will be proposed to the user.                             ROC curve. Note that the relative ranking of the algorithms does
    A rule must be defined to select the hotel to remove and which             not change since a curve dominates in ROC space if and only if it
new hotel to introduce in the recommendation list. Once we have                dominates in PR space [10, 30].
trained the Session Model, we can analyse the feature importance of
the variables for the positive cases that were correctly classified (i.e.      4 DATA
true positive cases). With the Local Interpretable Model-Agnostic              4.1 Hotel Recommendation Logs
Explanations model [29], we can understand the behaviour of the
                                                                               The dataset in this study consists of 715,952 elements. Out of these
model for these particular instances. Based on the importance of
                                                                               recommendations, there are a total of 3,588 clicks, which are consid-
features from LIME, a heuristic can be defined to replace a hotel
                                                                               ered conversions. Therefore, the dataset is unbalanced since only
from the list in order to improve the session conversion probability.
                                                                               0.5% of the instances are session conversions.
    Note that the LIME analysis is performed only on true positive
                                                                                  Each row contains information regarding the context of the ses-
cases from the training set. In this dataset, the classes are highly
                                                                               sion, the recommended hotel, and whether the recommendation
imbalanced due to a low conversion rate, as such standard feature
                                                                               led to a conversion. In particular, the features are the number of
analysis techniques may be overly influenced by negative samples,
                                                                               recommendations (from 1 to 5), date of the recommendation, coun-
i.e., sessions which did not result in clicks. As LIME is designed to
                                                                               try where the booking was made, country where the passenger is
be used on individual decisions, a linear model is fitted and analysed
                                                                               traveling, hotel identifier, hotel provider identifier, price of the hotel
for each true positive. The feature weights for each linear model are
                                                                               at time of the recommendation, price currency and whether the
then averaged, given a feature importance ranking for all correctly
                                                                               recommendation led to a conversion. Additionally, the logs were
classified converted sessions.
                                                                               enriched with supplementary information regarding each hotel
                                                                               including a hotel numerical rating (from 0 to 5), hotel categorical
3.9    Evaluation Metrics
                                                                               rating and the hotel chain.
As with many conversion problems, the classes are highly imbal-
anced, and as such the metrics used to assess performance must be              4.2    Passenger Name Record
carefully chosen.
                                                                               In the travel industry, a Passenger Name Record (PNR) is the basic
   F-measure (F β ). The generalization of the F 1 metric is given by          form of computerized travel record. A PNR is a set of data created
[7]:                                                                           when a travel reservation is made. PNRs include the travel itinerary


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                          14


Figure 5: Sketch of the full recommendation pipeline. The session builder is designed to select hotels which will maximise the
session conversion, based on the LIME feature importance of the session model.


Figure 6: Representation of ROC and AP curves for two Random Forest models predicting individual hotel conversion with
and without the PNR data.


information (e.g., flights number, dates) and the passenger informa-           5     RESULTS
tion (e.g., name, gender, and somethime passport details). A PNR               Table 1 shows the results of the experiment comparing different
may also include many other data elements such as payment infor-               algorithms for the hotel model in terms of AUC, AP, F 1 and F 0.5
mation (currency, total price, etc), additional ancillary services sold        scores. In Figure 6, the ROC and AP curves can be seen in detail.
with the ticket (such as extra baggage and hotel reservation) and              The low AUC value for the GLM model and Naive Bayes Classifier
other airline related information (cabin code, special meal request,           suggest that linear classification techniques do not lead to the best
etc).                                                                          results and more complex models are needed to correctly represent
   For the purpose of this study, we retrieve and extract features             the data. The non-linear techniques have closer results, with the
related with the air travel of the traveller. These include the date           Random Forest obtaining the best values for AP, F 1 and F 0.5 . A
of PNR creation, airline code, origin city, destination city, date of          Stacked Ensemble using all the previous models is created but it
departure, time of departure, date of arrival, time of arrival, days           does not improve the previous outcome.
between the departure and booking date, travel class, number of
stops (if any), duration of the flight in minutes (including stops)
                                                                               5.1    Contribution of PNR data
and the number of days the passenger is staying at the destination.
                                                                               The PNR data is an important attribute since it contains rich at-
                                                                               tributes related to the trip and passenger. However, is this case
                                                                               personally identifiable information is not used in the recommender
                                                                               system, thus the PNR features help to provide context about the


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                            15


Table 1: Summary of AUC, AP, F 1 and F 0.5 metrics for the                     Table 2: Summary of AUC, AP, F 1 and F 0.5 metrics for the
hotel model.                                                                   session model.

            Model           AUC       AP       F1       F0.5                                Model               AUC        AP       F1      F0.5
             GLM            0.625    0.128    0.247     0.274                                GLM                0.822     0.395    0.520    0.538
             NBC            0.819    0.058    0.175     0.159                                 NBC               0.933     0.342    0.467    0.408
              RF            0.966    0.249    0.320     0.334                                  RF               0.971     0.446    0.529    0.508
             GBM            0.953    0.210    0.294     0.288                                GBM                0.958     0.383    0.531    0.542
              NN            0.965    0.165    0.245     0.219                                 NN                0.967     0.433    0.483    0.467
           STK (all)        0.924    0.182    0.271     0.288                        STK (RF + GLM + NBC)       0.972     0.453    0.539    0.529
         STK (RF + NN)      0.969    0.242    0.314     0.284

                                                                                  As can be seen in Figure 7, the most important features according
trip rather than the traveller. Incorporating this data to the models          to LIME are all derived from the hotel model: the standard deviation,
substantially enhanced their performance, as can be observed in                maximum, and average individual hotel conversion probabilities.
Figure 6. Features of the PNR including the number of travellers               Some features which are important to the model such as "market"
in the booking and trip duration, among others, contributed to an              (country where the booking is made from), the flight class of service,
increase in area under the PR curve from 0.183 to 0.249.                       the destination city, and arrival and departure times of the flight
                                                                               can not be used to manipulate the results of the session builder,
5.2    Session Model                                                           as these are all part of context of the recommendation. Features
After we have trained the hotel model, we predict individually the             extracted from prices (the difference between the average price and
probability of conversion of a hotel. Then, we create the sessions             the minimum, and the ratio of the lowest price to the average price)
based on 5 recommended hotels.                                                 are also considered important by the LIME model, but rank lower
   In Table 2 the results are shown. In this case, the best model for          than many hotel conversion probability features.
both AUC and AP is the Stacked Ensemble composed of a Random                      As the standard deviation of the individual hotel conversions
Forest, a Generalized Linear Model and a Naïve Bayes Classifier.               is the most important criteria, the following rule for the session
Although the F 0.5 score of the GBM model is slightly better than the          builder is defined: from the original hotel list remove one hotel
STK model, the latter clearly outperforms the rest of the metrics.             with the closest conversion probability to the mean conversion
                                                                               probability of the list, and replace it with the hotel with the high-
5.3    Feature Importance                                                      est conversion probability from the set of available hotels for a
After the Session model has been trained, we analyse its feature im-           particular city.
portance to study which variables contribute the most to the model
using LIME. Concretely, we evaluate the model on the true positive             5.4     Simulated conversion using Hotel List
instances from the training dataset, since we want to optimise the                     Builder
conversion.                                                                    Results from the hotel list builder are shown in Table 3 for the two
                                                                               largest cities in the dataset and for the complete dataset. For both
                                                                               cities, we observe a large increase in conversion when using the
                                                                               LIME based session builder. However, a brute force approach to
                                                                               evaluating all possible lists does lead to higher conversion rates, at
                                                                               the cost of a significant increase in processing time. When we con-
                                                                               sider the complete dataset, we once again observe a large increase
                                                                               in conversion from the baseline for the LIME model. With respect
                                                                               to brute force, we observe that the LIME session builder performs
                                                                               much closer to the brute force builder in terms of conversion. This
                                                                               is attributed to the impact of smaller cities in the complete dataset,
                                                                               and thus less choice in hotels for the builders, resulting in the LIME
                                                                               session builder finding the optimal list. Additionally, on the com-
                                                                               plete dataset, the processing time of the brute force builder is 2.8
                                                                               times the duration of the LIME builder, whereas larger gains were
                                                                               observed on the individual cities, where more options for hotels
                                                                               were available.

                                                                               6     DISCUSSION
                                                                               In this study, an algorithm was created to improve hotel recom-
Figure 7: Feature importance of the true positive cases from                   mendations based on historical hotel bookings and flight booking
the Session Model using LIME.                                                  attributes. Different machine learning models are used in a cascaded


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                                  16


Table 3: Conversion rates and processing times for two large                                 Journal of Machine Learning Research 12, Aug (2011), 2493–2537.
cities and the complete dataset. The baseline performance is                             [9] Antónia Correia, Patricia Oom do Valle, and Cláudia Moço. 2007. Why people
                                                                                             travel to exotic places. International Journal of Culture, Tourism and Hospitality
given prior to any optimisation of the hotel lists, the LIME                                 Research 1, 1 (2007), 45–61.
based optimisation is compared to brute force.                                          [10] Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall
                                                                                             and ROC curves. In Proceedings of the 23rd international conference on Machine
                                                                                             learning. ACM, 233–240.
                                     Nice       Barcelona         Complete              [11] Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter-
    Base Conversion                 0.0019           0              0.0005                   pretable machine learning. (2017).
                                                                                        [12] Expedia. 2013. Retail and Travel Site Visitation Aligns As Consumers Plan and
    Conversion LIME                 0.0207        0.0089            0.0019                   Book Vacation Packages. https://advertising.expedia.com/about/press-releases/
    Conversion brute                0.0338        0.0125            0.0026                   retail-and-travel-site-visitation-aligns-consumers-plan-and-book-vacation-packages
  Processing time LIME                23s           23s             4h48m               [13] João Gama and Pavel Brazdil. 2000. Cascade generalization. Machine Learning
                                                                                             41, 3 (2000), 315–343.
  Processing time brute              314s          496s            13h36m               [14] Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018.
                                                                                             Lemna: Explaining deep learning based security applications. In Proceedings of
                                                                                             the 2018 ACM SIGSAC Conference on Computer and Communications Security.
fashion. First, a model estimates the conversion probability of the                          ACM, 364–379.
individual hotels independently. Note that adding trip context, via                     [15] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col-
                                                                                             laborative filtering recommendations. In Proceedings of the 2000 ACM conference
PNR based features, resulted in better PR AUC. The output of the                             on Computer supported cooperative work. ACM, 241–250.
first model is then combined with aggregates of the hotels in the                       [16] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed,
list in order to create a feature vector for the session model to es-                        Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N
                                                                                             Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech
timate the conversion probability that any hotel in the list will be                         recognition: The shared views of four research groups. IEEE Signal Processing
converted. LIME analysis revealed that the hotel model conversion                            Magazine 29, 6 (2012), 82–97.
probabilities are the most important features, specifically the stan-                   [17] Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based item
                                                                                             recommendation in e-commerce: on short-term intents, reminders, trends and
dard deviation, mean and maximum individual hotel conversion                                 discounts. User Modeling and User-Adapted Interaction 27, 3-5 (2017), 351–392.
probabilities in the list. This allows for a simple heuristic to be                     [18] Ingrid Jeacle and Chris Carter. 2011. In TripAdvisor we trust: Rankings, calculative
                                                                                             regimes and abstract systems. Accounting, Organizations and Society 36, 4 (2011),
defined to increase the session conversion probability. In this study,                       293–309.
a single change is performed in the list of hotels, however this could                  [19] Michael Kenteris, Damianos Gavalas, and Aristides Mpitziopoulos. 2010. A mobile
be expanded to allow multiple changes.                                                       tourism recommender system. In Computers and Communications (ISCC), 2010
                                                                                             IEEE Symposium on. IEEE, 840–845.
    Variations on this pipeline could also be considered, for instance                  [20] Dae-Young Kim, Yeong-Hyeon Hwang, and Daniel R Fesenmaier. 2005. Modeling
LIME is used in this study for feature importance ranking in the ses-                        tourism advertising effectiveness. Journal of Travel Research 44, 1 (2005), 42–49.
sion builder, however recently a similar methodology was proposed                       [21] Ron Kohavi, David H Wolpert, et al. 1996. Bias plus variance decomposition for
                                                                                             zero-one loss functions. In ICML, Vol. 96. 275–83.
using a mixture regression model referred to as LEMNA [14].                             [22] Yann Le Cun, LD Jackel, B Boser, JS Denker, HP Graf, Isabelle Guyon, Don
    Here, the session builder relies on insights gained from analysis                        Henderson, RE Howard, and W Hubbard. 1989. Handwritten digit recognition:
                                                                                             Applications of neural network chips and automatic learning. IEEE Communica-
of the feature importance ranking of the session model using LIME                            tions Magazine 27, 11 (1989), 41–46.
over all sessions which lead to a conversion. Thus, the same heuris-                    [23] Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft. 2012. Finding a
tic is applied to all datapoints in the session builder. However, a key                      needle in a haystack of reviews: cold start context-based hotel recommender
                                                                                             system. In Proceedings of the sixth ACM conference on Recommender systems. ACM,
aspect of LIME is that it provides an interpretation of a model for a                        115–122.
single datapoint. As such, an evolution of the approach would be                        [24] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommenda-
to compute the most important features for each recommendation                               tions: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003),
                                                                                             76–80.
in real time, and to use the information to build an optimal hotel                      [25] Stanley Loh, Fabiana Lorenzi, Ramiro Saldaña, and Daniel Licthnow. 2003. A
list based on the attributes most likely to lead to conversion.                              tourism recommender system based on collaboration and text analysis. Informa-
                                                                                             tion Technology & Tourism 6, 3 (2003), 157–165.
                                                                                        [26] Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending
REFERENCES                                                                                   using learning for text categorization. In Proceedings of the fifth ACM conference
[1] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja               on Digital libraries. ACM, 195–204.
    Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification     [27] Julia Neidhardt, Leonhard Seyfang, Rainer Schuster, and Hannes Werthner. 2014.
    decisions. Journal of Machine Learning Research 11, Jun (2010), 1803–1831.               A picture-based approach to recommender systems. Information Technology &
[2] Eric Bauer and Ron Kohavi. 1998. An empirical comparison of voting classification        Tourism 15, 1 (sep 2014), 49–69. https://doi.org/10.1007/s40558-014-0017-5
    algorithms: Bagging, boosting, and variants. Machine learning 36, 1 (1998), 2.      [28] Andreas Papatheodorou. 2001. Why people travel to different places. Annals of
[3] Yolanda Blanco-Fernandez, Jose J Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-           tourism research 28, 1 (2001), 164–179.
    Cabrer, and Martin Lopez-Nores. 2008. Providing entertainment by content-           [29] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i
    based filtering and semantic reasoning in intelligent recommender systems. IEEE          trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd
    Transactions on Consumer Electronics 54, 2 (2008).                                       ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[4] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender                ACM, 1135–1144.
    systems survey. Knowledge-Based Systems 46 (July 2013), 109–132. https://doi.       [30] Takaya Saito and Marc Rehmsmeier. 2015. The precision-recall plot is more
    org/10.1016/j.knosys.2013.03.012                                                         informative than the ROC plot when evaluating binary classifiers on imbalanced
[5] Robin Burke and Maryam Ramezani. 2011. Matching recommendation technolo-                 datasets. PloS one 10, 3 (2015), e0118432.
    gies and domains. In Recommender systems handbook. Springer, 367–386.               [31] Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock.
[6] Marcirio Silveira Chaves, Rodrigo Gomes, and Cristiane Pedron. 2012. Analysing           2002. Methods and metrics for cold-start recommendations. In Proceedings of the
    reviews in the Web 2.0: Small and medium hotels in Portugal. Tourism Manage-             25th annual international ACM SIGIR conference on Research and development in
    ment 33, 5 (2012), 1286–1287.                                                            information retrieval. ACM, 253–260.
[7] Nancy Chinchor. 1992. MUC-4 Evaluation Metrics. In Proceedings of the 4th Con-      [32] Alfredo Vellido, José David Martín-Guerrero, and Paulo JG Lisboa. 2012. Making
    ference on Message Understanding (MUC4 ’92). Association for Computational Lin-          machine learning models interpretable.. In ESANN, Vol. 12. Citeseer, 163–172.
    guistics, Stroudsburg, PA, USA, 22–29. https://doi.org/10.3115/1072064.1072067      [33] Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, and Devi Parikh. 2014.
[8] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu,           Predicting failures of vision systems. In Proceedings of the IEEE Conference on
    and Pavel Kuksa. 2011. Natural language processing (almost) from scratch.                Computer Vision and Pattern Recognition. 3566–3573.


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).