Understanding Customer Choices to Improve Recommendations in the Air Travel Industry Alejandro Mottini Alix Lhéritier Amadeus SAS Amadeus SAS Sophia Antipolis, France Sophia Antipolis, France Rodrigo Acuna-Agost Maria A. Zuluaga Amadeus SAS Amadeus SAS Sophia Antipolis, France Sophia Antipolis, France ABSTRACT as a black box [7]. Collaborative and content-based methods recom- Recommender systems aim at suggesting relevant items to users mend items based on similarities among users or items but, cannot to support them in various decision-making processes, on the ba- provide further insight. In the flight industry, it is key to under- sis of available information on items or users. In the latter, the standing passenger behaviour and their flight itinerary preferences. customer’s interests and tastes can be learnt and expressed using Players in the sector use this knowledge to adapt their offers to historical browsing data, purchase histories, and even other non- market conditions and customer needs, thus having an impact on traditional data sources such as social networks. Despite its proven airline’s revenue management and price optimisation systems [4]. success in the on-line retailing industry, in electronic commerce To tackle the flight itinerary choice problem and overcome these and, even tourism, recommender systems have been less popular limitations, the airline industry has historically resorted to Discrete in flight itinerary selection processes. This could be partially ex- Choice Modeling (CM). Due to its good performance, efficiency and plained by the fact that customers’ interests are only expressed as a ease of interpretation, the Multinomial Logit model (MNL) [11], a flight search request. As a result, this problem has been historically specific CM technique is the most popular approach for the flight tackled using classical Discrete Choice Modelling techniques and, itinerary choice problem. In spite of its numerous advantages, CM more recently, through the use of data-driven approaches such as also presents some weaknesses. For instance, MNL only considers Machine and Deep Learning techniques. At Amadeus, we are in- linear combinations of the input features, limiting its predictive terested in the use of choice models with recommender systems capability and requiring expert knowledge to perform feature engi- for the problem of airline itinerary selection. This work presents neering. Also, they lack the flexibility to handle collinear attributes a benchmark on three family of methods to identify which is the and correlations between options and it is difficult to model in- most suitable for the problem we tackle. dividual’s heterogeneities. These shortcomings might be overly restrictive or affect performance [12]. As an example, industrial ap- plications require to develop different models for distinct markets. KEYWORDS In the case of the flight itinerary choice prediction problem, this Choice Modeling; Choice-based Recommendations; Air Travel In- involves estimating models at a city-pair level [5] and/or customer dustry demographic segments [19]. In an effort to cope with CM limitations, recently machine learn- ing and deep learning techniques have been proposed. These algo- 1 INTRODUCTION rithms can more easily model non-linear relationships and handle In the recent years, recommender systems (RecSys) have proven correlated features, and have more modelling power which allows invaluable for solving problems in the on-line retail industry and to predict choices on an individual level, thus improving the pre- e-commerce[15]. While tourism has not been the exception to this diction performance. success [3], with applications covering almost every area of the Inspired by the work from Chaptini [6], at Amadeus we are travel and hospitality industry [14], RecSys have been less popu- working towards the use of CM with recommender systems for lar on the airline itinerary decision-making process. This can be the problem of airline itinerary selection. Combining the two ap- explained by two factors. On one hand, the available information proaches should leverage the strengths of both, leading to robust about users and items is not as rich as for most RecSys in tourism. In and scalable, but more interpretable models. In this first work, we the traveller’s flight itinerary choice problem, i.e. the task of select- seek to explore, evaluate and compare three different CM models ing a flight given a proposed list of itinerary recommendations, the which can be used as the predictive back-bone of a choice-based user’s interests are only expressed as a flight search request, user RecSys framework. In the remainder of this paper, first we present sessions are usually anonymous and there is no user history in the the theoretical background of CM and demonstrate why CM can travel provider’s databases. Therefore, classical RecSys algorithms be seen as a RecSys problem. Then, we present our experimen- cannot be applied directly. tal setup by describing the data, the evaluated algorithms and the On the other hand, RecSys techniques suffer from a lack of theo- performance measures. retical understanding of the underlying behavioural process that led to a particular choice [6] by seeing the decision-making process RecTour 2018, October 7th, 2018, Vancouver, Canada. 28 Copyright held by the author(s). 2 BACKGROUND jˆ that maximizes an utility function U (i, j), representing the user’s In this section, first we provide a brief background on classical utility on any item j [1]: discrete choice modelling theory and then show how it is equivalent jˆ = arg max U (i, j). (6) to the recommendation problem. j ∈ Ai Conceptually this is the same optimisation problem as that one 2.1 Discrete Choice Models. formulated by choice theory [2], and described previously in this CM defines four basic components: 1) the decision-maker, 2) the section. Equation (6) is equivalent to choosing the alternative with alternatives, 3) the attributes, and 4) the decision rules [2]. For- the highest utility for a decision-maker, in choice modelling theory. mally stated, a decision-maker i ∈ I chooses from a choice set Ai More formally: composed of Ji alternatives, with with j ∈ {1, . . . , Ji } the index of the j th alternative. For the sake of simplicity and without loss jˆ = arg max U (i, j) ⇔ Ui jˆ ≥ Ui j ; ∀ j ∈ Ai , (7) j ∈ Ai of generality, we will refer to the number of alternatives simply as J , although decision-makers might not be faced with the same which implies that the recommendation problem can be seen as a set and/or number of alternatives. The decision-maker i obtains an choice prediction problem. Therefore, the models and techniques utility Ui j from each j and chooses alternative jˆ if and only if: developed in CM can be applied to RecSys. Ui , jˆ ≥ Ui j ; ∀ j ∈ Ai . (1) 3 MATERIALS AND METHODS The utility function is unknown and not observable. However, as 3.1 Data it is possible to determine the attributes xi j perceived by decision- Experiments were conducted on real datasets of flight search logs maker i for each j, as well as Si the vector of characteristics of i, and bookings from MIDT, an Amadeus database containing book- there exists a function V (·) which relates the observed features to ings from over 93000 travel agencies. the decision-maker’s utility: Bookings are stored using Personal Name Records (PNR), which are created at reservation time by airlines or other air travel providers, Vi j = V (Xi j ), (2) and are then stored in the airline’s or Global Distribution System where Vi j is referred to as the representative utility and Xi j = (GDS) data centers. PNRs contain the travel itinerary of the passen- h(xi j , Si ), a simplified representation of xi j and Si through the use ger, personal and payment information, and/or additional ancillary of any appropriate vector valued function h. Vi j is generally a linear services sold with the ticket. As these only contain information combination of the features. For example, if an airline is trying to about the purchased ticket (final choice), and not about the alterna- predict which itinerary a user will choose, a very simple model tives considered before the purchase, we must also consider flight could be: search logs. These contain both itinerary requests (origin, desti- Vi j = a ∗ pricei j + b ∗ tripDurationi j nation and dates), and the different alternatives presented to the passenger. with a, b parameters of the model to be estimated, and which are Both data sources are combined into a final dataset containing the commonly refered to as β. alternatives presented to each user and their final choice (Figure 1). Since there are aspects of the utility function that cannot be ob- The matching process is in itself a challenging problem due to served, Vi j , Ui j . To reflect uncertainty, the utility can be modelled the high volume of data (i.e., around 100 GB of daily search logs) as a random variable, and to the difference in data sources and formats. Moreover, the Ui j = Vi j + εi j , (3) process cannot be perfectly accurate since there is not a direct link between the two data sources and booking/search times differ. An where εi j is a random variable that captures the unknown fac- approximate matching is performed using data fields which are tors that affect Ui j . As Ui j is now a random variable, the decision shared between booking and logs (i.e. origin, destination, time and rule needs to be expressed as the probability that decision-maker i booking agency). chooses the k th alternative: The choice set presented to a user, which we denote a session, contains up to 50 itineraries. The features used for each alternative P(k |Ai ) = P(Uik ≥ Ui j ; ∀j ∈ Ai ). (4) are summarized in Table 1. The considered dataset contains 33951 By replacing Ui j accordingly: sessions split into training/tests sets. P(k |Ai ) = P(Vik − Vi j ≥ εi j − εik ; ∀j ∈ Ai ). (5) 3.2 Algorithms Different assumptions about the random term εi j and the determin- Methods from three different families of algorithms, classical CM, istic term Vi j produce specific models. machine learning- and deep learning-based CM, are explored. 3.2.1 Classical CM. Two classical CM approaches are consid- 2.2 Choice-based Recommender Systems. ered: The Multinomial Logit (MNL) model [11], perhaps the most Given a set Ai of J available items presented to a user i, the rec- common CM model, and Latent class choice models (LCM) [8]. ommender problem can be seen as an optimisation task that first McFadden [11] demonstrated that if εi j is an i.i.d. Gumbel ran- estimates the utility of each item j ∈ Ai , and then chooses the item dom variable, the probability that a decision-maker i chooses the RecTour 2018, October 7th, 2018, Vancouver, Canada. 29 Copyright held by the author(s). Table 1: Feature set classified according to owner (individual decision maker i chooses the j-th alternative. As RF assumes inde- or alternative) and data type (numerical, categorical, binary pendence of the samples, at training stage, every Xi j is assumed or time). i.i.d., even if they belong to the same decision-maker. At predic- tion, each unseen alternative Xi j is propagated through the trained Owner Data Type Features forest to obtain the posterior probability of being chosen: T Individual Categorical Origin, destination, office 1Õ P(yi j |Xi j ) = P (yi j (Xi j ) = 1) (10) Numerical Days to trip, Trip weekday T t =1 lt Binary Stays Saturday, Continental trip, Domestic trip where T denotes the number of trees and Plt (·) denotes the pos- Alternatives Categorical Airline of first flight terior probability function of a leaf node l in tree t. However, the Numerical Price, Stay length, Trip duration, alternatives associated to an individual’s session cannot be treated Connections, Num. of airlines as independent. There is an inherent dependence among them: only DateTime Arrival time, Departure time one alternative per session can be selected. To cope with this, the predicted probabilities are considered scores used to rank the alter- natives. More formally, the index jˆ of the selected alternative a jˆ by Booking Data: Flight Search Logs Amadeus MIDT Query and Results decision-maker i is: Ticket Reservation: Search logs: jˆ = arg max P(yi j |Xi j ) Itinerary (Origin, Destination, Flights, 1≤j ≤ J From city A to B Dates) Payment information Searched from time/place/agency Booking time/place/agency Results: shown alternatives 3.2.3 DL. The assessed Deep learning choice modeling (DL) method [13] is based on an encoder-decoder network architecture using a modified pointer-network mechanism [18]. As with ML, Pre-processing/Matching: Find search session that the model is trained to predict the chosen alternative using a su- corresponds to each booking pervised learning approach. However, DL does not break the i.i.d. Final Dataset: assumption among samples, as ML-based CM does. Given the se- Query, shown alternatives and final choice quential nature of pointer networks, sessions are represented as sequences of itineraries, Z = {Xi1 , ..., Xi J }, which are fed sequen- Figure 1: Dataset generation through MIDT bookings and tially to the model. The encoder network "encodes" the input into a search log matching. hidden (encoder) state e. The decoder network will use the encoded information to output a vector u. Finally, a softmax function use the decoder’s output to estimate the posterior probability of being alternative k (Eq. 5), the logit choice probability, is given by: chosen for the k th element in the input sequence Z : exp(Vik ) P(k |Ai ) = Í . (8) P(yk = 1|Z ) = Í J exp(uk ) (11) j ∈J exp(Vi j ) j=1 exp(u j ) LCMs have been proposed to capture unobserved heterogeneity. Under LCM, the probability of choosing an alternative k can be with uk = d T W1ek , the pointer vector to the k th element of Z , ek expressed as: the k th encoder state, d = tanh(W2e J ) the decoder, W1 ,W2 learnable Q parameters and yk the binary indicator of whether k was chosen (yk = 1) or not. P(yk = 1|Z ) can be interpreted as an estimate of Õ P(k |Xi j ) = P(k |Xi j ; βq , Aq )P(q|Xi j ; θ ) (9) q=1 P(k |Ai ). where Q is the number of latent classes, βq are the choice model 3.3 Performance measurement parameters specific to class q, Aq is the choice set specific to class q, θ is an unknown parameter vector, and Xi j the simplified vector We used Top-N accuracy to asses and compare the models. Top-N representation of attributes of alternatives and characteristics of accuracy evaluates if the user’s choice is among the top-N predicted decision-maker i. alternatives. It is equivalent to the commonly used top-N error in Finally, both MNL and LCM models are optimized using maxi- image classification [16], as it can be formulated in terms of the mum likelihood estimation as they can not be solved in a closed latter as: form. accuracy = 1 − error 3.2.2 ML. Lheritier et al. a have proposed machine-learning 4 RESULTS based CM (ML) [10] technique which formulates the choice mod- Figure 2 presents the Top-N accuracy for MNL, ML and DL methods. elling problem as a supervised learning one through the use of Overall, DL presents the highest accuracies across all values of N. Random Forests (RF), a learning algorithm based on an ensemble These results are confirmed, in more detail, in Table 2 where Top-1, of decision trees. The training data consists of the set of sample 5 and 15 accuracies are detailed. Top-15 accuracy has a particular pairs T = {(Xi j , yi j )} 1 , with yi j the binary indicator of whether importance for ranking flight search recommendations since most 1 In the context of RF, X referred to as the feature vector of a sample websites show approximately 15 results per page. ij RecTour 2018, October 7th, 2018, Vancouver, Canada. 30 Copyright held by the author(s). Figure 3: Top-1 accuracy of LCM, , as a function of the num- ber of latent classes Q, compared to ML- and DL-based ap- Figure 2: Top-N accuracy using the full data set (solid proaches. line) and a subset of the dataset consisting of a single ori- gin/destination (O&D) pair (dashed line). Table 2: Top-N with N = 1, 5 and 15. The best result for each N is presented in bold. Trivial choices of cheapest and shortest flight are included for reference. Method Top-1. Top-5 Top-15 DL 25.3 66.37 93.1 ML 23.1 61.7 92.9 MNL 21.2 60.6 86.4 Cheapest 16.4 16.4 - Shortest 15.4 15.4 - Figure 4: Top 8 feature importance for the ML method. To simulate data segmentation, a second experiment was per- match a decision-maker’s profile based on previous data. While this formed in a simplified subset containing a single origin-destination is an important criterion, its limited assessment of a recommender (O&D) pair chosen at random. This resulted in 1617 decision-makers quality has been criticized for not taking the decision-makers’ situ- (users) with an associated booking to the O&D. The Top-N accuracy ational needs into account [9]. Due to their well-known readability, curve (Figure 2 dashed lines) shows how the difference in perfor- Discrete Choice Modelling appears as a natural alternative to over- mance between the methods is less significant w.r.t. that one using come this current limitation of RecSys. However, despite CM being the full data set. Despite MNL being the simplest method, results a well-studied problem in various fields of research, literature on show that, on simpler datasets, it is able to perform as well as more its use with recommender systems is very scarce. Existing works complex methods. have adopted classical CM in combination RecSys [6, 17], while This behaviour explains the motivation behind dataset pre-seg- suggesting CM as a promising paradigm in the field of RecSys. mentation often used in classical CM. This is further confirmed by However, classical choice models tend to suffer from scalability investigating the performance of LCM, as a function of the number issues as expert knowledge is usually required for model optimisa- of latent classes Q. Figure 3 reports top-1 accuracy of LCM, ML and tion. ML- and DL-based [10, 13] choice models are non-parametric DL, and demonstrates how it is possible to increase classic CM accu- approaches that overcome this limitation, easing the deployment of racy in complex data through a good estimation of Q. While MNL choice-based RecSys at large scale. On the down side, model read- reported accuracies lower than ML and DM, LCM can outperform ability can diminish. Although this might not be relevant for some them when Q is estimated correctly. This improvements comes, applications, understanding the reasons behind a decision-maker’s however, at some cost: LCM requires additional hand engineered choice is of high relevance in the air travel industry. ML-based features to achieve the segmentation and a good choice of Q. methods appear to be a suitable compromise into readability but, Although ML and MNL are not as accurate as DL, they have they make strong assumptions on the independence of data that is the advantage of having less hyper-parameters to tune. Moreover, arguable. Overall, it is possible to say that there is no ideal method they are more interpretable than DL. ML methods based on RF are and that the selection of one might depend on the specific rec- known for their capacity to provide information on feature impor- ommendation application that they target. As a guideline, Table 3 tance (Figure 4). This type of information can help to understand summarises the strengths and pitfalls of the different methods here the rationale behind the decision-maker’s choices, which can be evaluated when considering choice-based RecSys. important for some applications in the air travel industry. At Amadeus, we work towards the development of informative, readable and interpretable RecSys that suit the needs of the air 5 FINAL REMARKS travel industry. Our hypothesis is that the combination of discrete RecSys research has so far predominantly focused on optimizing the choice modeling with RecSys can provide improvements to current algorithms used for generating recommendations to increase preci- systems in the air travel industry by keeping readability while im- sion [9]. Precision measures how well the suggested alternatives proving performance. In that sense, an ML method like the random RecTour 2018, October 7th, 2018, Vancouver, Canada. 31 Copyright held by the author(s). Table 3: Advantages and disadvantages of the different families of CM methods. Method Advantages Disadvantages CM • Simple and interpretable • Feature engineering is required • Accurate on simple cases • Limited in handling big data ML • Interpretable • Assumes independence of samples • Accurate • Feature engineering might be required • Suitable for big data • Handles non-linear and latent relationships DL • No assumptions on data • Non-interpretable • Highly accurate • Many hyper-parameters • Suitable for big data • Computationally expensive • Handles non-linear and latent relationships forests evaluated here represents a good compromise and a promis- [8] William H Greene and David A Hensher. 2003. A latent class model for discrete ing path to pursue in what we are looking for. On one hand, the choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological 37, 8 (2003), 681–698. method provides information on the relevance of features. On the [9] Joseph A. Konstan and John Riedl. 2012. Recommender systems: from algorithms other one it avoids the limitations of classical CM models. In that to user experience. User Modeling and User-Adapted Interaction 22, 1-2 (2012), 101–123. sense, although DL approaches have higher accuracy, they are not [10] Alix Lhéritier, Michael Bocamazo, Thierry Delahaye, and Rodrigo Acuna-Agost. as advantageous given their limited interpretability. 2018. Airline Itinerary Choice Modeling using Machine Learning. International This work represents an initial benchmark that evaluates three Journal of Choice Modeling (2018), https://doi.org/10.1016/j.jocm.2018.02.002. [11] Daniel McFadden. 1973. Conditional Logit Analysis of Qualitative Choice Be- families of CM methods in the context of flight itinerary selec- haviour. In Frontiers in Econometrics, P. Zarembka (Ed.). Academic Press New tion/recommmendation. Our future work will focus in the develop- York, New York, NY, USA, 105–142. [12] D. McFadden. 2001. Economic choices. The American Economic Review 91, 3 ment of a unified framework that can leverage the strengths of the (2001), 351–378. explored CM methods. [13] Alejandro Mottini and Rodrigo Acuna-Agost. 2017. Deep Choice Model Using Pointer Networks for Airline Itinerary Predictions. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. REFERENCES ACM, New York, NY, USA, 1575–1583. [1] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next gen- [14] Julia Neidhardt, Tsvi Kuflik, and Wolfgang WÃűrndl. 2018. Special section on eration of recommender systems: a survey of the state-of-the-art and possible recommender systems in tourism. Information Technology & Tourism 19, 1-4 extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (2005), (2018), 83–85. 734–749. [15] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.). 2011. [2] Moshe Ben-Akiva and Michel Bierlaire. 1985. Discrete choice analysis: theory and Recommender Systems Handbook. Springer Nature, New York. application to travel demand. MIT press, Cambridge, MA. [16] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, [3] Joan Borràs, Antonio Moreno, and Aida Valls. 2014. Intelligent tourism rec- Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. ommender systems: A survey. Expert Systems with Applications 41, 16 (2014), Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. 7370–7389. International Journal of Computer Vision 115, 3 (2015), 211–252. [4] J. Broder and P. Rusmevichientong. 2012. Dynamic pricing under a general [17] Paula Saavedra, Pablo Barreiro, Roi Duran, Rosa Crujeiras, María Loureiro, and parametric choice model. Operations Research 60, 4 (2012), 965–980. Eduardo Sánchez Vila. 2016. Choice-Based Recommender Systems. In RecTour@ [5] Judit G Busquets, Antony D Evans, and Eduardo Alonso. 2016. Predicting Aggre- RecSys - Workshop on Recommenders in Tourism held in conjunction with the 10th gate Air Itinerary Shares Using Discrete Choice Modeling. In 16th AIAA Aviation ACM Conference on Recommender Systems (RecSys). CEUR Workshop Proceedings, Technology, Integration, and Operations Conference. 4076. Boston, MA, USA, 38–46. [6] Bassam H. Chaptini. 2005. Use of discrete choice models with recommender systems. [18] O. Vinyals, M. Fortunato, and N. Jaitly. 2015. Pointer networks. In Advances Ph.D. Dissertation. Massachusetts Institute of Technology. in Neural Information Processing Systems (NIPS 2015). Curran Associates, Inc., [7] Li Chen, Marco de Gemmis, Alexander Felfernig, Pasquale Lops, Francesco Ricci, Montreal, Canada, 2692–2700. and Giovanni Semeraro. 2013. Human Decision Making and Recommender [19] V. Warburg, C. Bhat, and T. Adler. 2006. Modeling demographic and unobserved Systems. ACM Transactions on Interactive Intelligent Systems 3, 3 (2013), 1–7. heterogeneity in air passengersâĂŹ sensitivity to service attributes in itinerary choice. Journal of the Transportation Research Board 1951 (2006). RecTour 2018, October 7th, 2018, Vancouver, Canada. 32 Copyright held by the author(s).