Itinerary Recommendation for Cruises: User Study Diana Nurbakova∗ Léa Laporte LIRIS - INSA Lyon LIRIS - INSA Lyon 20 avenue Albert Einstein 20 avenue Albert Einstein Villeurbanne 69621 cedex, France Villeurbanne 69621 cedex, France diana.nurbakova@insa-lyon.fr lea.laporte@insa-lyon.fr Sylvie Calabretto Jérôme Gensel LIRIS - INSA Lyon Université Grenoble Alpes, CNRS, Grenoble INP, LIG 20 avenue Albert Einstein Grenoble F-38000, France Villeurbanne 69621 cedex, France jerome.gensel@univ-grenoble-alpes.fr sylvie.calabretto@insa-lyon.fr ABSTRACT In this work, we consider a case of a cruise. According to Florida- Vacations and leisure activities constitute an important part of Caribbean Cruise Association (F-CCA) [6], about 25.3M passengers human life. Nowadays, a lot of attention is paid to cruising, that are expected to cruise globally in 2017, showing a 7% average annual is reported to be a favourite vacation choice for families with kids passenger growth rate over the last 30 years. Cruising has become a and for Millenials. Like other distributed events (events that gather preferred vacation choice for families, especially with kids, making multiple activities distributed in space and time under one umbrella) cruisers population younger and more diverse than non-cruisers. F- such as big festivals, conventions, conferences etc., cruises offer CCA reports [6] that cruising is the favourite choice of Millennials a vast variety of simultaneous on-board activities for all ages and and Generation X. Cruisers appreciate the opportunity to relax and tastes. This results in a cruiser’s information overload, in particular get away from it all, see and do new things. Cruise lines offer a vast given a very limited availability of activities. Recommender systems variety of on-board activities, as well as in ports of call. appear as a desirable solution in such an environment. Due to In this paper, we focus on the itinerary recommendation and the number of time constraints, it is more convenient to get a present a user study based on a 7-night Disney Fantasy cruise. More personalised itinerary of activities rather than a list of top-n. In precisely, we aim at answering the following research questions. this paper, we present a user study conducted in order to create RQ1: What is itinerary recommendation and what makes it a preliminary dataset that simulates users’ attendance of a cruise challenging? and sheds the light on the activity selection behaviour. We discuss RQ2: What are the characteristics of the data treated by itinerary challenges faced by the itinerary recommendation and illustrate recommendation? Is there any dataset that could be used as is? them with user study examples. The remainder of the paper is organised as follows. In Section 2 we define the itinerary recommendation problem and the challenges CCS CONCEPTS it faces. Section 3 gives an overview of existing datasets, presents our user study that simulates users’ attendance of a cruise and • Information systems → Personalization; discussion over conducted analysis. Section 4 concludes the paper. KEYWORDS recommendation of leisure activities, itinerary recommendation 2 PROBLEM STATEMENT AND CHALLENGES In this paper, we aim at finding a personalised itinerary for a given user that maximises his satisfaction and takes into account spatio- 1 INTRODUCTION temporal constraints. More precisely, given a set of activities with Nowadays, the field of leisure activities experiences a substantial their locations, descriptions, time windows of their availability, growth. In this context, a rising phenomenon we are witnessing is duration, and a vector of categories, a set of users, and users’ history distributed events that gather various activities under one umbrella. (attendance) binary matrix, find a feasible sequence of activities (or They attract more and more attendees. Examples of such events are itinerary) that maximises the user’s satisfaction for every given user. cruises, festivals, big conferences, conventions, etc. User’s satisfaction with respect to an itinerary is defined as the sum Attendees of distributed events are overwhelmed with the num- of the user’s satisfaction scores regarding all the activities within ber of ongoing parallel activities and are looking for personalised the itinerary. For more details on the itinerary recommendation experience. Recommender systems appear as a natural solution in problem, see [9]. such an environment. It is to note that given the density of activi- Itinerary recommendation faces the following challenges. ties and their limited availability, participants are interested in a C-1: Implicit Feedback. Given that activities are happening in personalised itinerary (a sequence of activities to undertake) rather future as in the case of event recommendation [8], there is very little than in a list of top-n activities that may compete in terms of time. information to handle and there is much less user-item interactions than in traditional recommendation scenarios. We deal with implicit ∗ D. Nurbakova held a doctoral fellowship from la Région Auvergne-Rhône-Alpes. feedback, implying that the degree to which a user likes or not an RecTour 2017, August 27th, 2017, Como, Italy. 31 Copyright held by the author(s). item is not known. The use of multiple contexts may increase the 300 recommendation performance of the algorithms. C-2: Interest vs. Attendance. Due to the limited availability and 200 #activities multiple parallel activities, we deal with attendance bias, as a user may miss an activity of his/her interest or in contrast, may join an activity that does not represent a particular interest to him/her. 100 C-3: List vs. Itinerary. Activities are competitive and short-lived, which results in the user’s preference for one activity over the others in a given time slot. In this context, an itinerary (a feasible 0 sequence of activities) is more desirable than a list of interesting 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 activities. User We will illustrate the challenges in the next section. Interested & Going Not Interested & Going Interested & Not Going Not Interested & Not Going 3 USER STUDY In this section, we formulate a list of characteristics of a dataset Figure 1: Distribution of interest in activities and attendance satisfying the needs of the target problem, provide a comparison of per user. available datasets (see Tab. 1) and describe a user study conducted in order to collect data with desirable characteristics. questionnaire consisted of 4 parts. The overview of the survey with 3.1 Data Characteristics and Existing Datasets examples of questions is given in Tab. 2. Thus, 23 contributions were collected. Statistics concerning the We categorise the existing datasets w.r.t. the focus of data into 3 participants are provided in Tab. 3. The main statistics of the dataset groups: Single Item, Schedule, and Sequence. We define a list of char- are given in Tab. 4. The average duration of an activity is 45 minutes. acteristics (column "Characteristics" in Tab. 1) based on the activity The average number of ongoing simultaneous activities is 5. attributes and consecutive nature of performed activities during dis- tributed events. We cluster the characteristics into 5 types w.r.t. the 3.3 Data Analysis entity they describe: item (unit under consideration), sequence (or- dered sequence of items), user (information about users), user-item The conducted user study gives a more practical insight into person- (user-item interactions), and user-user (relations between users). alised itinerary recommendation and the activity selection process. We distinguish 5 essential characteristics (given in italics in Tab. In the following, we illustrate the challenges from Section 2. 1): (1) time windows (start and end time of activity availability), C-2: Interest vs. Attendance. Figure 1 displays the user-wise dis- (2) coordinates (geographical location of an activity), (3) service tribution of the number of activities a user: (1) was interested in time (duration of an activity), (4) categories (associated categories), (ratinд ≥ 4 or ratinд = 3 if the highest rating given by the user to (5) users historical data. Though we indicate only 5 elements as any activity is equal to 3) and joined (Interested & Going), (2) was essentials, all the others listed in Tab. 1 are also important as they interested in but did not join (Interested & Not Going), (3) was not may enhance the recommendation. As it can be seen, none of the interested in but joined (Not Interested & Going), and (4) was not existing datasets contains all the essential characteristics. Thus, we interested in and did not join (Not Interested & Not Going)5 . The have made an attempt to create an integral dataset that contains all chart shows evidence that individuals miss many activities that the required features and provides an insight into users’ behaviour. represent interest to them. Thus, the number of Interested & Not Going activities is almost twice higher (1.7621) than Interested & 3.2 Data Collection Going. It is also surprising that Not Interested & Going activities In order to collect required data, we have performed a user study constitute about 43% of all joined activities. via online survey. Participants were recruited via a link to the on- C-3: List vs. Itinerary. Let us consider the following settings. We line questionnaire sent by email to several research and university compare several top-n item recommendation algorithms against mailing lists. The claimed aim of the study was to create a dataset itinerary recommendation from the literature. As history data we that simulates cruise attendance and could be used in order to consider a binary attendance matrix. make personalised recommendations of itineraries. The list of ac- - Category-based: This algorithm ranks the candidate activities tivities used in the survey was taken from the personal navigators based on their weighted frequency of corresponding categories. of Disney’s Fantasy 7-nights Eastern Caribbean cruise. Activities - Content-based: The candidate activities are ranked in descen- dedicated exclusively for kids have been excluded from the current dant order of their textual similarity with the user’s past activities. list of activities. The original personal navigators can be found An activity is represented as a TF-IDF vector. The user’s profile is online3 . The deck plan of the ship can be found on the web4 . The built over TF-IDF vectors of activities joined by the user in the past. 1 Yelp challenge dataset, http://www.yelp.com/dataset_challenge - Logistic Regression: We fed a vector of aforementioned scores 2 https://github.com/jalbertbowden/foursquare-user-dataset into a logistic regression model. 3 http://disneycruiselineblog.com/2015/07/personal-navigators-7-night-eastern- caribbean-cruise-on-disney-fantasy-itinerary-a-june-20-2015/ 5 Ratings are used only for this part of the study. We do not consider them in estimation 4 http://disneycruiselineblog.com/ships/deck-plans-disney-dream-disney-fantasy/ of user’s interest in activities, as we assume there exist only binary attendance matrix. RecTour 2017, August 27th, 2017, Como, Italy. 32 Copyright held by the author(s). Table 1: Comparison of the available datasets. Single Item Schedule Sequence Other OP-based [12] MCTOPMTW [10] Other OP-TW [12] Foursquare_1 [14] TREC CS’13 [3] TREC CS’14 [4] TREC CS’15 [2] Foursquare_2 2 TripBuilder [1] GeoLife [15] Meetup [7] Twitter [5] Flickr [11] Yelp 1 Entity Characteristic Time windows ✓ ✓ ✓ Coordinates ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Service Time ✓ ✓ ✓ Item Categories ✓ ✓ ✓ Price ✓ ✓ ✓ Item Additional Attributes ✓ ✓ ✓ Description ✓ ✓ Time budget ✓ ✓ ✓ Sequence Starting/Ending Point ✓ ✓ ✓ ✓ Tour Additional Attributes ✓ ✓ User User’s personal data ✓ ✓ User-Item Historical Data ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Score ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ User-User Social links ✓ ✓ ✓ ✓ Table 2: Description of the parts of the survey. Qnt denotes the number of questions in a section. Section Qnt Description Question Examples User Profile 10 Questions on basic user’s features and their Your gender: 2Female 2 Male cruising experience Have you already experienced DCL (Disney Cruise Line)? Are you aiming to attend the maximum amount of activities mentioned in your Personal Navigator or just a few must-see? Users 311 User’s evaluation of a list of proposed activities Sailing Away. Don’t Miss Event. Preferences by selecting one of the grades for the listed ac- Description: It’s time to go Sailing Away! Join Mickey and Minnie tivities: 1 - Never (not interested at all and won’t along with Tinker Bell and the rest of the gang as they welcome recommend to anyone to attend it); 2 - Not inter- you abroad the Disney Fantasy. ested; 3 - Neutral; 4 - Interested; 5 - Won’t miss Available: Day 1, 16:30-17:15, Location: Deck Stage Never #### Won’t miss Itinerary 593 Organisation of the activities into a day-wise Event Going Not going Planner itinerary. Given an ordered list of activities with their availability hours, the respondents were 11:30 - 15:00. Character Meet & Greet 2  2 asked to indicate their intention to join the activ- Ticket Distribution. Category: Charac- ity or not by clicking on "Going" or "Not going". ters. Location: Port Adventures Desk. Don’t Miss Event Afterwards 5 Conclusion questions When you were having a choice among different activities of your interest, did you consider the distance to the venue while making your choice? How do you usually manage the list of activities to perform during your vacations? - ILS+Scores: We also tested a state-of-the-art itinerary construc- (content-based, category-based and time-based) and transition prob- tion algorithm [9] that is based on the Iterated Local Search (ILS) abilities between activities. algorithm [13] with activities scores calculated using hybrid scores RecTour 2017, August 27th, 2017, Como, Italy. 33 Copyright held by the author(s). Table 3: Participants Statistics 3.1 will serve as the basis for the new dataset. Another direction of future work consists in proposing more accurate solution for Statistic Value the itinerary recommendation that would embrace all the sides and # Female users 7 address all the challenges of the itinerary recommendation. # Users already experienced DCL 1 # Users already experienced any cruise 4 REFERENCES # Users considering the distance between venues 8 [1] Igo Brilhante, Jose Antonio Macedo, Franco Maria Nardini, Raffaele Perego, and Chiara Renso. 2013. Where Shall We Go Today?: Planning Touristic Tours with Managing Activities. Not-to-miss List : Daily plan- 14 : 4 : 5 Tripbuilder. In Proc. of the 22nd ACM International Conference on Information & ning : No planning Knowledge Management (CIKM ’13). 757–762. Age group: 21-30 : > 30 16 : 7 [2] Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Julia Kiseleva, and Ellen Voorhees. 2015. Overview of the TREC 2015 Contextual Suggestion Track. In Proc. of the 24th Text REtrieval Conference (TREC 2015). Table 4: Dataset Statistics [3] Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Paul Thomas, Nicole Simone, and Ellen Voorhees. Overview of the TREC 2013 Contextual Suggestion Track. In NIST Special Publication 500-302: The Twenty-Second Text REtrieval Conference # Activities # Days # Users # Locations # Categories Proceedings (TREC 2013) (2013), Ellen M. Voorhees (Ed.). [4] Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Paul Thomas, and Ellen 593 7 23 47 52 Voorhees. Overview of the TREC 2014 Contextual Suggestion Track. In NIST Spe- cial Publication 500-308: The Twenty-Third Text REtrieval Conference Proceedings (TREC 2014) (2014), Ellen M. Voorhees and Angela Ellis (Eds.). [5] Jacob Eisenstein, Brendan O’Connor, Noah A Smith, and Eric P Xing. 2010. A latent variable model for geographic lexical variation. In Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1277–1287. [6] Florida-Caribbean Cruise Association (FCCA). 2017. Cruise Industry Overview. 11200 Pines Blvd., Suite 201 - Pembroke Pines, Florida 33026. http://www.f-cca. com/downloads/2017-Cruise-Industry-Overview-Cruise-Line-Statistics.pdf [7] Xingjie Liu, Qi He, Yuanyuan Tian, Wang-Chien Lee, John McPherson, and Jiawei Han. 2012. Event-based Social Networks: Linking the Online and Offline Social Worlds. In Proc. of the 18th ACM SIGKDD conference on Knowledge Discovery and Data Mining (2012) (KDD’12). [8] Augusto Q. Macedo, Leandro B. Marinho, and Rodrygo L.T. Santos. 2015. Context- Aware Event Recommendation in Event-based Social Networks. In Proc. of the 9th ACM Conference on Recommender Systems (RecSys ’15). 123–130. [9] Diana Nurbakova, Léa Laporte, Sylvie Calabretto, and Jérôme Gensel. 2017. Rec- Figure 2: Precision w.r.t. the number of history days. ommendation of Short-Term Activity Sequences During Distributed Events. Pro- cedia Computer Science 108 (2017), 2069 – 2078. International Conference on Computational Science, {ICCS} 2017, 12-14 June 2017, Zurich, Switzerland. [10] Wouter Souffriau, Pieter Vansteenwegen, Greet Vanden Berghe, and Dirk The algorithms were evaluated in terms of their precision. We Van Oudheusden. 2013. The Multiconstraint Team Orienteering Problem with returned top-20 activities for each day6 using top-n recommenda- Multiple Time Windows. Transportation Science 47, 1 (Feb. 2013), 53–63. tion algorithms. Figure 2 displays the recommendation power of [11] Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The New Data each algorithm with varying number of history days (from 1 to in Multimedia Research. Commun. ACM 59, 2 (2016), 64–73. 6). Itinerary recommendation algorithm shows higher precision, [12] Pieter Vansteenwegen, Wouter Souffriau, and Dirk Van Oudheusden. 2011. The proving that an itinerary satisfies better the user’s needs. orienteering problem: A survey. Eur J Oper Res 209, 1 (2011), 1 – 10. [13] Pieter Vansteenwegen, Wouter Souffriau, Greet Vanden Berghe, and Dirk Van Oudheusden. 2009. Iterated Local Search for the Team Orienteering Problem 4 DISCUSSION AND CONCLUSION with Time Windows. Comput. Oper. Res. 36, 12 (Dec. 2009), 3281–3290. [14] Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2016. Participatory Cultural In this paper we have considered the problem of personalised Mapping Based on Collective Behavior Data in Location-Based Social Networks. itinerary recommendation with special interest for cruises. We ACM Transactions on Intelligent Systems and Technology (TIST) 7, 3 (2016), 30. [15] Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining Interest- have distinguished the characteristics of data used in itinerary rec- ing Locations and Travel Sequences from GPS Trajectories. In Proc. of the 18th ommendation and have presented an overview of available datasets. International Conference on World Wide Web (WWW ’09). 791–800. To the best of our knowledge, this is the first attempt to classify and summarise the existing datasets, and describe them with respect to the aforementioned characteristics. Moreover, we have undertaken a user study in order to build a preliminary dataset that satisfies all the characteristics and that helps to understand individuals’ be- haviour in activity selection process. Though the discussed dataset is not large-scale, the undertaken user study reveals general trends of users’ behaviour while on board of a cruise or while attending a distributed event. Moreover, we have discussed the challenges faced by the problem of itinerary recommendation and have illustrated them with the performed data analysis. As future work, we plan to create a dataset via crowdsourcing using CrowdFlower platform. The characteristics presented in Sec. 6 The average number of joined activities per day is 18. RecTour 2017, August 27th, 2017, Como, Italy. 34 Copyright held by the author(s).