Ski Resorts Recommendation using Deep Neural Networks

  YASMINE SERDOUK, TIMOTHÉE COUBLE, ERIC COUBLE, and CÉDRIC MARCONE, Valraiso -
  e-Commerce Web Agency, France

  In this work, we present a recommender system based on deep neural networks for a novel application that predicts suitable ski resorts
  for a web agency customers. A special focus is paid to the features construction where they are created from historical heterogeneous
  data according to a user-friendly questionnaire. The evaluation is performed through the accuracy metric and the obtained results show
  a promising performance with a TOP-10 accuracy that reaches 98.95% and 99.30% for French and non-French customers, respectively.

  CCS Concepts: • Computing methodologies → Artificial intelligence; Machine learning approaches.

  Additional Key Words and Phrases: neural networks, deep learning, recommender systems, ski resorts


  1     INTRODUCTION
  Traditionally, Recommender Systems (RS) are designed to predict missing ratings and provide accurate top-N rec-
  ommendation for queries. However, traditional techniques only use user ratings for the different items and can not
  exploit feature vectors that can better capture the complex relations between the user and items. Thus, deep learning
  has been changing dramatically recommendation architectures lately [6, 8] with MultiLayer Perceptron (MLP) [2, 7],
  AutoEncoder (AE) [1, 13], Convolutional Neural Networks (CNN) [9] and other hybrid methods [11, 14].
      This work proposes a novel RS deep learning based for a web business company in order to provide their customers
  with the best online experience. The web company, called Valraiso makes commercial websites for the French Ski School
  (ESF: École de Ski Françaises) and this work aims at recommending suitable Ski resorts for the customers visiting these
  websites. In other words, we want to develop an online interface that communicates with clients to determine their
  needs and collect data, such as the budget, the customer’s location, etc. Then, on the basis of these data and the history
  of other previously collected orders, we build a model through deep learning techniques in order to recommend suitable
  ski resorts. In the literature, some works related to tourism have been explored like in [4] where authors analyse ski-lift
  transportations data to shape the decision making process in order to improve safety in ski resorts. Another interesting
  work treats about RS for the travel and hospitality industries with a skimatcher application [3]. Also, a case study was
  carried out in [5] about a constraint-based RS that was integrated into a travel advisory system for an Austrian spa
  resort.
      The rest of this paper is organized as follows. Section 2 presents the dataset that we created and the process we went
  through for collecting and cleaning it. Section 3 is for to the experimental results. Finally, main conclusions and future
  works are drawn in Section 4.

  2     DATASET CONSTRUCTION
  The used dataset is named SRESP for "Ski RESorts Prediction". Initially, the Valraiso company has archived various
  heterogeneous data since its creation. In order to know which data to select, we first define the questionnaire.

  2.1    Define the questions
  In a real time context, a questionnaire interacts with the customer and asks few questions in order to recover his needs
  and desires. From the customer’s responses, we reconstruct a feature vector that will be introduced into the prediction


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                 Serdouk and Couble, et al.


            model for recommendation. In order to allow a user-friendly interface, we need to have few questions and they need to
            be concise and allow simple answers divided into categories as follows.

               (1) What kind of skier(s) are you ? (In the case of a group or a family, please specify the average level)
                      Beginner, Intermediate, Confirmed, or Expert
               (2) What type of ski area suits you best ? Small, Medium, Large, or Very Large resorts,
               (3) How high would you like to ski ? Low, Average, High, or Very high altitude,
               (4) What week do you plan to go ?
               (5) What is your budget level for this trip ? Low, Average, or No limit cost
               (6) Where do you live ?
               (7) How old are your children ? Under 6 years old, Between 6 and 12 years old, More than 12 years old, or No children

            2.2      Data Selection and Creation
            From the archived data of Valraiso, we select the most relevant features that can contribute to the choice of a ski resort
            and which can be used to reconstruct the responses of the defined questionnaire. The dataset is organized, such that
            each row is a client order for a specific ski resort while the columns represent the features. The created features are:

            2.2.1     Group. This feature regroup four information about the nature of the group with different numeric values.

                    • The existence of at least one child under 6 years old: numeric value 3
                    • The existence of at least one child aged between 6 and 12: numeric value 5
                    • The existence of children in both age categories, which means aged under 12: numeric value 8
                    • Or no children within the order, which implies only teenagers or adults in the order: numeric value 0

            2.2.2     Budget. As most orders are for at least one week, we include the weekly pass fees and the total cost in euros
            within this feature. Also, as the daily_pass information is for one person (an adult), it is multiplied by the number of
            people included in the order. 𝐵𝑢𝑑𝑔𝑒𝑡 = (5 × 𝑑𝑎𝑖𝑙𝑦_𝑝𝑎𝑠𝑠 × (𝑛𝑏_𝑝𝑒𝑜𝑝𝑙𝑒)) + 𝑡𝑜𝑡𝑎𝑙𝐸𝑢𝑟𝑜
               Since the questionnaire allows only few categories of answers, feature values are categorized.

                    • Low cost category ∈]0, 300]€ having a numeric value 0, Average cost category ∈]300, 400]€ having a numeric
                      value 1, and a No limit cost category ∈]400, +∞[€ having numeric value 2.

               The range of each category of the selected features is chosen such that the data are fairly distributed over each group.
            Each category is converted into a unique numeric value.

            2.2.3     Delta_week. The delta between the stay week and the validation week (i.e. the week the order was placed) is
            important and can influence greatly the choice of the ski resort. 𝐷𝑒𝑙𝑡𝑎_𝑤𝑒𝑒𝑘 = 𝑠𝑡𝑎𝑦_𝑤𝑒𝑒𝑘 − 𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛_𝑤𝑒𝑒𝑘

            2.2.4     stay_week. In addition to the delta order, the stay week has an important influence on the choice of the resort

            2.2.5     altitude_max. Some customers like to ski at very high altitude while others (that have children for instance)
            would choose resorts with a low or an average altitude. Four categories representing each of:

                    • Low altitudes ∈]0, 2100] meters having a numeric value 0, Average altitudes ∈]2100, 2550] meters having
                      a numeric value 1, High altitudes ∈]2550, 2800] meters having a numeric value 2, Very high altitudes ∈
                      ]2800, +∞[ meters having a numeric value 3


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Ski Resorts Recommendation using Deep Neural Networks


                                                  Table 1. Composition of the dataset.


            dataset         # classes      # samples at training          # samples at test   total number of samples
      SRESP_fr                 147                   916451                     392766                 1309217
    SRESP_nonFr                147                   160365                      68729                  229094


  2.2.6     slopes_difficulty. This feature regroup the number of green (nb_g), blue (nb_b), red (nb_r) and black (nb_black)
  ski slopes to create a new feature that gives a notion of the resort’s difficulty and though, infer the average level of
  skiers in the order. Thus, it is calculated so as to slightly weight nb_g and nb_b and to heavily weight nb_r and nb_black
  as follows. 𝑠𝑙𝑜𝑝𝑒𝑠_𝑑 = (0.1 × 𝑛𝑏_𝑔) + (0.2 × 𝑛𝑏_𝑏) + (0.6 × 𝑛𝑏_𝑟 ) + 𝑛𝑏_𝑏𝑙𝑎𝑐𝑘
     Then, four categories are created.

          • Beginner level having a slopes_d between ]0, 6.2] with a numeric value 0, Intermediate level having a slopes_d
            between ]6.2, 14.7] with a numeric value 1, Confirmed level having a slopes_d between ]14.7, 22.4] with a
            numeric value 2, Expert level having a slopes_d between ]22.4, +∞[ with a numeric value 3,

  2.2.7     km_slopes. This feature represents the size of the ski area of the resort in kilometers.

          • Small ski resorts having km_slopes between ]0, 70] kilometers with a numeric value 0, Medium ski resorts
            having km_slopes between ]70, 146] kilometers with a numeric value 1, Large ski resorts having km_slopes
            between ]146, 160] kilometers with a numeric value 2, Very large ski resorts having km_slopes between
            ]160, 250] kilometers with a numeric value 3

  2.2.8     country_id. The country from which the order has been made is converted into a numeric value. The dataset
  contains 174 different countries, the numeric values vary from 0 to 173.

  2.2.9     dep. This feature is only dedicated to customers who ordered from France. Indeed, as more than 85% of the
  orders in the dataset are made from French customers and according to the market experts, the French and non French
  customer’s behaviours are different, we decided to build two models. One for the French customers and the other for
  the remainder countries. Therefore, dep depicts for the department of French customers.


  2.3      SRESP construction
  Note that two different inputs are presented into the machine learning. The first is for French and the second for
  non-French customers.
     [𝐺𝑟𝑜𝑢𝑝, 𝑠𝑡𝑎𝑦_𝑤𝑒𝑒𝑘, 𝐷𝑒𝑙𝑡𝑎_𝑤𝑒𝑒𝑘, 𝐵𝑢𝑑𝑔𝑒𝑡, 𝑎𝑙𝑡𝑖𝑡𝑢𝑑𝑒_𝑚𝑎𝑥, 𝑠𝑙𝑜𝑝𝑒𝑠_𝑑𝑖 𝑓 𝑓 , 𝑘𝑚_𝑠𝑙𝑜𝑝𝑒𝑠, 𝑑𝑒𝑝],
     [𝐺𝑟𝑜𝑢𝑝, 𝑠𝑡𝑎𝑦_𝑤𝑒𝑒𝑘, 𝐷𝑒𝑙𝑡𝑎_𝑤𝑒𝑒𝑘, 𝐵𝑢𝑑𝑔𝑒𝑡, 𝑎𝑙𝑡𝑖𝑡𝑢𝑑𝑒_𝑚𝑎𝑥, 𝑠𝑙𝑜𝑝𝑒𝑠_𝑑𝑖 𝑓 𝑓 , 𝑘𝑚_𝑠𝑙𝑜𝑝𝑒𝑠, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑖𝑑],
     On the other hand, Table 1 details the composition of the dataset. It can be downloaded from [12]. We split SRESP
  into two subsets: training and test. As our dataset has unbalanced classes, the train-test split is made according to the
  number of samples on each class, i.e. 70% of the samples of each class are used for training while the rest are for the
  evaluation. Finally, as the feature values have widely different ranges, a min-max scaling is done in the range of [0-1]
  for a faster convergence [10].


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                                Serdouk and Couble, et al.


                                       Table 2. Accuracy (%) results using DNN for both French and non-French data.


                  TOP Accuracy          TOP-1      TOP-2      TOP-3      TOP-4       TOP-5      TOP-6      TOP-7      TOP-8      TOP-9      TOP-10
                  French model           52.15      72.95      83.24      89.11      92.77      95.11       96.66      97.75      98.45       98.95
                non-French model         56.50      77.54      88.22      93.14      96.04      97.76       98.67      98.95      99.14       99.30


            3   EXPERIMENTAL RESULTS
            In this work, the parameters are selected through a 5-fold cross validation (5 different train-test splits). On the other
            hand, note that reducing the feature inputs, reduces the information variability and complexity to be introduced into the
            model. Also, categorizing the feature values, further reduces the variability within the values of our inputs. Therefore,
            the challenge is to have a strong architecture that can deal with such inputs. Finally, our problem can be summed up
            into a supervised classification problem that aims to predict resorts for new customers. The evaluation is carried out by
            computing the accuracy metric that assesses overall, how often our model is correct. It is presented for both French and
            non-French models.
                After many experiments, the selected architecture has four hidden layers with 550 nodes each. A ReLU function is
            placed after each layer, except for the last layer where a softmax function is employed. Then, a batch normalization is
            employed to normalize each layer’s inputs. Lastly, a dropout layer is added to prevent over-fitting by ignoring 25% of
            neurons during training. The training is performed through 1000 epochs for the French data and only 100 epochs for
            the non-French data. The Adam Optimizer is used with a batch size of 6360 and an initial learning rate of 0.003 that is
            decreased using the exponential decay: 𝑙𝑟 = 𝑙𝑟 0 ∗ 𝛽 (𝑠𝑡𝑒𝑝/𝛾 ) , where 𝑙𝑟 0 is the initial learning rate, 𝑠𝑡𝑒𝑝 is the current step,
            and 𝛽, 𝛾 are two parameters set to 𝛽 = 0.96 and 𝛾 = 90000.
                The DNN results are presented in Table 2 for French and non-French models. Furthermore, since we want to
            recommend more than one ski resort, we present the accuracy results from TOP-1 to TOP-10. Note that the TOP-n
            accuracy corresponds to the accuracy where the true class matches with any one of the n most probable classes
            predicted by the model. Obviously, the accuracy increases through the TOP results. The results show a promising
            TOP-10 performance with 98.95% and 99.30%, respectively. Another thing to note is that the non-French accuracies are
            always higher than those with French data. This can be explained by the fact that in the non-French input vector we
            have the country_id feature instead of the department feature in the French input. Indeed, since we have 171 countries
            and only 99 different French departments, we think that the countries variability helps more in discriminating between
            the different classes (i.e. resorts). Also, one can note that the TOP-1 accuracy remains rather low, which can be explained
            by the constraint of the questionnaire which allows little variability in the data.

            4   CONCLUSION
            In this paper, the proposed system learns various feature vectors that describe important information specific to both
            customers and target resorts. The obtained results showed promising results using the proposed architecture of DNN
            with a TOP-10 accuracy of 98.95% and 99.30% for French and non-French data, respectively. As further improvements,
            we plan to make the customer online experience better by recommending through a chat bot a full stay in the resort.

            REFERENCES
             [1] Bing Bai, Yushun Fan, Wei Tan, and Jia Zhang. 2020. DLTSR: A Deep Learning Framework for Recommendations of Long-Tail Web Services. IEEE
                 Transactions on Services Computing 13, 1 (Feb. 2020), 73–85. https://doi.org/10.1109/TSC.2017.2681666


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Ski Resorts Recommendation using Deep Neural Networks


   [2] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM
       Conference on Recommender Systems (RecSys ’16). Association for Computing Machinery, Boston, USA, 191–198.
   [3] Joaquin Delgado and Richard Davidson. 2002. Knowledge Bases and User Profiling in Travel and Hospitality Recommender Systems. In Proceedings
       of the 9th International Conference on Information Technologies in Tourism (ENTER 2002). springer, Innsbruck-Austria, 1–16.
   [4] Boris Delibasic, S. Radovanović, M. Jovanovic, and Milija Suknovic. 2020. Improving Decision-Making in Ski Resorts by Analysing Ski Lift
       Transportation—A Review. In Advances in Operational Research in the Balkans. Springer, 265–273.
   [5] Jannach Dietmar, Zanker Markus, and Fuchs Matthias. 2009. Constraint-Based Recommendation in Tourism: A Multiperspective Case Study.
       Information Technology and Tourism 11 (2009), 139–155.
   [6] Travis Ebesu and Yi Fang. 2017. Neural Citation Network for Context-Aware Citation Recommendation. In Proceedings of the 40th International ACM
       SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17). Association for Computing Machinery, Tokyo, Japan, 1093–1096.
   [7] Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, and li Deng. 2014. Modeling Interestingness with Deep Neural Networks. In Proceedings
       of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP ’14). Association for Computational Linguistics, Doha, Qatar,
       2–13.
   [8] Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM
       SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17). Association for Computing Machinery, Tokyo, Japan, 355–364.
   [9] Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware
       Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). Association for Computing Machinery, Boston,
       USA, 233–240.
  [10] GOPAL KRISHNA PATRO and Kishore Kumar Sahu. 2015. Normalization: A Preprocessing Stage. International Advanced Research Journal in Science,
       Engineering and Technology 2, 3 (March 2015), 20–22.
  [11] Maria Nadia Postorino and Giuseppe M. L. Sarnè. 2010. A Neural Network Hybrid Recommender System. In The 20th Italian Workshop on Neural
       Nets,. 180–187.
  [12] skiResortData 2021. Ski Resort dataset. Retrieved September 14, 2021 from https://www.kaggle.com/serdoukyasmine/data-frenchskiresorts
  [13] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid recommender system based on autoencoders. In Proceedings of the 1st Workshop on
       Deep Learning for Recommender System (DLRS ’16). Association for Computing Machinery, Boston, USA, 11–16.
  [14] Muhammet Çakır, Şule Gündüz Öğüdücü, and Resul Tugay. 2019. A Deep Hybrid Model for Recommendation Systems. In International Conference
       of the Italian Association for Artificial Intelligence. Springer, 321–335.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).