=Paper=
{{Paper
|id=Vol-1685/paper3
|storemode=property
|title=Understanding the Impact of Weather for POI Recommendations
|pdfUrl=https://ceur-ws.org/Vol-1685/paper3.pdf
|volume=Vol-1685
|authors=Christoph Trattner,Alexander Oberegger,Lukas Eberhard,Denis Parra,Leandro Balby Marinho
|dblpUrl=https://dblp.org/rec/conf/recsys/TrattnerOEPM16
}}
==Understanding the Impact of Weather for POI Recommendations==
Understanding the Impact of Weather for POI Recommendations ⇤ Christoph Trattner Alexander Oberegger Know-Center, Austria TUG, Austria Lukas Eberhard Denis Parra Leandro Marinho TUG, Austria PUC, Chile UFCG, Brasil ABSTRACT with millions of subscribers doing millions of check-ins ev- POI recommender systems for location-based social network eryday all over the world2 . This vast amount of check-in services, such as Foursquare or Yelp, have gained tremen- data, publicly available through Foursquare’s data access dous popularity in the past few years. Much work has been APIs, has inspired many researchers to investigate human dedicated into improving recommendation services in such mobility patterns and behaviors with the aim of assisting systems by integrating di↵erent features that are assumed users by means of personalized POI (point of interest) rec- to have an impact on people’s preferences for POIs, such as ommendation services [15, 16]. time and geolocation. Yet, little attention has been paid to Problem Statement. The problem we address in this the impact of weather on the users’ final decision to visit a paper is the POI recommendation problem. Hence, given recommended POI. In this paper we contribute to this area a user u and their check-in history Lu , i.e., the POIs that of research by presenting the first results of a study that aims they have visited in the past, and current weather condi- to recommend POIs based on weather data. To this end, we tions C = {c1 , . . . , c|C| }, where ci are weather features such extend the state-of-the-art Rank-GeoFM POI recommender as temperature, wind speed, pressure, etc., we want to rec- algorithm with additional weather-related features, such as ommend the POIs L̂u = {l1 , . . . , l|L| } that they will likely temperature, cloud cover, humidity and precipitation inten- visit in the future that are not in Lu . sity. We show that using weather data not only significantly Objective. Most of the existing approaches on POI rec- increases the recommendation accuracy in comparison to the ommendation exploit three main factors (aka contexts) of original algorithm, but also outperforms its time-based vari- the data, namely, social, time and geolocation [5, 10, 15]. ant. Furthermore, we present the magnitude of impact of While these approaches work reasonably well, little atten- each feature on the recommendation quality, showing the tion has been paid to weather, a factor that may potentially need to study the weather context in more detail in the light have a major impact on users’ decisions about visiting a of POI recommendation systems. POI or not. For example, if it is raining in a certain place in a certain period of time, the user may prefer to check-in indoor POIs. Keywords In this paper we contribute to this area of research by POI Recommender Systems; Location-based services; Weather- presenting the first results of a recently started project that Context exploits weather data to recommend, for a given user within a given city, the POIs that they will likely visit in the fu- ture. To this end, we extract several weather features based 1. INTRODUCTION on data collected from forecast.io such as temperature, cloud Location-based social networks (LBSN) enable users to cover, humidity or precipitation intensity, and feed it into a check-in and share places and relevant content, such as pho- state-of-the-art POI recommender algorithm called Rank- tos, tips and comments that help other users in exploring GeoFM [10]. The reason why we decided to build our ap- novel and interesting places in which they might not have proach on top of this algorithm is twofold: (i) Rank-GeoFM been before. Foursquare1 , for example, is a popular LBSN has shown to outperform other strong baselines from the lit- ⇤Corresponding author: ctrattner@know-center.at erature and (ii) it is very easy to extend it with additional 1 contextual data. https://foursquare.com/ Research Questions. To drive our research the follow- ing three research questions were defined: • RQ1. Do weather conditions have a relation with the check-in behavior of Foursquare users? • RQ2. Is it possible to improve current POI recom- mendation quality using these weather features? Copyright held by the author(s). • RQ3. Which weather features provide the highest im- RecTour 2016 - Workshop on Recommenders in Tourism held in conjunc- pact on the recommendations? tion with the 10th ACM Conference on Recommender Systems (RecSys), 2 September 15, 2016, Boston, MA, USA. https://foursquare.com/infographics/10million City #Check-Ins #Venues #Users Sparsity Sym. Description Minneapolis 37,737 797 436 89.1% U set of users u1 , u2 , ..., u|U | Boston 42,956 1141 637 94.3% L set of POIs l1 , l2 , ..., l|L| Miami 29,222 796 410 91.0% F Cf set of classes for feature f Honolulu 16,042 410 173 77.4% F set of weather feature classes f1 , f2 , ..., f|F Cf | ⇥ latent model parameters containing the learned weights Table 1: Basic statistics of the dataset. {L(1) , L(2) , L(3) , U (1) , U (2) , F (1) } for locations, users and weather features. Xul |U | ⇥ |L| matrix containing the check-ins of users at POIs. Contributions. To the best of our knowledge, this is the Xulc |U | ⇥ |L| ⇥ |F Cf | matrix containing the check-ins of users at first paper that investigates in detail the extent to which POIs at a specific feature class c. D1 user-POI pairs: (u, l)|xul > 0. weather features such as temperature, cloud cover, humidity D2 user-POI-feature class triples: (u, l, c)|xulc > 0. or precipitation intensity impact on users’ check-in behav- W geographical probability matrix of size |L|x|L| where wll0 iors and how these features perform in the context of POI contains the probability of l0 being visited after l has been visited according to their geographical distance. wll0 = (0.5+ recommender systems. Although there is literature showing d(l, l0 )) 1) where d(l, l0 ) is the geographical distance between that POI recommender systems can be improved by using the latitude and longitude of l and l0 . some kind of weather context such as e.g. temperature, it is WI probability that a weather feature class c is influenced by not clear yet, how much they add or what type of weather feature class c0 . wicc0 = cos sim(c, c0 ). Nk (l) set of k nearest neighbors of POI l. feature is the most/least useful one. Another contribution yul the recommendation score of user u and POI l. of this paper is the introduction of a weather-aware rec- yulc the recommendation score of user u, POI l and weather fea- ommender method that builds upon a very strong state-of- ture class c. I(·) indicator function returning I(a) = 1 when a is true and 0 the-art POI recommender system called Rank-GeoFM. The otherwise. method is implemented and embedded into the very pop- ✏ margin to soften ranking incompatibility. ular recommender framework MyMediaLite [7] and can be w learning rate for updates on weather latent parameters. g learning rate for updates on latent parameters from base ap- downloaded for free from our GitHub repository (details in proach. Section 8). E(·) a function that turns the rating incompatibility Outline. The structure of this paper is as follows: In Sec- Incomp(yulc , ✏), that counts the number of locations l0 2 L that should be ranked lower than l at the current tion 2 we highlight relevant related work in this field. Sec- weather context c and user Pu but are ranked higher by the tion 3 describes how we enriched Rank-GeoFM with weather model, into a loss E(r) = r 1 i=1 i . data. Section 4 describes the experimental setup and presents ucll0 function to approximate the indicator function with a contin- 1 uous sigmoid function s(a) = 1+exp( . ucll0 = s(yul0 c + results from our empirical analysis. Section 5 presents in- a) ✏ yulc )(1 s(yul0 c + ✏ yulc )) sights on the results obtained with our weather-aware rec- |L| b n c if the nth location l0 was ranked incorrect by the model the ommender approach. Finally, Sections 6 and 7 conclude the |L| expactation is that overall b n c locations are ranked incor- paper with a summary of our main findings and future di- rect. rections of the work. g, µ auxiliary variable that save partial results of the calculation of the stochastic gradient. 2. RELATED WORK With the advent of LBSNs, POI recommendation rapidly Table 2: The notations used to describe Rank-GeoFM and became an active area of research within the recommender the incorporation of the weather context. systems, machine learning and Geographic Information Sys- tems research communities [2]. Most of the existing research works in this area exploit some sort of combination between in a more recent and state-of-the-art algorithm, and we some (or all) of the following data sources: check-in history, also provide details of which weather features contribute social relations (e.g. friendship relations), time and geolo- the most to the recommender performance. In an exten- cations [1, 5, 6, 8, 10, 13, 15]. While these di↵erent sources sion of their initial work, Braunhofer et al. [4] implemented of data (aka contexts) a↵ect the user’s decision on visiting and evaluated a context-aware recommender system which a POI in di↵erent ways, weather data, which according to uses weather data. They find that the model which lever- common sense may have a great influence on this decision, ages the weather context outperformed the version without are still rarely used. it. Although more similar to our current work, they did not Martin et al. [11] proposed a mobile application which provide a detailed feature analysis as the present article. architecture considered the use of weather data to person- In summary, compared to previous works which have used alize a geocoding mobile service, but no implementation or weather as a contextual factor for recommendation systems, evaluation was presented. A similar contribution was done we provide detailed information about the recommendation by Meehan et al. [12], who proposed a hybrid recommender algorithm and we contribute an implementation extending system based on time, weather and media sentiment when a state-of-the-art matrix factorization model exploiting rich introducing the VISIT mobile tourism recommender, but weather data. Moreover, we also provide details on how the they neither implemented nor evaluated it. weather features were exploited by it, as well as a detailed Among the few works that have actually used weather analysis about the impact of the features on the recommen- in the recommendation pipeline, Braunhofer et al. [3] intro- dation performance. duced a recommender system designed to run in mobile ap- plications for recommending touristic POIs in Italy. The au- thors conducted an online study with 54 users and found out 3. RECOMMENDATION APPROACH that recommendations that take into consideration weather Our recommendation approach is built upon a state-of- information were indeed able to increase the user satisfac- the-art POI recommender algorithm named Rank-GeoFM tion. Compared to this work, our implementation is based [10], a personalized ranking based matrix factorization method. Algorithm 1: Rank-GeoFM with weather context transforming continuous values of weather features (e.g., temperature) into intervals might alleviate this problem. Input: check-in data D1 , D2 , geographical influence matrix Hence, a mapping function is introduced (see Equation 1) W , weather influence matrix W I, hyperparameters ✏, C, ↵, and learning rate g and w that converts the weather features into interval bins. |F Cf | Output: parameters of the model defines the number of bins for the current weather feature. ⇥ = {L(1) , L(2) , L(3) , U (1) , U (2) , F } We will refer to these bins as feature classes. The best re- 1 init: Initialize ⇥ with N (0, 0.01); Shu✏e D1 and D2 sults were obtained with |F Cf | = 20 (validated on hold-out randomly data). 2 repeat ⌫ 3 for (u, l) 2 D1 do (value min(f )) · (|F Cf | 1) 4 approach from Li et al. [10] cf (value) = (1) (max(f ) min(f )) 5 end 6 for (u, l, c) 2 D2 do To extend the original Rank-GeoFM approach with weather 7 Compute yulc as Equation 3 and set n = 0 context, three additional latent factors are introduced that 8 repeat 9 Sample l0 and c0 , Compute yul0 c0 as are represented by matrices in a K-dimensional space. The Equation 3 first one is for incorporating the weather-popularity-score 10 n++ that models whether or not a location is popular with re- 11 until I(xulc > xul0 c0 )I(yulc < yul0 c0 + ✏) = 1 spect to a specific weather feature class and is named L(2) 2 or n > |L| R|L|⇥K , where K denotes the size of the latent parameter 12 if I(xulc > xul0 c0 )I(yulc < yul0 c0 + ✏) = 1 space. Furthermore, a matrix L(3) 2 R|L|⇥K is introduced then ⇣j k⌘ to model the influence between two feature classes. In other |L| 13 ⌘=E n ucll0 words, L(3) softens the borders between the particular fea- 14 g= ⇣P ⌘ ture classes. The third latent parameter F (1) 2 R|F Cf |⇥K (1) P (1) is then used to parametrize the feature classes of the spe- c⇤ 2F Cf wic0 c⇤ fc⇤ c+ 2F Cf wicc+ fc+ cific weather feature. In addition to the latent parameters, (1) (1) (2) (2) a Matrix W I 2 R|F Cf |⇥|F Cf | is introduced for storing the 15 fc fc w ⌘(ll0 ll ) (3) (3) probability that a weather feature class c is influenced by 16 ll ll w ⌘g feature class c0 . Denoting xulc as the frequency that a user (2) (2) 17 ll 0 ll 0 w ⌘fc u checked-in POI l with the current weather context c, this 18 (2) ll (2) ll + w ⌘fc probability is calculated as follows: 19 end P P l2L xulc xulc 0 u2U 20 Project updated factors to accomplish wicc0 = qP P qP P (2) constraints 2 2 21 end u2U l2L xulc u2U l2L xulc0 22 until convergence (1) , L(2) , L(3) , U (1) , U (2) , F (1) } To calculate the recommendation score for a given user u, 23 return ⇥ = {L POI l and weather feature class c, Equation 3 is introduced, where yul denotes the recommendation score as computed in Li et al. [10]. We have selected Rank-GeoFM over other alternatives, be- (1) X (1) yul = u(1) u · ll + u(2) u · wll⇤ ll⇤ cause it has been shown to be a very strong POI recom- l⇤ 2Nk (l) mender method compared to other approaches often cited X (3) (2) (3) (1) in the literature. In Li et al. [10] the authors compared yulc = yul + fc(1) · ll + ll · wicc⇤ fc⇤ Rank-GeoFM against twelve other recommender methods, c⇤ 2F C showing that Rank-GeoFM significantly outperforms strong Algorithm 1 describes how we incorporated the weather generic baselines, such as user-KNN, item-KNN CF, WRMF, context features into the base Rank-GeoFM approach. Tak- BPR-MF [7] as well as specialized POI recommender meth- ing the initialization and the hyperparameters from the orig- ods, such as BPP [17]. Another reason for choosing Rank- inal approach, we first iterate over all pairs of users and POIs GeoFM is related to its ability to easily accommodate addi- (u, l) 2 D1 , where D1 is the set of all check-ins and do the tional features, such as the ones that we plan to use in this adjustments of the latent parameters as described in Li et work. The aim of Rank-GeoFM is to learn latent parameters al. [10]. that model the relationship between the context of interest We then introduce an iteration over alltriples (u, l, c) 2 D2 in order to adjust the Table 2 describes the symbols used in the recommender latent parameters on the incorrect ranked venues according algorithm. For each type of contextual data considered, la- to the specific weather context. This adjustment is necessary tent model parameters are introduced. The prediction score because the algorithm might rank a triple (u, l, c) correctly of a triple is then made based on this where on the other hand (u, l, c0 ) might be ranked incor- learned latent parameters. The parameters are trained us- rectly. The adjustments are then done accordingly to the ing a fast learning scheme introduced by the authors that is base algorithm in lines 6-20. based on Stochastic Gradient Descent (SGD). During our studies we found that with a learning rate To add the weather context into Rank-GeoFM, the weather of g = .0001, as used in Li et al. [10], the algorithm did features’ values needed to be discretized. This was done to not converge. The reason for that is that the adjustments reduce data sparsity. For example, if we considered tem- are done on a higher granularity for each (u, l, c) triple and perature as a real number, most of the check-ins concerning not just on the (u, l) level. Henceforth, we introduce a new specific temperature values would probably be zero. Thus, learning rate parameter w = .00001 for the weather con- (a) Cloud cover (b) Visibility (c) Moonphase (d) Precipitation intensity (e) Pressure (f) Temperature (g) Humidity (h) Windspeed Figure 1: Check-in distributions over the eight weather features. 4.1 Datasets The dataset we used in this study was obtained from the work of Yang et al. [14]. It is a Foursquare crawl comprising user check-in data from April 2012 to September 2013. The original dataset contains more than 33 million check-ins from 415 cities in 77 countries. However, before dealing with our problem on such a large scale, we decided to first concentrate our investigation on a small set of US cities. We selected four cities that could represent some weather variety in order (a) “Austrian Restaurant” (b) “Farm” to investigate whether our model is robust to such variety of weather conditions (see Figure 3). Table 1 provides an overview of the check-in statistics of the four target cities chosen for our experiments: Minneapolis, Boston, Miami and Honolulu. Concerning the weather information, we have used the API of forecast.io3 to collect, for each