=Paper=
{{Paper
|id=Vol-1405/paper-05
|storemode=property
|title=Personalized Location Recommendation by Aggregating Multiple Recommenders in Diversity
|pdfUrl=https://ceur-ws.org/Vol-1405/paper-05.pdf
|volume=Vol-1405
|dblpUrl=https://dblp.org/rec/conf/recsys/LuWMTC15
}}
==Personalized Location Recommendation by Aggregating Multiple Recommenders in Diversity==
Personalized Location Recommendation by Aggregating Multiple Recommenders in Diversity Ziyu Lu Hao Wang Nikos Mamoulis Dept. of Computer Science State Key Lab. for Novel Dept. of Computer Science The Univ. of Hong Kong Software Technology The Univ. of Hong Kong zylu@cs.hku.hk Nanjing University nikos@cs.hku.hk wanghao@nju.edu.cn Wenting Tu David W. Cheung Dept. of Computer Science Dept. of Computer Science The Univ. of Hong Kong The Univ. of Hong Kong wttu@cs.hku.hk dcheung@cs.hku.hk ABSTRACT promotion in cyber-physical space. Location recommendation is an important feature of social network This paper focuses on check-in records of LBSN users. An applications and location-based services. Most existing studies fo- LBSN user may check-in at different locations from time to time, cus on developing one single method or model for all users. By by explicit check-in operations via mobile applications or by im- analyzing real location-based social networks, in this paper we re- plicit actions such as posting photos with location annotations. veal that the decisions of users on place visits depend on multiple Therefore, over time, a user generates a sequence of check-in factors, and different users may be affected differently by these fac- records. It is important to distinguish user-location check-ins from tors. We design a location recommendation framework that com- their analogues in classic recommender systems (e.g., user-item rat- bines results from various recommenders that consider various fac- ings [1]). As Hu et al. [12] have pointed out, a rating is explicit, tors. Our framework estimates, for each individual user, the un- indicating directly whether and how a user likes an item, whereas a derlying influence of each factor to her. Based on the estimation, check-in record is implicit with some unique characteristics: we aggregate suggestions from different recommenders to derive 1. There is no negative feedback in check-in records, i.e., personalized recommendations. Experiments on Foursquare and check-ins do not indicate whether a user dislikes a location. Gowalla show that our proposed method outperforms the state-of- the-art methods on location recommendation. 2. The check-in frequency, even after normalization, is not a reliable indicator of how much a user likes a location. Categories and Subject Descriptors Despite any differences between ratings and check-ins, conven- H.3.3 [Information Search and Retrieval]: Information Filtering tional collaborative filtering (CF) methods (e.g., [1]) are compu- tationally applicable to check-in data. Recently, many new ap- proaches have been proposed, making use of social, geographical, Keywords temporal, and semantic information of LBSN data (e.g., [7, 10, 18, Location recommendation, personalization, aggregation 26] and others discussed later in Section 2). However, most exist- ing methods take unified perspectives towards the recommendation 1. INTRODUCTION problem, though some of them do consider more than one aspects of the check-in data. In other words, most exising methods focus Due to the proliferation of mobile devices, location-based social on developing one single method / model for all users. networks (LBSNs) have emerged. Some of these social networks In this paper we argue, and reveal by analyzing real LBSN are dedicated to location sharing (e.g., Gowalla, Foursquare, etc.) datasets, that the decisions of users on where to go depend on multi- while others provide location-based features together with other so- ple factors, and different users may be affected differently by these cial networking services (e.g., Facebook, Twitter, etc.). In either factors. For example, some users are easily affected by their friends case, users can share with their friends the experience of visiting and do not care much about the places to visit, while some others locations (such as restaurants, view spots, etc.). Location recom- are more independent with more concern of the places. Therefore, mendation is very important for these applications, because it pro- we aim to personalize location recommenders. That being the pur- vides better user experience and may thus contribute to business pose, we propose a general framework to combine multiple rec- ommenders that are potentially useful. A pair-based optimization problem is formulated and solved to identify the underlying prefer- ence of each user. The contributions of this paper are as follows: • By analyzing two real LBSN datasets, Foursquare [10] and Gowalla [6], we reveal the diversity among recommenda- tions made by different recommenders and the diversity of user preferences over recommenders. Copyright held by the author(s). LocalRec’15, September 19, 2015, Vienna, Austria. • We propose a framework for location recommendation. Un- der our framework we are able to personalize location rec- cv are corresponding rows in the check-in matrix C. This is the ommenders to better serve the users. conventional user-based CF method [1]. • We evaluate our method using Foursquare and Gowalla data. R2 : Friend-based CF (FCF). Similarity between users could be The results show that our method outperforms the state-of- reflected by their common friends. For friends u and v, let wu,v = the-art location recommenders. Jaccard (Fu , Fv ), where F∗ refers to the set of friends of a user and Jaccard (·, ·) computes the Jaccard index. This recommender The rest of this paper is organized as follows. By analyzing is proposed by Konstas et al. [14]. Foursquare and Gowalla data, Section 2 shows the diversity in rec- R3 : Friend-location CF (FLCF). In addition to common ommenders and user preferences. Motivated by certain observa- friends, commonly visited locations may as well reflect the close- tions, Section 3 describes LURA, our proposed framework, which ness of two friends: wu,v = η · Jaccard (Fu , Fv ) + (1 − η) · is extensively evaluated in Section 4. Finally, Section 5 presents Jaccard (Lu , Lv ), where u and v are friends; L∗ is the set of lo- some related work and Section 6 concludes the paper. cations that a user has visited [14]. Parameter η ∈ [0, 1] is the weighing between common friends and common locations. Setting 2. DATA AND RECOMMENDERS η = 0.1 achieves the best performance for this method on our data. R4 : Geo-distance CF (GCF). The rationale of GCF is that In this section we first introduce the datasets and formally define nearby friends are more influential than faraway ones. This intu- the location recommendation problem. Then, we select 11 repre- ition is used by Ying et al. [28], where the weight wu,v is a rescale sentative recommenders which, to the best of our knowledge, cover of the geographical distance between u and v: all the factors that the state-of-the-art methods consider in loca- tion recommendation. By analyzing their performance on different geodist (u, v) wu,v = 1 − . users, we demonstrate the diversity of (i) recommendations from maxw∈Fu geodist (u, w) the representative recommenders and (ii) users’ check-in behaviors. The location of each user can be inferred from her most frequently 2.1 Datasets & Recommendation Problem visited locations. R5 : Category CF (CCF). Consider users as keywords and lo- We use Foursquare [10] and Gowalla [6] datasets. In both cation categories as documents; between user u and category C datasets, there are users (U), locations (L), and check-ins each in there can be a relevance score, rel(u, C) (e.g., TF-IDF). To mea- the form of a tuple (u, `, t) recording the event that user u vis- sure the similarity between two users u and v, Bao et al. [2] con- ited location ` at time t. Note that from the tuples we can infer sider the sum of the check-in frequency cu,` of user u to location `. In many recom- Pminimum relevance scores over all categories, i.e., sim(u, v) = C min{rel(u, C), rel(v, C)}. This value is then menders, the matrix C = (cu,` )u∈U ,`∈L is the primary data source. penalized by the difference between users’ randomness in prefer- Associated with each location there is a longitude and latitude co- ences, thus the weight wu,v is ordinate. Friendships are represented as undirected, unweighted pairs of users. In addition, for each location we have collected its sim(u, v) wu,v = , semantic information (i.e., category) from a Foursquare API1 . 1 + |ent(u) − ent(v)| We filtered the datasets to include only users that have at least 10 where ent(·) is the entropy of a user’s preference over categories. check-in records, and locations that are visited at least twice. The resulting Foursquare dataset has 11,326 users, 96,002 locations, 2.2.2 Item-based CF Methods and 2,199,782 check-in records over 364 days; Gowalla is a bit Item-based CF methods take a “transposed” view of the data, i.e., larger, containing 74,725 users, 767,936 locations, and 5,829,873 users are recommended to items instead of the other way round. check-in records over a period of 626 days. When the weight w`,`0 between two locations is properly defined, Top-N location recommendation. Given a user u, the top-N the score of user u to location `, score(u, `), can be computed as location recommendation problem is to suggest to u a list of N P locations, previously unvisited by u, with the expectation that u `0 ∈L w`,` · cu,` 0 0 will be intrigued to visit (some of) these locations. score(u, `) = P . `0 ∈L w`,` 0 2.2 Representative Recommenders R6 : Item-based CF (ICF). Similar to UCF, w`,`0 could be the We select 11 representative recommenders that consider social, cosine similarity cos (c` , c`0 ), where c` and c`0 are corresponding geographical, temporal, as well as semantic aspects of the data. columns in C the check-in matrix [1, 21]. R7 : Time-weighted CF (TCF). Recent check-in records are rel- 2.2.1 User-based CF Methods atively more reliable if we consider any evolution of user pref- User-based CF methods assume that similar users have similar erence. Ding and Li [7] use an exponential discounting func- preferences over locations, thus the score of location ` for user tion f (t) = e−αt to describe temporal bias in w`,`0 . Namely, u, score(u, `), is computed by the similarity-weighted average of w`,`0 = cos (c` , c`0 ) f (tu,`0 ) where tu,`0 is the time of the user’s other users’ visits to `: check-in at `0 . α is the decreasing rate. We set α = 17 so that TCF P achieves the best performance. v∈U wu,v · cv,` score(u, `) = P . 2.2.3 Probabilistic Methods v∈U wu,v Besides the CF family, another methodology of making recom- N locations with the largest scores are recommended to user u. mendations is to estimate the probability of user u visiting location Different weighing schemas (i.e., wu,v ) yield different recom- `, conditioned on the check-in history of u, i.e., menders (R1 -R5 ). R1 : User-based CF (UCF). wu,v could be the cosine similarity score(u, `) = Pr {` |Lu } . between users u and v, i.e., wu,v = cos (cu , cv ),where cu and The most probable N locations are recommended to the user. The 1 https://developer.foursquare.com/ next three recommenders (R8 -R10 ) attempt to estimate Pr {` |Lu }. l pairs of locations that have at least one common visitor, P = 300 80 Number of users Number of users {(`, `0 ) |U` ∩ U`0 6= ∅ }, and report that the set of all distance 250 0 200 60 values, {geodist (`, ` )}(`,`0 )∈P , observes power-law distribution. Q 150 40 b Hence, Pr {` |Lu } = `0 ∈Lu a · geodist (`, `0 ) , where parame- 100 ters a and b are determined via a data fitting process. 50 20 R9 : Kernel density model (KDM). For a user u with past 0 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 0.6 0.7 0.8 0.9 1.0 check-ins at locations Lu = {`1 , `2 , · · · , `n }, this factor estimates Diversity value Diversity value R8 : Power-law model (PLM). Ye et al. [26] study the probability of u visiting pairs P a new location350 ` using kernel tech- (a) 100 Foursquare (b) Gowalla of locations that have at least one common niques: Pr visitor, {` |Lu } = Pn1 n = Kh (geodist300 j=1 (`, `j )), where Kh (·) 80 Number of users Number of users 0 is a scaled kernel [4] trained on an individual 250 basis. h is set to Figure 1: Distribution of diversity values. {(`, ` ) |U` ∩ U`0 6= ∅ }, and report that the 1 set of all distance − d+4 0 n where d = 1 is the dimensionality200 of the data (distance 60 values, {geodist (`, ` )}(`,`0 )∈P , obeys a power-law values). This model distribution. is used by Zhang and Chow Q 150 [37]. 40 Foursquare (Days 1-240 for training and Days R 0 : b Spatial kernel density model (SKDM). This is our Hence, Pr {` |Lu } = `0 ∈Lu a · geodist (`, ` ) , where parame- 10 ommenders using 100 improvement over R9 . Instead of geodist (`, `0 ) we direct- 241-360 for testing). 20 For each user u, each recommender generates ters a and b are determined via a regressionlyprocess. 50 consider distances between latitudes and longitudes and a list of 10 locations. The performance of the recommender is then R9 : Kernel density model (KDM). For Pna user employ withdensity a 2D ukernel past estimation, i.e.,0 Pr0.6 0.5 {` |Lu0.7 } =0.8 0.9 measured1.0 by the0.5 0 number 0.6 of hits 0.7 (i.e.,0.8the number 0.9 1.0 of recommended check-ins at locations Lu = {`1 , `2 , · · · , `nn}, this 1 j=1 K factor − lat(`j ), lon(`) − lon(`j )); here, lat(·) estimates h (lat(`) andvalue locations that are actuallyDiversity Diversity visited byvalue u). Hence, a user u can be lon(·) give the latitude and longitude of a location, and Kh is a depicted by an 11-dimensional vector of which the jth element is the probability of u visiting Pn a new location ` using kernel tech- 1 1 (a) Foursquare the performance of recommender (b) GowallaRj . 2D scaled kernel, with h = n− d+4 = n− 6 . niques: Pr {` |Lu } = n 1 j=1 Kh (geodist (`, `j )), where Kh (·) 2.2.4 Matrix Factorization 1 − d+4 Figure 1: Distribution 3.0 Cluster of diversity values. is a scaled kernel trained on an individual basis. h is set to n 1 Cluster 4 Cluster 7 Cluster Coverage Variance The last recommender belongs to the matrix factorization (MF) 2.5 Cluster 2 Cluster 5 Cluster 8 1 42.55% 0.67 where d = 1 is the dimensionality of thefamily. data MF (distance values). Cluster 3 Cluster 6 Number of hits attempts to find latent structures of the check-in matrix 2.0 2 18.03% 1.76 This model is used by Zhang and Chow [30]. C. In particular, MF tries to find low-rank matrices (i.e., latent 1.5 3 4 7.62% 3.29% 4.80 8.98 R10 : Spatial kernel density model user (SKDM). and location profiles) This P and Q such is our 301-360 that Ĉ =for PQtesting). is a good For each 1.0 user u, each recommender 5generates 12.84% 1.48 approximation 0 of C. A zero-valued entry in may thus have a list of 10 locations. The performance of the recommender c C 6 11.04% is then 2.93 improvement over R9 . Instead of geodist (`, ` ) we directly u,` 0.5 7 4.62% 5.28 a good estimation ĉu,` in Ĉ; recommendations can then be made. consider distances between latitudes and Rlongitudes and em- measured by the 11 : Implicit matrix factorization (IMF). Proposed by Hu et number of hits (i.e., the 0.01 2 3 4 5 6 7 8 9 10 11 number Recommender of recommended ploy locationsMF that for are actually visited by u). of the Hence, clusters a user can be Pna 2D kernel density estimation, al. i.e., Pr is{`a |L [14], this u} modification=of the conventional implicit (a) Centers (b)uStatistics of the clusters 1 j=1 K h (lat(`) − lat(`j ), lon(`) − user )); lon(` j feedbacks here, (such as check-ins). lat(·) and depicted by an 11-dimensional vector of which the jth element is n lon(·) give the latitude and longitude of a2.3 Diversity location, and K inh Recommendations is a the performance of recommender RFigure j. 2: Diversity in user preferences. 1 1 2D scaled kernel, with h = n− d+4 = n− 6 . We now show that the 11 recommenders (R1 -R11 ) generate very different recommendations. We use the first 300 3.0 days of Foursquare and the first 420 days of Gowalla to construct the recommenders, Cluster 1 Cluster 4 Cluster 7 Cluster Coverage Variance 2.2.4 Matrix Factorization 2.5 involving 3,942 and 1,307 users respectively. We compare Cluster 2 Cluster 5 Cluster 3 the top- Cluster 6 Cluster 8 1 1.65% 3.06 Number of hits N suggested locations by different recommenders, 2.0 for N = 10.3 2 65.3% 0.12 The last recommender belongs to the matrix factorization (MF) For two such length-N recommendation lists, 3 5.71% 1.04 1.5 L1 and L2 , Jaccard family. MF attempts to find latent structuresindex of the can check-in be used to matrix measure the their similarity. Therefore for 4 9.01% 0.82 C. In particular, MF tries to find low-rank eachmatrices (i.e., latent user, we measure the diversity of the1.011 recommenders by 5 7.52% 0.75 pair-wise aggregation [13]: 6 4.45% 1.55 user and location profiles) P and Q such that Ĉ = PQ is a good P 0.5 7 3.71% 2.58 approximation of C. A zero-valued entry cu,` in C may thus have 2 · i,j (1 − diversity(L1 , L2 · · · , Ln ) = 0.0Jaccard (Li , Lj )) 1 2 3 4 5 .6 7 8 9 10 11 8 2.65% 0.25 a good estimation ĉ in Ĉ; recommendations can then be made. n · (n − 1) Recommender u,` R11 : Implicit matrix factorization (IMF). Proposed Intuitively, bywill this value Hu be et (a)all Centers close to 0 if Li ’s are of the clusters highly similar, This experiment(b)involves Statistics of Foursquare 4,694 the clustersusers. To present and otherwise be close to 1 if they are pairwise different. their preferences, we group them into 7 clusters using the K-means al. [12], this is a modification of the conventional MF for implicit Figure 1 shows the distribution of diversity values. As we can algorithm. Figure 2(a) shows the centers of the 7 clusters in paral- user feedbacks (such as check-ins). see, the values are all distributed around 0.9. Specifically, Figurein2: Diversity in user lel coordinates, preferences. where an 11-dimensional vector is represented as a Foursquare, the values are all in the range [0.77, 0.99] with an av- sequence of 11 values. Figure 2(b) shows some statistics of each 2.3 Diversity in Recommendations erage of 0.93, whereas in Gowalla the minimum, maximum, and To present average values are 0.58, 0.99, and 0.93 respectively. This the cluster. For example, the entry “1,42.55%,0.67” means that, Clus- preferences of 3,942 Foursquare users, we group indicates ter 1 contains 42.55% of the users; and the variance within the clus- We now show that the 11 recommenders that (R1the -R11 ) generate very 11recommenders these recommendations. make very different users into 8 clusters ter using theFrom is 0.67. K-means algorithm. this experiment we see(The num- that the 11 recommenders different recommendations. We use the first2.4 300 days of Foursquare Diversity in User Preferences ber 8 is chosen to get the perform differently on from best modularity the 7 clusters. the resulting clus- and the first 420 days of Gowalla to construct the recommenders, The above analysis supports our claims: (i) different recom- ters.) over Users may also have different preferences Figure 2(a) on the aspects shows menders the centers of the make very 8 clusters different in parallel recommendations and (ii) different involving 3,942 and 1,307 users respectively. Werecommenders which compare the are top- coordinates, based (i.e., friendship, where geographical dis-an 11-dimensional users have differentvector is represented behavioral as ato visiting lo- patterns with regard 2 10 suggested locations by different recommenders. Forunderstand tance, etc.). To two such user preferences, we test sequence ofthe 1111values. rec- cations. 2(b) Figure This basically shows indicates that a goodof some statistics recommender each for one length-10 recommendation lists, Jaccard index Notecan thatbe used toamea- recommender cluster. person is not necessarily good for another. Combining different 3 sometimes For may fail to example, generate a list the entry “1, 1.65%, 3.06” means that, Cluster recommenders may produce better recommendations, though, s- sure their similarity. Therefore for each user, of we length 10. For example, measure the diver- FCF (R2 ) requires that the target user has some friends but loners do exist in 1 contains LBSNs. 1.65% In such cases,ofwethe users, and therecommenders ince multiple variance withintogetherthe maycluster cover a is wider range of sity of the 11 recommenders by pairwise aggregation complement the [11]: 3.06.popular length-10 list with the most We see that the 11 recommenders locations. perform factors that potentially affect a differently on the user’s behavior. P 8 clusters. 2 · i,j (1 − Jaccard (Li , Lj )) diversity(L1 , L2 · · · , Ln ) = . The above analysis supports our claims: (i) different recom- n · (n − 1) menders make very different recommendations and (ii) different Intuitively, this value will be close to 0 if Li ’s are all highly similar, users have different behavioral patterns in visiting locations. This and otherwise be close to 1 if they are pairwise different. basically indicates that a good recommender for one person is Figure 1 shows the distribution of the diversity values. In not necessarily good for another. Combining different recom- Foursquare, the diversity values are all between 0.79 and 0.99, with menders may produce better recommendations, since multiple rec- an average of 0.92, whereas in Gowalla the minimum, maximum, ommenders together may cover a wider range of factors that poten- and average values are 0.58, 0.99, and 0.93 respectively. This indi- tially affect users’ behaviors. cates that the 11 recommenders generate very different results. 3. OUR METHOD 2.4 Diversity in User Preferences In this section we explain LURA, our framework for combining Users may also have different preferences over the aspects on different location recommenders. (The name “LURA” comes from which recommenders are based (i.e., friendship, geographical dis- its execution cycle of Learn-Update-Recommend-Aggregate.) Al- tance, etc.). To understand user preferences, we test the 11 rec- gorithm 1 outlines the steps of LURA, which could be repeated ommenders using Foursquare (Days 1-300 for training and Days every ∆t time (e.g., every 3 days or 1 week, as long as there is ne- 2 Note that sometimes a recommender may fail to generate a list cessity in providing new recommendations and there are sufficient of length 10. For example, FCF (R2 ) requires that the target user data). LURA is based on two important elements: recommenders has some friends but loners do exist in LBSNs. In such cases, we and user preferences; every time LURA runs, it keeps these ele- complement the length-10 list with the most popular locations. ments up to date. Specifically, if invoked at time t, LURA first learns the user’s current preferences αut (i) (i.e., weights) over dif- probability of observing Put given the preference αtu . Assuming ferent recommenders; this is done by testing the recommendations that pairs (`, `0 ) ∈ Put are independent, then made at time t − ∆t against the actual check-ins during the pe- Y riod (t − ∆t, t] (Line 1). Then, LURA updates the recommenders Pr Put αtu = Pr ` >u `0 αtu . to make use of all the data available (Line 2). After that, LURA (`,`0 )∈Pu t makes recommendations to users, by aggregating the results of the Computing Pr {` >u `0 |αu } is nontrivial, but an intuition is component recommenders (Lines 3-4). that this probability should be proportional to the difference be- tween scores s(u, `; αu ) and s(u, `0 ; αu ): the larger the difference Algorithm 1 LURA(Gt , Rt−∆t , u) s(u, `; αu ) − s(u, `0 ; αu ), the more confident we will be on con- Input: Gt : the snapshot of LSBN up to the t-th (current) day; Rt−∆t : cluding ` >u `0 . Based on this idea, we set a set of n recommenders, trained using Gt−∆t ; u: a user 0 Output: N recommended locations for user u at time t Pr ` >u `0 |αu = σ d`,` u (αu ) , . At time t − ∆t each recommender Rit−∆t has recommended N loca- h i where σ is the logistic function that can generate a probability dis- tions to u, Rit−∆t (u) = `t−∆t . 0 ij j=1,2,··· ,N tribution in the range of [0,1], σ(x) = 1+e1−x , and d`,` u (αu ) = 1: Learn user u’s current preferences, αtu (i), on each recommender s(u, `; αu ) − s(u, `0 ; αu ). For the ease of algorithm design, the Rit−∆t , based on recommendations Rit−∆t (u) and u’s check-in facts actual objective function being optimized is during the time period (t − ∆t, t] X 0 2: Update Rt−∆t to Rt (or rebuild from scratch) using Gt ln Pr Put |αu = ln σ d`,`u (αu ) . 3: Recommend: each recommender Rit recommends N locations to u, in (`,`0 )∈Pu t the form of N location-score pairs, t h Rii (u), Rit (u) = `tij . The learning approach (L EARN P REFERENCE) for finding αtu j=1,2,··· ,N 4: Aggregate recommendations Rt (u) = Rit (u) i=1,2,··· ,n using based on αut−∆t uses a stochastic gradient descent technique, as shown in Algorithm 2. Since the training space Put is large, sam- weights αtu = αtu (i) i=1,2,··· ,n to generate the final recommen- dation of N locations to u Algorithm 2 L EARN P REFERENCE(Put , αt−∆t u , M, K) t−∆t Input: Put = Lt+ × Lt− ; αu ; number of iterations M ; number of Although maintaining and executing individual recommenders samples K (Lines 2-3) is straightforward, LURA’s novelty lies in the learning Output: updated user preference αtu of user preferences αtu (Section 3.1)3 and in the strategies for ag- 1: for j = 1, 2, · · · , M do gregating the outputs Rut of different recommenders (Section 3.2). 2: α(j) ← αt−∆tu In general, the aggregation is a linear combination: 3: Draw K sample pairs from the training space Put n 4: for each sample pair p = (`, `0 ) do X 5: α(j) ← update with p and α(j) (gradient descent) st (u, `; αtu , ϕ) = αut (i) · ϕ Rit (u, `) . (1) i=1 6: αu ← M t 1 PM j=1 α (j) Here Rit (u, `) is the score of user-location pair (u, `) estimated by ples are used for learning (Lines 3-5). The gradient descent update recommender Rit ; ϕ(·) is a strategy for handling individual scores. in Line 5 is done with the following formula: When there is no ambiguity on αtu or ϕ in the context, we may omit one or both of them to simplify the notation. ∇α(j) d`,` 0 α(j) u α(j) ← (1 − γ) · α(j) + γ · τ · , (2) 3.1 Learning User Preferences 1 + edu `,`0 At time t, LURA has a set of recommenders Rt−∆t = R1t−∆t , R2t−∆t , · · · , Rn t−∆t , and each Rit−∆t has suggested N where γ ∈ (0, 1) is the learning rate, τ the step size, and 0 locations to user u at time t − ∆t. New data during the period ∇α(j) d`,` u the gradient: (t − ∆t, t] are used to evaluate the quality of each Rit−∆t with re- gard to user u. This evaluation eventually results in weights αut (i), ϕ R1t−∆t (u, `) − ϕ R1t−∆t (u, `0 ) ϕ R2 0 which are then used to guide the aggregation process. 0 t−∆t t−∆t (u, `) − ϕ R2 (u, ` ) ∇α(j) d`,` = .. . (3) As user check-in data are implicit, we utilize a pairwise method- u . ology for preference learning. Consider a user u and her visiting t−∆t t−∆t 0 ϕ Rn (u, `) − ϕ Rn (u, ` ) history up to time t, Ltu ; let Lt+ = Ltu \ Lt−∆t u and Lt− = {`|` 6∈ 0 t t t Lu }. For two locations ` ∈ L+ and ` ∈ L− , to some extent it is After M independent iterations, Algorithm 2 terminates with a per- reasonable to assume that u prefers ` to `0 (denoted as ` >u `0 ). sonalized weight αtu (Line 6). Clearly, a good recommendation should be highly consistent with The quality of samples (Line 3) are of essential importance in 0 >u . This means, for the linear aggregation of Formula 1, pairs Algorithm 2. Intuitively, if a sampled pair (`, ` ) is such that (`, `0 ) ∈ Put = Lt+ × Lt− can be used to tune the weights αut (i). ϕ Rit−∆t (u, `) ≈ ϕ Rit−∆t (u, `0 ) for all Rit−∆t (i.e., no rec- We consider maximizing the likelihood Pr Put αtu , which is the ommender has distinct preference over ` and `0 ), then it does not 0 3 provide much information for learning (i.e., ∇α(j) d`,`u ≈ 0). It is worth mentioning that a trivial implementation of LURA is Therefore, we use the strategies proposed by Rendle and Freuden- to put equal weights on component recommenders. This, however, thaler [19] to sample informative pairs. Since the size of Lt+ is usually leads to very bad recommendations due to the diversities we studied in Section 2 (Figures 1-2). Indeed, the essence of learning typically small, the number of samples K is usually proportional is to identify those good recommenders out of a large population of to |Lt+ |, e.g., K = K0 · |Lt+ | for some integer K0 > 1. There- bad ones, with regard to some individual user. fore, sampling in Lt+ is to simply sample each ` ∈ Lt+ uniformly at random (i.e., K0 times on average), and strategies for sampling and constructing a LURA recommender; and (iii) a testing period informative pairs in fact focus on the set Lt− . t Ttest = (t, t + ∆t] for evaluating the recommenders. Competitor Random sampling (RS). This is the basic strategy. Locations in methods that do not require any learning of user preferences are Lt− are selected uniformly at random, i.e., built using all data in Tbaset t ∪ Tlearn = [1, t], so that any comparison between them and LURA is fair. 1 Pr `0 |u = . To evaluate the performance of a recommender Rt , for each |Lt− | user u we compare the top-N recommendation Rt (u) = Static sampling (SS). This strategy favors popular locations, i.e., {`1 , `2 , · · · , `N } with Lu (t,t+∆t] , the actual set of locations vis- locations with many visitors have higher chances to be selected. ited by u during the testing period Ttest t . In the following, we use Specifically, t = 300 for Foursquare and t = 420 for Gowalla; ∆t is set to 60 for both datasets. We omit the experiments on evolving t due rank(`0 ) Pr `0 |u ∝ exp − , λ > 0, to the lack of space. All the results at different t are similar, given λ sufficient data in period (t − ∆t, t]. where rank(·) is the “1234” ranking of Lt− by check-in frequencies; Evaluation metrics. Two metrics are commonly used for lo- smaller numbers are assigned to more popular locations. cation recommendation: Precision@N and Recall@N [26]. Let (t,t+∆t] Adaptive sampling (AS). When sampling, this strategy gives Hut = Rt (u) ∩ Lu be the number of correctly predicted higher chances to those locations with higher scores (i.e., locations locations with regard to user u (i.e., the number of successfully considered as promising by the recommender): predicted locations). Then, P rank(u, `0 ) t Pr `0 |u ∝ exp − , λ > 0, u∈A Hu λ Precision@N = , N · |A| P where rank(u, `0 ) is the “1234” ranking of `0 based on the score t u∈A Hu st−∆t (u, `0 ). Such promising yet unvisited locations may be more Recall@N = P . (t,t+∆t] informative in identifying a user’s preference. u∈A Lu 3.2 Recommendation Aggregation where A is the set of active users. For a user u to be active, she must The last step in our LURA framework is to aggregate the rec- t (i) visit at least 5 new locations in the learning period Tlearn so that ommendations from all the component recommenders, and then LURA can infer her preference, and (ii) visit at least 1 new location provide the user u with a final list of N items (Formula 1). We t in the testing period Ttest so that the evaluation is nontrivial. consider the following two different aggregation strategies. Score-based aggregation (SA). This strategy is to use the scaled 4.2 Different Implementations of LURA score of user-item pair We first study different implementations of LURA, i.e., we com- (u, `) estimated by each recommender (at pare different sampling and aggregation strategies presented in Sec- time t). ϕ Rit (u, `) is defined as tions 3.1 and 3.2, and see how they affect the performance of Rit (u, `) LURA. We have 3 sampling strategies and 2 aggregation strategies, ϕ Rit (u, `) = , i = 1, 2, · · · , n. max`0 ∈Lt− Rit (u, `0 ) thus in total we have 6 different implementations of LURA. Figure 3 shows the comparison result among these 6 implemen- Rank-based aggregation (RA). This strategy considers the tations with respect to varying N . As we can see, the performance ranked position of a location `. Given a ranked list of N locations, of different implementations are nearly the same, with adaptive `1 , `2 , · · · , `N , this strategy assigns higher scores to top locations. sampling (AS) being slightly better than the other two sampling In particular, ϕ(·) is defined as strategies in most of the cases. On the other hand, we observe 1 that the performance of aggregation strategies is data-dependent: ϕ Rit (u, `) = 1 − (ranki (u, `) − 1) , i = 1, 2, · · · , n, on Foursquare, rank-based aggregation (RA) is slightly better than N score-based aggregation (SA) while on Gowalla it is the other way where ranki (u, `) is the ranked position of a location ` in the rec- round. This could be due to the fact that Gowalla has relatively ommendation list provided to u by Rit . This strategy is a common more data for learning user preferences (29,996 check-ins for 1,307 variant of Borda count [8]. active users, comparing to 38,688 check-ins for 3,942 active users in Foursquare), and therefore the final scores are more accurate. 4. EXPERIMENTS For both Foursquare and Gowalla, precision and recall values In this section, we evaluate LURA on Foursquare and Gowalla are all at the level of 1%-10%. Such performance of location rec- datasets. In Section 4.1 we explain the experiment setup as well as ommendation is due to data sparsity, i.e., out of a huge collection the evaluation metrics. Then, in Section 4.2 we study how different of locations (96,002 in Foursquare and 767,936 in Gowalla) most implementations, i.e., different sampling strategies (Section 3.1) people merely visit several to dozens of them. and aggregation strategies (Section 3.2), affect the performance of LURA. After that, in Sections 4.3 and 4.4, we compare the best 4.3 Comparing with the 11 Component Rec- implementations of LURA with (i) their own component recom- ommenders menders, and (ii) other advanced recommenders, respectively. We compare LURA with its 11 component recommenders. We use the best sampling and aggregation strategies for LURA, i.e., 4.1 Setup we use LURA-ASSA and LURA-ASRA based on the findings in The experiments involve three time periods: (i) a base period Section 4.2. The comparison results are shown in Figure 4. t Tbase = [1, t−∆t], in which the data are used for constructing com- Both LURA-ASSA and LURA-ASRA outperform all the other t ponent recommenders; (ii) a learning period Tlearn = (t − ∆t, t], methods in all cases. This justifies that combining different rec- for learning user preferences, updating component recommenders, ommenders may provide better recommendations. Among the 11 7 Foursquare Foursquare 5 Gowalla 5 Gowalla LURA-RSSA LURA-SSRA 12 LURA-RSSA LURA-SSRA LURA-RSSA LURA-SSRA LURA-RSSA LURA-SSRA 6 Precision@N ( ×10−2 ) Precision@N ( ×10−2 ) LURA-RSRA LURA-ASSA LURA-RSRA LURA-ASSA LURA-RSRA LURA-ASSA LURA-RSRA LURA-ASSA Recall@N ( ×10−2 ) Recall@N ( ×10−2 ) LURA-SSSA LURA-ASRA 10 LURA-SSSA LURA-ASRA 4 LURA-SSSA LURA-ASRA 4 LURA-SSSA LURA-ASRA 5 4 8 3 3 3 6 2 2 2 4 1 2 1 1 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 N N N N Figure 3: Comparing different implementations of LURA 7 Foursquare 5 Gowalla 6 4 Precision@N ( ×10−2 ) 5 4 3 3 2 2 1 1 0 5 10 15 20 25 0 5 10 15 20 25 N N 12 Foursquare 5 Gowalla 10 4 Recall@N ( ×10−2 ) 8 3 6 2 4 2 1 0 5 10 15 20 25 0 5 10 15 20 25 N N Figure 4: Comparing LURA with its 11 component recommenders component recommenders of LURA, UCF always performs the we do as what [26] has proposed: we randomly select 70% of the best. This means that the behaviors of most users are best reflected data before time t to construct the three component recommenders: by other similar users. Nonetheless, comparing to UCF, LURA can UCF, FLCF, and PLM, and then use the rest 30% to determine pa- be better in Foursqure by up to 11.82% in precision and 11.72% rameters α and β. The final parameters are α = β = 0.1 for in recall; in Gowalla these numbers are 8.53% and 8.52% respec- Foursquare, and α = 0.1, β = 0.2 for Gowalla. These parame- tively. ters are also consistent with what [26] has suggested. Note that the Considering the small absolute values of precision and recall, we parameter settings are the same for all users, i.e., not personalized. also carry out tests of statistical significance. Specifically, we run iGLSR [30]. This method combines GCF and KDM, i.e., the a paired t-test between the numbers of hits of LURA and UCF, the component recommenders R4 and R9 of LURA, having a clear fo- best-performing component recommender. The p-values are both cus on the geographical relationships between locations. Differ- less than 0.01 (2.99 × 10−4 for Foursquare and 1.79 × 10−4 for ent from USG and LURA which aggregate scores linearly, iGLSR Gowalla), indicating that the improvement of LURA over UCF is adopts the following combination: statistically significant. iGLSR(u, `) = ϕ (R4 (u, `)) · ϕ (R9 (u, `)) , 4.4 Comparison with Other Methods where ϕ(·) is as in USG and LURA. We also compare LURA with other existing methods that adopt RankBoost [9]. This is a method to combine a given set of weak similar ideas to what we use for LURA. recommenders. For a target user u, the training data for RankBoost USG [26]. This is a linear combination of UCF, FLCF, and PLM, is Put = Lt+ × Lt− , as what is fed to LURA (Section 3.1). Rank- i.e., the component recommenders R1 , R3 , and R8 of LURA. Boost takes K iterations to get a “boosted” recommender. During USG(u, `) = (1 − α − β) · ϕ (R1 (u, `)) the k-th iteration, it aims at maximizing the discrimination between Lt+ and Lt− , + α · ϕ (R3 (u, `)) + β · ϕ (R8 (u, `)) , X 0 where α and β are parameters, and ϕ(·) is a rescale of scores as Zk = Dk (`, `0 )eαk ·(ϕ(hk (u,`))−ϕ(hk (u,` ))) , what we use for LURA (Section 3.2). To construct USG at time t, (`,`0 )∈Pu t 7 Foursquare Foursquare 5 Gowalla 4 Gowalla 6 10 Precision@N ( ×10−2 ) Precision@N ( ×10−2 ) Recall@N ( ×10−2 ) Recall@N ( ×10−2 ) 4 3 5 8 4 6 3 2 3 2 4 2 1 1 2 1 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 N N N N Figure 5: Comparing LURA with other existing methods where Dk is a weight distribution over location pairs in Put , mea- 5. RELATED WORK suring how important the pair (`, `0 ) currently is; ϕ(·) is for scaling recommendation scores as introduced in Section 3.2; and hk is the 5.1 Other POI Recommendation Methods selected weak recommender during the k-th iteration. While pur- GPS data from mobile services are also used for location recom- suing this goal, RankBoost estimates a weight βk for the selected mendation. Zhang et al. [33] analyzed GPS trajectories to discover recommender hk and updates the distribution Dk . Eventually, rec- points of interests for recommendation. Zhang et al. [32] studied ommendations are made based on the following scoring function: location-and-activity recommendation using GPS data, where ac- tivities could be various human behaviors such as shopping, watch- K X ing movies, etc. Leung et al. [15] proposed a co-clustering frame- RankBoost(u, `) = βk · ϕ (hk (u, `)) . work for location recommendation based on user-activity-location k=1 tripartite graphs. Some recent studies view location recommenda- tion problem from a perspective of topic modeling (e.g., [17, 27]). Similar to LURA, RankBoost also has two options for ϕ(·): score- In general, with additional information such as location contents, based (SA) and rank-based (RA). In our experiments we consider user-provided tips, etc., these methods build topic models for users both options, i.e., we have RankBoost-SA and RankBoost-RA as and locations, based on which the likelihood of a user visiting a two different implementations of RankBoost. location can be estimated. In addition, Yuan et al. [29] consid- BPRMF [20]. This method also takes a pairwise view of loca- ered time-dependent location recommendation, arguing that the lo- tions, i.e., it considers (`, `0 ) ∈ Put as training samples. Given the cations recommended to a user at lunch time should be different check-in frequency matrix at time t, C t , BPRMF manages to max- from the ones recommended at leisure time. imize the posterior probability Pr Θ Put , where Θ is a comple- tion of the matrix C t (i.e., Θ has an estimation on every unknown 5.2 Recommender Ensemble (u, `) entries in C t ). BPRMF transforms this posterior probability Ensemble techniques have been used to combine several simple using the Bayes’ theorem, and optimizes Pr Put |Θ Pr {Θ} using recommenders to form a stronger one. To name a few, Bar et al. [3] a stochastic gradient descent method. studied different ways to combine simple CF methods, including SBPR [31]. SBPR integrates social information into BPRMF. It bagging, boosting, fusion, and randomness injection; Schclar et divides items into three disjoint sets : positive items I + , negative al. [22] considered AdaBoost regression on a neighbor-based CF items I − , and social items IS . It assumes that items in I + are more method. Experiments on movie ratings data showed that these en- preferable than those in IS , while the ones I − are the least prefer- semble methods had improved performance. Besides, Tiemann and able. Two partial orders are thus used in a BPR-like framework. Pauws [24] considered ensembles of content-based methods, which GeoMF [16]. GeoMF integrates geographical information into succeeded in recommending music and news. weighted MF methods [12]. Specifically, it assumes that a loca- These ensemble methods attempt to minimize the root mean tion may have a Gaussian impact in its neighboring area, which is square error (RMSE) on ratings data. However, as we have em- considered as weights in an MF framework. phasized earlier in this paper, check-in data are implicit, and thus The comparison results are shown in Figure 5. USG and Rank- pursuing numerical estimations of check-in frequencies will essen- Boost consistently perform the best among methods other than tially be a distraction from the goal of location recommendation. LURA. LURA-ASSA and LURA-ASRA are clearly superior to USG and RankBoost. In Foursquare, LURA can outperform the 5.3 (Personalized) Learning to Rank best of USG and RankBoost by up to 7.16% in precision and 6.22% The problem of learning to rank is to determine a ranking among in recall; in Gowalla, these numbers are 7.35% and 8.12% respec- a set of items. Given some training data (e.g., observed user click- tively. Compared to USG which adopts unified parameters for throughs that imply preference over search results), the problem all users, the performance improvement of LURA comes from its is to find a ranking function that best agree with the training data, awareness of users’ preferences over recommenders. which can be done by machine learning techniques (e.g., [4, 5, 13]). Similar to what we have done in Section 4.3, we also did a Recently, Wang et al. [25] and Song et al. [23] studied personalized paired t-test for the numbers of hits of LURA and USG, the best- learning to rank, where personalized rankings were adapted from a performing competitor. The p-values are both less than 0.05 (0.021 global user-independent ranking. for Foursquare and 0.005 for Gowalla), implying that LURA’s im- The problem of (personalized) learning to rank is related to our provement over existing methods is also statistically significant. problem in general, in that they both pursue the best ranking of items. However, our problem is to infer user preference over dif- [15] K. W.-T. Leung, D. L. Lee, and W.-C. Lee. CLR: A ferent recommenders, which are themselves ranking functions over collaborative location recommendation framework based on items (locations); the above-mentioned learning to rank techniques co-clustering. In SIGIR, 2011. are thus not directly applicable to solve our problem. [16] D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui. GeoMF: Joint geographical modeling and matrix 6. CONCLUSION factorization for point-of-interest recommendation. In KDD, In this paper we study the problem of location recommenda- 2014. tion. We first investigate two real-life LBSN datasets, discover- [17] B. Liu, Y. Fu, Z. Yao, and H. Xiong. Learning geographical ing diversities in (i) recommendations generated by representative preferences for point-of-interest recommendation. In KDD, recommenders, and (ii) user preferences over the recommenders. 2013. Based on these observations, we consider personalized location [18] X. Liu, Y. Liu, K. Aberer, and C. Miao. Personalized recommendation by inferring user preferences over multiple rec- point-of-interest recommendation by mining users’ ommenders. We propose a LURA framework to achieve our goal preference transition. In CIKM, 2013. and test it with extensive experiments. The results show that LURA [19] S. Rendle and C. Freudenthaler. Improving pairwise learning achieves significant performance improvements over existing rec- for item recommendation from implicit feedback. In WSDM, ommendation methods. In the future, we plan to investigate other 2014. possible ways to aggregate recommenders. [20] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: Bayesian personalized ranking 7. ACKNOWLEDGMENTS from implicit feedback. In UAI, 2009. The authors would like to acknowledge the support from Hong [21] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based Kong RGC (Grant No. HKU715413E) and the National Natural collaborative filtering recommendation algorithms. In WWW, Science Foundation of China (Grant No. 61432008). 2001. [22] A. Schclar, A. Tsikinovsky, L. Rokach, A. Meisels, and L. Antwarg. Ensemble methods for improving the 8. REFERENCES performance of neighborhood-based collaborative filtering. [1] G. Adomavicius and A. Tuzhilin. Toward the next generation In RecSys, 2009. of recommender systems: A survey of the state-of-the-art [23] Y. Song, H. Wang, and X. He. Adapting deep RankNet for and possible extensions. TKDE, 17(6):734–749, 2005. personalized search. In WSDM, 2014. [2] J. Bao, Y. Zheng, and M. F. Mokbel. Location-based and [24] M. Tiemann and S. Pauws. Towards ensemble learning for preference-aware recommendation using sparse geo-social hybrid music recommendation. In RecSys, 2007. networking data. In SIGSPATIAL GIS, 2012. [25] H. Wang, X. He, M.-W. Chang, Y. Song, R. W. White, and [3] A. Bar, L. Rokach, G. Shani, B. Shapira, and A. Schclar. W. Chu. Personalized ranking model adaptation for web Improving simple collaborative filtering models using search. In SIGIR, 2013. ensemble methods. In MCS, 2013. [26] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee. Exploiting [4] C. Burges, R. Ragno, and Q. V. Le. Learning to rank with geographical influence for collaborative point-of-interest nonsmooth cost functions. In NIPS, 2007. recommendation. In SIGIR, 2011. [5] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, [27] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen. LCARS: A N. Hamilton, and G. Hullender. Learning to rank using location-content-aware recommender system. In KDD, 2013. gradient descent. In ICML, 2005. [28] J. J.-C. Ying, E. H.-C. Lu, W.-N. Kuo, and V. S. Tseng. [6] E. Cho, S. A. Myers, and J. Leskovec. Friendship and Urban point-of-interest recommendation by mining user mobility: User movement in location-based social networks. check-in behaviors. In UrbComp, 2012. In KDD, 2011. [29] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann. [7] Y. Ding and X. Li. Time weight collaborative filtering. In Time-aware point-of-interest recommendation. In SIGIR, CIKM, 2005. 2013. [8] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank [30] J.-D. Zhang and C.-Y. Chow. iGSLR: Personalized aggregation methods for the web. In WWW, 2001. geo-social location recommendation - a kernel density [9] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient estimation approach. In SIGSPATIAL GIS, 2013. boosting algorithm for combining preferences. JMLR, [31] T. Zhao, J. J. McAuley, and I. King. Leveraging social 4:933–969, 2003. connections to improve personalized ranking for [10] H. Gao, J. Tang, and H. Liu. gSCorr: Modeling geo-social collaborative filtering. In CIKM, 2014. correlations for new check-ins on location-based social [32] V. W. Zheng, Y. Zheng, X. Xie, and Q. Yang. Collaborative networks. In CIKM, 2012. location and activity recommendations with GPS history [11] M. Halvey, P. Punitha, D. Hannah, R. Villa, F. Hopfgartner, data. In WWW, 2010. A. Goyal, and J. M. Jose. Diversity, assortment, dissimilarity, [33] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining variety: A study of diversity measures using low level interesting locations and travel sequences from GPS features for video retrieval. In ECIR, 2009. trajectories. In WWW, 2009. [12] Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In ICDM, 2008. [13] T. Joachims. Optimizing search engines using clickthrough data. In KDD, 2002. [14] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In SIGIR, 2009.