=Paper= {{Paper |id=Vol-1405/paper-03 |storemode=property |title="Where Far Can Be Close": Finding Distant Neighbors In Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-1405/paper-03.pdf |volume=Vol-1405 |dblpUrl=https://dblp.org/rec/conf/recsys/KumarJAKH15 }} =="Where Far Can Be Close": Finding Distant Neighbors In Recommender Systems== https://ceur-ws.org/Vol-1405/paper-03.pdf
     “Where Far Can Be Close”: Finding Distant Neighbors In
                    Recommender Systems

           Vikas Kumar Daniel Jarratt Rahul Anand Joseph A. Konstan Brent Hecht
                                                          GroupLens Research
                                                        Dept. of Computer Science
                                                    University of Minnesota, Twin Cities
                                                             Minneapolis,USA
                                  {vikas, jarratt, anand037, konstan, bhecht}@cs.umn.edu

ABSTRACT                                                                rather than across them produces the lowest error. Similarly,
Location and its corollary, distance, are critical concepts in          partitioning users has resulted in better predictive accuracy
social computing. Recommender systems that incorporate                  than the full model [18, 25].
location have generally assumed that the utility of location-              Location-aware recommenders, for example, have used ge-
awareness monotonically decreases as entities get farther               ographic information as a filter to narrow the user-item rat-
apart. However, it is well known in geography that places               ing space for improved performance. LARS [14] partitions
that are distant “as the crow flies” can be more similar and            users into geographic grid squares: an assumption that users
connected than nearby places (e.g., by demographics, expe-              within a contiguous, compact grid square are more alike
riences, or socioeconomic). We adopt theory and statistical             than users not in that square. Other location-aware recom-
methods from geography to demonstrate that a more nu-                   menders [28, 1] assume similarity is proportional to straight-
anced consideration of distance in which “far can be close” –           line (geodesic) distance.
that is, grouping users with their “distant neighbors” – mod-              This paper extends that approach by recognizing
erately improves both traditional and location-aware rec-               that location-aware recommenders can incorporate non-
ommender systems. We show that the distant neighbors                    proximate – distant, non-contiguous, and non-compact –
approach leads to small improvements in predictive accu-                geographies into their recommendation models. Current
racy and recommender utility of an item-item recommender                location-aware recommenders are grounded in Tobler’s First
compared to a “nearby neighbors” approach as well as other              Law of Geography [22] (TFL): “everything is related to ev-
baselines. We also highlight an increase in recommender                 erything else, but near things are more related than distant
utility for new users with the use of distant neighbors com-            things”. However, Tobler emphasizes that it is a rule-of-
pared to other traditional approaches.                                  thumb rather than a law, and urges researchers to con-
                                                                        sider more sophisticated notions of distance (e.g., population
                                                                        density-controlled distance, border-aware distance, socioeco-
Keywords                                                                nomic distance). Recent work in peer production communi-
location-aware recommendations, user clustering, distant                ties finds evidence for different ways of calculating distance
neighbors                                                               in regards to the First Law [7]. We do this by considering
                                                                        “ratings preference distance” between locations, combined
1.     INTRODUCTION                                                     with geodesic distance.
                                                                           We use this non-trivial notion of distance to determine
   Collaborative filtering starts with the assumption that              similar locations (specifically postal codes: e.g., 19104 in
past agreement is predictive of future agreement. Separat-              Philadelphia, Pennsylvania, is most similar to 61801 in Ur-
ing recommenders into user and item domains, within which               bana, Illinois) and partition users based on their associated
users’ agreement is more predictive, increases the extent to            location, then build collaborative filtering models for each
which that core assumption is true. For this reason, we                 partition. We demonstrate that general collaborative filter-
do not build recommender systems that mix movies, base-                 ing systems – even those that consider items that do not
ball cards, and sweaters into a single model. Konstan et                have a unique spatial footprint, such as movies – can benefit
al. [11] give empirical evidence that partitioning user-user            from an understanding of user geography, and specifically
recommenders by item domain improves predictive accu-                   an understanding that goes beyond a “closer = more rele-
racy, showing that recommending within Usenix newsgroups                vant” assumption. We provide the first evidence that simi-
                                                                        larity of locations (based on ratings) does not monotonically
                                                                        decrease with geodesic distance, meaning that locations of-
                                                                        ten have “distant neighbors”. We demonstrate that cre-
                                                                        ating a user space based on location similarity moderately
                                                                        improves predictive accuracy and recommender utility, rela-
                                                                        tive to baseline approaches and location-aware recommender
                                                                        systems that rely just on proximate users.
                                                                           At a high level, this paper demonstrates that we
Copyright held by the author(s).
                                                                        can harness geographic techniques to improve rec-
LocalRec’15, September 19, 2015, Vienna, Austria.
ommender systems by limiting the recommender                       “just another feature” recommenders that treat location as
model to groups of users in similar, though not nec-               profile features but do not apply geographic methods.
essarily adjacent, locations. Intuitively, the argument
we make here for partitioning recommender systems based             2.2.1    “Nearby neighbors” filtering systems
on users rests on whether one can find sets of users who              When recommenders are aware of location (i.e., from min-
have significantly different “views of the world” that make        ing GPS tracks [15] or asking a person for her postal code
past agreement less predictive of future agreement. We can         [20]), the systems use that context to personalize sugges-
think of this in the context of item-item collaborative filter-    tions to those relevant to nearby users or items [1]. Some
ing, where we build a model of item-similarity (rating corre-      location-aware recommenders [9, 27] operate within domains
lations) of item association (item co-rating or co-purchase).      of items which have specific spatial footprints [6] (such as
If items are considered related by one set of users, but not by    restaurants, people, or homes). For instance, recommenders
others (e.g., if Belgians see french fries associated with may-    meant for mobile queries can calculate an area around a per-
onnaise while Americans see them associated with ketchup           son where her query remains valid [14][28] if, for instance,
but not mayonnaise), then an item model built across these         she requires updated coffee shop recommendations as she
diverse groups of users may be less effective at recommen-         moves from neighborhood to neighborhood. Some point-of-
dation than one based on partitions of users. Based on this        interest recommenders [3] use co-visitation rates (implicitly
intuition, we look at partitions based on location similar-        geographic).
ity by forming groups of distant neighbors, and make the              Other recommenders operate in domains of items which
following contributions (compared to an item-item recom-           do not have specific spatial footprints, such as movies or
mender that uses no partitioning, a geographic partitioning        consumer products. Levandoski et al. [14] tessellate users
based only on local neighbors, and a random-partitioning           into geographic grid cells, which is discussed in detail below.
baseline):                                                         Similarly, Das et al. [5] apply Voronoi tessellation, a strictly
                                                                   “nearby neighbors” method.
     1. We show improvement in prediction accuracy using
                                                                      The primary motivation for this work is that of Levan-
        partitioning on similar locations (“distant neighbors”).
                                                                   doski et al. [14], who introduce the “Location-Aware Rec-
     2. In new user cold start, we show improvement in rec-        ommender System” (LARS). Like our work, LARS is a par-
        ommender utility for Top-N lists.                          titioning process that occurs before collaborative filtering
                                                                   model-building. LARS lays a hierarchy of grids over the
2.     RELATED WORK                                                surface of the Earth at various levels of granularity, so that
                                                                   each user belongs to one geographically contiguous and com-
   This paper is motivated by research in two disciplines:
                                                                   pact square. LARS then creates an item-item recommender
geography and recommender systems. Below we outline re-
                                                                   model for each grid cell, suggesting movies – items without
lated work in each of these areas.
                                                                   specific spatial footprints – based on the ratings of other
2.1      Tobler’s First Law                                        people in the same grid cell. Its authors find an inflection
                                                                   point where quality of recommendations is maximized: a 4x4
  Current use of geography in collaborative filtering is al-
                                                                   grid where users are partitioned into 16 geographic squares.
most always under the assumption that the most valuable
                                                                   Smaller squares have too little data; larger squares include
ratings come from nearby people – an assumption grounded
                                                                   too many less-similar users. LARS thereby encodes physical
in the First Law of Geography (or “Tobler’s First Law of
                                                                   proximity as a proxy for similarity, and is the first paper
Geography”, 1970) [22]. In other words, the recommender
                                                                   to contribute geographic user partitioning for collaborative
systems community has used distance decay as the rele-
                                                                   filtering. Our contribution contra LARS is that geographic
vant property of location. But as noted by Tobler in 2004
                                                                   user partitions need not be contiguous nor a compact space.
[21], deviations from pure distance decay, such as population
density, are common. Tobler’s 2004 assertion has received           2.2.2    Location as “just another feature”
empirical support in the work of Li et al. [16] and Hecht
                                                                      Still other work in general recommender research treats
and Moxley [7], which found that the relatedness between
                                                                   location information as “just another feature”. Seroussi et
geographic entities in Wikipedia generally decreases with
                                                                   al. [20] use standard matrix factorization techniques to find
geodesic distance, but that (as advocated by Tobler himself)
                                                                   latent relevant features in a person’s preferences, with U.S.
we need to consider more sophisticated notions of distance to
                                                                   state names drawn from postal codes as possible features.
more completely model the relationship between relatedness
                                                                   Similarly, Kucuktunc et al. [12] match postal codes with
and distance. Our distant neighbors approach uses “ratings
                                                                   census data, inferring demographics to match users in a
preference distance” in combination with geodesic distance:
                                                                   question-and-answer site. We see promise in Kucuktunc’s
we show evidence that people do have similarity with peo-
                                                                   method, which uses postal codes as a way to guess demo-
ple around them, but that at a certain geographic scale (for
                                                                   graphic tags for users. Their method is based on Weber
which we examine postal codes), they are just as similar
                                                                   and Castillo’s demographic analysis which is meant to “de-
to a subset of distant people. This paper is the first, to
                                                                   personalize and de-localize” location information [24], yet
our knowledge, to show that an understanding of TFL that
                                                                   demographics are themselves geographically clustered [4],
combines ratings preference similarity and geodesic distance
                                                                   resulting in a “distant neighbors” type of approach through
can lead to increased accuracy in recommendations.
                                                                   a demographic step. This approach has the benefit of ac-
2.2      Location-aware recommendation                             counting for locations a recommender has never before en-
                                                                   countered – “location cold start” – since demographics are
   Location-aware recommender systems consider location in
                                                                   often widely available.1
two ways: (1) “nearby neighbors” filtering systems that ex-
                                                                   1
plicitly or implicitly encode a strict form of TFL and (2)             www.ipums.org
                          Figure 1: Illustration of Distant Neighbors Partitioning Approach


3.     DATASET                                                   users’ ratings for the same item. Based on previously pub-
   We use movie ratings from the MovieLens rating commu-         lished work on MovieLens data [10], we use the Bayesian
nity2 , whose users have the option of entering their postal     average [26] with a damping value of 5, to dampen the ef-
code while signing up for the site. In MovieLens, users can      fect of ratings of movies rated by only few users in a postal
rate movies on a rating scale from 0.5 to 5.0 in increments of   code.
0.5. For our experimental dataset, we select only those users       In the above process, for example, we determine the Toy
who entered a valid US postal code (which is about 22% of        Story-15213 postal code (Pittsburgh, Pennsylvania) rating
all users). To ensure data density, we further exclude postal    from the ratings of all users in that postal code who rated
codes that do not have at least 20 MovieLens users who           Toy Story.
rated at least 20 movies. We finally randomly sample 5000           We then cluster locations in the location-item rating ma-
users to create a test dataset, resulting in 1779 postal codes   trix using spectral clustering.5 After each process that gen-
and 1.01 million ratings.3 The rating count is equivalent to     erates C clusters of postal codes, we partition the original
the extensively-used publicly available MovieLens dataset of     user-item rating matrix into C partitions. Recall that each
1 million ratings.4                                              user belongs to a single postal code and hence only to one
                                                                 cluster. For a cluster count C = 1, the user-item rating ma-
                                                                 trix remains the same as the traditional full model used for
4.     PARTITIONING APPROACHES                                   recommendation. The process is illustrated in Figure 1.
  We consider three partitioning approaches – distant neigh-        Why Distant neighbors?
bors, local neighbors and random – to partition the user item       To understand how postal codes reflect rating preference
rating matrix.                                                   distance compared to geodesic distance, we use geostatis-
                                                                 tics. The correlogram, a geostatistical tool, helps us under-
4.1      Distant neighbors partitioning                          stand the correlation between geodesic distance and similar-
  Distant neighbors refers to users in a cluster of postal       ity among places [17]. We show this correlation in Figure
codes based on rating similarity. In this section, we explain    2, where on the x-axis we plot pairwise geodesic distance
how we identify distant neighbors from a user-item rating        between postal codes (lag = 100 km) and on the y-axis, the
dataset. We then use a traditional collaborative filtering       average rating similarity of postal codes (separated by the
recommender on each of those clusters. Partitioning of the       corresponding distance on the x-axis). To determine sim-
user-item matrix into distant neighbors clusters occurs in       ilarity we compare the rating vectors for each postal code
three stages:                                                    using (a) cosine similarity, (b) Spearman’s rank correlation,
     1. Convert the user-item rating matrix to a location-item   and (c) Jaccard index. We also evaluate (d) cosine simi-
        rating matrix                                            larity on postal code’s item-popularity vectors instead. As
                                                                 an example in Figure 2, we see that for postal codes sep-
     2. Determine location-to-location similarity                arated by 1000-1100 km, their average cosine similarity is
                                                                 approximately 0.28.
     3. Cluster locations to form groups of users associated        Now, going back to the definition of Tobler’s First Law
        with their respective locations                          based on purely geodesic distance leads to the assumption
                                                                 of the similarity of postal codes monotonically decreasing as
   We create a location-item rating matrix from the user-
                                                                 the straight-line distance between them increases. However,
item rating matrix by matching postal codes with their re-
                                                                 as shown in Figure 2, it is immediately obvious that To-
spective user identifiers. We then determine the rating of
                                                                 bler’s First Law in its geodesic distance interpretation does
a postal code for an item by averaging over the location’s
                                                                 not hold in the movie domain. While nearby postal codes
2
  www.movielens.org
3                                                                5
  This dataset is public at http://cs.umn.edu/∼vikas               We find that 20 features and 50 iterations for a given cluster
4
  www.grouplens.org/datasets/movielens                           results in better inter-cluster and intra-cluster density.
                                                                 the successively nearest locations (geodesic) until a given
                                                                 number of users is reached (matching with the number of
                                                                 users in the corresponding distant neighbors clusters). We
                                                                 use this method to compare our approach to location-aware
                                                                 recommenders that are based on location proximity instead
                                                                 of similarity.

                                                                 4.3   Random partitioning
                                                                    As a baseline, we use random partitions where each par-
                                                                 tition consists of a given number of users (matched with
                                                                 the number of users in the corresponding distant neighbors
                                                                 cluster), randomly selected from the original user-item rat-
                                                                 ing matrix.

                                                                 5.    BUILDING RECOMMENDATIONS
                                                                    We use the item-item collaborative filtering approach with
                                                                 log-likelihood similarity [19] for the prediction and evalua-
                                                                 tion of the different partitioning approaches on recommen-
                                                                 dation. We build models for each partitions (or clusters)
                                                                 independently i.e. if a partitioning approach creates N user-
                                                                 item matrices then we build N recommendation models, one
                                                                 for each matrix. For item recommendations to a user we se-
                                                                 lect her respective partition (determined by her postal code)
                                                                 and use the respective model.
                                                                    Note, it is possible that with many clusters we may have
                                                                 small user and item proportions per cluster. In such cases,
                                                                 a recommender may fail to predict or recommend result-
                                                                 ing in lower number of successful predictions thus affecting
                                                                 the coverage of the recommender. For example, item-item
                                                                 collaborative filtering fails to predict rating of an item for
                                                                 a given user, if that item has no correlation with any other
                                                                 item that user has rated. Hence, we compare the accuracy of
                                                                 different partitioning approaches with respect to their cov-
                                                                 erage. Using fallback predictions, we also compare the ac-
                                                                 curacy of each partitioning approach at 100% coverage i.e.
                                                                 whenever recommender fails to predict we either use (a) the
                                                                 item mean rating, if item exists in the partition’s training
                                                                 set, or (b) the global mean rating if item does not exist in
  Figure 2: Correlograms: Distance Vs Similarity
                                                                 the partition’s training set.

are more similar than slightly more distant postal codes, av-    6.    EVALUATION
erage similarity does not decrease as distance increases. In        At a high level, our methodology involves three stages.
fact, there are several points where the average similarity of   First, we generate user-item rating matrix partitions in sev-
locations separated by thousands of kilometers exceeds the       eral sizes for two approaches, distant neighbors and random.
average similarity of nearby locations.                          For local neighbors, we build user-item rating matrices in
  Thus, recommender systems that only consider geodesic          several sizes for each postal code, which we call “partitions”
distance using (often) the correlogram’s first bucket(s) ig-     for convenience. Next, for each partition, we build an inde-
nore the higher average similarity displayed by distant postal   pendent item-item recommender. Finally, we evaluate pre-
code pairs. If these systems increase their geodesic distance    diction accuracy using RMSE (root mean squared error) and
threshold in an attempt to include more users, the newly-        recommendation accuracy with MAP (mean average preci-
added nearby locations may actually have lower average           sion). We compare the performance of each approach to that
similarity than distant locations.                               of an item-item recommender built using the full rating ma-
  We reasoned that perhaps the nationwide popularity of          trix. We evaluate the techniques on four different numbers
many movies causes the average similarity of distant postal      of disjoint clusters – 1, 5, 20 and 50 – with user proportion
codes to increase. To test our hypothesis, we re-evaluate the    sizes of 100%, 20%, 5% and 2% respectively.
correlograms ignoring the most popular 20% of movies and            For RMSE evaluation, we use 90% of ratings for each user
found the same result.                                           in the training data and keep 10% of their ratings in the
                                                                 test set. In Top-N evaluation, for each user we build an
4.2   Local neighbors                                            evaluation set of relevant items [23]: those rated 4 (of 5)
   For each postal code, we build a “local neighbors” user-      or higher, following [10]. We then keep only 80% of that
item matrix, which is a subsample of the entire dataset.         user’s rating in the training data with all other users’ rat-
This consists of the location’s own users, plus users from       ings. We repeat this process for a random sample of 10% of
users in the dataset and take the average of MAP for Top-50       method is able to beat the full model RMSE. Moreover, the
recommendations (MAP@50) over this sample of users.               RMSE gets better with fallback predictions than the RMSE
                                                                  without fallback predictions. This suggests that the item
6.1    Results                                                    baseline predictor is better than the full model in cold start;
   The distant neighbors approach with 50 clusters shows          we find this result consistent with the results from Kluver
the best RMSE among all other partitioning approaches and         et al. [10] on evaluation of item-item for new users.
number of clusters, including the full model (“1 partition”)        For recommendation utility, we observe local neighbors to
recommender, shown in Table 1. We find the improvement            have better MAP@50 for 20 and 50 clusters compared to
against the full model to be statistically significant (p <       any other partition or the full model. Distant neighbors is
0.001, per user using the Wilcoxon Rank-Sum test). We             able to do better than other partitions only for 5 clusters. In
also note that for any given cluster count, distant neigh-        contrast, with fallback predictions for all number of clusters,
bors have more accurate predictions than other partitioning       we observe that distant neighbors is able to produce better
approaches i.e. local neighbors and random partitions; how-       MAP@50 results (0.1972 with 50 clusters) with improvement
ever, we find the difference between them to be statistically     over local neighbors (0.14755), random (0.09753) and the full
significant only for 20 clusters.                                 model (0.0533).
   The distant neighbors approach’s low coverage of 88.75%
compared to the full model’s 99.87% means that we cannot          8.   DISCUSSION
directly imply better performance. We note that at 20 and            In this section, we summarize the results in light of the
50 partitions, distant neighbors has higher coverage than         assumptions and the limitations of our approach. We further
local neighbors and random partitioning. Controlling for          highlight the key takeaways and some anecdotes from our
100% coverage using fallback predictions, we observe that         dataset.
distant neighbors at 20 clusters remains statistically signifi-      We show that by controlling for coverage, grouping users
cantly more accurate than full-model item-item (p < 0.05),        based on their location similarity – distant neighbors –
but the other two partitioning methods are no longer signif-      shows significant improvement in prediction accuracy over
icantly different than the full model.                            other partitioning approaches for 20 clusters. In cold start,
   For recommendation metrics, distant neighbors shows sig-       though, we notice that partitioning fails to improve pre-
nificantly better MAP@50 for cluster sizes of 20 and 50,          diction accuracy (RMSE). However, as Kluver et al. [10]
especially over the full model. We hypothesize here that          states that recommendation utility is more important for
due to very small user proportions (and therefore items)          new users, we perhaps regard distant neighbors better mean
per cluster, the recommendations are limited to a specific        average precision a positive result.
set of items liked by most of the users in that cluster. To          Also, we note that the Top-N metrics, like MAP, have
understand this, we calculate the intra-list similarity (or di-   popularity bias [10]. Mean average precision will be higher
versity) between the items, using the full training data, and     for any algorithm that favors popular items. To understand
find a much lower diversity using any partitioning method         if such bias exists in our results, we look at popularity of the
compared to the full model.                                       recommended items. We find that the local neighbors rec-
                                                                  ommender favors relatively more popular items than distant
7.    NEW USER COLD START EVALUA-                                 neighbors, which recommends more popular items than ran-
                                                                  dom, and than the full model. We calculate popularity based
      TION                                                        on the number of users who rated the item in the full matrix.
  New users are a crucial part of a recommender system’s          We also determine the average user rating for the items for
success, and this problem forms an important part of sys-         the cold start situation to understand the relevance of rec-
tem design decisions. Previous work has used location in-         ommendation, by taking the mean of ratings by the user
formation to solve the cold start problem [2], because user       for the relevant items in the recommendation list. We find
context data can substitute for user rating data. We ana-         that distant neighbors consistently provides recommenda-
lyze the effect of our location-aware approach by simulating      tions that have higher user average ratings: 4.57, compared
the new user experience. Here we randomly sample 10% of           to local neighbors (4.25) and random (4.23). We therefore
users from the training data (that contains 90% of ratings        find the distant neighbors performance on Top-N metrics to
for each user) and retain only a few ratings for those users      be a positive result.
in the training data – 5 ratings per user – and discard other        We consider these results as a proof of concept. We note
ratings for that user [10]. We evaluate the metrics on the        that the magnitude of the improvements is small, even if sta-
test data containing only these users.                            tistically significant. We recognize that such small decreases
                                                                  in RMSE are unlikely to, in and of themselves, result in a
7.1    Results                                                    significant change in user experience. Rather, we hope to see
   In case of cold start with no fallback predictions, we ob-     further development and optimization of geographic similar-
serve distant neighbors with 20 clusters to have the statis-      ity into recommender systems with the promise of further
tically significantly best predictive accuracy against the full   improvements in performance. We also note that the Top-N
model (p < 0.05, significance per user). However, we find         results show even more promise, and deserve future user-
this difference to be not significant against local neighbors     centered evaluation.
based partitions for the same clusters.                              To further interpret our results we revisit our intuition
   The coverage, like the previous evaluation, remains low        of user worldviews that can manifest in rating space. By
for larger number of clusters. With fallback predictions,         “worldviews” we mean the agreement among a subset of
although distant neighbors has better RMSE compared to            users who have significantly different “views of the world”
local neighbors and random (not significant), no partition        compared to other subsets. We see empirical evidence of how
                                     No fallback predictions                       With fallback predictions (100% coverage)
 Partitions      Distant neighbors       Local neighbors        Random         Distant neighbors Local neighbors      Random

                                         0.98253 RMSE
     1                                                                                          0.9875 RMSE
                                       (99.87% coverage)
                                                                                                   0 MAP
                                             0 MAP

                     0.98076                  0.98206            0.98124
     5                                                                               0.9822          0.98131          0.98073
                     (98.81%)                (98.68%)           (99.12%)
                                                                                        0            0.02632             0
                         0                       0                  0


                     0.96914                  0.97346            0.97063
    20                                                                              0.98131           0.98367         0.98234
                    (96.34%)                 (95.82%)           (96.26%)
                                                                                    0.03849           0.02381         0.02042
                     0.03846                  0.03015            0.03014


                     0.96774                  0.96946            0.96812            0.98996           0.99461         0.99373
    50              (88.75%)                 (87.57%)           (87.27%)            0.06639           0.0407          0.04532
                     0.13854                  0.06105            0.03107


Table 1: Evaluation of partitioning approaches. The best results for each partition count per coverage
condition are in bold. The best results in each coverage condition are underlined.




                                     No fallback predictions                       With fallback predictions (100% coverage)
  Partitions     Distant neighbors      Local neighbors        Random          Distant neighbors Local neighbors      Random

                                         1.09396 RMSE
      1                                                                                        1.07119 RMSE
                                       (99.87% coverage)
                                                                                                0.0533 MAP
                                          0.0533 MAP

                     1.09025                 1.08774           1.08122
      5                                                                            1.08078           1.07376          1.07596
                    (99.58%)                (98.85%)           (98.61%)
                                                                                    0.125            0.10346          0.09515
                     0.05309                 0.03452            0.0303

                       1.09                 1.09278             1.10356
     20                                                                            1.08069           1.08502          1.08672
                     (95.36%)              (95.74%)            (94.74%)
                                                                                   0.13668           0.09609           0.097
                      0.11268               0.1241              0.05956


                     1.10404                 1.10414            1.11138            1.07392           1.0867           1.07834
     50             (92.46%)                (91.36%)           (91.04%)             0.1972           0.14755          0.09753
                     0.08049                0.15347             0.11289


Table 2: Evaluation of partitioning approaches in simulated cold start. The best results for each partition
count per coverage condition are in bold. The best results in each coverage condition are underlined.




                       Place                    Most Similar            2nd Most Similar         3rd Most Similar
              Jersey City, NJ 07087        San Francisco, CA 94104   Redwood City, CA 94063   Bala Cynwyd, PA 19004
              Philadelphia, PA 19104          Urbana, IL 61801        New York, NY 10003       Cambridge, MA 02139
              Ann Arbor, MI 48104          Minneapolis, MN 55455       Brooklyn, NY 11215     Minneapolis, MN 55414
               Palo Alto, CA 94305            Chicago, IL 60614         Urbana, IL 61801       Ann Arbor, MI 48103

                                Table 3: Most similar locations for selected US postal codes
college towns are more similar to other college towns than       proach may work less well considering individual places (say,
to other postal codes in their own city, and hypothesize that    a specific restaurant) but may work just as well considering
people on a particular college campus may share an under-        place features (say, a cuisine that applies to many restau-
standing of concepts (for example: registration, fraternities,   rants).
and late-night pizza) with people in places far apart. For          Location-aware recommenders should be able to use the
instance, as seen in Table 3, the postal code 48104 (repre-      context of location to explain their suggestions: e.g., that
senting a university campus in Ann Arbor, Michigan) is most      people in a similar location offered relevant opinions. Fu-
similar to 55414, another university campus in Minneapolis,      ture work involves exploring the value of location-based ex-
Minnesota, about 900 km away. We further hypothesize that        planations over those focused on other dimensions of simi-
similarity among users is not primarily based on straight-       larity. Herlocker et al. find that 86% of recommender sys-
line distance but on a more nuanced distance – cultural or       tem users value additional explanations [8], and location-
socioeconomic – that manifests in a community consensus          based partitions are explainable. Placenames are immedi-
of agreement on item likeness, in the same way that the se-      ately available, and demographic inference (e.g., “another
mantic relatedness of spatially-referenced Wikipedia articles    college town”) is possible.
is not fully consistent with geodesic promiximity [7].              Finally, although we chose postal codes as the unit of spa-
   We also believe that clusters based on location provide ad-   tial granularity in our system, location is defined as part of
vantages over clusters based directly on users’ ratings [18]:    a hierarchy [3] and we could have been more or less precise.
because (a) the number of postal codes is few compared to        In an area of the world with smaller nations and stricter
the number of users, an important factor when clustering         cultural borders – say, Europe or Central America – we may
within a very large rating matrix; (b) the exogenous infor-      find the applicable scale is at the level of nations and lan-
mation from location can help explain the clusters.              guages. Spatial granularity may also be inconsistently scaled
                                                                 within the same dataset: if speaking German is predictive,
                                                                 it may be at the national level in Europe, regional level in
9.    LIMITATIONS                                                the United States, and city level in South America.
   An important limitation of our work is that our exper-
iment considered only United States postal codes. The
United States has a large variety of cultures and demograph-     11.   CONCLUSION
ics spread throughout the country. Different countries may          We demonstrate that we can harness geographic tech-
display different properties.                                    niques to improve recommender systems by limiting the rec-
   The other limitation is the low user density within postal    ommender model to groups of users in similar locations. We
codes used for evaluation. We find 1779 postal codes asso-       show similar locations are determined by a combination of
ciated with 5000 users which means we have on average 3          ratings preference and geodesic distance, an understanding
users per postal code. Recognizing this limitation, we per-      that goes beyond a “closer = more relevant” assumption. We
formed a predictive accuracy experiment on an equivalent         provide the first evidence that rating similarity of locations
sized dataset by picking postal codes (which has its own         does not monotonically decrease with geodesic distance, and
bias of oversampling users from dense locations) and found       describe a geostatistical method to discover “distant neigh-
RMSE results were consistent.                                    bors”. We show that creating a user space based on location
   In addition, our experiment considered only a Bayesian        similarity improves predictive accuracy and recommender
average to calculate postal codes’ item rating midpoints.        utility, including the new user cold start scenario and con-
Different similarity methods, such as Kullback–Leibler di-       trolling for full coverage.
vergence [13] of rating distributions may provide better un-        Over-relying on the opinions of nearby places can lead to
derstanding of similarity between two places.                    the inclusion of proximate but less similar people, whereas
   While we calculate local neighbors by proximity, LARS         the power of distant neighbors in recommender systems is
tessellated users into geographic grid squares. One of LARS’     smart partitioning of the underlying dataset to include the
major contributions was spatial data structure optimization      right users among all those far away.
for best storage and recommendation performance. The use            We urge a reconsideration of geographic relationships in
of a spatial database is out of the scope of our paper.          recommender systems, where the value of geography goes
   Finally, MovieLens users enter postal codes at registration   beyond geodesic distance.
and rarely update that information, even if they move.

10.   FUTURE WORK                                                12.   NOTES
   There is significant complexity around what kind of            Our code and datasets, as well as results for additional
location-aware partitioning may improve recommendations.         metrics, are available at http://cs.umn.edu/∼vikas.
Other methods might include ESRI’s Tapestry dataset,
which groups postal codes not on item rating similarity but
with “lifestyle” – that is, demographic and socioeconomic –      13.   ACKNOWLEDGMENTS
weighted tags. For example, the top lifestyle segments for          This work was supported in part by National Science
94043 (Mountain View, California) are Enterprising Profes-       Foundation (IIS-0808692 and IIS-0964695) and the Uni-
sionals (38%), Trendsetters (14%), and Laptops and Lattes        versity of Minnesota’s Social Media and Business Analyt-
(10%).                                                           ics Collaborative (SOBACO). We also would like to thank
   We are interested in how this process applies to item do-     Daniel Kluver (University of Minnesota) and Michael Ek-
mains other than movies, especially for items with specific      strand (Texas State University) for their continuous feed-
spatial footprints. We intuit that a distant neighbors ap-       back and input in the paper.
14.   REFERENCES                                                    in natural language processing to better understand
 [1] G. Adomavicius and A. Tuzhilin. Context-aware                  Tobler’s first law of geography. In Proceedings of the
     recommender systems. In Recommender systems                    2014 SIGSPATIAL Conference. ACM, 2014.
     handbook, pages 217–253. Springer, 2011.                  [17] N. L. Oden. Assessing the significance of a spatial
 [2] J. Bao, Y. Zheng, and M. F. Mokbel. Location-based             correlationogram. Geographical Analysis, 16(1):1–16,
     and preference-aware recommendation using sparse               1984.
     geo-social networking data. In Proceedings of the 20th    [18] A. M. Rashid, S. K. Lam, G. Karypis, and J. Riedl.
     International Conference on Advances in Geographic             ClustKNN: a highly scalable hybrid model- &
     Information Systems, pages 199–208. ACM, 2012.                 memory-based CF algorithm. Proceeding of WebKDD,
 [3] J. Bao, Y. Zheng, D. Wilkie, and M. Mokbel.                    2006.
     Recommendations in location-based social networks: a      [19] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
     survey. GeoInformatica, pages 1–41, 2015.                      Item-based collaborative filtering recommendation
 [4] B. Bill. The Big Sort: Why the Clustering of                   algorithms. In Proceedings of the 10th international
     Like-Minded America Is Tearing Us Apart. New York:             conference on World Wide Web, pages 285–295. ACM,
     Houghton Mifflin Company, 2008.                                2001.
 [5] J. Das, S. Majumder, and P. Gupta. Voronoi based          [20] Y. Seroussi, F. Bohnert, and I. Zukerman.
     location aware collaborative filtering. In 3rd National        Personalised rating prediction for new users using
     Conference on Emerging Trends and Applications in              latent factor models. In Proceedings of the 22nd ACM
     Computer Science (NCETACS), pages 179–183. IEEE,               conference on hypertext and hypermedia, pages 47–56.
     2012.                                                          ACM, 2011.
 [6] K. E. Grossner, M. F. Goodchild, and K. C. Clarke.        [21] W. Tobler. On the first law of geography: A reply.
     Defining a digital earth system. Transactions in GIS,          Annals of the Association of American Geographers,
     12(1):145–160, 2008.                                           94(2):304–310, 2004.
 [7] B. Hecht and E. Moxley. Terabytes of Tobler:              [22] W. R. Tobler. A computer movie simulating urban
     evaluating the first law in a massive, domain-neutral          growth in the detroit region. Economic geography,
     representation of world knowledge. In Spatial                  pages 234–240, 1970.
     information theory, pages 88–105. Springer, 2009.         [23] S. Vargas and P. Castells. Rank and relevance in
 [8] J. L. Herlocker, J. A. Konstan, and J. Riedl.                  novelty and diversity metrics for recommender
     Explaining collaborative filtering recommendations. In         systems. In Proceedings of the 5th ACM conference on
     Proceedings of the 2000 ACM conference on computer             recommender systems, RecSys ’11, pages 109–116.
     supported cooperative work, pages 241–250. ACM,                ACM, 2011.
     2000.                                                     [24] I. Weber and C. Castillo. The demographics of web
 [9] T. Horozov, N. Narasimhan, and V. Vasudevan. Using             search. In Proceedings of the 33rd international ACM
     location for personalized poi recommendations in               SIGIR conference on research and development in
     mobile environments. In International Symposium on             information retrieval, pages 523–530. ACM, 2010.
     Applications and the Internet (SAINT). IEEE, 2006.        [25] B. Xu, J. Bu, C. Chen, and D. Cai. An exploration of
[10] D. Kluver and J. A. Konstan. Evaluating                        improving collaborative recommender systems via
     recommender behavior for new users. In Proceedings of          user-item subgroups. In Proceedings of the 21st
     the 8th ACM conference on recommender systems,                 international conference on World Wide Web, pages
     pages 121–128. ACM, 2014.                                      21–30. ACM, 2012.
[11] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker,   [26] X. Yang and Z. Zhang. Combining prestige and
     L. R. Gordon, and J. Riedl. Grouplens: applying                relevance ranking for personalized recommendation. In
     collaborative filtering to usenet news. Communications         Proceedings of the 22nd ACM international conference
     of the ACM, 40(3):77–87, 1997.                                 on information & knowledge management, pages
[12] O. Kucuktunc, B. B. Cambazoglu, I. Weber, and                  1877–1880. ACM, 2013.
     H. Ferhatosmanoglu. A large-scale sentiment analysis      [27] Y. Yu and X. Chen. A survey of point-of-interest
     for Yahoo! answers. In Proceedings of the fifth ACM            recommendation in location-based social networks.
     international conference on web search and data                2015.
     mining, pages 633–642. ACM, 2012.                         [28] J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L.
[13] S. Kullback and R. A. Leibler. On information and              Lee. Location-based spatial queries. In Proceedings of
     sufficiency. The annals of mathematical statistics,            the ACM SIGMOD international conference on
     pages 79–86, 1951.                                             management of data, pages 443–454. ACM, 2003.
[14] J. J. Levandoski, M. Sarwat, A. Eldawy, and M. F.
     Mokbel. Lars: A location-aware recommender system.
     In 28th International Conference on Data Engineering
     (ICDE), pages 450–461. IEEE, 2012.
[15] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y.
     Ma. Mining user similarity based on location history.
     In Proceedings of the 16th ACM SIGSPATIAL
     international conference on advances in geographic
     information systems, page 34. ACM, 2008.
[16] T. J.-J. Li, S. Sen, and B. Hecht. Leveraging advances