=Paper=
{{Paper
|id=Vol-1405/paper-03
|storemode=property
|title="Where Far Can Be Close": Finding Distant Neighbors In Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-1405/paper-03.pdf
|volume=Vol-1405
|dblpUrl=https://dblp.org/rec/conf/recsys/KumarJAKH15
}}
=="Where Far Can Be Close": Finding Distant Neighbors In Recommender Systems==
“Where Far Can Be Close”: Finding Distant Neighbors In
Recommender Systems
Vikas Kumar Daniel Jarratt Rahul Anand Joseph A. Konstan Brent Hecht
GroupLens Research
Dept. of Computer Science
University of Minnesota, Twin Cities
Minneapolis,USA
{vikas, jarratt, anand037, konstan, bhecht}@cs.umn.edu
ABSTRACT rather than across them produces the lowest error. Similarly,
Location and its corollary, distance, are critical concepts in partitioning users has resulted in better predictive accuracy
social computing. Recommender systems that incorporate than the full model [18, 25].
location have generally assumed that the utility of location- Location-aware recommenders, for example, have used ge-
awareness monotonically decreases as entities get farther ographic information as a filter to narrow the user-item rat-
apart. However, it is well known in geography that places ing space for improved performance. LARS [14] partitions
that are distant “as the crow flies” can be more similar and users into geographic grid squares: an assumption that users
connected than nearby places (e.g., by demographics, expe- within a contiguous, compact grid square are more alike
riences, or socioeconomic). We adopt theory and statistical than users not in that square. Other location-aware recom-
methods from geography to demonstrate that a more nu- menders [28, 1] assume similarity is proportional to straight-
anced consideration of distance in which “far can be close” – line (geodesic) distance.
that is, grouping users with their “distant neighbors” – mod- This paper extends that approach by recognizing
erately improves both traditional and location-aware rec- that location-aware recommenders can incorporate non-
ommender systems. We show that the distant neighbors proximate – distant, non-contiguous, and non-compact –
approach leads to small improvements in predictive accu- geographies into their recommendation models. Current
racy and recommender utility of an item-item recommender location-aware recommenders are grounded in Tobler’s First
compared to a “nearby neighbors” approach as well as other Law of Geography [22] (TFL): “everything is related to ev-
baselines. We also highlight an increase in recommender erything else, but near things are more related than distant
utility for new users with the use of distant neighbors com- things”. However, Tobler emphasizes that it is a rule-of-
pared to other traditional approaches. thumb rather than a law, and urges researchers to con-
sider more sophisticated notions of distance (e.g., population
density-controlled distance, border-aware distance, socioeco-
Keywords nomic distance). Recent work in peer production communi-
location-aware recommendations, user clustering, distant ties finds evidence for different ways of calculating distance
neighbors in regards to the First Law [7]. We do this by considering
“ratings preference distance” between locations, combined
1. INTRODUCTION with geodesic distance.
We use this non-trivial notion of distance to determine
Collaborative filtering starts with the assumption that similar locations (specifically postal codes: e.g., 19104 in
past agreement is predictive of future agreement. Separat- Philadelphia, Pennsylvania, is most similar to 61801 in Ur-
ing recommenders into user and item domains, within which bana, Illinois) and partition users based on their associated
users’ agreement is more predictive, increases the extent to location, then build collaborative filtering models for each
which that core assumption is true. For this reason, we partition. We demonstrate that general collaborative filter-
do not build recommender systems that mix movies, base- ing systems – even those that consider items that do not
ball cards, and sweaters into a single model. Konstan et have a unique spatial footprint, such as movies – can benefit
al. [11] give empirical evidence that partitioning user-user from an understanding of user geography, and specifically
recommenders by item domain improves predictive accu- an understanding that goes beyond a “closer = more rele-
racy, showing that recommending within Usenix newsgroups vant” assumption. We provide the first evidence that simi-
larity of locations (based on ratings) does not monotonically
decrease with geodesic distance, meaning that locations of-
ten have “distant neighbors”. We demonstrate that cre-
ating a user space based on location similarity moderately
improves predictive accuracy and recommender utility, rela-
tive to baseline approaches and location-aware recommender
systems that rely just on proximate users.
At a high level, this paper demonstrates that we
Copyright held by the author(s).
can harness geographic techniques to improve rec-
LocalRec’15, September 19, 2015, Vienna, Austria.
ommender systems by limiting the recommender “just another feature” recommenders that treat location as
model to groups of users in similar, though not nec- profile features but do not apply geographic methods.
essarily adjacent, locations. Intuitively, the argument
we make here for partitioning recommender systems based 2.2.1 “Nearby neighbors” filtering systems
on users rests on whether one can find sets of users who When recommenders are aware of location (i.e., from min-
have significantly different “views of the world” that make ing GPS tracks [15] or asking a person for her postal code
past agreement less predictive of future agreement. We can [20]), the systems use that context to personalize sugges-
think of this in the context of item-item collaborative filter- tions to those relevant to nearby users or items [1]. Some
ing, where we build a model of item-similarity (rating corre- location-aware recommenders [9, 27] operate within domains
lations) of item association (item co-rating or co-purchase). of items which have specific spatial footprints [6] (such as
If items are considered related by one set of users, but not by restaurants, people, or homes). For instance, recommenders
others (e.g., if Belgians see french fries associated with may- meant for mobile queries can calculate an area around a per-
onnaise while Americans see them associated with ketchup son where her query remains valid [14][28] if, for instance,
but not mayonnaise), then an item model built across these she requires updated coffee shop recommendations as she
diverse groups of users may be less effective at recommen- moves from neighborhood to neighborhood. Some point-of-
dation than one based on partitions of users. Based on this interest recommenders [3] use co-visitation rates (implicitly
intuition, we look at partitions based on location similar- geographic).
ity by forming groups of distant neighbors, and make the Other recommenders operate in domains of items which
following contributions (compared to an item-item recom- do not have specific spatial footprints, such as movies or
mender that uses no partitioning, a geographic partitioning consumer products. Levandoski et al. [14] tessellate users
based only on local neighbors, and a random-partitioning into geographic grid cells, which is discussed in detail below.
baseline): Similarly, Das et al. [5] apply Voronoi tessellation, a strictly
“nearby neighbors” method.
1. We show improvement in prediction accuracy using
The primary motivation for this work is that of Levan-
partitioning on similar locations (“distant neighbors”).
doski et al. [14], who introduce the “Location-Aware Rec-
2. In new user cold start, we show improvement in rec- ommender System” (LARS). Like our work, LARS is a par-
ommender utility for Top-N lists. titioning process that occurs before collaborative filtering
model-building. LARS lays a hierarchy of grids over the
2. RELATED WORK surface of the Earth at various levels of granularity, so that
each user belongs to one geographically contiguous and com-
This paper is motivated by research in two disciplines:
pact square. LARS then creates an item-item recommender
geography and recommender systems. Below we outline re-
model for each grid cell, suggesting movies – items without
lated work in each of these areas.
specific spatial footprints – based on the ratings of other
2.1 Tobler’s First Law people in the same grid cell. Its authors find an inflection
point where quality of recommendations is maximized: a 4x4
Current use of geography in collaborative filtering is al-
grid where users are partitioned into 16 geographic squares.
most always under the assumption that the most valuable
Smaller squares have too little data; larger squares include
ratings come from nearby people – an assumption grounded
too many less-similar users. LARS thereby encodes physical
in the First Law of Geography (or “Tobler’s First Law of
proximity as a proxy for similarity, and is the first paper
Geography”, 1970) [22]. In other words, the recommender
to contribute geographic user partitioning for collaborative
systems community has used distance decay as the rele-
filtering. Our contribution contra LARS is that geographic
vant property of location. But as noted by Tobler in 2004
user partitions need not be contiguous nor a compact space.
[21], deviations from pure distance decay, such as population
density, are common. Tobler’s 2004 assertion has received 2.2.2 Location as “just another feature”
empirical support in the work of Li et al. [16] and Hecht
Still other work in general recommender research treats
and Moxley [7], which found that the relatedness between
location information as “just another feature”. Seroussi et
geographic entities in Wikipedia generally decreases with
al. [20] use standard matrix factorization techniques to find
geodesic distance, but that (as advocated by Tobler himself)
latent relevant features in a person’s preferences, with U.S.
we need to consider more sophisticated notions of distance to
state names drawn from postal codes as possible features.
more completely model the relationship between relatedness
Similarly, Kucuktunc et al. [12] match postal codes with
and distance. Our distant neighbors approach uses “ratings
census data, inferring demographics to match users in a
preference distance” in combination with geodesic distance:
question-and-answer site. We see promise in Kucuktunc’s
we show evidence that people do have similarity with peo-
method, which uses postal codes as a way to guess demo-
ple around them, but that at a certain geographic scale (for
graphic tags for users. Their method is based on Weber
which we examine postal codes), they are just as similar
and Castillo’s demographic analysis which is meant to “de-
to a subset of distant people. This paper is the first, to
personalize and de-localize” location information [24], yet
our knowledge, to show that an understanding of TFL that
demographics are themselves geographically clustered [4],
combines ratings preference similarity and geodesic distance
resulting in a “distant neighbors” type of approach through
can lead to increased accuracy in recommendations.
a demographic step. This approach has the benefit of ac-
2.2 Location-aware recommendation counting for locations a recommender has never before en-
countered – “location cold start” – since demographics are
Location-aware recommender systems consider location in
often widely available.1
two ways: (1) “nearby neighbors” filtering systems that ex-
1
plicitly or implicitly encode a strict form of TFL and (2) www.ipums.org
Figure 1: Illustration of Distant Neighbors Partitioning Approach
3. DATASET users’ ratings for the same item. Based on previously pub-
We use movie ratings from the MovieLens rating commu- lished work on MovieLens data [10], we use the Bayesian
nity2 , whose users have the option of entering their postal average [26] with a damping value of 5, to dampen the ef-
code while signing up for the site. In MovieLens, users can fect of ratings of movies rated by only few users in a postal
rate movies on a rating scale from 0.5 to 5.0 in increments of code.
0.5. For our experimental dataset, we select only those users In the above process, for example, we determine the Toy
who entered a valid US postal code (which is about 22% of Story-15213 postal code (Pittsburgh, Pennsylvania) rating
all users). To ensure data density, we further exclude postal from the ratings of all users in that postal code who rated
codes that do not have at least 20 MovieLens users who Toy Story.
rated at least 20 movies. We finally randomly sample 5000 We then cluster locations in the location-item rating ma-
users to create a test dataset, resulting in 1779 postal codes trix using spectral clustering.5 After each process that gen-
and 1.01 million ratings.3 The rating count is equivalent to erates C clusters of postal codes, we partition the original
the extensively-used publicly available MovieLens dataset of user-item rating matrix into C partitions. Recall that each
1 million ratings.4 user belongs to a single postal code and hence only to one
cluster. For a cluster count C = 1, the user-item rating ma-
trix remains the same as the traditional full model used for
4. PARTITIONING APPROACHES recommendation. The process is illustrated in Figure 1.
We consider three partitioning approaches – distant neigh- Why Distant neighbors?
bors, local neighbors and random – to partition the user item To understand how postal codes reflect rating preference
rating matrix. distance compared to geodesic distance, we use geostatis-
tics. The correlogram, a geostatistical tool, helps us under-
4.1 Distant neighbors partitioning stand the correlation between geodesic distance and similar-
Distant neighbors refers to users in a cluster of postal ity among places [17]. We show this correlation in Figure
codes based on rating similarity. In this section, we explain 2, where on the x-axis we plot pairwise geodesic distance
how we identify distant neighbors from a user-item rating between postal codes (lag = 100 km) and on the y-axis, the
dataset. We then use a traditional collaborative filtering average rating similarity of postal codes (separated by the
recommender on each of those clusters. Partitioning of the corresponding distance on the x-axis). To determine sim-
user-item matrix into distant neighbors clusters occurs in ilarity we compare the rating vectors for each postal code
three stages: using (a) cosine similarity, (b) Spearman’s rank correlation,
1. Convert the user-item rating matrix to a location-item and (c) Jaccard index. We also evaluate (d) cosine simi-
rating matrix larity on postal code’s item-popularity vectors instead. As
an example in Figure 2, we see that for postal codes sep-
2. Determine location-to-location similarity arated by 1000-1100 km, their average cosine similarity is
approximately 0.28.
3. Cluster locations to form groups of users associated Now, going back to the definition of Tobler’s First Law
with their respective locations based on purely geodesic distance leads to the assumption
of the similarity of postal codes monotonically decreasing as
We create a location-item rating matrix from the user-
the straight-line distance between them increases. However,
item rating matrix by matching postal codes with their re-
as shown in Figure 2, it is immediately obvious that To-
spective user identifiers. We then determine the rating of
bler’s First Law in its geodesic distance interpretation does
a postal code for an item by averaging over the location’s
not hold in the movie domain. While nearby postal codes
2
www.movielens.org
3 5
This dataset is public at http://cs.umn.edu/∼vikas We find that 20 features and 50 iterations for a given cluster
4
www.grouplens.org/datasets/movielens results in better inter-cluster and intra-cluster density.
the successively nearest locations (geodesic) until a given
number of users is reached (matching with the number of
users in the corresponding distant neighbors clusters). We
use this method to compare our approach to location-aware
recommenders that are based on location proximity instead
of similarity.
4.3 Random partitioning
As a baseline, we use random partitions where each par-
tition consists of a given number of users (matched with
the number of users in the corresponding distant neighbors
cluster), randomly selected from the original user-item rat-
ing matrix.
5. BUILDING RECOMMENDATIONS
We use the item-item collaborative filtering approach with
log-likelihood similarity [19] for the prediction and evalua-
tion of the different partitioning approaches on recommen-
dation. We build models for each partitions (or clusters)
independently i.e. if a partitioning approach creates N user-
item matrices then we build N recommendation models, one
for each matrix. For item recommendations to a user we se-
lect her respective partition (determined by her postal code)
and use the respective model.
Note, it is possible that with many clusters we may have
small user and item proportions per cluster. In such cases,
a recommender may fail to predict or recommend result-
ing in lower number of successful predictions thus affecting
the coverage of the recommender. For example, item-item
collaborative filtering fails to predict rating of an item for
a given user, if that item has no correlation with any other
item that user has rated. Hence, we compare the accuracy of
different partitioning approaches with respect to their cov-
erage. Using fallback predictions, we also compare the ac-
curacy of each partitioning approach at 100% coverage i.e.
whenever recommender fails to predict we either use (a) the
item mean rating, if item exists in the partition’s training
set, or (b) the global mean rating if item does not exist in
Figure 2: Correlograms: Distance Vs Similarity
the partition’s training set.
are more similar than slightly more distant postal codes, av- 6. EVALUATION
erage similarity does not decrease as distance increases. In At a high level, our methodology involves three stages.
fact, there are several points where the average similarity of First, we generate user-item rating matrix partitions in sev-
locations separated by thousands of kilometers exceeds the eral sizes for two approaches, distant neighbors and random.
average similarity of nearby locations. For local neighbors, we build user-item rating matrices in
Thus, recommender systems that only consider geodesic several sizes for each postal code, which we call “partitions”
distance using (often) the correlogram’s first bucket(s) ig- for convenience. Next, for each partition, we build an inde-
nore the higher average similarity displayed by distant postal pendent item-item recommender. Finally, we evaluate pre-
code pairs. If these systems increase their geodesic distance diction accuracy using RMSE (root mean squared error) and
threshold in an attempt to include more users, the newly- recommendation accuracy with MAP (mean average preci-
added nearby locations may actually have lower average sion). We compare the performance of each approach to that
similarity than distant locations. of an item-item recommender built using the full rating ma-
We reasoned that perhaps the nationwide popularity of trix. We evaluate the techniques on four different numbers
many movies causes the average similarity of distant postal of disjoint clusters – 1, 5, 20 and 50 – with user proportion
codes to increase. To test our hypothesis, we re-evaluate the sizes of 100%, 20%, 5% and 2% respectively.
correlograms ignoring the most popular 20% of movies and For RMSE evaluation, we use 90% of ratings for each user
found the same result. in the training data and keep 10% of their ratings in the
test set. In Top-N evaluation, for each user we build an
4.2 Local neighbors evaluation set of relevant items [23]: those rated 4 (of 5)
For each postal code, we build a “local neighbors” user- or higher, following [10]. We then keep only 80% of that
item matrix, which is a subsample of the entire dataset. user’s rating in the training data with all other users’ rat-
This consists of the location’s own users, plus users from ings. We repeat this process for a random sample of 10% of
users in the dataset and take the average of MAP for Top-50 method is able to beat the full model RMSE. Moreover, the
recommendations (MAP@50) over this sample of users. RMSE gets better with fallback predictions than the RMSE
without fallback predictions. This suggests that the item
6.1 Results baseline predictor is better than the full model in cold start;
The distant neighbors approach with 50 clusters shows we find this result consistent with the results from Kluver
the best RMSE among all other partitioning approaches and et al. [10] on evaluation of item-item for new users.
number of clusters, including the full model (“1 partition”) For recommendation utility, we observe local neighbors to
recommender, shown in Table 1. We find the improvement have better MAP@50 for 20 and 50 clusters compared to
against the full model to be statistically significant (p < any other partition or the full model. Distant neighbors is
0.001, per user using the Wilcoxon Rank-Sum test). We able to do better than other partitions only for 5 clusters. In
also note that for any given cluster count, distant neigh- contrast, with fallback predictions for all number of clusters,
bors have more accurate predictions than other partitioning we observe that distant neighbors is able to produce better
approaches i.e. local neighbors and random partitions; how- MAP@50 results (0.1972 with 50 clusters) with improvement
ever, we find the difference between them to be statistically over local neighbors (0.14755), random (0.09753) and the full
significant only for 20 clusters. model (0.0533).
The distant neighbors approach’s low coverage of 88.75%
compared to the full model’s 99.87% means that we cannot 8. DISCUSSION
directly imply better performance. We note that at 20 and In this section, we summarize the results in light of the
50 partitions, distant neighbors has higher coverage than assumptions and the limitations of our approach. We further
local neighbors and random partitioning. Controlling for highlight the key takeaways and some anecdotes from our
100% coverage using fallback predictions, we observe that dataset.
distant neighbors at 20 clusters remains statistically signifi- We show that by controlling for coverage, grouping users
cantly more accurate than full-model item-item (p < 0.05), based on their location similarity – distant neighbors –
but the other two partitioning methods are no longer signif- shows significant improvement in prediction accuracy over
icantly different than the full model. other partitioning approaches for 20 clusters. In cold start,
For recommendation metrics, distant neighbors shows sig- though, we notice that partitioning fails to improve pre-
nificantly better MAP@50 for cluster sizes of 20 and 50, diction accuracy (RMSE). However, as Kluver et al. [10]
especially over the full model. We hypothesize here that states that recommendation utility is more important for
due to very small user proportions (and therefore items) new users, we perhaps regard distant neighbors better mean
per cluster, the recommendations are limited to a specific average precision a positive result.
set of items liked by most of the users in that cluster. To Also, we note that the Top-N metrics, like MAP, have
understand this, we calculate the intra-list similarity (or di- popularity bias [10]. Mean average precision will be higher
versity) between the items, using the full training data, and for any algorithm that favors popular items. To understand
find a much lower diversity using any partitioning method if such bias exists in our results, we look at popularity of the
compared to the full model. recommended items. We find that the local neighbors rec-
ommender favors relatively more popular items than distant
7. NEW USER COLD START EVALUA- neighbors, which recommends more popular items than ran-
dom, and than the full model. We calculate popularity based
TION on the number of users who rated the item in the full matrix.
New users are a crucial part of a recommender system’s We also determine the average user rating for the items for
success, and this problem forms an important part of sys- the cold start situation to understand the relevance of rec-
tem design decisions. Previous work has used location in- ommendation, by taking the mean of ratings by the user
formation to solve the cold start problem [2], because user for the relevant items in the recommendation list. We find
context data can substitute for user rating data. We ana- that distant neighbors consistently provides recommenda-
lyze the effect of our location-aware approach by simulating tions that have higher user average ratings: 4.57, compared
the new user experience. Here we randomly sample 10% of to local neighbors (4.25) and random (4.23). We therefore
users from the training data (that contains 90% of ratings find the distant neighbors performance on Top-N metrics to
for each user) and retain only a few ratings for those users be a positive result.
in the training data – 5 ratings per user – and discard other We consider these results as a proof of concept. We note
ratings for that user [10]. We evaluate the metrics on the that the magnitude of the improvements is small, even if sta-
test data containing only these users. tistically significant. We recognize that such small decreases
in RMSE are unlikely to, in and of themselves, result in a
7.1 Results significant change in user experience. Rather, we hope to see
In case of cold start with no fallback predictions, we ob- further development and optimization of geographic similar-
serve distant neighbors with 20 clusters to have the statis- ity into recommender systems with the promise of further
tically significantly best predictive accuracy against the full improvements in performance. We also note that the Top-N
model (p < 0.05, significance per user). However, we find results show even more promise, and deserve future user-
this difference to be not significant against local neighbors centered evaluation.
based partitions for the same clusters. To further interpret our results we revisit our intuition
The coverage, like the previous evaluation, remains low of user worldviews that can manifest in rating space. By
for larger number of clusters. With fallback predictions, “worldviews” we mean the agreement among a subset of
although distant neighbors has better RMSE compared to users who have significantly different “views of the world”
local neighbors and random (not significant), no partition compared to other subsets. We see empirical evidence of how
No fallback predictions With fallback predictions (100% coverage)
Partitions Distant neighbors Local neighbors Random Distant neighbors Local neighbors Random
0.98253 RMSE
1 0.9875 RMSE
(99.87% coverage)
0 MAP
0 MAP
0.98076 0.98206 0.98124
5 0.9822 0.98131 0.98073
(98.81%) (98.68%) (99.12%)
0 0.02632 0
0 0 0
0.96914 0.97346 0.97063
20 0.98131 0.98367 0.98234
(96.34%) (95.82%) (96.26%)
0.03849 0.02381 0.02042
0.03846 0.03015 0.03014
0.96774 0.96946 0.96812 0.98996 0.99461 0.99373
50 (88.75%) (87.57%) (87.27%) 0.06639 0.0407 0.04532
0.13854 0.06105 0.03107
Table 1: Evaluation of partitioning approaches. The best results for each partition count per coverage
condition are in bold. The best results in each coverage condition are underlined.
No fallback predictions With fallback predictions (100% coverage)
Partitions Distant neighbors Local neighbors Random Distant neighbors Local neighbors Random
1.09396 RMSE
1 1.07119 RMSE
(99.87% coverage)
0.0533 MAP
0.0533 MAP
1.09025 1.08774 1.08122
5 1.08078 1.07376 1.07596
(99.58%) (98.85%) (98.61%)
0.125 0.10346 0.09515
0.05309 0.03452 0.0303
1.09 1.09278 1.10356
20 1.08069 1.08502 1.08672
(95.36%) (95.74%) (94.74%)
0.13668 0.09609 0.097
0.11268 0.1241 0.05956
1.10404 1.10414 1.11138 1.07392 1.0867 1.07834
50 (92.46%) (91.36%) (91.04%) 0.1972 0.14755 0.09753
0.08049 0.15347 0.11289
Table 2: Evaluation of partitioning approaches in simulated cold start. The best results for each partition
count per coverage condition are in bold. The best results in each coverage condition are underlined.
Place Most Similar 2nd Most Similar 3rd Most Similar
Jersey City, NJ 07087 San Francisco, CA 94104 Redwood City, CA 94063 Bala Cynwyd, PA 19004
Philadelphia, PA 19104 Urbana, IL 61801 New York, NY 10003 Cambridge, MA 02139
Ann Arbor, MI 48104 Minneapolis, MN 55455 Brooklyn, NY 11215 Minneapolis, MN 55414
Palo Alto, CA 94305 Chicago, IL 60614 Urbana, IL 61801 Ann Arbor, MI 48103
Table 3: Most similar locations for selected US postal codes
college towns are more similar to other college towns than proach may work less well considering individual places (say,
to other postal codes in their own city, and hypothesize that a specific restaurant) but may work just as well considering
people on a particular college campus may share an under- place features (say, a cuisine that applies to many restau-
standing of concepts (for example: registration, fraternities, rants).
and late-night pizza) with people in places far apart. For Location-aware recommenders should be able to use the
instance, as seen in Table 3, the postal code 48104 (repre- context of location to explain their suggestions: e.g., that
senting a university campus in Ann Arbor, Michigan) is most people in a similar location offered relevant opinions. Fu-
similar to 55414, another university campus in Minneapolis, ture work involves exploring the value of location-based ex-
Minnesota, about 900 km away. We further hypothesize that planations over those focused on other dimensions of simi-
similarity among users is not primarily based on straight- larity. Herlocker et al. find that 86% of recommender sys-
line distance but on a more nuanced distance – cultural or tem users value additional explanations [8], and location-
socioeconomic – that manifests in a community consensus based partitions are explainable. Placenames are immedi-
of agreement on item likeness, in the same way that the se- ately available, and demographic inference (e.g., “another
mantic relatedness of spatially-referenced Wikipedia articles college town”) is possible.
is not fully consistent with geodesic promiximity [7]. Finally, although we chose postal codes as the unit of spa-
We also believe that clusters based on location provide ad- tial granularity in our system, location is defined as part of
vantages over clusters based directly on users’ ratings [18]: a hierarchy [3] and we could have been more or less precise.
because (a) the number of postal codes is few compared to In an area of the world with smaller nations and stricter
the number of users, an important factor when clustering cultural borders – say, Europe or Central America – we may
within a very large rating matrix; (b) the exogenous infor- find the applicable scale is at the level of nations and lan-
mation from location can help explain the clusters. guages. Spatial granularity may also be inconsistently scaled
within the same dataset: if speaking German is predictive,
it may be at the national level in Europe, regional level in
9. LIMITATIONS the United States, and city level in South America.
An important limitation of our work is that our exper-
iment considered only United States postal codes. The
United States has a large variety of cultures and demograph- 11. CONCLUSION
ics spread throughout the country. Different countries may We demonstrate that we can harness geographic tech-
display different properties. niques to improve recommender systems by limiting the rec-
The other limitation is the low user density within postal ommender model to groups of users in similar locations. We
codes used for evaluation. We find 1779 postal codes asso- show similar locations are determined by a combination of
ciated with 5000 users which means we have on average 3 ratings preference and geodesic distance, an understanding
users per postal code. Recognizing this limitation, we per- that goes beyond a “closer = more relevant” assumption. We
formed a predictive accuracy experiment on an equivalent provide the first evidence that rating similarity of locations
sized dataset by picking postal codes (which has its own does not monotonically decrease with geodesic distance, and
bias of oversampling users from dense locations) and found describe a geostatistical method to discover “distant neigh-
RMSE results were consistent. bors”. We show that creating a user space based on location
In addition, our experiment considered only a Bayesian similarity improves predictive accuracy and recommender
average to calculate postal codes’ item rating midpoints. utility, including the new user cold start scenario and con-
Different similarity methods, such as Kullback–Leibler di- trolling for full coverage.
vergence [13] of rating distributions may provide better un- Over-relying on the opinions of nearby places can lead to
derstanding of similarity between two places. the inclusion of proximate but less similar people, whereas
While we calculate local neighbors by proximity, LARS the power of distant neighbors in recommender systems is
tessellated users into geographic grid squares. One of LARS’ smart partitioning of the underlying dataset to include the
major contributions was spatial data structure optimization right users among all those far away.
for best storage and recommendation performance. The use We urge a reconsideration of geographic relationships in
of a spatial database is out of the scope of our paper. recommender systems, where the value of geography goes
Finally, MovieLens users enter postal codes at registration beyond geodesic distance.
and rarely update that information, even if they move.
10. FUTURE WORK 12. NOTES
There is significant complexity around what kind of Our code and datasets, as well as results for additional
location-aware partitioning may improve recommendations. metrics, are available at http://cs.umn.edu/∼vikas.
Other methods might include ESRI’s Tapestry dataset,
which groups postal codes not on item rating similarity but
with “lifestyle” – that is, demographic and socioeconomic – 13. ACKNOWLEDGMENTS
weighted tags. For example, the top lifestyle segments for This work was supported in part by National Science
94043 (Mountain View, California) are Enterprising Profes- Foundation (IIS-0808692 and IIS-0964695) and the Uni-
sionals (38%), Trendsetters (14%), and Laptops and Lattes versity of Minnesota’s Social Media and Business Analyt-
(10%). ics Collaborative (SOBACO). We also would like to thank
We are interested in how this process applies to item do- Daniel Kluver (University of Minnesota) and Michael Ek-
mains other than movies, especially for items with specific strand (Texas State University) for their continuous feed-
spatial footprints. We intuit that a distant neighbors ap- back and input in the paper.
14. REFERENCES in natural language processing to better understand
[1] G. Adomavicius and A. Tuzhilin. Context-aware Tobler’s first law of geography. In Proceedings of the
recommender systems. In Recommender systems 2014 SIGSPATIAL Conference. ACM, 2014.
handbook, pages 217–253. Springer, 2011. [17] N. L. Oden. Assessing the significance of a spatial
[2] J. Bao, Y. Zheng, and M. F. Mokbel. Location-based correlationogram. Geographical Analysis, 16(1):1–16,
and preference-aware recommendation using sparse 1984.
geo-social networking data. In Proceedings of the 20th [18] A. M. Rashid, S. K. Lam, G. Karypis, and J. Riedl.
International Conference on Advances in Geographic ClustKNN: a highly scalable hybrid model- &
Information Systems, pages 199–208. ACM, 2012. memory-based CF algorithm. Proceeding of WebKDD,
[3] J. Bao, Y. Zheng, D. Wilkie, and M. Mokbel. 2006.
Recommendations in location-based social networks: a [19] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
survey. GeoInformatica, pages 1–41, 2015. Item-based collaborative filtering recommendation
[4] B. Bill. The Big Sort: Why the Clustering of algorithms. In Proceedings of the 10th international
Like-Minded America Is Tearing Us Apart. New York: conference on World Wide Web, pages 285–295. ACM,
Houghton Mifflin Company, 2008. 2001.
[5] J. Das, S. Majumder, and P. Gupta. Voronoi based [20] Y. Seroussi, F. Bohnert, and I. Zukerman.
location aware collaborative filtering. In 3rd National Personalised rating prediction for new users using
Conference on Emerging Trends and Applications in latent factor models. In Proceedings of the 22nd ACM
Computer Science (NCETACS), pages 179–183. IEEE, conference on hypertext and hypermedia, pages 47–56.
2012. ACM, 2011.
[6] K. E. Grossner, M. F. Goodchild, and K. C. Clarke. [21] W. Tobler. On the first law of geography: A reply.
Defining a digital earth system. Transactions in GIS, Annals of the Association of American Geographers,
12(1):145–160, 2008. 94(2):304–310, 2004.
[7] B. Hecht and E. Moxley. Terabytes of Tobler: [22] W. R. Tobler. A computer movie simulating urban
evaluating the first law in a massive, domain-neutral growth in the detroit region. Economic geography,
representation of world knowledge. In Spatial pages 234–240, 1970.
information theory, pages 88–105. Springer, 2009. [23] S. Vargas and P. Castells. Rank and relevance in
[8] J. L. Herlocker, J. A. Konstan, and J. Riedl. novelty and diversity metrics for recommender
Explaining collaborative filtering recommendations. In systems. In Proceedings of the 5th ACM conference on
Proceedings of the 2000 ACM conference on computer recommender systems, RecSys ’11, pages 109–116.
supported cooperative work, pages 241–250. ACM, ACM, 2011.
2000. [24] I. Weber and C. Castillo. The demographics of web
[9] T. Horozov, N. Narasimhan, and V. Vasudevan. Using search. In Proceedings of the 33rd international ACM
location for personalized poi recommendations in SIGIR conference on research and development in
mobile environments. In International Symposium on information retrieval, pages 523–530. ACM, 2010.
Applications and the Internet (SAINT). IEEE, 2006. [25] B. Xu, J. Bu, C. Chen, and D. Cai. An exploration of
[10] D. Kluver and J. A. Konstan. Evaluating improving collaborative recommender systems via
recommender behavior for new users. In Proceedings of user-item subgroups. In Proceedings of the 21st
the 8th ACM conference on recommender systems, international conference on World Wide Web, pages
pages 121–128. ACM, 2014. 21–30. ACM, 2012.
[11] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, [26] X. Yang and Z. Zhang. Combining prestige and
L. R. Gordon, and J. Riedl. Grouplens: applying relevance ranking for personalized recommendation. In
collaborative filtering to usenet news. Communications Proceedings of the 22nd ACM international conference
of the ACM, 40(3):77–87, 1997. on information & knowledge management, pages
[12] O. Kucuktunc, B. B. Cambazoglu, I. Weber, and 1877–1880. ACM, 2013.
H. Ferhatosmanoglu. A large-scale sentiment analysis [27] Y. Yu and X. Chen. A survey of point-of-interest
for Yahoo! answers. In Proceedings of the fifth ACM recommendation in location-based social networks.
international conference on web search and data 2015.
mining, pages 633–642. ACM, 2012. [28] J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L.
[13] S. Kullback and R. A. Leibler. On information and Lee. Location-based spatial queries. In Proceedings of
sufficiency. The annals of mathematical statistics, the ACM SIGMOD international conference on
pages 79–86, 1951. management of data, pages 443–454. ACM, 2003.
[14] J. J. Levandoski, M. Sarwat, A. Eldawy, and M. F.
Mokbel. Lars: A location-aware recommender system.
In 28th International Conference on Data Engineering
(ICDE), pages 450–461. IEEE, 2012.
[15] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y.
Ma. Mining user similarity based on location history.
In Proceedings of the 16th ACM SIGSPATIAL
international conference on advances in geographic
information systems, page 34. ACM, 2008.
[16] T. J.-J. Li, S. Sen, and B. Hecht. Leveraging advances