Detecting Trending Venues Using Foursquare’s Data Stephanie Yang Max Sklar Foursquare Labs Foursquare Labs 568 Broadway, 10th Floor 568 Broadway, 10th Floor New York, NY New York, NY stpyang@foursquare.com max@foursquare.com ABSTRACT Foursquare has successfully implemented short term trend Foursquare is a search and discovery tool which helps users detection to showcase real-time events as they happen [6]. discover venues around the world. Much of the data for these The algorithm in this paper fills the gap between the near- recommendations come from its sister app Swarm, which is instantaneous discovery of popular events, and the long-term a location based social network where users can “check in” detection of quality venues. to places they visit. Older versions of Foursquare had a strongly static compo- 2. FEATURES nent to its recommendations. For instance, the top restau- All of the features described below are generated by users’ rants in New York City do not vary from month to month, interaction with the Foursquare and Swarm apps and by and venues with years of consistently strong signals will passively generated visits from Pilgrim. Noteworthy venues dominate search results. inspire users to interact with their apps, and so most user In this paper we outline a new algorithm which Foursquare activity for a venue is seen as positive. uses in order to discover fresh recommendations. Promoting younger venues with fewer check-ins or older venues with a recent surge of activity increases turnover in our recommen- 2.1 User generated signals dations and yields a better user experience. Checkins and visits: The primary signals for trendi- ness are based on foot traffic in the form of active check-ins and passive visits. Active check-ins typically indicate better Keywords venues, since Swarm users tend to broadcast special outings Recommender systems; Ratings; Foursquare more often than their day to day activities. Saves: Foursquare users have the option of saving a venue to a list for later. This distinguishes trendy new places from 1. INTRODUCTION average ones, because it indicates aspirations to visit. Foursquare has a database of nine billion check-ins and 85 Tips from users: Users have the option of writing tips at million public venues around the world. Using this data, the any venue, which are shown to other users as part of the local mobile app provides personalized venue recommendations to discovery experience [5]. Trendy venues consistently attract users. Core components of these recommendations are based a larger number of tips compared to the average venues. on foot traffic data in the form of check-ins and passively Tips from vetted accounts: A handful of user accounts generated visits from a background location service called are unusually influential. For example, some celebrities and Pilgrim [4, 7], as well as other user interactions in the form local blogs about food maintain active Foursquare accounts of venue feedback, tips, and photos. with tens of thousands of followers. Tips from these accounts There is a constant tension between consistency and fresh- drive foot traffic and are a leading signal of venue trendiness. ness in Foursquare’s recommendations. For example, Thomas Explicit feedback: The Foursquare app prompts it users Keller’s Per Se is always at the top of the results for restau- to leave explicit ratings—like, dislike, or neutral—about the rants in New York City, but most users find value in discov- places they visit. ering a more accessible venue like a new mom-and-pop coffee Photos: The excitement of visiting a noteworthy venue shop around the corner. Likewise, a celebrity chef moving is often reflected by our users documenting their visit with to a new restaurant results in a flurry of activity which is photographs. not always captured well by Foursquare’s long-term signals. 2.2 Trend detection For each of the activities listed above, we calculate two statistics. The first statistic is derived from fitting a trend line through Permission to make digital or hard copies of part or all of this work for personal or the time series of the activity. The signal that we use is given classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation by the equation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the authors. RecSys ’16 Sept 15–19, 2016, Boston, MA, USA β̂ S= , c 2016 Copyright held by the authors. σβ̂ 400 Check-ins Suppa 300 Kafe Pi 200 100 0 0 10 20 30 40 50 60 Day Figure 1: Number of check-ins per day for two restaurants in Istanbul. Note that check-ins exhibit a weekly cycle. The spike in activity for Kafe Pi on day 39 was due to a single event. Figure 2: A screenshot from the Foursquare website for New York City. where β̂ denotes the slope of the trend line through the time series with 56 days of data, and σβ̂ denotes the standard 4. SUMMARY AND RESULTS error of the estimate β̂. The combined signal is now being used as a primary com- In Figure 1 we display the number of check-ins per day ponent of venue recommendations, and is showcased in the for two restaurants in Istanbul, Turkey. The trend lines main Foursquare app and on the website in the form of for both time series (omitted from the figure) have similar weekly billboard-style “Trending This Week” lists in major slopes. Although the value of β̂ is positive and similar for metropolitan areas (Figure 2). It is also frequently covered both venues, the value of σβ̂ is lower for Suppa than it is for in articles which feature best-of lists for many cities [1, 2, Kaffe Pi. Hence the signal S for Suppa is larger than the 3]. Weekly e-mails featuring these lists have click through corresponding signal for Kaffe Pi. In general, venues with rates that far exceed the industry average and drive reg- erratic or spiky activity do not benefit from one-time events ular in-app activity. The signal has also been integrated for this class of signals. into Foursquare’s core venue ratings algorithm resulting in The second statistic is a decayed sum of the activity, cal- greater freshness and turnover. culated with a half life of 56 days. 5. REFERENCES D= X λd cd e , [1] R. Bruner. 10 trendy Austin restaurants you need to d try right now. http://www.businessinsider.com/the- hottest-restaurants-in-austin-tx-2016-3. Accessed: where d is the number of days prior to the current day, 2016-06-30. cd is the total amount of user activity on that day, and [2] R. Bruner. 12 up-and-coming New York City λ = − ln 2/56. Note that short half lives are associated with restaurants you need to try right now. noisier data, and long half lives lead to a lack of freshness. http://www.businessinsider.com.au/12-trendy-new-nyc- For example, the venue Kafe Pi in Figure 1 has a spike in restaurants-to-try-now-2016-1. Accessed: activity on Day 39, which would have dominated the signal 2016-06-30. if the half life were too short. Longer half lives have more [3] R. Bruner. 15 trendy New York City restaurants you stability, and we found that very long half lives lead to a need to try right now. lack of freshness in our recommendations. In our research, http://www.businessinsider.com/15-new-nyc- 56 days is the best balance for both stability and freshness. restaurants-to-try-now-2016-2. Accessed: 2016-06-30. 3. COMBINING THE SIGNALS [4] A. Heath. Foursquare’s location data is way more The distribution of the S-scores is roughly bell-shaped, powerful than people realize. Tech Insider, January while the distribution of the D-scores has a long tail. In 2016. order to combine the two classes of scores, we normalize [5] M. Sklar. Timely tip selection for Foursquare each signal to a Gaussian distribution using the function recommendations. In RecSys Posters, October 6–10 2014. [6] M. Sklar, B. Shaw, and A. Hogue. Recommending N = Φ−1 (r), interesting events in real-time with Foursquare where Φ is the cdf of the standard N (0, 1) distribution and check-ins. In RecSys 2012 Poster Proceedings, pages r is the relative rank, between 0 and 1, of the venue when 311–312, September 9 2012. compared to all other venues and sorted by a given score. [7] R. Tate. The brilliant hack that brought Foursquare We then combine the signals linearly with hand-tuned co- back from the dead. Wired, December 2013. efficients. The largest coefficients are associated with the S-score of tips left by vetted accounts—a sparse but strong signal —and the S-score of Pilgrim-generated visits. These two scores account for more than 60% of the final signal.