Timely Tip Selection for Foursquare Recommendations

                                 Max Sklar                                          Kristian J. Concepcion
                            Foursquare Labs                                              Foursquare Labs
                        568 Broadway, 10th Floor                                     568 Broadway, 10th Floor
                             New York, NY                                                 New York, NY
                       max@foursquare.com                                            kjc@foursquare.com


ABSTRACT
                                                                          Table 1: Comparing Food Items with the Highest
This poster summarizes the techniques we use to serve                     Bhattacharyya Similarity to Lunch
Foursquare tips for a given venue and more specifically the                English Phrase Score    Thai   Translated Score
strategies employed for choosing timely and seasonal tips.
                                                                            salad sandwich     0.943    ก๋วยเตี๋ยว    noodles     0.884
                                                                           turkey sandwich     0.929    ชา เขียว     green tea    0.845
Categories and Subject Descriptors                                         cuban sandwich      0.918      กาแฟ         coffee     0.832
H.3.3 [Information Search and Retrieval]: Information                           panini         0.913     ขา หมู      pig’s feet   0.827
filtering; G.3 [Probability and Statistics]: Time series
analysis; I.2.7 [Natural Language Processing]: Text anal-
ysis; I.5.1 [Pattern Recognition]: Models–Statistical
                                                                          3.   TIMELINESS OF PHRASES AND TIPS
Keywords                                                                     Through our Swarm app, users check in to share their
                                                                          location and leave a short update for their friends called a
bhattacharyya coefficient, context-aware recommenders,
                                                                          shout. In order to find phrases which are time-sensitive,
foursquare, machine learning, natural language processing,
                                                                          we looked at shouts instead of tips because they were more
text classification
                                                                          specific to what users were doing at any particular time.
                                                                             Our model for phrase popularity over the course of the
1.    INTRODUCTION                                                        week mirrors our model for venue popularity[4]. For each
  Foursquare is a location-based recommendation engine. A                 supported language, we divided the week into 168 hour buck-
primary action for users is to write a tip, which is a short              ets. We then counted the number of times each phrase was
public note attached to a venue, often a review or suggestion.            used in a given bucket. We also counted the total number
Any given venue is likely to have many tips attached to it,               of shouts in each bucket to produce a baseline distribution.
which vary in quality and relevance. With the recent focus                   The Bhattacharyya coefficient[1] is a metric for comparing
on search and discovery as well as passive location awareness,            the similarity between two probability distributions. Given
we have developed a number of heuristics in order to serve                two phrase distributions P and Q, we define the similarity
the right tips to the right people at the right time.                     to be
                                                                                                     X p
                                                                                          S(P, Q) =         P (w)Q(w)
2.    TIP SELECTION COMPONENTS                                                                         w∈W
   Language Identification: In order to avoid serving lan-
guages that a user does not understand, a language classi-                where W is the set of all 168 weekhour buckets.
fier on Foursquare tips was built using an ensemble of open                  For example, the Bhattacharyya coefficient between any
source and home-grown solutions.                                          phrase and the word “lunch” provides a measure of how ap-
   Global quality: We created a hand-labelled training set                propriate that phrase is for lunch time. The food items
of high and low quality tips based off of a strict set of qual-           which rank most highly in this metric for English and Thai
ity guidelines. Raw scores from various statistical classifiers           give interesting insights into the lunch habits of different
that were trained to identify specific traits such as sentiment           language groups (Table 1).
or spam were used as features to train a quality model.                      Furthermore, the Bhattacharyya coefficient between any
   Personalization: We developed a number of signals which                phrase and the baseline distribution measures the time sensi-
take into account the user’s tastes and social connections.               tivity of that phrase. We extracted all the phrases that meet
   Timeliness and Seasonality: For any given date and                     a certain threshold for time sensitivity. Then, each phrase-
time, a tip is analyzed in order to determine whether it is               bucket was assigned a timeliness score which is the log-ratio
appropriate for a particular time of week or time of year.                of the phrase probability and the baseline probability.
In this poster, we go into more detail on the system for                     We defined C(p) to be the total number of times phrase
analyzing this component.                                                 p appears in the corpus, and C(pw ) to be the total num-
                                                                          ber of times p appears in weekhour w. Finally, α is a 168-
                                                                          dimentional Dirichlet smoothing constant on phrase count
Copyright is held by the author/owner(s).
RecSys 2014 Poster Proceedings October 6-10, 2014, Foster City, Silicon   data[5] and b is defined as a phrase to correspond with
Valley, USA                                                               the baseline counts. The timeliness score for a phrase at
weekhour w is computed as follows.                                               chosen threshold, we achieved 71.3% precision and 74.5% re-
                                                                               call for timely tips against our hand labelled set. Untimely
                     C(pw ) + αw   C(bw ) + αw                                   scoring used a different threshold and achieved 74.7% preci-
     T (p, w) = ln          P    ÷        P
                     C(p) + i αi   C(b) + i αi                                   sion and 67.0% recall.
   The timeliness score for a tip is the sum of the scores of
its phrases. For example, at Veselka (a popular Ukrainian                        4.    EXTENSION TO SEASONALITY
restaurant in New York’s East Village), a user wrote “They’re                       The ability to detect and exploit seasonality is an im-
open 24/7 - turn up after your night out and partake of                          portant feature for search and recommendation systems[3].
the pierogis with applesauce.” The terms “24/7”, “turn up”,                      There was not enough data to create 365 day-buckets so in-
“night out”, and “pierogi” all meet the Bhattacharyya thresh-                    stead we chose to create buckets based off of weeks. Unfor-
old. Their respective scores for Sunday night at midnight are                    tunately, in the unix calendar utility, many popular holidays
1.2, 0.6, 0.3, and -0.6. These sum to 1.5 which is positive                      crucial to seasonality fall in different buckets each year. To
and indicates that this tip is timely on Sunday night.                           ameliorate this, we forced every month into a 4 week model,
   We supplemented our shout counts with the English Word-                       with the last week of the month subsuming all extra days
net[2] food corpus and our English menu database. This                           beyond the 28th. The last week of each month was then
allowed us to associate entries in the Wordnet corpus with                       normalized to account for the extra days before the Bhat-
specific meals (breakfast, lunch, dinner, dessert, and late                      tacharyya coefficients were calculated.
night). For phrases in the Wordnet food corpus with insuf-                          Another issue was caused by phrases that were seasonal
ficient shout data, we replaced the distribution with that of                    in only one year. Very popular movies caused us to as-
the matching abstract mealtimes.                                                 sociate “James Bond” with mid-November and “Star Trek”
   One problem we encountered was with non-compositional                         with June. We solved this problem by looking at data for
compound phrases. The timeliness of “burrito” is very differ-                    each year individually and flagged outliers. Once flagged,
ent from that of “breakfast burrito”, but because the burrito                    we smoothed the counts to bring the offending year more in
data included all mentions of breakfast burrito as well, its                     line with the rest of the data.
timeliness score was dampened. To counteract this prob-
lem, we merged phrases in our training data that appeared                        4.1    Future Work
more frequently together so that they would be considered                          Some terms follow a different seasonal pattern depending
as completely separate entities from their constituent to-                       on geographic region and performance would be improved
kens. In terms of burritos, this meant that all mentions of                      by geo-fencing phrase distributions by region. For example,
breakfast burrito were counted as one term, and all men-                         the term “fireworks” was found to be incredibly timely during
tions of burrito not following breakfast were considered as                      the first week of July for American Independance Day, but
an entirely separate term.                                                       there is also a smaller spike in the first week of November for
                                                                                 Guy Fawkes Day in Great Britain. Another example was the
                                                                                 term “Rangers” being timely in the summer and the winter.
                            2                                                    The Texas Rangers (a baseball team that plays during the
                                                                                 summer) was being conflated with the New York Rangers (a
                                                                                 hockey team that plays during the winter).
                            1
 Phrase Timeliness Score


                                                                                   Geo-fencing by climate zone as opposed to national bor-
                                                                                 ders or metropolitan areas would improve results for weather-
                            0                                                    related phrases such as “outdoor seating”“hot soup”, and “air
                                                                                 conditioning”.

                           −1                                                    5.    REFERENCES
                                                                                  [1] Bhattacharyya, A. (1946). On a measure of divergence
                           −2                                                         between two multinomial populations. Sankhyā: The
                                                   Burrito                            Indian Journal of Statistics, 401-406.
                                         Breakfast Burrito                        [2] Miller, George A. (1995). WordNet: A Lexical
                           −3        Non-Breakfast Burrito                            Database for English. Communications of the ACM
                                                                                      Vol. 38, No. 11: 39-41.
                                95     100     105     110      115     120       [3] Shokouhi, M. (2011, July). Detecting seasonal queries
                            Hour (96 to 120 is midnight to midnight on Friday)        by time-series analysis. In Proceedings of the 34th
                                                                                      international ACM SIGIR conference on Research and
                                                                                      development in Information Retrieval (pp. 1171-1172).
3.1                        Evaluation                                                 ACM.
   We evaluated the timeliness score on a hand-labelled set                       [4] Sklar, M., Shaw, B., & Hogue, A. (2012, September).
of 825 tips, each with four abstract meal times: breakfast,                           Recommending interesting events in real-time with
lunch, dinner, and late night. For each tip and time period,                          foursquare check-ins. In Proceedings of the sixth ACM
we applied the label of timely, neutral, or untimely. We then                         conference on Recommender systems (pp. 311-312).
compared those labels to our timeliness scores, the result of                         ACM.
which satisfied us for using the feature in the product.                          [5] Sklar, M. (2014). Fast MLE Computation for the
   The timeliness score serves two purposes: detecting specif-                        Dirichlet Multinomial. arXiv preprint arXiv:1405.0099.
ically timely tips, and disqualifying untimely tips. With our