Cicerone: Design of a Real-Time Area Knowledge-Enhanced Venue Recommender Daniel Villatoro, Jordi Aranda, Marc Planagumà, Rafael Gimenez and Marc Torrent-Moreno Barcelona Digital Technology Centre, Barcelona, Spain {dvillatoro,jaranda,mplanaguma,rgimenez,mtorrent}@bdigital.org Abstract previously analyzed ones as they also have to take into ac- count the context, distance from the user to the recommended Smart-devices with information sharing capabili- venue, and maybe several other factors. ties anytime and anywhere have opened a wide With the penetration of smart-devices, users have the pos- range of ubiquitous applications. Within urban en- sibility to access information anytime anywhere, and the vironments citizens have a plethora of locations to system we present in this work profits from those ubiqui- choose from, and in the advent of the smart-cities tious computing capabilities; our on-site location-based rec- paradigm, this is the scope of location-based rec- ommender system allows users to obtain the most adequate ommender systems to provide citizens with the ad- venue with respect to their current position. Our approach equate suggestions. In this work we present the de- profits from a different dimension of the users’ parameters sign of an in-situ location-based recommender sys- space, namely their social relationships and their relative geo- tem, where the venue recommendations are built graphical knowledge with respect to the location of the items. upon the users’ location at request-time, but also This model provides an alternative solution to the problem incorporating the social dimension and the exper- of providing personalized recommendations in a geospatial tise of the neighboring users knowledge used to domain: user expertise in this type of domain conveys an build the recommendations. Moreover, we propose implicit continuum knowledge of the surrounding geospatial a specific easy-to-deploy architecture, that bases its area and the locations within that area. Our solution intelli- functioning in the participatory social media plat- gently combines this user geospatial knowledge to the clas- forms such as Twitter or Foursquare. Our system sical social distances amongst users used in state-of-the-art constructs its knowledge base from the accesible recommenders. data in Foursquare, and similarly obtains ratings Our system personalizes recommendations of locations not from geopositioned tweets. only considering the past history of a specific user, but also (1) the current location of the user, (2) the social distance with other similar users and (3) their expertise in the area where 1 Introduction the recommendation is going to be provided. This aggrega- Urban environments host a plethora of interesting locations tion function basically expresses a tendency of a user to visit such as restaurants, shops, museums, theaters and a wide a certain location given its distance to the location, and the range of other venues that neither can be known by all users past history of the user and its friends and their knowledge of nor they might be interested in visiting all. However, each the area. citizen can potentially become an expert of the neighborhood To the best of our knowledge, there are no existing recom- he visits more often or lives, as he will know, and maybe have mender systems that profit from the inherent characteristics of visited, more venues in such area. Therefore, it is straighfor- the geographical location, such as continuity in space, user’s ward to see how for an specific citizen might not be a problem area expertise or word-of-mouth location suggestions, to gen- to find an adequate venue for his taste on his neighborhood of erate recommendations to users. expertise, but it potentially becomes cumbersome to do the same task when in a different less-known neighborhood. It 2 State of the Art becomes then a problem for citizens to find locations they As we have discussed previously, the problem of finding ad- might enjoy when away of their area of expertise. The prob- equate venues for citizens to visit is a problem already tack- lem of finding adequate items for specific users is that clasi- led by the recommender community, under the location-based cally solved by recommender systems. recommender systems [Zheng et al., 2009; Park et al., 2007]. In our case, we focus on location-based recommender sys- Despite the impressive amount of literature in such area, this tems [Zheng et al., 2009; Park et al., 2007], where users are is still an open problem, even for those with access to com- recommended locations to visit expecting to maximize users’ plete datasets and user profiles [Sklar et al., 2012], as new satisfaction. These type of recommenders complement the methods and algorithms are being proposed to boost their ef- ficiency. formed by the items the user is being recommended. A User In this work however, we propose the integration of social in- A’s Recommendations set will be denoted herein as RA . t formation into the calculation of the recommendations. Some An essential concept is the one of “Check-in” (CIU,L ) autors have investigated the potential of the explicit inclusion 1 which represents the attendance of a user U to a certain lo- of information of user’s relationships from social networks to cation L in the last t days, and therefore. Our system will generate the neighborhood used in classical collaborative fil- generate Location recommendations to the users by consid- tering (CF) algorithms (social filtering), improving the resutls ering not only its geographical position, but also its social obtained by the classic CF in the analyzed scenarios [Groh relationship with other users and their degree of knowledge and Ehmig, 2007]. of the visited locations. Others [Bonhard and Sasse, 2006] have analyzed how the In order to obtain more adequate recommendations in this relationship between advice-seeker and recommender is ex- type of environments, we envision the necessity of certain tremely important in the user-centered recommendations, estimators. Firstly, we need to quantify how well a certain concluding that familiarity and similarity amongst the dif- user knows a specific area (by considering the attendance fre- ferent roles in the recommendation process aid judgement quency to locations in such area with respect to the rest of and decision making. As well as in our approach, some re- the city). Moreover, it becomes necessary to understand the searchers have considered the important role of experts [Am- social distance between the target user and the other users, atriain et al., 2009; Bao et al., 2012], however in our case, whose opinions are being used to create the recommendations these experts are calculated automatically for each specific for the target user. area of the city and weighted with respect to the social dis- These measures are clearly described and specified next: tance amongst the advice-seeker and recommender. The Area Knowledge (AKU,L t ) of a user U with respect to a A similar recommendation approach is presented [Ye et al., location L is calculated: 2010], where authors also propose the usage of Foursquare P t information to provide venue recommendations to users; CIU,l 0 0 l ∈P ostalCodeL t more importantly the social perspective is integrated into their AKU,L = P t (1) recommendations, developing a Friend-Based Collaborative CIU,a ∀a∈Locations Filtering (where the neighbours for CF are selected from the social network of users), and an extension of this method and represents how familiar a user is within an specific area Geo-measured Friend-Based Collaborative Filtering (where of the city (represented by its postal code). t only closely located friends are selected as neighbours for The Location Frequency (LFU,L ) of a user U in a certain CF). location L is calculated: Our method then proposes a combination of the Geo- CI t measured Friend-Based Collaborative Filtering [Ye et al., t LFU,L = P U,L t (2) 2010] and experts [Amatriain et al., 2009; Bao et al., 2012], CIU,a ∀a∈Locations in our specific case, neighborhood or area experts. and normalizes the number of visits of the user U to the loca- tion L. 3 Cicerone Recommender System t 0 The Social Importance (SIU,U 0 ) of a user U for a user U is In this section we provide the theoretical framework of the calculated: Cicerone location-based recommender system. Firstly we de- 1 scribe the basic terminology used later in the recommenda- t (DegreeU 0 ) d(U,U 0 ) tion algorithm. As we have sketched previously, our system SIU,U 0 = (3) (nodes − 1) bases its functioning in three information elements: the users’ social network, the users’ area knowledge and the current lo- where DegreeU represents the number of connections cation of the requesting user. that U has in its social network, nodes represents the total number of nodes in the social network (and used to normalize 3.1 Basic Terminology the SI), and d(U, U 0 ) represents the geodesic distance, i.e. minimum number of hops necessary to reach U 0 from U As used herein, the term “location data item” stands for any using the shortest path in their social network2 . location item or representation of a location. A “location item” is intended to encompass any type of location which The Location Value (LVL,Ut ) of a location L for a user U can be represented in a map using a latitude, a longitude, and at time t is calculated: possibly a category. P t t t The location recommender may be capable of selecting rel- (LFU,L × AKU,L × SIU,U 0) t usersinL evant locations for a given target user. To do so, users should LVL,U = (4) |users| be comparable entities and locations as well. It should be understood that the implementations described herein are not 1 The attendance of a user to a certain location can be captured in item-specific and may operate with any other type of item vis- several ways, for example, a Foursquare Check-in, a geopositioned ited/shared by a community of users. For the specific case of tweet, or a CDR trace of a phone call. bars or restaurants items, users may interact with the items by 2 d(U, U 0 ) < 0 means that there is no possible path that connects visiting them. The Recommendations Set is the locations set U and U 0 a service embedded within already massive social networks. Twitter seems to be the ideal candidate for us for the follow- ing reasons: • Twitter shows a widespread uniform penetration almost worldwide, with an continously increasing numebr of users (288 million monthly active users in July 2012, showing an increase of 40% since July 2009 [Global- WebIndex, 2013]). • It allows users to associate their location when posting a message, and associate the specific coordinates as meta- Figure 1: Twitter-sensed Barcelona Social Network information. • It provides developers with an accesible API to obtain in where |users| represents the number of users that have near real-time the publicly published tweets. “checked-in” to that Location. • Twitter captures a social network of followers and fol- The resulting value basically aggregates the information lowings, publicly available for each user. of surrounding detected locations considering the social dis- As we have initially decided to deploy our application in the tance of our specific user to the users that visited that location city of Barcelona, Twitter confirms to be an ideal candidate. (social information), and their familiarity in the area of the The number of captured geopositioned tweets daily within location (geographical information). Barcelona is 6200 (from data coming from 2012), and the social network inferred from the users posting them can be 3.2 Cicerone Recommendation Algorithm seen in Figure 1 (with an average degree of 2.93). The general algorithm for the functioning is the following: Similarly, and to populate our items database, we opt to 1. Once the position of a user A is detected, the system use the crowdsourced database of Foursquare. Foursquare is automatically captures its Latitude and Longitude and a location-based social media platform to communicate the launches the process that builds the personalized recom- venues a user is in. This platform allows users to input into mendation set for that position and user at that certain their databases new locations, by introducing not only the moment. venue’s name and specific location (with the GPS position 2. The system retrieves all the locations in 100m radius of and postal address) but also a semantic category. Foursquare the current position. describes the places according to a rather complete taxonomy, where about 400 kinds of places are identified and grouped in 3. The recommendation set for user A, Ra , is a set con- 9 wide categories3 . Foursquare provides an accesible API al- structed with all the locations in 100m radius of the user lowing us to take snapshots of the existing locations in a cer- current position. tain city. Within the city of Barcelona, from a snapshot taken 4. The system calculates the location value of each of the April 2013, we have detected over 66.000 foursquare loca- locations in that set. tions, uniformly distributed amongst the different districts. 5. The system orders the retrieved locations according to Moreover, within the OpenData movement, the city of the calculated Location Values and constructs the Rec- Barcelona provides a machine-readable administrative divi- ommendation Set with the 3 with a highest value. sion necessary for our theoretical calculations (namely the District divisions). 4 Functional implementation of Cicerone 5 System Architecture As explained previously, the theoretical framework to build the recommendation needs from a number of data sources, In this section we will describe the system architecture namely, users locations, venues and the social relationships (sketched in Figure 2) needed to implement a functional in- amongst users. As this recommendation process is envisioned statiation of the theoretical framework previously described. to be executed when users are in-situ, the main functional re- Firstly we will describe the social network monitoring used quirement for our system is to work from a mobile device. as data input for our platform, and also as user interface inter- Working prototypes have decided to opt for the development action with the recommendation engine. After that, we will of a dedicated app (such as Yelp, TimeOut or TripAdvisor), sketch the persistence infrastructure used to save the informa- that users have to download within their devices. The app tion related to venues and users, and finally we will describe provides several advantages as the explicit user profiling as the information update process and the component needed to well as the definition of the necessary information to obtain develop the recommendation platform. the recommendation. However, for us it implies the big prob- lem of reaching a critical mass of users that would made the 3 The detailed categorization of Foursquare categories and parent knowledge base and the recommendation more accurate. To categories can be found at http://aboutfoursquare.com/ avoid this limitation we have opted to develop our system as foursquare-categories/ (Last access April 2nd 2013). and update. Fed by the crawlers, the data required for our rec- ommendation solution arrives to the persistence manager and each of these elements are stored in a persistence infrastruc- ture in the following way: Users: One of the main functional requirements of the recommendation algorithm is the access to the social network of users. In order to effectively store this information, we opt for using a graph-oriented database, namely Neo4j4 . These type of databases allow us to per- sist users’ social network in the form of a directed weighted graph. In this database, we persist users as nodes and then establish edges amongst nodes if there exist a social relation amongst them. Consequently, an edge between two nodes is created if there exists a social relation amongst them, accord- ing to users’ Twitter profiles; specifically, an edge is created amongst from user A towards user B if userA follows user B in Twitter. At this edge level, the edge’s weight will be de- Figure 2: Cicerone Architecture fined depending on the users interactions: different types of Twitter interactions (such as mentions, retweets or favourited) will affect the weight differently.5 Another important infor- mation about users is saved, namely his “Check-ins” (as de- scribed in Sec. ). These “check-ins” (the specific coordinates of each user geopositioned tweet) is stored into a MongoDB6 , as we can profit from the implemented geo-spatial index. Items: The items in our system are the locations within the target city. The locations database needs to provide efficient information access, as the recommender algorithm needs a high average number of accesses to it to build the recom- mendations. Moreover, an imporant item’s characteristic is Figure 3: Cicerone Workflow its location within space, that is proffited from when using geo-spatial indexing. Given these two characteristics (rapid 5.1 Social Networks Monitoring: Sensing the City information access and geo-spatial indexing), as well as the The usage of social media platforms in our system are potential for distributed computing, we opt to implement this twofold: (1) information acquisition to feed the knowledge database using MongoDB. base of our platform, and (2) a channel for users’ interaction Ratings: The notion of rating is clasically treated as an ex- with our technology. plicit evaluation of users about an item. However, in this The participatory information provided by users in work, we take an alternative approach for ratings: we con- Foursquare will be used to populate our items database; sim- sider as a constant rating value the users presence in a loca- ilarly, we will use geopositioned tweets to calculate users’ tion, sensed through the geopositioned tweets posted from or Area Knowledge. Therefore, the social network monitor is close to the venue location. This value is not obtained di- the first layer of our architecture and it is composed by two rectly, the Ratings will be part of knowledge obtained by the crawlers: a Foursquare crawler and a Twitter crawler. The recomendation engine and their information update process Foursquare crawler is in charge to scan the target city for new capturing user’s visits to specific locations but this will be ex- venues. Once a new venue is identified, it is stored in the plained on the Section 5.3. items database with its associated metainformation such as its specific coordinates, the address or the category. The Twit- 5.3 Recomendation Engine: Information Update ter crawler is in charge to capture all the tweets generated in Process the target city. Its scope is threefold: (1) build and update The last component in our proposed architecture is the re- the users’ social network, (2) update the user area knowledge comendation engine containing the implementation of the using its geopositioned tweets, and (3) permits users’ com- theoretical algorithms previously explained in the Section 3. munication with the system. Once we have our social networks monitor as an urban data Moreover, and given its popularity, we use Twitter as the sensor, and the ability to persist all the raw data required communication channel of our recommender system through by the system, this component will be the responsible of the a bot account managed by our intelligent agent. knowledge extraction process and the bussines logic triggered 5.2 Persistence Infrastructure: Urban data Model 4 http://www.neo4j.org Any recommender system bases its functioning in three main 5 Specific values and functions for edge weight determination elements: users, items and ratings. These three elements have will be developed at later versions of our software using empirical to be stored according to the inherent properties of the sys- information. 6 tem, which in this case, imply real-time information access http://www.mongodb.org to generate a recommendation. based on expert opinions from the web. In Proceedings Because of the real-time aspect of our system, our recom- of the 32nd international ACM SIGIR conference on Re- mendation platform (whose workflow is detailed in Figure 3) search and development in information retrieval, SIGIR needs to continously update some information elements such ’09, pages 532–539, New York, NY, USA, 2009. ACM. as users’ Area Knowledge and Location Frequency, the cre- [Bao et al., 2012] Jie Bao, Yu Zheng, and Mohamed F. Mok- ation or update of social relationships amongst users or the bel. Location-based and preference-aware recommenda- appearance of new locations. Specifically, we envision the tion using sparse geo-social networking data. In Proceed- users’ communication with our system through a Twitter per- ings of the 20th International Conference on Advances sonality that encapsules our recommendation platform; ev- in Geographic Information Systems, SIGSPATIAL ’12, erytime a user mentions our system’s username, the plat- pages 199–208, New York, NY, USA, 2012. ACM. form will capture this tweet (through the Mention’s Service sketched in Figure 2) and identify it as an explicit request [Bonhard and Sasse, 2006] P. Bonhard and M. A. Sasse. for a recommendation that will trigger the whole intelligent ’knowing me, knowing you’ – using profiles and social process. Eventhough our technological platform allows us to networking to improve recommender systems. BT Tech- generate recommendations everytime a user’s location is cap- nology Journal, 24(3):84–98, July 2006. tured (with every geopositioned tweet), we rather restrict its [GlobalWebIndex, 2013] GlobalWebIndex. Twitter now the functioning with a mention system reducing the overall intru- fastest growing social platform in the world. Web Report, siveness. Jan 2013. After the recommendation is generated, it is returned to [Groh and Ehmig, 2007] Georg Groh and Christian Ehmig. the user also through Twitter with a message posted by our Recommendations in taste related domains: collaborative intelligent agent. filtering vs. social filtering. In Proceedings of the 2007 international ACM conference on Supporting group work, 6 Conclusion and Future Work GROUP ’07, pages 127–136, New York, NY, USA, 2007. The designed recommender system plans to profit from the ACM. information proactively shared by users in the analyzed par- [Jeske, 2013] Tobias Jeske. Floating car data from smart- ticipatory platforms. However, as recently argued in [Jeske, phones: What google and waze know about you and how 2013], these type of crowdsourced systems is sensible to ma- hackers can control traffic. https://media.blackhat.com/eu- licious attacks: in our case, and given the lack of restrictions 13/briefings/Jeske/bh-eu-13-floating-car-data-jeske- to post geo-positioned content from Twitter, someone could wp.pdf (Last access April 1st 2013)., 2013. easily envision the method to create a fake user to become the [Park et al., 2007] Moon-Hee Park, Jin-Hyuk Hong, and one with higher area knowledge in every area of the city, and Sung-Bae Cho. Location-based recommendation system then influence directly the resulting recommendations to his using bayesian users preference model in mobile devices. own will. In Ubiquitous Intelligence and Computing, pages 1130– Despite this potential problem associated to the publishing 1139. Springer, 2007. policy of Twitter and Foursquare, and as we have analyzed in Sec. 2, many others have used information from these sources [Sklar et al., 2012] Max Sklar, Blake Shaw, and Andrew to generate location-based recommendations. However, and Hogue. Recommending interesting events in real-time to the best of our knowledge, the presented algorithm is the with foursquare check-ins. In Proceedings of the sixth first to include explicitly the user’s expertise about one of the ACM conference on Recommender systems, RecSys ’12, fundamental properties of the items: the area where it is lo- pages 311–312, New York, NY, USA, 2012. ACM. cated. By combining this information, with some social infor- [Ye et al., 2010] Mao Ye, Peifeng Yin, and Wang-Chien Lee. mation, we hypothesize that our system will be able to out- Location recommendation for location-based social net- perform other location-based recommender systems. works. In Proceedings of the 18th SIGSPATIAL Interna- Our main long term research task to be performed is the tional Conference on Advances in Geographic Information development of a user profiling in term of the type of venues Systems, GIS ’10, pages 458–461, New York, NY, USA, the user attends to, with the overall objective of combining 2010. ACM. the area expertise and with specific user profiles. [Zheng et al., 2009] Yu Zheng, Yukun Chen, Xing Xie, and Wei-Ying Ma. Geolife2. 0: a location-based social net- Acknowledgments working service. In Mobile Data Management: Systems, This work has been completed with the support of ACC1Ó, Services and Middleware, 2009. MDM’09. Tenth Interna- the Catalan Agency to promote applied research and innova- tional Conference on, pages 357–358. IEEE, 2009. tion. References [Amatriain et al., 2009] Xavier Amatriain, Neal Lathia, Josep M. Pujol, Haewoon Kwak, and Nuria Oliver. The wisdom of the few: a collaborative filtering approach