Context and Intention-Awareness in POIs Recommender Systems Hernani Costa Barbara Furtado Durval Pires CISUC, University of Coimbra CISUC, University of Coimbra CISUC, University of Coimbra Coimbra, Portugal Coimbra, Portugal Coimbra, Portugal hpcosta@dei.uc.pt bfurtado@student.dei.uc.pt durval@student.dei.uc.pt Luis Macedo Amilcar Cardoso CISUC, University of Coimbra CISUC, University of Coimbra Coimbra, Portugal Coimbra, Portugal macedo@dei.uc.pt amilcar@dei.uc.pt ABSTRACT relevant information for the user may not only depend on This paper describes an agent-based approach for making his preferences, but also in his context. In addition, the context and intention-aware recommendations of Points very same content can be relevant to a user in a particular of Interest (POI). A two-parted agent architecture was context, and completely irrelevant in a different one. For used, with an agent responsible for gathering POIs this reason, we believe that it is important to have the user’s from a location-based mobile application, and a set of context in consideration during the recommendation process Personal Assistant Agents (PAA), collecting information [14, 1]. Such systems can be useful in POIs RS [4, 7, 3]. about the context and intentions of its respective user. In this paper, we intend to analyse the advantages Each PAA includes a probabilistic classifier for making of using a Multiagent System (MAS) capable of filtering recommendations given its information about the user’s irrelevant information, while taking into account the user’s context and intentions. Supervised, incremental learning context. Our system uses standard POI attributes, and also occurs when the feedback of the true relevance of each integrates dynamic context data like user’s context and goal, recommendation is given by the user to his PAA. To evaluate in order to process the requests. The system is able to the system’s recommendations, we performed an experiment understand the differences between each user, since each one based on the profile used in the training process, using of them has unique preferences, intentions and behaviours, different locations, contexts and goals. resulting in different recommendations for different users, even if their context is the same. The remaining of the paper starts with a presentation Keywords of the system’s architecture (section 2). In section 3, we Context, Information Overload, Machine Learning, Personal present the experimentation performed. Finally, section 4 Assistant Agents, Points of Interest Recommendation. presents our conclusions. 1. INTRODUCTION 2. SYSTEM ARCHITECTURE With the technological advance registered in the last In this section, we present the system’s architecture and decades, there has been an exponential growth of all its components (see figure 1). the information available. In order to cope with This architecture can be seen as a middleware between this superabundance, Recommender Systems (RS) are a the user’s needs and the information available in our promising technique to be used in location-based systems system. More specifically, the Master Agent is responsible (see [13, 4]). Most existing RS’ approaches focus on for starting, not only the agents (described in figure 1 either finding a match between an item’s description and as Agent 1 · · · Agent n) that gather data from the Web the user’s profile (Content-based [2, 12, 10]), or finding resources (i.e., location-based mobile applications), but also users with similar tastes (Collaborative Filtering [8, 5, 6]). the user’s Personal Assistant Agent (PAA). Moreover, it These traditional RS consider only two types of entities, is capable of aggregating the POIs returned from the Web users and items, and do not put them into a context agents into a well-defined knowledge representation. when providing recommendations. Nevertheless, the most The main purpose of each Web agent is to obtain all the POIs’ information available in pre-defined Web sources. These autonomous agents are constantly searching for new Permission to make digital or hard copies of all or part of this work for information, and verifying if the data stored in the database personal or classroom use is granted without fee provided that copies are (presented in figure 1 as POIs Database) is up-to-date. not made or distributed for profit or commercial advantage and that copies As we can see in figure 1, each user has a PAA assigned bear this notice and the full citation on the first page. To copy otherwise, to to him. This agent expects a request from the user, and, republish, to post on servers or to redistribute to lists, requires prior specific based on his context, recommends a list of nearby POIs (see permission and/or a fee. CARS-2012, September 9, 2012, Dublin, Ireland. section 3). The PAA learns from the user’s past experiences, Copyright is held by the author/owner(s). in order to improve its recommendations. Specifically, a POIs that do not belong to the categories we will use (see POIs' 3.1.4). This process is repeated every 30 seconds, to allow resources the agent to detect new POIs whenever they are created, or Master  Agent ... to discover changes in an existing POI. Agent_1 Agent_n Due to the fact that Gowalla’s database does not have all memory memory the information needed for the experiment, we decided to ... POIs aggregation module gather more information about the POIs on the field. This allowed us to have more details about each POI in order to PAA _1 PAA _n POIs Database fulfil the requirements of the experiment (see 3.1.5 for more details). After filtering the unused categories (irrelevant to this experiment), this extra information was combined with ... Gowalla’s info in the aggregation module, being then saved to the database (see section 3.1.5). User_1 User_n 3.1.2 Dataset The recommendation process resorts to WEKA’s API1 . Figure 1: System’s Architecture. In order to predict if a POI would be useful for the user and if its recommendation is worthy, it was used a probabilistic classifier is used for that purpose, i.e., the PAA probabilistic classifier that was trained with the Naive Bayes assigns a probability value to the relevance of the POI, given Updateable algorithm. The predicted values vary from 1 its information, the current user’s context and intentions. (totally irrelevant) to 5 (most relevant), and the algorithm Therefore, when the feedback of the true relevance of each automatically distributes the probability ranges in this scale. recommendation is given by the user to his PAA, the PAA POIs with a classification of at least 3, are recommended to updates its memory. As a result, the agent learns every time the user. the user decides to make a request and give his feedback. When an agent recommends POIs to its user, the agent expects the user to rate each recommendation, and saves this information into its memory, which allows it to learn from 3. EXPERIMENTAL WORK the experience. The agent’s memory is a set of instances, Our main goal is to show that we can face the problem which we call dataset. In table 1, we can see an example of location-based context-aware recommendations with a of a dataset. The first five columns correspond to the MAS architecture. In addition, we intend to verify how information related to the POI: ID, category, price, schedule machine learning algorithms suit the task of predicting the (morning, afternoon and night) and day off. The distance user’s preferences, based on his context. An effectiveness field corresponds to the distance between the POI and the evaluation of our RS, in terms of the accuracy of its user (near, average or far). The following three columns predictions, will be performed. This section presents the correspond to the user’s context information: time of day, experimentation, in a controlled simulation, carried out to day of the week and his current goal (coffee, lunch, dinner study the system’s performance while recommending POIs. or party). The last column (Label), corresponds to the Firstly, the experiment set-up is presented (see 3.1), followed algorithm’s prediction. by an exhaustive analysis of the results (see 3.2). 3.1.3 User’s Profile 3.1 Experiment Set-up As explained in section 2, each user has his own PAA Our MAS contains agents responsible for obtaining POIs (i.e., a dataset with his own preferences). We performed from Web sources. The purpose of these agents is to keep a simulation period in order to train the PAAs’ classifiers. the information up-to-date in the database (see 3.1.1). On Since we had various PAAs’ classifiers (each one with the other side, the system has PAAs, that use memory to different user’s profiles), it was impossible to evaluate all save the user’s experiences (see 3.1.2). In this experiment, of them and we had to choose only one profile. This profile only one information source was used (see 3.1.1) and can be seen as a stereotype of a user who prefers POIs that only one user’s profile (see 3.1.3). The experimentation are near, cheaper and not closed. For the sake of clarity, was performed in a specific area of the city of Coimbra the feedback given by the user only considers the POIs’ (Portugal), explained in detail in 3.1.5. Different scenarios categories and not their names. were used to specify both the user’s and POIs’ contexts (see 3.1.4 Definition of Scenario 3.1.4). To evaluate the system, some well-known metrics are presented in 3.1.6. Scenario is defined as the set of information related to the user which a PAA needs to classify a POI, in a 3.1.1 Agent Gowalla certain context. More precisely, a scenario results from the combination of the user’s context with the POI’s context. As previously mentioned, our system could receive input We have defined the user’s context by: i) proximity from various location-based applications. In this experiment related to a specific POI (far, average or near, where in particular, it is used one of the existing POIs’ sources we consider near≤100m, 100m200m)2 ; ii) current time of day (morning, afternoon agent to obtain POI information. and night); iii) current day of the week; iv) user’s goal Agent Gowalla obtains all the information through calls 1 to Gowalla’s API. It starts by requesting for POIs in a pre- http://weka.sourceforge.net/doc 2 defined area (see 3.1.4). During this process, it filters all the It were used “small distance amplitudes” because in this Table 1: Dataset example. POI id Category Price Schedule DayOff Distance TimeOfDay DayOfTheWeek Goal Label 7086048 Bakery Cheap Morning/Afternoon/Night Sunday Average Night Saturday Coffee 5 7023528 Apparel Cheap Morning/Afternoon Sunday Far Afternoon Friday Lunch 1 1512823 Pub Cheap Night Sunday Far Night Friday Party 4 (coffee, lunch, dinner and party). The POI’s context the precision and recall. is defined by the POI: a) id; b) category; c) price 2 ∗ P recision ∗ Recall (cheap, average, expensive); d) timetable (morning, F1 = (4) afternoon, night, or combinations); e) day off (a day P recision + Recall of the week or combinations). 3.2 Results 3.1.5 Area of work Our experiment can be divided in two different The number of POIs that exist in Coimbra (Gowalla evaluations: cross validation (3.2.1), and the use of metrics returned about 954) made it impossible to manually evaluate (3.1.6) to compare the output recommendations given by the whole city. For this reason, it was used a smaller the system with manual evaluation (3.2.2 and 3.2.3). It is part of the city that had more POIs density and diversity important to explain that the system’s classifier was trained (Coimbra’s Downtown). So, we studied the type of POIs in using a dataset (see 3.1.2) containing: the original training that area, and also restricted the set to three main categories dataset (which has correct classifications given by us); and a ({Food, Shopping, Nightlife}, the categories that contain list of instances that were created from all POIs the system more POIs). The number of sub-categories for Food are 44, recommended during the simulation period. These POIs Shopping 51 and Nightlife 11, with 59, 29 and 29 different that were recommended by the system were inserted in that POIs, respectively. dataset, not with the classification given by the system, As referred above (see 3.1.1), we gathered more but instead with the feedback given by the user during the information about the POIs. The extra information simulation. The resulting classifier was used to do the cross we manually gathered from the places, was the POI’s: validation experiment (3.2.1). Price (cheap, average or expensive); DayOff (day(s) the POI closes); Timetable (part of the day in which 3.2.1 Cross Validation the POI is open). So, the combination of this new data It was chosen to do 10 runs and 10 folds, because this with Gowalla’s information, fulfils the POI’s context. is a combination that guarantees better evaluation [11]. In table 2 we can verify the percentage of correctly and 3.1.6 Metrics incorrectly classified instances, and check some statistics In this topic we present the metrics that will be used in from our classifier’s performance. The results show that in our experiment. Equation 1 will be used to correlate two a total of 14616 instances, the classifier correctly classified different types of data. Precision, recall and F1 formulas 9246 (63%), which can been seen as a good start. will be used to analyse the system’s accuracy. The Correlation Coefficient (ρ) is used to return the Table 2: Classifier’s statistics. correlation coefficient between two arrays, mi and xi , where Correctly Classified Instances 9246 63.2594% {mi , xi } ∈ R, ρ ∈ R: −1 ≤ ρ ≤ 1, being i ∈ N and Incorrectly Classified Instances 5370 36.7406% corresponding to the matrix’s index. Kappa statistic 0.3909 Mean absolute error 0.1729 P Root mean squared error 0.3163 (mi − m)(xi − x) Relative absolute error 73.0797% i Root relative squared error 91.9724% ρ(mi , xi ) = rP (1) Total Number of Instances 14616 (mi − m)(xi − x) i Table 3 shows the detailed accuracy of our classifier, by Precision will be used to evaluate the quality of class (Cl). Each class corresponds to the prediction values, the recommendations. Specifically, it is the number of in a scale of 1 to 5, as explained in section 3.1.2. For each correctly recommended POIs divided by the total number class, the table shows the percentage of true positive (TP), of recommended POIs. false positive (FP), precision (P), recall (R), F1 score (F1 ) Correctly recommended P OIs and ROC Area. The results demonstrate that class 1 P recision = (2) has better results. This is due to the greater number of T otal recommended P OIs instances in the training dataset, classified with 1. Indeed, Recall evaluates the quantity of POIs extracted. More this happens because in many user’s contexts there are precisely, it is the number of correctly recommended POIs, always some irrelevant POIs (for instance, POIs that do not divided by the total number of correctly evaluated POIs that suit the user’s goal). This makes the classifier more accurate should have been retrieved. in this class. Although the remaining classes do not have the Correctly recommended P OIs same accuracy, their results are also very promising. Recall = (3) T otal correct P OIs 3.2.2 Manual Evaluation The F 1 score can be interpreted as a weighted average of To test our approach, we used a set of pre-defined experiment we only considered situations in which the user scenarios that simulate real situations. Although we only reach his destination on foot. used three different user locations (the ones that had more shown in figure 2. In addition, the F1 score was calculated Table 3: Cross validation’s statistics. (figure 3). The x-axis represents a run, which corresponds TP FP P R F1 ROC Area Cl 0.717 0.283 0.745 0.717 0.731 0.745 1 to the simulation of a user’s request in a specific context (see 0.584 0.416 0.443 0.584 0.504 0.885 2 3.1.4) and all the nearby POIs recommended by the system 0.552 0.448 0.410 0.552 0.470 0.914 3 0.413 0.587 0.490 0.413 0.448 0.816 4 (see 3.2.2). We simulated different requests, leading to a 0.489 0.511 0.630 0.489 0.550 0.957 5 total of 18 runs. More specifically, runs: {1, 2, 3, 7, 8, 9, 13, 14, 15} = goal Coffee; {4, 10, 16} = goal Lunch; {5, 11, 17} = goal Dinner; and {6, 12, 18} to the goal Party. POI density), we analysed 18 different user contexts (see section 3.1.4). This 18 combinations were named runs. The 18 runs resulted from the combination between different user’s request, each one in a specific context (see section 3.1.4) and all the nearby POIs recommended by the system. More precisely in this experiment, it was used three user’s locations in six different situations. Goal, time of day, day of the week: [Coffee, Morning, Sunday]; [Coffee, Afternoon, Monday]; [Coffee, Night, Tuesday]; [Lunch, Afternoon, Wednesday]; [Dinner, Night, Friday]; [Party, Night, Saturday]. Our goal was to compare the system’s recommendations with a manual evaluation made by human judges, and to apply some metrics to analyse our system’s performance. The judges evaluated every POI, from every run, according to the current user’s context and POI’s context, Figure 2: Correlation coefficients between manual using the following scale: 0 - if the POI does not satisfy evaluation (with exact agreement) and the system’s the user’s context or the user’s goal; 1 - if the POI satisfies recommendations. the user’s context and the user’s goal, but if it is expensive or too far from the user; 2 - if the POI satisfies the user’s In order to avoid some of the ambiguity that could arise context and the user’s goal, and it is not expensive or far. when using a 1-5 scale, the human judges evaluated the It is important to refer that the classifier’s training dataset system in a scale of 0 to 2 (see section 3.1.2 and 3.2.2, was built based on the preferences of a particular user’s respectively). Furthermore, to calculate the correlation profile (i.e., POIs that were near, cheaper, and that were not (eq. 1) between the system’s recommendations and the closed, see section 3.1.3 for more details). The evaluation evaluation of the human judges, the scale of both evaluations performed by the human judges was also based on the was standardised. The system’s scale was converted to preferences of the same user’s profile. They were asked to a scale from 0 to 2: where 1 and 2 corresponds to 0; 3 give their personal opinion for a list of scenarios, but never corresponds to 1; and 4 and 5 corresponds to 2. Therefore, contradicting the user’s profile they were simulating. figure 2 shows the correlation coefficients between the To perform the manual evaluation, we create a user most common evaluated value (i.e., the exact agreement interface using Google Maps3 . The POIs’ names were correlation, represented as EA) given by each of the human omitted to avoid that the judges’ personal opinion influenced judges (corresponding to H1, H2 and H3, in the chart) and the evaluation, since the classifier was trained based on the the system’s recommendations, through the 18 runs. POI’s category. We had to do this to prevent discrepancy As we can see in figure 2, the results are promising. between the judges preferences and the user’s profile (3.1.3). However, some of the results have low correlation values The manual evaluation was important to evaluate the because when we trained the system, we discarded all performance of the system in ambiguous cases (a POI with contexts that make no sense, like having lunch at night an average price and average distance). In this specific or morning and having dinner or to party at morning or situations, the agreement between the human judges was low afternoon. On the other hand, the goal Coffee is valid (14.3%). However, the PAA was trained to a specific user’s in all times of day, resulting in a lot more instances and, profile and it is expected, in these ambiguous cases, to give consequently, the system performed better when this was better results for the judge with preferences closer to the the user’s goal. In order to overcome this problem, more user’s profile used in the training process (notice that each instances with goals Lunch, Dinner and Party should be user has a PAA that learn with his preferences, individually). added to the training dataset. Furthermore, the exact agreement among judges resulted The figure 3 shows the evolution of the F1 (eq. 3) values in 93.3% (using the three values in the scale: {0, 1, 2}). In (y-axis), in all the 18 runs (x-axis). In the figure 3, the addition, we also calculated the relaxed agreement (using results represented by the legends named High and Low, a scale of {0, 2}, considering POIs classified as 1 and 2 as correspond to recommendations given by the system, with correct), resulting in 95.7%. a score of 2 and a score of 2 and 1, respectively. This allow 3.2.3 Manual Evaluation vs. Automatic us to compare the results with a high filter, considering only Recommendations the best recommendations (score 2), and with a low filter, considering all the good recommendations (score 1 and 2). In order to observe the relationships between the manual As we expected, higher values are obtained for the goal evaluation and the output values given by the RS, the Coffee (see for example run 7 and 8), and low values are correlation coefficients between them were computed and are obtained for the goal Lunch and Dinner (see for instance 3 http://code.google.com/apis/maps/index.html runs 16 and 17). This happens because, as we mentioned 0124-FEDER-010146. 5. REFERENCES [1] G. Adomavicius, B. Mobasher, F. Ricci, and A. Tuzhilin. Context-Aware Recommender Systems. AI Magazine, 32(3):67–80, 2011. [2] M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40(3):66–72, 1997. [3] L. Baltrunas, B. Ludwig, S. Peer, and F. Ricci. Context relevance assessment and exploitation in mobile recommender systems. Personal and Ubiquitous Computing, 16(5):507–526, 2012. [4] C. Biancalana, A. Flamini, F. Gasparetti, A. Micarelli, Figure 3: F-Measure. S. Millevolte, and G. Sansonetti. Enhancing Traditional Local Search Recommendations with before, the goal Coffee is valid in all times of day and the Context-Awareness. In User Modeling, Adaption and system performed better in these situations. Personalization, pages 335–340. Springer, 2011. [5] D. Billsus and M. J. Pazzani. Learning Collaborative Information Filters. In Proc. 15th Int. Conf. on 4. CONCLUSIONS Machine Learning, pages 46–54, San Francisco, CA, In this paper, we discussed the combination of context USA, 1998. Morgan Kaufmann Publishers Inc. and intention-awareness with RS, applied in a location- [6] A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google based application. We pointed out what advantages are news personalization: scalable online collaborative earned in using, besides the context, the user’s intentions, filtering. In Proc. 16th Int. Conf. on World Wide Web, and how to integrate both into a location-based RS. We pages 271–280, New York, NY, USA, 2007. ACM. also presented our system’s architecture and described its [7] H. Huang and G. Gartner. Using Context-Aware advantages, such as its modular nature. Machine learning Collaborative Filtering for POI Recommendations in techniques were used to train the classifiers, more precisely Mobile Guides. In Advances in Location-Based the Naive Bayes Updateable algorithm. Machine learning Services, Lecture in Geoinformation and Cartography, can be a powerful tool to predict which content will be pages 131–147, Vienna, Austria, 2012. Springer. interesting for a determined user, but it should be used with [8] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, caution and the datasets must be well defined. L. R. Gordon, and J. Riedl. GroupLens: applying Then, we created an experimental set-up to evaluate collaborative filtering to Usenet news. Commun. the system’s performance. To test the accuracy of our ACM, 40(3):77–87, 1997. system, we used various evaluation methods. First, we [9] L. Macedo. A Surprise-based Selective Attention did a cross-validation test. Next, in order to observe the Agent for Travel Information. In Proc. 9th Int. Conf. relationships between the manual evaluation and the output on Autonomous Agents and Multiagent Systems, 6th values given by the PAA, the correlation coefficients between Workshop on Agents in Traffic and Transportation, them were computed. Nevertheless, after analysing the pages 111–120, 2010. results in general, the recommendations can be considered very promising, being this a good starting point to develop [10] P. Melville, R. J. Mooney, and R. Nagarajan. a context and intention-aware POI RS. Content-boosted collaborative filtering for improved In the future, we are planning numerous improvements recommendations. In Proc. 18th National Conf. on AI, to our work, such as: take into account new attributes pages 187–192, Menlo Park, CA, USA, 2002. AAAI. (e.g., POI’s quality); test and compare other machine [11] P. Refaeilzadeh, L. Tang, and H. Liu. learning algorithms; analyse other users’ profiles; use new Cross-Validation. In Encyclopedia of Database information sources; and make it available to the community Systems, pages 532–538. Springer, 2009. in order to get more feedback. We think that with more [12] W. van Meteren and M. van Someren. Using data and more training the results from our system could Content-Based Filtering for Recommendation. In improve. Furthermore, we intend to make available to the Proc. ECML/MLNET Workshop on Machine user the possibility of changing what values fit in each Learning and the New Information Age, pages 47–56, attributes (e.g., what price is considered cheap). Moreover, Barcelona, Spain, 2000. we plan to analyse the system accuracy when applying [13] M. van Setten, S. Pokraev, and J. Koolwaaij. selective attention metrics, such as surprise [9], in the Context-Aware Recommendations in the Mobile recommendation outputs. Tourist Application COMPASS. In Proc. 3rd Conf. on Adaptive Hypermedia and Adaptive Web-Based Acknowledgments Systems, pages 235–244, Berlin, 2004. Springer. [14] W. Woerndl and J. Schlichter. Introducing context Work funded by Fundação para a Ciência e Tecnologia into recommender systems. In Proc. AAAI, Workshop — Project PTDC/EIA-EIA/108675/2008, and by on RS in e-Commerce, pages 22–23, Vancouver, FEDER through Programa Operacional Factores de Canada, 2007. Competitividade do QREN — COMPETE:FCOMP-01-