Effects of relevant contextual features in the performance of a restaurant recommender system Blanca Vargas-Govea, Gabriel González-Serna, Rafael Ponce-Medellín Centro Nacional de Investigación y Desarrollo Tecnológico Interior Internado Palmira S/N, Col. Palmira, Cuernavaca Mor., México {blanca.vargas,gabriel,rafaponce}@cenidet.edu.mx ABSTRACT Contextual information has become a key factor to im- Contextual information in recommender systems aims to im- prove user satisfaction. However, not the totality of the con- prove user satisfaction. Usually, it is assumed that the com- textual information given to the system is relevant. More- plete set of contextual features is significant. However, iden- over, if the system requires explicit information, asking for tifying relevant context variables is important as increasing a huge amount of data can be intrusive. On the contrary, a their number may lead the system to dimensionality prob- lack of information can lead the system to generate poor rec- lems. In this paper, relevant contextual attributes are iden- ommendations. A careful selection of relevant information tified by using a simple feature selection approach. Once could improve efficiency and predictive accuracy of recom- the features has been identified, it is shown their impact in mendation algorithms. To deal with this problem, feature different performance aspects of the system. This approach selection techniques have been used in different domains but was applied to a semantic based restaurant recommender have not been widely exploited in contextual recommender system. Results show that feature selection techniques can systems. Usually, contextual information provided to the be applied successfully to identify relevant contextual data. system is chosen by experience and it is assumed to be im- These results are important to model contextual user pro- portant. files with meaningful information, to reduce dimensionality, This paper presents an analysis about the effects of con- and to analyze user’s decision criteria. textual attributes in the predictive ability of a recommender system. The study focuses on Surfeous, a contextual recom- mender system prototype based on CF and semantic models. Keywords Relevant variables were identified by applying a simple fea- contextual information, feature selection ture selection approach. Once the meaningful variables have been identified, the effects of each relevant contextual vari- able in the predictive performance of the system was ana- 1. INTRODUCTION lyzed. Results of this research have impact on three aspects: When we want to know new places to eat, it is common to i) identification of relevant contextual attributes that users ask friends for suggestions. Touristic and gastronomic guides take into account when selecting a restaurant, ii) reduction are other alternatives to find good restaurants. However, of dimensionality, and iii) providing new insights about the friends are likely to know our taste, current location and fa- effects of contextual variables in the predictive performance vorite environment. Consequently, their suggestions would of recommender systems. be more precise than those provided by the guides. Actually, This paper is structured as follows. Section 2 presents recommender systems are common online services that help an overview of relevant work about the effects of context users to cope with information overload by retrieving useful and feature selection techniques applied to recommender items according to their preferences. Collaborative Filter- systems. Section 3 describes Surfeous and its contextual ing (CF) is a successful technique, which automatizes the features. Section 4 presents general concepts about feature social recommender scheme. CF predicts user’s preferences selection techniques and describes the approach applied to considering opinions of users with similar interests, whereas our analysis. The experiments and results are presented in content-based recommendation systems build a model only Section 5. Finally, conclusions and future research directions from the user’s favorite items. Common recommendation are given in Section 6. approaches only take into account user-item-rating data, ig- noring contextual information. The drawback of this scheme is the lack of personalization; thus, a tourist visiting Africa 2. RELATED WORK could receive a recommendation of a restaurant located in Our literature review focuses on two topics: the impor- Brazil. Information such as time, location, or weather can tance of context, and the use of feature selection techniques generate tailored recommendations according to the current in recommender systems. As an emerging and under-explored user’s situation. research area, context-aware recommender systems are re- ceiving an increasing attention [10]. Although the effect of context has not been widely studied, related work reveals that contextual information is important [1]. For example, CARS-2011, October 23, 2011, Chicago, Illinois, USA. in [8] the effect of context variables in a content-based sys- Copyright is held by the author/owner(s). tem was analyzed. Several clusters were built according to the statistic dependence between pairs of context variables. Next, the system was evaluated over each cluster to observe Table 1: Context attributes Service model (23 attributes) changes in recommendations. Results showed that context latitude,longitude,address,city,state,country,fax,ZIP, variables improve predictive accuracy. In [13] it is shown alcohol,smoking,dress,accessibility,price,franchise, the benefits of contextual information, both in precision and ambiance,space,services,parking,cuisine,phone,accepts, using an ontology to exploit semantic concepts. Another in- days,hours teresting strategy consists on splitting the user profile into User model (21 attributes) several sub-profiles [2]. Each sub-profile represents the user in a time period of the day. Using sub-profiles, the system latitude,longitude,smoking,alcohol,dress,ambiance,age, could recommend music precisely as it took into account transportation,marital-status,children,interests, time. It was showed that accuracy could increase by mak- personality,religion,occupation,favorite-color,weight, ing recommendations using sub-profiles instead of a single height,budget,accepts,accessibility,cuisine profile. Environment model (2 attributes) As a consequence of the integration of contextual infor- time,weather mation, adaptability and dimensionality reduction require- ments on recommender systems have increased. To deal with fies three context models, each one with the set of attributes these problems, machine learning and data mining tech- shown in Table 1. The models are described as follows: niques have proved to be effective. However, feature se- 1. Service model. It describes the restaurant characteris- lection and machine learning techniques have mainly been tics. The model has 23 attributes; 6 of them: cuisine, applied to content-based systems. For example, in [9] it alcohol, smoking, dress, accepts (type of payment) and is shown an interactive recommender system based on rein- parking were defined according to http://chefmoz.org, forcement learning. When the system returns a huge amount an online dining guide. Values are selected by the user of results, a feature selection algorithm is applied to reduce from several possible options showed by a GUI when the results list. Another approach to feature selection is pre- he/she rates a new restaurant. sented in [3], where similarity between users is calculated 2. User model. It describes the user profile. The model taking into account the subset of common items that best has 21 attributes; 19 of them are provided by the user describes the user preferences. The results show that predic- when he/she signs into the system the first time or tive performance can be improved by a careful selection of modify his/her personal information. item ratings. The recommender system is based on CF and 3. Environment model. It refers to the time and weather does not include contextual information. In [4] the authors of the user’s location; their values are acquired from tested their methodology with a recommender system based Web services. This information restricts the search on Semantic Web technologies and collaborative filtering. to available restaurants that have appropriate instal- Their work focuses on the assessment of relevant model fea- lations. In this paper, Surfeous considers a 3 km ratio tures using decision trees and feature selection techniques. from the user’s location to select the restaurants. However, the system was not evaluated with the new learned models. To generate the recommendations, Surfeous gets the user In contrast to the reviewed works, in this paper, our pri- location and searches for the closer restaurants from a spa- mary goal is to identify relevant attributes and then evaluate tial database. With this information, an ontology is created their effects in the system’s predictive performance. A se- at execution time. The closer restaurants become the in- mantic recommender system that fuses social and contextual stances that populate the ontology. Then, to match the aspects was used as test bed. context models, the Semantic Web Rule Language (SWRL) is applied to a set of semantic rules. This set of rules was defined based on a market study of consumer behavior. 3. SURFEOUS: WHERE TO EAT? From the attributes of the restaurant profile (i.e., service Surfeous is a recommender system prototype that uses so- model) a relation is created to determine if its value matches cial annotations (e.g., tags) and contextual models to find the corresponding value in the user profile. Based on a restaurants that best suit the user preferences. The recom- space-temporal attribute, an antecedent and a consequent mendations are shown as an ordered list (top-n). are created to describe the situation. For example, if the In regard to the social aspect, Surfeous uses an item- user does not smoke, the recommended restaurants need a based collaborative filter approach. Its prediction process no-smoking area. The rule is as follows: is based on the Tso-Sutter [12] extended technique that in- smokingArea(R, no) ∧ restaurant(R) → noSmoking(R, true) cludes tags. In contrast to the common two-dimensional relationship item-attribute, tags are represented as a three- Some examples of the rules and relations are shown in dimensional relation user-item-tag. These three dimensions Table 2. Semantic Web rules use the ontology and infer the are arranged as a three two-dimensional problem: user-tag, places that fulfill the premises. Results are ranked based on item-tag and user-item by augmenting the standard user- the number of context rules that hold for each user query: item matrix horizontally and vertically with user and item for each different restaurant a score is computed by counting tags correspondingly. Thus, user tags are considered as and normalizing the rules that hold. The social results are items, and item tags are viewed as users in the user-item added to this score considering weights between 0.1 and 0.9 matrix. After the extension, user and item based CF have with intervals of 0.1, where 0.0 stands for context-free (i.e., to be recomputed with the new matrix. only tags) and 1.0 is 100% context (i.e., only rules). In this In this paper, semantic Web technologies are exploited to paper, fusion is the average of the intervals between 0.1 and manage the contextual information by using ontologies to 0.9. Our analysis is focused on the service model to explore model the user and restaurant profiles [11]. Surfeous speci- what do the users are looking for to select a restaurant. Algorithm LVF (Las Vegas Filter) Table 2: Some rules and relations user - service profile Input: maximum number of iterations (M ax), dataset person(X) ∧ hasOccupation(X, student) ∧ (D), number of attributes (N ), allowable inconsistency restaurant(R) ∧ hasCost(R, low) → select(X, R) rate (γ) user - environment profile Output: sets of M features satisfying the inconsistency person(X) ∧ isJapanese(X, true) ∧ criterion (Solutions) queryPlace(X, U SA) ∧ restaurant(R) ∧ Solutions = ∅ isVeryClose(R, true) → select(X, R) Cbest = N environment - service profile for i = 1 to M ax do S = randomSet(seed); C = numOf F eatures(S) currentWeather(today, rainy) ∧ restaurant(R) ∧ if C < Cbest then space(R, closed) → select(R) if InconCheck(S,D) < γ then Relations Sbest = S; Cbest = C likesFood(X, Y ) X: person, Y : cuisine-type Solutions = S currentWeather(X, Y ) X: query, Y : weather end if space(X, Y ) X: restaurant, Y : {closed, open} else if C = Cbest and InconCheck(S,D) < γ then append(Solutions, S) end if 4. FEATURE SELECTION end for Feature selection techniques have proven their usefulness in machine learning to improve predictive performance, to relief storage requirements, to provide a better model un- derstanding, and to ease data visualization. ICS (A) = S(A) − max Sk (A) (1) Attributes are relevant with regard to a class if their val- k ues can separate one class from the others. For example, The inconsistency rate IR of S (Eq.2) is the sum of the in- if a tourist has to chose a restaurant to have breakfast, at- consistency counts divided by the total number of instances. tributes such as the fax number are likely to be irrelevant. P Conversely, attributes such as cuisine and location are fre- A∈S ICS (A) IR(S) = (2) quently a decision criterion. When the attributes can be |S| derived from other attributes, they are redundant and can To characterize our feature selection problem, Surfeous be removed. For instance, the restaurant’s address can be can be seen as a classifier that predicts if a restaurant will calculated from the latitude and longitude values. be high rated by the user. Contextual attributes are repre- Feature selection finds the minimum subset of attributes sented as a vector, where the class of the training instances such that the resulting probability distribution of the data is labeled with the rating values (i.e., 0,1,2). The goal is to classes is as close as possible to the original distribution ob- find the minimum set of contextual attributes that obtain tained using the whole attribute set. There are two main at least the same predictive performance as with the whole methods [5] to feature selection, one is the filter method, attribute set. Once the minimum attribute subset has been which makes an independent subset evaluation considering found, the next step is to analyze the effects of different general characteristics of the data such as distance, informa- contextual attributes. tion gain, dependency and consistency. The second one is the wrapper method, which evaluates the attribute subset using the learning algorithm; its computational cost is high. 5. EXPERIMENTS In this paper, the filter method was chosen because of its in- The experiments have three purposes: i) to identify rele- dependence of the algorithm and its low computational cost vant contextual attributes, ii) to show that with the mini- in contrast to the wrapper method. mum attribute subset, the predictive performance is as least To select the relevant context features Las Vegas Filter the same as with the whole attribute set and, iii) to analyze (LVF) algorithm [7] was chosen. LVF algorithm generates a the effects of relevant contextual attributes. random subset S of N attributes. If the number of attributes Data description. The experiments have been conducted C is less than the best (Cbest ) then the algorithm computes using the data collected during a seven months period (i.e., their evaluation measure based on an inconsistency criterion; from July, 2010 to February, 2011). Test users added and if the inconsistency criterion is satisfied, Cbest and Sbest are rated new and existing restaurants; they filled the attribute replaced in the Solutions list. Otherwise, if the number of values described in Section 3. Data comprises 111 users that attributes C is equal than the best (Cbest ) and the incon- contributed with information about 237 restaurants and ac- sistency criterion is satisfied, S is added to the Solutions. cumulated 1, 251 ratings. Possible rating values are 0, 1, M ax was defined from experimentation (77 × N 5 ) (see Al- and 2, where 0 indicates that the user does not like the gorithm LVF). restaurant, and 2 denotes a high preference. Rating average Inconsistency criterion. The algorithm considers that two is about 11.2 ratings per user; half of the ratings concen- instances are inconsistent if their attributes have the same trates on the 38 best rated restaurants. There are numer- values except for their class labels. For the matching in- ous restaurants with few ratings (i.e., 65 restaurants have stances, regardless of the class labels, the inconsistency count 1 rating), whereas less than 5 restaurants gathered more IC (Eq.1) of an instance A ∈ S is the number of instances than 15 ratings. Although our sample is not in the range in S equal to A minus the number of instances of the most of thousands of instances, it presents a power law distribu- frequent class (k) with the same attributes of S. tion usually found on recommender systems: a small number of items dominates the ratings whereas many items obtains 0.35 only a few. 0.30 Attribute selection. As input to the feature selection al- 0.25 gorithm we built a set of 5, 802 instances. For each rated type item, several instances were created by replacing their at- 0.20 Recall context.free tribute values with different possible values. Each instance 0.15 fusion was a vector consisting of the 23 restaurant attributes de- context 0.10 scribed in Table 1, and rating values were given as nominal class labels. The consistency selector algorithm [7] was taken 0.05 from WEKA [6], it involved a best-first search with a for- ward approach and 3-fold cross-validation. The remaining All B C D E F G subset parameters were set to their default values. The output was a subset consisting of the following 5 attributes: cuisine, subset C-free Fusion Rules All 0.2975 0.3097 0.1025 hours, days, accepts and address; 18 features were removed B 0.2975 0.3086 0.1293 from the original set (i.e., 78.26% from the whole set). C 0.2975 0.3246 0.1414 Tests with Surfeous. Experimental setup is based on a D 0.2975 0.3048 0.1053 leave-one-out scheme: an instance of each user was ran- E 0.2975 0.3124 0.1215 F 0.2975 0.2997 0.1335 domly taken to build the test subset (111 instances) while G 0.2975 0.2939 0.1174 the remaining instances became the training subset (1, 140 instances). Seven different datasets were defined: the subset All consists of the original 23 attributes. B is the minimum Figure 2: Recall. Fusion outperformed the context- attribute subset (5) calculated with the feature selection al- free model with most of the subsets. gorithm (i.e., accepts, cuisine, hours, days, address). The reminder subsets (C-G) were built by removing one different though context-free precision was not outperformed, in com- attribute from B. For each set, 10 executions with normal- parison with the best result there is only a difference of ized data were performed. 0.756%. It is a trade-off between feature reduction and per- A test consists in executing Surfeous with each attribute formance. For semantic rules, the subset F (i.e., address, set and measuring its performance. Evaluation was per- cuisine, hours and accepts) got the best value (0.0228). Al- formed over three types of recommendations: those gener- though semantic rules do not show a good performance, they ated by the system without contextual features (i.e., context- contribute with personalized features to the social approach free), those generated by the fusion of social and contextual with similar precision results. The relevant features of the aspects, and those produced only by the set of semantic best subsets are: hours, days, accepts and cuisine. rules. Furthermore, two facets of the system’s performance Figure 2 shows the results for recall. For fusion, the major- were evaluated: its capacity to retrieve relevant items and ity of the subsets outperformed the context-free performance its effectiveness to show the expected items in the first po- (0.2975). Subset C generated the best recall value both for sitions of the recommendation lists. fusion (0.3246) and for semantic rules (0.1414). As with pre- Figure 1 shows the results for precision. For fusion, the cision, the most important attributes are: cuisine, hours and highest value was obtained with the subset D (0.0788). Al- days. However, recall does not take into account the item’s position in the recommendations list. Consequently, a sys- tem that shows the useful items in lower positions could ob- 0.09 tain the same recall value than other system, which presents 0.08 the expected items in higher positions. Since recall is unable 0.07 to measure this aspect, Surfeous was evaluated with NDCG 0.06 (Normalized Discounted Cumulative Gain). To compute the type value, the top-k list is represented as a binary vector of 10 Precision 0.05 0.04 context.free positions. A value of 1 is assigned to the position where the fusion expected item appears; otherwise its value is 0. When the 0.03 context 0.02 expected restaurant appears in the first position, it achieves 0.01 the optimal score of 1 (i.e., log12 2 ). Results were averaged over 23, 294 queries. All B C D E F G Figure 3 shows that for all the attribute sets, Surfeous pre- subset sented the expected items in the top-5 list. For fusion, with subset C-free Fusion Rules the subset D it is obtained a very similar value (0.4923) to All 0.0794 0.0767 0.0186 the context-free performance (0.4994). For semantic rules, B 0.0794 0.0771 0.0222 the subset G (i.e.,cuisine, hours, days, accepts) got the best C 0.0794 0.0785 0.0226 D 0.0794 0.0788 0.0178 score. Even though the attribute address appears as im- E 0.0794 0.0758 0.0177 portant in the subsets, Surfeous selects the recommended F 0.0794 0.0733 0.0228 restaurants based on the user’s location. For this reason, G 0.0794 0.0754 0.0206 address is an implicit feature that can be discarded. To sum up, the best subset for precision and NDCG is D Figure 1: Precision. Using the subset D (hours, (i.e., hours, days, accepts), whereas C (i.e., cuisine, hours, days, accepts, address), fusion performed similar to days) is the best only for recall. Results suggest that the the context-free model. restaurant opening times and its type of payment are likely 0.55 7. REFERENCES 0.50 [1] G. Adomavicius, R. Sankaranarayanan, S. Sen, and 0.45 0.40 A. Tuzhilin. Incorporating contextual information in 0.35 type recommender systems using a multidimensional 0.30 approach. ACM Transactions on Information Systems, NDCG context.free 0.25 fusion 23:103–145, January 2005. 0.20 0.15 context [2] L. Baltrunas and X. Amatriain. Towards 0.10 time-dependant recommendation based on implicit 0.05 feedback. In RecSys’09: Workshop on context-aware systems (CARS-2009), 2009. All B C D E F G subset [3] L. Baltrunas and F. Ricci. Locally adaptive neighborhood selection for collaborative filtering subset C-free Fusion Rules All 0.4994 0.4537 0.3614 recommendations. In Proceedings of the 5th B 0.4994 0.4523 0.3428 international conference on Adaptive Hypermedia and C 0.4994 0.4502 0.3238 Adaptive Web-Based Systems, pages 22–31, Berlin, D 0.4994 0.4923 0.3518 Heidelberg, 2008. Springer-Verlag. E 0.4994 0.4336 0.3249 F 0.4994 0.4598 0.3522 [4] A. Bellogı́n, I. Cantador, P. Castells, and A. Ortigosa. G 0.4994 0.4526 0.3624 Discovering relevant preferences in a personalised recommender system using machine learning Figure 3: NDCG. For fusion, the best NDCG score techniques. In Proceedings of the 8th European was achieved when using the subset D (hours, days, Conference on Machine Learning and Principles and accepts, address). Practice of Knowledge Discovery in Databases, Preference Learning Workshop, pages 82–96, 2008. [5] I. Guyon and A. Elisseeff. An introduction to variable to be the most important factors to make a choice. The and feature selection. Journal of Machine Learning majority of the test users are students with irregular meal Research, 3:1157–1182, March 2003. hours, thus, they prefer a restaurant with flexible open hours [6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, and several types of payment even if it does not offer their P. Reutemann, and I. H. Witten. The WEKA data favorite food. mining software: an update. SIGKDD Explorations Results for recall show that a context-free approach can Newsletter, 11:10–18, November 2009. be improved with the use of context features. Although [7] H. Liu and R. Setiono. A probabilistic approach to the performance achieved by the semantic rules is low, they feature selection - a filter solution. In 13th provide the social approach with features that enriches the International Conference on Machine Learning, pages decision process. A deep analysis of the set of rules is needed 319–327, 1996. to determine the reason of their weak performance. [8] S. Lombardi, S. Anand, and M. Gorgoglione. Context Identification of relevant contextual features facilitates a and customer behavior in recommendation. In better understanding of the decision criteria of users. This RecSys’09: Workshop on context-aware systems knowledge is potentially useful to model user/item profiles (CARS-2009), 2009. with meaningful information, to design efficient user inter- [9] T. Mahmood and F. Ricci. Learning and adaptivity in faces, and to improve services based on people preferences. interactive recommender systems. In Proceedings of the ninth international conference on Electronic 6. CONCLUSIONS AND FUTURE WORK commerce, pages 75–84, New York, USA, 2007. ACM. [10] B. Mobasher. Contextual user modeling for In this work a feature selection approach was applied to a recommendation. In RecSys’10: Workshop on Context recommender system that fuses social annotations and con- Aware Recommender Systems (CARS-2010), textual models to recommend restaurants. The feature se- Barcelona, Spain, 2010. lection problem was characterized as a classification task. A subset evaluator was applied to the complete feature set, [11] R. Ponce-Medellı́n, J. G. González-Serna, R. Vargas, then, the effects of the relevant features in the system’s per- and L. Ruiz. Technology integration around the formance were evaluated. It was shown that by using the geographic information: A state of the art. reduced subset of attributes, the system’s performance was International Journal of Computer Science Issues, not degraded. Feature selection techniques can contribute to 5:17–26, 2009. improve the efficiency of a contextual recommender system. [12] K. H. L. Tso-Sutter, L. B. Marinho, and As future research direction we want to extend the ap- L. Schmidt-Thieme. Tag-aware recommender systems proach to another application domain towards deepening the by fusion of collaborative filtering algorithms. In understanding of contextual information in recommender Proceedings of the 2008 ACM symposium on Applied systems. Turning the feature selection approach into a context- computing, pages 1995–1999, New York, USA, 2008. oriented technique is an interesting open issue that we would [13] D. Vallet, M. Fernández, P. Castells, P. Mylonas, and like to follow. Also, taking into account context variables in Y. Avrithis. A contextual personalization approach an evaluation methodology is part of our future work. based on ontological knowledge. 17th European The authors thank the reviewers for their useful com- Conference on Artificial Intelligence - Contexts and ments. This research was sponsored by CONACYT under Ontologies: Theory, Practice and Applications grant 290586. Workshop, Riva del Garda, Italy, 28 August, 2006.