 Effects of relevant contextual features in the performance
            of a restaurant recommender system

                  Blanca Vargas-Govea, Gabriel González-Serna, Rafael Ponce-Medellín
                                   Centro Nacional de Investigación y Desarrollo Tecnológico
                            Interior Internado Palmira S/N, Col. Palmira, Cuernavaca Mor., México

ABSTRACT                                                             Contextual information has become a key factor to im-
Contextual information in recommender systems aims to im-         prove user satisfaction. However, not the totality of the con-
prove user satisfaction. Usually, it is assumed that the com-     textual information given to the system is relevant. More-
plete set of contextual features is significant. However, iden-   over, if the system requires explicit information, asking for
tifying relevant context variables is important as increasing     a huge amount of data can be intrusive. On the contrary, a
their number may lead the system to dimensionality prob-          lack of information can lead the system to generate poor rec-
lems. In this paper, relevant contextual attributes are iden-     ommendations. A careful selection of relevant information
tified by using a simple feature selection approach. Once         could improve efficiency and predictive accuracy of recom-
the features has been identified, it is shown their impact in     mendation algorithms. To deal with this problem, feature
different performance aspects of the system. This approach        selection techniques have been used in different domains but
was applied to a semantic based restaurant recommender            have not been widely exploited in contextual recommender
system. Results show that feature selection techniques can        systems. Usually, contextual information provided to the
be applied successfully to identify relevant contextual data.     system is chosen by experience and it is assumed to be im-
These results are important to model contextual user pro-         portant.
files with meaningful information, to reduce dimensionality,         This paper presents an analysis about the effects of con-
and to analyze user’s decision criteria.                          textual attributes in the predictive ability of a recommender
                                                                  system. The study focuses on Surfeous, a contextual recom-
                                                                  mender system prototype based on CF and semantic models.
Keywords                                                          Relevant variables were identified by applying a simple fea-
contextual information, feature selection                         ture selection approach. Once the meaningful variables have
                                                                  been identified, the effects of each relevant contextual vari-
                                                                  able in the predictive performance of the system was ana-
1. INTRODUCTION                                                   lyzed. Results of this research have impact on three aspects:
   When we want to know new places to eat, it is common to        i) identification of relevant contextual attributes that users
ask friends for suggestions. Touristic and gastronomic guides     take into account when selecting a restaurant, ii) reduction
are other alternatives to find good restaurants. However,         of dimensionality, and iii) providing new insights about the
friends are likely to know our taste, current location and fa-    effects of contextual variables in the predictive performance
vorite environment. Consequently, their suggestions would         of recommender systems.
be more precise than those provided by the guides. Actually,         This paper is structured as follows. Section 2 presents
recommender systems are common online services that help          an overview of relevant work about the effects of context
users to cope with information overload by retrieving useful      and feature selection techniques applied to recommender
items according to their preferences. Collaborative Filter-       systems. Section 3 describes Surfeous and its contextual
ing (CF) is a successful technique, which automatizes the         features. Section 4 presents general concepts about feature
social recommender scheme. CF predicts user’s preferences         selection techniques and describes the approach applied to
considering opinions of users with similar interests, whereas     our analysis. The experiments and results are presented in
content-based recommendation systems build a model only           Section 5. Finally, conclusions and future research directions
from the user’s favorite items. Common recommendation             are given in Section 6.
approaches only take into account user-item-rating data, ig-
noring contextual information. The drawback of this scheme
is the lack of personalization; thus, a tourist visiting Africa   2.   RELATED WORK
could receive a recommendation of a restaurant located in            Our literature review focuses on two topics: the impor-
Brazil. Information such as time, location, or weather can        tance of context, and the use of feature selection techniques
generate tailored recommendations according to the current        in recommender systems. As an emerging and under-explored
user’s situation.                                                 research area, context-aware recommender systems are re-
                                                                  ceiving an increasing attention [10]. Although the effect of
                                                                  context has not been widely studied, related work reveals
                                                                  that contextual information is important [1]. For example,
the statistic dependence between pairs of context variables.
Next, the system was evaluated over each cluster to observe                     Table 1: Context attributes
                                                                    Service model (23 attributes)
changes in recommendations. Results showed that context
variables improve predictive accuracy. In [13] it is shown
the benefits of contextual information, both in precision and
using an ontology to exploit semantic concepts. Another in-
teresting strategy consists on splitting the user profile into
                                                                    User model (21 attributes)
several sub-profiles [2]. Each sub-profile represents the user
in a time period of the day. Using sub-profiles, the system         latitude,longitude,smoking,alcohol,dress,ambiance,age,
could recommend music precisely as it took into account             transportation,marital-status,children,interests,
time. It was showed that accuracy could increase by mak-            personality,religion,occupation,favorite-color,weight,
ing recommendations using sub-profiles instead of a single          height,budget,accepts,accessibility,cuisine
profile.                                                            Environment model (2 attributes)
   As a consequence of the integration of contextual infor-         time,weather
mation, adaptability and dimensionality reduction require-
ments on recommender systems have increased. To deal with         fies three context models, each one with the set of attributes
these problems, machine learning and data mining tech-            shown in Table 1. The models are described as follows:
niques have proved to be effective. However, feature se-            1. Service model. It describes the restaurant characteris-
lection and machine learning techniques have mainly been               tics. The model has 23 attributes; 6 of them: cuisine,
applied to content-based systems. For example, in [9] it               alcohol, smoking, dress, accepts (type of payment) and
is shown an interactive recommender system based on rein-              parking were defined according to http://chefmoz.org,
forcement learning. When the system returns a huge amount              an online dining guide. Values are selected by the user
of results, a feature selection algorithm is applied to reduce         from several possible options showed by a GUI when
the results list. Another approach to feature selection is pre-        he/she rates a new restaurant.
sented in [3], where similarity between users is calculated         2. User model. It describes the user profile. The model
taking into account the subset of common items that best               has 21 attributes; 19 of them are provided by the user
describes the user preferences. The results show that predic-          when he/she signs into the system the first time or
tive performance can be improved by a careful selection of             modify his/her personal information.
item ratings. The recommender system is based on CF and             3. Environment model. It refers to the time and weather
does not include contextual information. In [4] the authors            of the user’s location; their values are acquired from
tested their methodology with a recommender system based               Web services. This information restricts the search
on Semantic Web technologies and collaborative filtering.              to available restaurants that have appropriate instal-
Their work focuses on the assessment of relevant model fea-            lations. In this paper, Surfeous considers a 3 km ratio
tures using decision trees and feature selection techniques.           from the user’s location to select the restaurants.
However, the system was not evaluated with the new learned
models.                                                              To generate the recommendations, Surfeous gets the user
   In contrast to the reviewed works, in this paper, our pri-     location and searches for the closer restaurants from a spa-
mary goal is to identify relevant attributes and then evaluate    tial database. With this information, an ontology is created
their effects in the system’s predictive performance. A se-       at execution time. The closer restaurants become the in-
mantic recommender system that fuses social and contextual        stances that populate the ontology. Then, to match the
aspects was used as test bed.                                     context models, the Semantic Web Rule Language (SWRL)
                                                                  is applied to a set of semantic rules. This set of rules was
                                                                  defined based on a market study of consumer behavior.
3. SURFEOUS: WHERE TO EAT?                                           From the attributes of the restaurant profile (i.e., service
   Surfeous is a recommender system prototype that uses so-       model) a relation is created to determine if its value matches
cial annotations (e.g., tags) and contextual models to find       the corresponding value in the user profile. Based on a
restaurants that best suit the user preferences. The recom-       space-temporal attribute, an antecedent and a consequent
mendations are shown as an ordered list (top-n).                  are created to describe the situation. For example, if the
   In regard to the social aspect, Surfeous uses an item-         user does not smoke, the recommended restaurants need a
based collaborative filter approach. Its prediction process       no-smoking area. The rule is as follows:
is based on the Tso-Sutter [12] extended technique that in-
                                                                  smokingArea(R, no) ∧ restaurant(R) → noSmoking(R, true)
cludes tags. In contrast to the common two-dimensional
relationship item-attribute, tags are represented as a three-       Some examples of the rules and relations are shown in
dimensional relation user-item-tag. These three dimensions        Table 2. Semantic Web rules use the ontology and infer the
are arranged as a three two-dimensional problem: user-tag,        places that fulfill the premises. Results are ranked based on
item-tag and user-item by augmenting the standard user-           the number of context rules that hold for each user query:
item matrix horizontally and vertically with user and item        for each different restaurant a score is computed by counting
tags correspondingly. Thus, user tags are considered as           and normalizing the rules that hold. The social results are
items, and item tags are viewed as users in the user-item         added to this score considering weights between 0.1 and 0.9
matrix. After the extension, user and item based CF have          with intervals of 0.1, where 0.0 stands for context-free (i.e.,
to be recomputed with the new matrix.                             only tags) and 1.0 is 100% context (i.e., only rules). In this
   In this paper, semantic Web technologies are exploited to      paper, fusion is the average of the intervals between 0.1 and
manage the contextual information by using ontologies to          0.9. Our analysis is focused on the service model to explore
model the user and restaurant profiles [11]. Surfeous speci-      what do the users are looking for to select a restaurant.
                                                                   Algorithm LVF (Las Vegas Filter)
          Table 2: Some rules and relations
    user - service profile                                          Input: maximum number of iterations (M ax), dataset
    person(X) ∧ hasOccupation(X, student) ∧                         (D), number of attributes (N ), allowable inconsistency
    restaurant(R) ∧ hasCost(R, low) → select(X, R)                  rate (γ)
    user - environment profile                                      Output: sets of M features satisfying the inconsistency
    person(X) ∧ isJapanese(X, true) ∧                               criterion (Solutions)
    queryPlace(X, U SA) ∧ restaurant(R) ∧                           Solutions = ∅
    isVeryClose(R, true) → select(X, R)                             Cbest = N
    environment - service profile                                   for i = 1 to M ax do
                                                                       S = randomSet(seed); C = numOf F eatures(S)
    currentWeather(today, rainy) ∧ restaurant(R) ∧
                                                                       if C < Cbest then
    space(R, closed) → select(R)
                                                                          if InconCheck(S,D) < γ then
                                                                             Sbest = S; Cbest = C
    likesFood(X, Y ) X: person, Y : cuisine-type
                                                                             Solutions = S
    currentWeather(X, Y ) X: query, Y : weather
                                                                          end if
    space(X, Y ) X: restaurant, Y : {closed, open}
                                                                       else if C = Cbest and InconCheck(S,D) < γ then
                                                                          append(Solutions, S)
                                                                       end if
4. FEATURE SELECTION                                                end for
   Feature selection techniques have proven their usefulness
in machine learning to improve predictive performance, to
relief storage requirements, to provide a better model un-
derstanding, and to ease data visualization.                                      ICS (A) = S(A) − max Sk (A)                 (1)
   Attributes are relevant with regard to a class if their val-                                         k
ues can separate one class from the others. For example,             The inconsistency rate IR of S (Eq.2) is the sum of the in-
if a tourist has to chose a restaurant to have breakfast, at-      consistency counts divided by the total number of instances.
tributes such as the fax number are likely to be irrelevant.                                  P
Conversely, attributes such as cuisine and location are fre-                                    A∈S ICS (A)
                                                                                    IR(S) =                                  (2)
quently a decision criterion. When the attributes can be                                            |S|
derived from other attributes, they are redundant and can
                                                                      To characterize our feature selection problem, Surfeous
be removed. For instance, the restaurant’s address can be
                                                                   can be seen as a classifier that predicts if a restaurant will
calculated from the latitude and longitude values.
                                                                   be high rated by the user. Contextual attributes are repre-
   Feature selection finds the minimum subset of attributes
                                                                   sented as a vector, where the class of the training instances
such that the resulting probability distribution of the data
                                                                   is labeled with the rating values (i.e., 0,1,2). The goal is to
classes is as close as possible to the original distribution ob-
                                                                   find the minimum set of contextual attributes that obtain
tained using the whole attribute set. There are two main
                                                                   at least the same predictive performance as with the whole
methods [5] to feature selection, one is the filter method,
                                                                   attribute set. Once the minimum attribute subset has been
which makes an independent subset evaluation considering
                                                                   found, the next step is to analyze the effects of different
general characteristics of the data such as distance, informa-
                                                                   contextual attributes.
tion gain, dependency and consistency. The second one is
the wrapper method, which evaluates the attribute subset
using the learning algorithm; its computational cost is high.      5.   EXPERIMENTS
In this paper, the filter method was chosen because of its in-        The experiments have three purposes: i) to identify rele-
dependence of the algorithm and its low computational cost         vant contextual attributes, ii) to show that with the mini-
in contrast to the wrapper method.                                 mum attribute subset, the predictive performance is as least
   To select the relevant context features Las Vegas Filter        the same as with the whole attribute set and, iii) to analyze
(LVF) algorithm [7] was chosen. LVF algorithm generates a          the effects of relevant contextual attributes.
random subset S of N attributes. If the number of attributes          Data description. The experiments have been conducted
C is less than the best (Cbest ) then the algorithm computes       using the data collected during a seven months period (i.e.,
their evaluation measure based on an inconsistency criterion;      from July, 2010 to February, 2011). Test users added and
if the inconsistency criterion is satisfied, Cbest and Sbest are   rated new and existing restaurants; they filled the attribute
replaced in the Solutions list. Otherwise, if the number of        values described in Section 3. Data comprises 111 users that
attributes C is equal than the best (Cbest ) and the incon-        contributed with information about 237 restaurants and ac-
sistency criterion is satisfied, S is added to the Solutions.      cumulated 1, 251 ratings. Possible rating values are 0, 1,
M ax was defined from experimentation (77 × N 5 ) (see Al-         and 2, where 0 indicates that the user does not like the
gorithm LVF).                                                      restaurant, and 2 denotes a high preference. Rating average
   Inconsistency criterion. The algorithm considers that two       is about 11.2 ratings per user; half of the ratings concen-
instances are inconsistent if their attributes have the same       trates on the 38 best rated restaurants. There are numer-
values except for their class labels. For the matching in-         ous restaurants with few ratings (i.e., 65 restaurants have
stances, regardless of the class labels, the inconsistency count   1 rating), whereas less than 5 restaurants gathered more
IC (Eq.1) of an instance A ∈ S is the number of instances          than 15 ratings. Although our sample is not in the range
in S equal to A minus the number of instances of the most          of thousands of instances, it presents a power law distribu-
frequent class (k) with the same attributes of S.                  tion usually found on recommender systems: a small number
of items dominates the ratings whereas many items obtains                                               0.35
only a few.                                                                                             0.30
   Attribute selection. As input to the feature selection al-
gorithm we built a set of 5, 802 instances. For each rated
item, several instances were created by replacing their at-                                             0.20

tribute values with different possible values. Each instance                                            0.15                                                 fusion
was a vector consisting of the 23 restaurant attributes de-                                                                                                  context
scribed in Table 1, and rating values were given as nominal
class labels. The consistency selector algorithm [7] was taken                                          0.05

from WEKA [6], it involved a best-first search with a for-
ward approach and 3-fold cross-validation. The remaining                                                        All     B     C      D    E   F   G
parameters were set to their default values. The output was
a subset consisting of the following 5 attributes: cuisine,                                                    subset       C-free       Fusion   Rules
                                                                                                               All          0.2975       0.3097   0.1025
hours, days, accepts and address; 18 features were removed                                                     B            0.2975       0.3086   0.1293
from the original set (i.e., 78.26% from the whole set).                                                       C            0.2975       0.3246   0.1414
   Tests with Surfeous. Experimental setup is based on a                                                       D            0.2975       0.3048   0.1053
leave-one-out scheme: an instance of each user was ran-                                                        E            0.2975       0.3124   0.1215
                                                                                                               F            0.2975       0.2997   0.1335
domly taken to build the test subset (111 instances) while
                                                                                                               G            0.2975       0.2939   0.1174
the remaining instances became the training subset (1, 140
instances). Seven different datasets were defined: the subset
All consists of the original 23 attributes. B is the minimum                           Figure 2: Recall. Fusion outperformed the context-
attribute subset (5) calculated with the feature selection al-                         free model with most of the subsets.
gorithm (i.e., accepts, cuisine, hours, days, address). The
reminder subsets (C-G) were built by removing one different
                                                                                       though context-free precision was not outperformed, in com-
attribute from B. For each set, 10 executions with normal-
                                                                                       parison with the best result there is only a difference of
ized data were performed.
                                                                                       0.756%. It is a trade-off between feature reduction and per-
   A test consists in executing Surfeous with each attribute
                                                                                       formance. For semantic rules, the subset F (i.e., address,
set and measuring its performance. Evaluation was per-
                                                                                       cuisine, hours and accepts) got the best value (0.0228). Al-
formed over three types of recommendations: those gener-
                                                                                       though semantic rules do not show a good performance, they
ated by the system without contextual features (i.e., context-
                                                                                       contribute with personalized features to the social approach
free), those generated by the fusion of social and contextual
                                                                                       with similar precision results. The relevant features of the
aspects, and those produced only by the set of semantic
                                                                                       best subsets are: hours, days, accepts and cuisine.
rules. Furthermore, two facets of the system’s performance
                                                                                          Figure 2 shows the results for recall. For fusion, the major-
were evaluated: its capacity to retrieve relevant items and
                                                                                       ity of the subsets outperformed the context-free performance
its effectiveness to show the expected items in the first po-
                                                                                       (0.2975). Subset C generated the best recall value both for
sitions of the recommendation lists.
                                                                                       fusion (0.3246) and for semantic rules (0.1414). As with pre-
   Figure 1 shows the results for precision. For fusion, the
                                                                                       cision, the most important attributes are: cuisine, hours and
highest value was obtained with the subset D (0.0788). Al-
                                                                                       days. However, recall does not take into account the item’s
                                                                                       position in the recommendations list. Consequently, a sys-
                                                                                       tem that shows the useful items in lower positions could ob-
                                                                                       tain the same recall value than other system, which presents
                                                                                       the expected items in higher positions. Since recall is unable
                                                                                       to measure this aspect, Surfeous was evaluated with NDCG
                                                                                       (Normalized Discounted Cumulative Gain). To compute the
                                                                 type                  value, the top-k list is represented as a binary vector of 10

                                                                        context.free   positions. A value of 1 is assigned to the position where the
                                                                        fusion         expected item appears; otherwise its value is 0. When the
                                                                                       expected restaurant appears in the first position, it achieves
                                                                                       the optimal score of 1 (i.e., log12 2 ). Results were averaged
                                                                                       over 23, 294 queries.
                           All     B     C      D    E   F   G                            Figure 3 shows that for all the attribute sets, Surfeous pre-
                                         subset                                        sented the expected items in the top-5 list. For fusion, with
                          subset       C-free       Fusion   Rules                     the subset D it is obtained a very similar value (0.4923) to
                          All          0.0794       0.0767   0.0186                    the context-free performance (0.4994). For semantic rules,
                          B            0.0794       0.0771   0.0222                    the subset G (i.e.,cuisine, hours, days, accepts) got the best
                          C            0.0794       0.0785   0.0226
                          D            0.0794       0.0788   0.0178
                                                                                       score. Even though the attribute address appears as im-
                          E            0.0794       0.0758   0.0177                    portant in the subsets, Surfeous selects the recommended
                          F            0.0794       0.0733   0.0228                    restaurants based on the user’s location. For this reason,
                          G            0.0794       0.0754   0.0206                    address is an implicit feature that can be discarded.
                                                                                          To sum up, the best subset for precision and NDCG is D
Figure 1: Precision. Using the subset D (hours,                                        (i.e., hours, days, accepts), whereas C (i.e., cuisine, hours,
days, accepts, address), fusion performed similar to                                   days) is the best only for recall. Results suggest that the
the context-free model.                                                                restaurant opening times and its type of payment are likely
